a faster compression algorithm for pg

Prev: [HACKERS] C-Language Fun on VC2005 ERROR: could not load library
Next: extended operator classes vs. type interfaces

From: Bruce Momjian on 14 Apr 2010 20:29

Dimitri Fontaine wrote:
> Tom Lane <tgl(a)sss.pgh.pa.us> writes:
> > Well, what we *really* need is a convincing argument that it's worth
> > taking some risk for. I find that not obvious. You can pipe the output
> > of pg_dump into your-choice-of-compressor, for example, and that gets
> > you the ability to spread the work across multiple CPUs in addition to
> > eliminating legal risk to the PG project.
>
> Well, I like -Fc and playing with the catalog to restore in staging
> environments only the "interesting" data. I even automated all the
> catalog mangling in pg_staging so that I just have to setup which
> schema I want, with only the DDL or with the DATA too.
>
> The fun is when you want to exclude functions that are used in
> triggers based on the schema where the function lives, not the
> trigger, BTW, but that's another story.
>
> So yes having both -Fc and another compression facility than plain gzip
> would be good news. And benefiting from a better compression in TOAST
> would be good too I guess (small size hit, lots faster, would fit).
>
> Summary?: my convincing argument is using the dumps for efficiently
> preparing development and testing environments from production data,
> thanks to -Fc. That includes skipping data to restore.

I assume people realize that if they are using pg_dump -Fc and then
compressing the output later, they should turn off compression in
pg_dump, or is that something we should document/suggest?

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: daveg on 14 Apr 2010 20:54

On Tue, Apr 13, 2010 at 03:03:58PM -0400, Tom Lane wrote:
> Joachim Wieland <joe(a)mcknight.de> writes:
> > If we still cannot do this, then what I am asking is: What does the
> > project need to be able to at least link against such a compression
> > algorithm?
>
> Well, what we *really* need is a convincing argument that it's worth
> taking some risk for. I find that not obvious. You can pipe the output
> of pg_dump into your-choice-of-compressor, for example, and that gets
> you the ability to spread the work across multiple CPUs in addition to
> eliminating legal risk to the PG project. And in any case the general
> impression seems to be that the main dump-speed bottleneck is on the
> backend side not in pg_dump's compression.

My client uses pg_dump -Fc and produces about 700GB of compressed postgresql
dump nightly from multiple hosts. They also depend on being able to read and
filter the dump catalog. A faster compression algorithm would be a huge
benefit for dealing with this volume.

-dg

--
David Gould daveg(a)sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev |
Pages: 1 2
Prev: [HACKERS] C-Language Fun on VC2005 ERROR: could not load library
Next: extended operator classes vs. type interfaces

a faster compression algorithm for pg_dump