From: Robert Haas on
On Sat, Jul 31, 2010 at 12:40 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Another misbehavior that I noted while playing with Artur's example is
> that, while GIN index build seems to adhere pretty well to whatever
> maintenance_work_mem limit you give it in 8.4, it blows right by that
> limit in 9.0 and HEAD --- I observed it happily eating something north
> of 128MB with a maintenance_work_mem of 70MB. �It looks to me like the
> reason is the new "rbtree.c" code, which adds a whole lot of data
> structure that's not being counted against the maintenance_work_mem
> limit.
>
> Now the first question that might be asked is what we'd need to do to
> rbtree.c's API to allow its memory consumption to be tracked. �However,
> I think there's a considerably more pressing question, which is what
> is it that rbtree.c is doing that justifies a 2X bloat factor in GIN
> index build --- which is pretty much what it's costing us, given the
> observation above. �We know that the amount of memory available for
> the build has an enormous effect on build time. �If we just do the
> obvious thing of including the rbtree data structure in the
> maintenance_work_mem calculation, what we're going to get is a very
> substantial slowdown in build speed for the same maintenance_work_mem
> setting, compared to the way 8.4 worked.
>
> So, I would like somebody to show cause why that whole module shouldn't
> be ripped out and the code reverted to where it was in 8.4. �My
> recollection is that the argument for adding it was to speed things up
> in corner cases, but what I think it's actually going to do is slow
> things down in every case.

I've always been a bit suspicious of this code, too, even though I
didn't think about the memory consumption issue. But see here:

http://archives.postgresql.org/pgsql-hackers/2010-02/msg00307.php

I think there may be a better way to avoid the pathological cases, but
I'm not sure what it is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Robert Haas <robertmhaas(a)gmail.com> writes:
> On Sat, Jul 31, 2010 at 12:40 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> So, I would like somebody to show cause why that whole module shouldn't
>> be ripped out and the code reverted to where it was in 8.4. �My
>> recollection is that the argument for adding it was to speed things up
>> in corner cases, but what I think it's actually going to do is slow
>> things down in every case.

> I've always been a bit suspicious of this code, too, even though I
> didn't think about the memory consumption issue. But see here:
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00307.php

I did a bit of experimentation and confirmed my fears: HEAD is willing
to eat about double the specified maintenance_work_mem. If you cut
back the setting so that its actual memory use is no more than 8.4's,
it's about 33% slower on non-pathological data (I'm testing the dataset
from Artur Dabrowski here). The referenced tests showing significant
speedup over 8.4 were presumably all done without controlling for memory
usage. I'm not sure how much those results need to be discounted, but
they can't be taken at face value.

> I think there may be a better way to avoid the pathological cases, but
> I'm not sure what it is.

After a bit of further study I think maybe something could be done
towards making rbtree.c less memory-hungry: it's basically been coded
with zero attention to memory efficiency. The RBNode structs are
individually palloc'd, which means that on a 64-bit machine they take
about 80 bytes apiece. The EntryAccumulator structs they are frontends
for are only 32 bytes apiece (plus the probably-pass-by-reference Datum
value, which we can't easily do anything with), and they are allocated in
groups to avoid palloc overhead. EntryAccumulator did get 16 bytes
smaller since 8.4 as a result of removing the left/right pointers that
the rbtree structure is substituting for, but that doesn't make up
for this.

I'm tempted to suggest that making RBNode be a hidden struct containing
a pointer to somebody else's datum is fundamentally the wrong way to
go about things, because the extra void pointer is pure overhead,
and we aren't ever going to be using these things in a context where
memory usage isn't of concern. If we refactored the API so that RBNode
was intended to be the first field of some larger struct, as is done in
dynahash tables for instance, we could eliminate the void pointer and
the palloc inefficiency. The added storage compared to what 8.4 used
would be a parent link and the iteratorState/color fields, which would
end up costing us 16 more bytes per EntryAccumulator rather than 64.
Still not great but at least it's not a 2X penalty, and the memory
allocation would become the caller's problem not rbtree's, so the
problem of tracking usage would be no different from before.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Sat, Jul 31, 2010 at 12:02 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas(a)gmail.com> writes:
>> On Sat, Jul 31, 2010 at 12:40 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>>> So, I would like somebody to show cause why that whole module shouldn't
>>> be ripped out and the code reverted to where it was in 8.4. �My
>>> recollection is that the argument for adding it was to speed things up
>>> in corner cases, but what I think it's actually going to do is slow
>>> things down in every case.
>
>> I've always been a bit suspicious of this code, too, even though I
>> didn't think about the memory consumption issue. �But see here:
>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00307.php
>
> I did a bit of experimentation and confirmed my fears: HEAD is willing
> to eat about double the specified maintenance_work_mem. �If you cut
> back the setting so that its actual memory use is no more than 8.4's,
> it's about 33% slower on non-pathological data (I'm testing the dataset
> from Artur Dabrowski here).

That seems like a pretty serious regression.

> I'm tempted to suggest that making RBNode be a hidden struct containing
> a pointer to somebody else's datum is fundamentally the wrong way to
> go about things, because the extra void pointer is pure overhead,
> and we aren't ever going to be using these things in a context where
> memory usage isn't of concern. �If we refactored the API so that RBNode
> was intended to be the first field of some larger struct, as is done in
> dynahash tables for instance, we could eliminate the void pointer and
> the palloc inefficiency. �The added storage compared to what 8.4 used
> would be a parent link and the iteratorState/color fields, which would
> end up costing us 16 more bytes per EntryAccumulator rather than 64.
> Still not great but at least it's not a 2X penalty, and the memory
> allocation would become the caller's problem not rbtree's, so the
> problem of tracking usage would be no different from before.

Even if we do that, is it still going to be too much of a performance
regression overall?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Robert Haas <robertmhaas(a)gmail.com> writes:
> On Sat, Jul 31, 2010 at 12:02 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> I'm tempted to suggest that making RBNode be a hidden struct containing
>> a pointer to somebody else's datum is fundamentally the wrong way to
>> go about things, because the extra void pointer is pure overhead,
>> and we aren't ever going to be using these things in a context where
>> memory usage isn't of concern. �If we refactored the API so that RBNode
>> was intended to be the first field of some larger struct, as is done in
>> dynahash tables for instance, we could eliminate the void pointer and
>> the palloc inefficiency.

> Even if we do that, is it still going to be too much of a performance
> regression overall?

Looking back, EntryAccumulator was 56 bytes on 64-bit machines in 8.4
(it should have been smaller, but a poor choice of field ordering left
a lot of pad space). Right now it's 32 bytes, and if we stick an RBNode
field in the front it'd be 64. So that'd be a 14% penalty compared to
8.4, as opposed to the present situation which is a 100% penalty (32+80
bytes per entry). On 32-bit machines the numbers are 32 bytes (8.4)
versus 20+40 (HEAD) versus 36 bytes (my proposal), so 12.5% penalty
versus 87.5%. (All of these numbers should be discounted by whatever
space you want to assume the pass-by-reference key datum takes.)

So it'd definitely be a lot better than now. Maybe there'd be some
remaining performance gap for non-pathological cases, but I think we
would be willing to pay that in order to avoid bad behavior for the
pathological cases. It's difficult to say for sure of course
without going to the trouble of coding and testing it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Sat, Jul 31, 2010 at 12:32 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas(a)gmail.com> writes:
>> On Sat, Jul 31, 2010 at 12:02 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>>> I'm tempted to suggest that making RBNode be a hidden struct containing
>>> a pointer to somebody else's datum is fundamentally the wrong way to
>>> go about things, because the extra void pointer is pure overhead,
>>> and we aren't ever going to be using these things in a context where
>>> memory usage isn't of concern. �If we refactored the API so that RBNode
>>> was intended to be the first field of some larger struct, as is done in
>>> dynahash tables for instance, we could eliminate the void pointer and
>>> the palloc inefficiency.
>
>> Even if we do that, is it still going to be too much of a performance
>> regression overall?
>
> Looking back, EntryAccumulator was 56 bytes on 64-bit machines in 8.4
> (it should have been smaller, but a poor choice of field ordering left
> a lot of pad space). �Right now it's 32 bytes, and if we stick an RBNode
> field in the front it'd be 64. �So that'd be a 14% penalty compared to
> 8.4, as opposed to the present situation which is a 100% penalty (32+80
> bytes per entry). �On 32-bit machines the numbers are 32 bytes (8.4)
> versus 20+40 (HEAD) versus 36 bytes (my proposal), so 12.5% penalty
> versus 87.5%. �(All of these numbers should be discounted by whatever
> space you want to assume the pass-by-reference key datum takes.)
>
> So it'd definitely be a lot better than now. �Maybe there'd be some
> remaining performance gap for non-pathological cases, but I think we
> would be willing to pay that in order to avoid bad behavior for the
> pathological cases. �It's difficult to say for sure of course
> without going to the trouble of coding and testing it.

Well, it sounds like a reasonable thing to try, then. You going to
take a crack at it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers