From: Tom Lane on
I was thinking a bit about how we pad columns of type NAME to
fixed-width, even though they're semantically equivalent to C strings.
The reason for wasting that space is that it makes it possible to
overlay a C struct onto the leading columns of most system catalogs.
I don't wish to propose changing that (at least not today), but it
struck me that there is no reason to overlay a C struct onto index
entries, and that getting rid of the padding space would be even more
useful in an index than in the catalog itself. It turns out to be
dead easy to implement this: effectively, we just decree that the
index column storage type for NAME is always CSTRING. Because the
two types are effectively binary-compatible as long as you don't
look at the padding, the attached ugly-but-impressively-short patch
seems to accomplish this. It passes the regression tests anyway.
Here are some numbers about the space savings in a virgin database:

CVS HEAD w/patch savings

pg_database_size('postgres') 4439752 4071112 8.3%
pg_relation_size('pg_class_relname_nsp_index') 57344 40960 28%
pg_relation_size('pg_proc_proname_args_nsp_index') 319488 204800 35%

Cutting a third off the size of a system index has got to be worth
something, but is it worth a hack as ugly as this one?

regards, tom lane


From: Tom Lane on
Mark Mielke <mark(a)mark.mielke.cc> writes:
>> Tom Lane wrote:
>>> Cutting a third off the size of a system index has got to be worth
>>> something, but is it worth a hack as ugly as this one?

> Were you able to time any speedup?

I didn't try; can you suggest any suitable benchmark?

The performance impact is probably going to be limited by our extensive
use of catalog caches --- once a desired row is in a backend's catcache,
it doesn't take a btree search to fetch it again. Still, the system
indexes are probably "hot" enough to stay in shared buffers most of the
time, and the smaller they are the more space will be left for other
stuff, so I think there should be a distributed benefit.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Simon Riggs <simon(a)2ndquadrant.com> writes:
> On Mon, 2008-06-23 at 15:52 -0400, Tom Lane wrote:
>> Cutting a third off the size of a system index has got to be worth
>> something, but is it worth a hack as ugly as this one?

> Not doing it would be more ugly, unless there is some negative
> side-effect?

I thought some more about why this seems ugly to me, and realized that a
lot of it has to do with the change in typalign. Currently, a compiler
is entitled to assume that a pointer to Name is 4-byte aligned; thus
for instance it could generate word-wide instructions for copying a Name
from one place to another. A "Name" that is stored as just CSTRING
might break that. We are already at risk of this, really, because of
all the places where we gaily pass plain old C strings to syscache and
index searches on Name columns. I think the only reason we've not been
burnt is that it's hard to optimize strcmp() into word-wide operations.

However the solution to that seems fairly obvious: let's downgrade Name
to typalign 1 instead of 4.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Heikki Linnakangas" on
Shane Ambler wrote:
> My question is whether this is limited to system catalogs? or will this
> benefit char() index used on any table? The second would make it more
> worthwhile.

char(n) fields are already stored as variable-length on disk. This isn't
applicable to them.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Teodor Sigaev <teodor(a)sigaev.ru> writes:
>> dead easy to implement this: effectively, we just decree that the
>> index column storage type for NAME is always CSTRING. Because the

> Isn't it a reason to add STORAGE option of CREATE OPERATOR CLASS to BTree? as
> it's done for GiST and GIN indexes.

Hmm ... I don't see a point in exposing that as a user-level facility,
unless you can point to other use-cases besides NAME. But it would be
cute to implement the hack by changing the initial contents of
pg_opclass instead of inserting code in the backend. I'll give that
a try.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers