|
Prev: Auto-explain patch
Next: Fragments in tsearch2 headline
From: Gregory Stark on 30 Jun 2008 14:56 "Tom Lane" <tgl(a)sss.pgh.pa.us> writes: > After playing with it for a little bit, I'm not convinced that it buys > enough performance win to be worth applying --- the restriction of cache > lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive. > (For example, sorts that involve toasted sort keys continue to suck, > because the tuples being sorted aren't in Slots.) It would probably > fix the specific case that the PostGIS hackers were complaining of, > but I think we need something more. > > Still, I wanted to get it into the archives because the idea of indirect > toast pointers might be useful for something else. I do like that it handles even inline-compressed cases. What I didn't like about the managed cache was that it couldn't handle such cases. I could easily imagine the PostGIS case arising for inline compressed data structures. I wonder if it isn't worthwhile just for that case even if there's a further cache behind it for repeated fetches of out-of-line data. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Mark Cave-Ayland on 1 Jul 2008 07:15 Tom Lane wrote: > Attached is a worked-out patch for the approach proposed here: > http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php > namely, that cache management for de-TOASTed datums is handled > by TupleTableSlots. > > To avoid premature detoasting of values that we might never need, the > patch introduces a concept of an "indirect TOAST pointer", which has > the same 0x80 or 0x01 header as an external TOAST pointer, but can > be told apart by having a different length byte. Within that we have > * pointer to original toasted field within the Slot's tuple > * pointer to the owning Slot > * pointer to decompressed copy, or NULL if not decompressed yet > Some fairly straightforward extensions to the TupleTableSlot code, > heaptuple.c, and tuptoaster.c make it all go. > > My original thoughts had included turning FREE_IF_COPY() into a no-op, > but on investigation that seems impractical. One case that still > depends on that pfree is where we have palloc'd a 4-byte-header copy > of a short-header datum to support code that needs properly aligned > datum content. The solution adopted in the patch is to arrange for > pfree() applied to a cacheable detoasted object to be a no-op, whereas > it still works normally for non-cached detoasted objects. We do this > by inserting a dummy chunk header that points to a dummy memory context > whose pfree support method does nothing. I think this part of the patch > would be required for any toast caching method, not just this one. > > What I like about this patch is that it's a fairly small-footprint > change, it doesn't add much overhead, and it covers caching of > decompression for in-line-compressed datums as well as the out-of-line > case. > > One thing I really *don't* like about it is that it requires everyplace > that copies Datums to know about indirect pointers: in general, the copy > must be a copy of the original toasted Datum, not of the indirect > pointer, else we have indirect pointers that can outlive their owning > TupleTableSlot (or at least outlive its current tuple cycle). There > only seem to be about half a dozen such places in the current code, > but still it seems a rather fragile coding rule. > > After playing with it for a little bit, I'm not convinced that it buys > enough performance win to be worth applying --- the restriction of cache > lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive. > (For example, sorts that involve toasted sort keys continue to suck, > because the tuples being sorted aren't in Slots.) It would probably > fix the specific case that the PostGIS hackers were complaining of, > but I think we need something more. > > Still, I wanted to get it into the archives because the idea of indirect > toast pointers might be useful for something else. Hi Tom, Thanks very much for supplying the WIP patch. In the interest of testing and feedback, I've just downloaded PostgreSQL CVS head and applied your patch, compiled up a slightly modified version of PostGIS (without RECHECKs on the GiST opclass) and loaded in the same dataset. Unfortunately I have to report back that with your WIP patch applied, timings seem to have become several orders of magnitude *worse*: pg84(a)zeno:~$ psql -d postgis psql (8.4devel) Type "help" for help. postgis=# explain analyze select count(*) from geography where centroid && (select the_geom from geography where id=69495); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=7100.28..7100.29 rows=1 width=0) (actual time=18238.932..18238.934 rows=1 loops=1) InitPlan -> Seq Scan on geography (cost=0.00..7092.00 rows=1 width=4387) (actual time=27.472..69.223 rows=1 loops=1) Filter: (id = 69495::numeric) -> Index Scan using geography_centroid_idx on geography (cost=0.00..8.27 rows=1 width=0) (actual time=118.371..18196.041 rows=32880 loops=1) Index Cond: (centroid && $0) Total runtime: 18239.918 ms (7 rows) In fact, even sequential scans seem to have gone up by several orders of magnitude: postgis=# set enable_indexscan = 'f'; SET postgis=# set enable_bitmapscan = 'f'; SET postgis=# explain analyze select count(*) from geography where centroid && (select the_geom from geography where id=69495); QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Aggregate (cost=14184.01..14184.01 rows=1 width=0) (actual time=9117.022..9117.024 rows=1 loops=1) InitPlan -> Seq Scan on geography (cost=0.00..7092.00 rows=1 width=4387) (actual time=23.780..54.362 rows=1 loops=1) Filter: (id = 69495::numeric) -> Seq Scan on geography (cost=0.00..7092.00 rows=1 width=0) (actual time=55.016..9073.084 rows=32880 loops=1) Filter: (centroid && $0) Total runtime: 9117.174 ms (7 rows) ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Guillaume Smet" on 1 Jul 2008 08:39 Mark, On Tue, Jul 1, 2008 at 1:15 PM, Mark Cave-Ayland <mark.cave-ayland(a)siriusit.co.uk> wrote: > Thanks very much for supplying the WIP patch. In the interest of testing and > feedback, I've just downloaded PostgreSQL CVS head and applied your patch, > compiled up a slightly modified version of PostGIS (without RECHECKs on the > GiST opclass) and loaded in the same dataset. From the discussion we had a few months ago, I don't think the no RECHECK trick works with CVS tip anymore. See my post on the "Remove lossy-operator RECHECK flag?" thread: http://archives.postgresql.org/pgsql-hackers/2008-04/msg00847.php and follow-ups. That said, perhaps it's not the only problem you have but I thought it was worth mentioning. -- Guillaume -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Mark Cave-Ayland on 1 Jul 2008 09:11 Guillaume Smet wrote: > From the discussion we had a few months ago, I don't think the no > RECHECK trick works with CVS tip anymore. > > See my post on the "Remove lossy-operator RECHECK flag?" thread: > http://archives.postgresql.org/pgsql-hackers/2008-04/msg00847.php > and follow-ups. > > That said, perhaps it's not the only problem you have but I thought it > was worth mentioning. Yeah sorry, that might not have been as clear as I hoped. What I meant was that I removed the explicit RECHECK clause from the GiST operator class definition - since as you correctly mention, CVS tip throws an error if this is present. ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 1 Jul 2008 09:53
Mark Cave-Ayland <mark.cave-ayland(a)siriusit.co.uk> writes: > Unfortunately I have to report back that with your WIP patch applied, > timings seem to have become several orders of magnitude *worse*: Ugh. Could I pester you to run the case under gprof or oprofile? Or, if you can give me step-by-step directions for setting up the test scenario, I could do that here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |