transaction tables consistent reads - undo records applied [Oracle]

Prev: understanding oracle terminology - instance, database, sid,schema
Next: Global Industries - Central Infrastructure & Industry News

From: Steve Howard on 24 Jun 2010 22:20

Hi All,

10.2.0.4 three node cluster EE on SLES 10

Can someone give me a good definition of *exactly* what this means and
what causes it (mainly the latter). The documentation is not very
descriptive, with "Number of undo records applied to transaction
tables that have been rolled back for consistent read purposes".

It sounds like undo on undo, but we don't have any larger number for
rollbacks (or commits) when this happens than we do at any other time.

We have been plagued by this for over a year, and after multilpe SR's
where the support analyst just reads us the documentation, I am at my
wits end.

We have a fairly large table (almost 1TB with about 300 million rows)
with a large XMLTYPE column. Once a day, a job scans this table for
records added that day for propagation to an external system. The
longer the query runs, the more we see the session doing single block
reads against the undo tablespace, with the stat in the subject
climbing into the millions. Eventually, after several hours, an
ORA-01555 is thrown.

I even grabbed one of the P1/P2 parameters for the session querying
and dumped the undo block in the P2 value. While it was a second or
two after the event was posted, the block itself didn't even contain
any references to the table being queried!

Can anyone shed some light?

Thanks,

Steve

From: Steve Howard on 24 Jun 2010 22:25

On Jun 24, 10:20 pm, Steve Howard <stevedhow...(a)gmail.com> wrote:
> Hi All,
>
> 10.2.0.4 three node cluster EE on SLES 10

....I should also mention the explain plan for the query is OK. It is
a range scan for the time period being retrieved, which may be two or
three hours, representing 300,000 rows (out of 300 million).

Also, when I look at "table fetch by rowid" for the querying session,
it will periodically just stall. It is as if the session is reading
thousands upon thousands of undo blocks for that next row. The
arraysize is 15 (standard SQL*Plus)

Sometimes this happens, sometimes it doesn't.

We did apply patch 7527908

From: joel garry on 25 Jun 2010 12:01

On Jun 24, 7:25 pm, Steve Howard <stevedhow...(a)gmail.com> wrote:
> On Jun 24, 10:20 pm, Steve Howard <stevedhow...(a)gmail.com> wrote:
>
> > Hi All,
>
> > 10.2.0.4 three node cluster EE on SLES 10
>
> ...I should also mention the explain plan for the query is OK. It is
> a range scan for the time period being retrieved, which may be two or
> three hours, representing 300,000 rows (out of 300 million).
>
> Also, when I look at "table fetch by rowid" for the querying session,
> it will periodically just stall. It is as if the session is reading
> thousands upon thousands of undo blocks for that next row. The
> arraysize is 15 (standard SQL*Plus)
>
> Sometimes this happens, sometimes it doesn't.
>
> We did apply patch 7527908

I have no idea, but I speculate you can use Tanel Poders' latchprofx
and poke around on his site about in memory undo to figure this out.
Does your plan show lots of recursion?

jg
--
@home.com is bogus.
http://www.businessweek.com/news/2010-06-25/oracle-rises-after-sun-acquisition-fuels-profit-gain.html

From: Jonathan Lewis on 25 Jun 2010 12:37

This happens when your query hits a block that was updated
"a long time" in the past by a transaction that has committed
but not been cleaned out.

Your query can tell that it is a committed transaction because
the ITL entry for the transaction points to transaction table slot
(in an undo segment header block) that has been re-used for
a newer transaction. (Part of the transaction id is the "transaction
sequence number", which is counting the number of times a transaction
slot has been used).

Your query therefore needs to know WHEN the transaction committed,
so that it can decide whether or not it's supposed to see the new version
or the old version of the row. (If the transaction committed before the
query then the query doesn't need to know exactly when the transaction
committed, if it started after the query then it has to be rolled back -
and it's possible that the "snapshot too old" is the result of the data
rollback
than the transaction table rollback.)

To find out when the transaction committed, your query copies the undo
segment header block and starts rolling it back. The number of times this
happens is recorded as:
"transaction tables consistent read rollbacks"

To perform the rollback, your query will read the transaction control block
(another part of the undo segment header) which contains a number of
important
details - including the first undo block address of the most recent
transaction
to use that undo segment header. This undo block address will hold the
first
record of that transaction *** - which include information about the
PREVIOUS
state of the transaction control block. By using this undo record your
query
can take the undo segment header block backwards in time by one step -
at which point it reads the older version of the transaction control block
and
repeats the process until it reaches the point where the transaction slot
it's
interested in has been taken back to the correct sequence number (or a
change
has taken the undo segment header block back to a point in time before the
start of the query). Each record it reads in this process is counted in
the
"transaction tables consistent reads - undo records applied"

(*** This is why the block you dumped had nothing to do with your table.)

The trouble with your requirement is that we really need to do a backwards
tablescan - because it's probably the data near the end of the table that
is
changing while you are "wasting" time reading all the data from the start
of
the table.

Unfortunately there is no such hint - but if it's really critical, you
could write
some code to scan the table one extent at a time in reverse order.

--
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com

"Steve Howard" <stevedhoward(a)gmail.com> wrote in message
news:82fa462e-574c-461d-b1c6-65a5473a3afc(a)d37g2000yqm.googlegroups.com...
> Hi All,
>
> 10.2.0.4 three node cluster EE on SLES 10
>
> Can someone give me a good definition of *exactly* what this means and
> what causes it (mainly the latter). The documentation is not very
> descriptive, with "Number of undo records applied to transaction
> tables that have been rolled back for consistent read purposes".
>
> It sounds like undo on undo, but we don't have any larger number for
> rollbacks (or commits) when this happens than we do at any other time.
>
> We have been plagued by this for over a year, and after multilpe SR's
> where the support analyst just reads us the documentation, I am at my
> wits end.
>
> We have a fairly large table (almost 1TB with about 300 million rows)
> with a large XMLTYPE column. Once a day, a job scans this table for
> records added that day for propagation to an external system. The
> longer the query runs, the more we see the session doing single block
> reads against the undo tablespace, with the stat in the subject
> climbing into the millions. Eventually, after several hours, an
> ORA-01555 is thrown.
>
> I even grabbed one of the P1/P2 parameters for the session querying
> and dumped the undo block in the P2 value. While it was a second or
> two after the event was posted, the block itself didn't even contain
> any references to the table being queried!
>
> Can anyone shed some light?
>
> Thanks,
>
> Steve

From: joel garry on 25 Jun 2010 16:31

On Jun 25, 9:37 am, "Jonathan Lewis" <jonat...(a)jlcomp.demon.co.uk>
wrote:
> This happens when your query hits a block that was updated
> "a long time" in the past by a transaction that has committed
> but not been cleaned out.
>
> Your query can tell that it is a committed transaction because
> the ITL entry for the transaction points to transaction table slot
> (in an undo segment header block) that has been re-used for
> a newer transaction. (Part of the transaction id is the "transaction
> sequence number", which is counting the number of times a transaction
> slot has been used).
>
> Your query therefore needs to know WHEN the transaction committed,
> so that it can decide whether or not it's supposed to see the new version
> or the old version of the row. (If the transaction committed before the
> query then the query doesn't need to know exactly when the transaction
> committed, if it started after the query then it has to be rolled back -
> and it's possible that the "snapshot too old" is the result of the data
> rollback
> than the transaction table rollback.)
>
> To find out when the transaction committed, your query copies the undo
> segment header block and starts rolling it back. The number of times this
> happens is recorded as:
> "transaction tables consistent read rollbacks"
>
> To perform the rollback, your query will read the transaction control block
> (another part of the undo segment header) which contains a number of
> important
> details - including the first undo block address of the most recent
> transaction
> to use that undo segment header. This undo block address will hold the
> first
> record of that transaction *** - which include information about the
> PREVIOUS
> state of the transaction control block. By using this undo record your
> query
> can take the undo segment header block backwards in time by one step -
> at which point it reads the older version of the transaction control block
> and
> repeats the process until it reaches the point where the transaction slot
> it's
> interested in has been taken back to the correct sequence number (or a
> change
> has taken the undo segment header block back to a point in time before the
> start of the query). Each record it reads in this process is counted in
> the
> "transaction tables consistent reads - undo records applied"
>
> (*** This is why the block you dumped had nothing to do with your table.)
>
> The trouble with your requirement is that we really need to do a backwards
> tablescan - because it's probably the data near the end of the table that
> is
> changing while you are "wasting" time reading all the data from the start
> of
> the table.

Excellent explanation, but I lost you here. He says plan says doing a
range scan, for 1% of the table? (Maybe you hadn't seen subsequent
post yet, where he mentions a fetch suddenly exhibiting the
characteristics you describe.)

>
> Unfortunately there is no such hint - but if it's really critical, you
> could write
> some code to scan the table one extent at a time in reverse order.

This cleaning makes perfect sense, but I'm wondering if there is some
administrative tuning like adjusting undo size or retention or some
fiddling with initrans? Sounds critical if it's interrupting data
extraction. I'm wondering if the mysterious translation of xmltype
from a column might be a problem here. Steve, how exactly are you
inserting and accessing this column?

jg
--
@home.com is bogus.
snake oil 2.0 http://www.gapingvoidgallery.com/product_info.php?products_id=1614

| Next | Last
Pages: 1 2 3
Prev: understanding oracle terminology - instance, database, sid,schema
Next: Global Industries - Central Infrastructure & Industry News