Latch / Wait Events [Oracle]

Prev: performance issue after upgrade to oracle 11.2.0.1 linux 32 bit.
Next: stored outlines set?

From: The Magnet on 12 Nov 2009 10:11

Hi,

I've been reading some documentation on latches to try and see if our
database can be improved. While the concept is rather simple, I
cannot really find information on which latches are really important
and what numbers may indicate a problem.

We have many, many events "rdbms ipc message". Some values are:

Total Waits: 27674285
Timeouts: 535295
Time Waited: 164032311

If this bad? Why are the so many of these events?

Here is another one related to redo logs:

log file parallel write
Total Waits: 27705118
Total Timeouts: 0
Time Waited: 983231

This query has been posted everywhere. I've run it for 3 days and the
same addresses appear at the top:

select CHILD# "cCHILD"
, ADDR "sADDR"
, GETS "sGETS"
, MISSES "sMISSES"
, SLEEPS "sSLEEPS"
from v$latch_children
where name = 'cache buffers chains'
order by 5 desc, 1, 2, 3;

sMisses: 364252
sSleeps: 8957

This good or bad???

I know that any value for 'cache buffers chains' or 'latch_free' is
probably not good. But, how to decide what latch wait is bad, what
value is unacceptable and what can be done?

We are running 10gR2.

Many Thanks

From: Mladen Gogala on 12 Nov 2009 10:56

On Thu, 12 Nov 2009 07:11:50 -0800, The Magnet wrote:

> Hi,
>
> I've been reading some documentation on latches to try and see if our
> database can be improved. While the concept is rather simple, I cannot
> really find information on which latches are really important and what
> numbers may indicate a problem.
>
> We have many, many events "rdbms ipc message". Some values are:
>
> Total Waits: 27674285
> Timeouts: 535295
> Time Waited: 164032311
>
> If this bad? Why are the so many of these events?
>
> Here is another one related to redo logs:
>
> log file parallel write
> Total Waits: 27705118
> Total Timeouts: 0
> Time Waited: 983231
>
> This query has been posted everywhere. I've run it for 3 days and the
> same addresses appear at the top:
>
> select CHILD# "cCHILD"
> , ADDR "sADDR"
> , GETS "sGETS"
> , MISSES "sMISSES"
> , SLEEPS "sSLEEPS"
> from v$latch_children
> where name = 'cache buffers chains'
> order by 5 desc, 1, 2, 3;
>
> sMisses: 364252
> sSleeps: 8957
>
> This good or bad???
>
> I know that any value for 'cache buffers chains' or 'latch_free' is
> probably not good. But, how to decide what latch wait is bad, what
> value is unacceptable and what can be done?
>
> We are running 10gR2.
>
> Many Thanks

Do you want to "improve database" or do you want to improve the response
time of the applications? If latter is the case, try seeing where the
applications are spending time and see how you can reduce this time. The
phrase "improve database" doesn't make much sense as the database is just
a storage area where your data is stored. You don't improve the
performance of the warehouse, you improve the business processes that
access the warehouse. The answer to your numbers will be the only correct
one: "it depends". How long since you started the instance? What are these
latches? What documentation did you read?

--
http://mgogala.freehostia.com

From: hpuxrac on 12 Nov 2009 19:43

On Nov 12, 10:11 am, The Magnet <a...(a)unsu.com> wrote:

snip

> Hi,
>
> I've been reading some documentation on latches to try and see if our
> database can be improved. While the concept is rather simple, I
> cannot really find information on which latches are really important
> and what numbers may indicate a problem.
>
> We have many, many events "rdbms ipc message". Some values are:
>
> Total Waits: 27674285
> Timeouts: 535295
> Time Waited: 164032311
>
> If this bad? Why are the so many of these events?
>
> Here is another one related to redo logs:
>
> log file parallel write
> Total Waits: 27705118
> Total Timeouts: 0
> Time Waited: 983231
>
> This query has been posted everywhere. I've run it for 3 days and the
> same addresses appear at the top:
>
> select CHILD# "cCHILD"
> , ADDR "sADDR"
> , GETS "sGETS"
> , MISSES "sMISSES"
> , SLEEPS "sSLEEPS"
> from v$latch_children
> where name = 'cache buffers chains'
> order by 5 desc, 1, 2, 3;
>
> sMisses: 364252
> sSleeps: 8957
>
> This good or bad???
>
> I know that any value for 'cache buffers chains' or 'latch_free' is
> probably not good. But, how to decide what latch wait is bad, what
> value is unacceptable and what can be done?
>
> We are running 10gR2.
>
> Many Thanks

Have you read Tom Kyte's books? That is where I would recommend you
start at this point.

From: Mark D Powell on 13 Nov 2009 10:12

On Nov 12, 10:11 am, The Magnet <a...(a)unsu.com> wrote:
> Hi,
>
> I've been reading some documentation on latches to try and see if our
> database can be improved. While the concept is rather simple, I
> cannot really find information on which latches are really important
> and what numbers may indicate a problem.
>
> We have many, many events "rdbms ipc message". Some values are:
>
> Total Waits: 27674285
> Timeouts: 535295
> Time Waited: 164032311
>
> If this bad? Why are the so many of these events?
>
> Here is another one related to redo logs:
>
> log file parallel write
> Total Waits: 27705118
> Total Timeouts: 0
> Time Waited: 983231
>
> This query has been posted everywhere. I've run it for 3 days and the
> same addresses appear at the top:
>
> select CHILD# "cCHILD"
> , ADDR "sADDR"
> , GETS "sGETS"
> , MISSES "sMISSES"
> , SLEEPS "sSLEEPS"
> from v$latch_children
> where name = 'cache buffers chains'
> order by 5 desc, 1, 2, 3;
>
> sMisses: 364252
> sSleeps: 8957
>
> This good or bad???
>
> I know that any value for 'cache buffers chains' or 'latch_free' is
> probably not good. But, how to decide what latch wait is bad, what
> value is unacceptable and what can be done?
>
> We are running 10gR2.
>
> Many Thanks

To add to what mgogala said most Oracle wait statistic values have
meaning only in relation to other Oracle statistics. If you wait
1,000 times for a resoure is that significant? The answer depends on
what the resource is and how many times it was requested. If the
total request count was 10,000 then a 10% wait factor is probably an
issue worth further research but 10,000 waits against 100,000,000
requests would be a less significant wait percentage so the absolute
values of the statistics are not what counts.

One way to help learn which statistics might be significant is to run
statspack or if Licensed AWR reports against multiple databases
hosting similar applications and compare the reports looking to see if
you can spot identical patterns in the relationship between
statistics.

HTH -- Mark D Powell --

From: vsevolod afanassiev on 14 Nov 2009 18:47

1. "rdbms ipc mesage" is idle event experienced by Oracle background
processes (DBWR, SMON, PMON, etc) and it can be ignored.
2. "log file parallel write" is event experienced by log writer. It is
not experienced by application sessions.

Start from examining V$SESSION_EVENT/V$SESSTAT for sessions created by
application.
Let's say a session was connected 1000 seconds ago. It spent:
- 100 seconds on CPU
- 100 seconds on 'db file sequential read/db file scattered read"
- less than 10 seconds on other non-idle events (buffer busy/latch
free, etc)
- idle rest of the time (SQL*Net message from client)
This indicates healthy session

Let's say a session was connected 1000 seconds ago. It spent:
- 300 seconds waiting on 'latch free'
- 100 seconds on CPU
- 50 seconds on other non-idle events
- idle rest of the time
This indicates latch issue.

In other words: start from comparing time spent on various wait
events.

|
Pages: 1
Prev: performance issue after upgrade to oracle 11.2.0.1 linux 32 bit.
Next: stored outlines set?