From: raja on
Hi Charles/ Steeve,

Query 1 ( Query given by charles ) :
select
to_char(begin_interval_time,'YYYY_MM_DD HH24:MI'),
dhse.instance_number,
time_waited_micro - nvl(lag(time_waited_micro, 1, 0) over (partition
by dhse.instance_number, dhse.event_name order by dhse.snap_id),0) as
time_waited,
total_waits - nvl(lag(total_waits, 1, 0) over (partition by
dhse.instance_number, dhse.event_name order by dhse.snap_id),0) as
total_waits
from dba_hist_snapshot dhs,
dba_hist_system_event dhse
where dhs.snap_id = dhse.snap_id
and dhs.instance_number = dhse.instance_number
order by 1, 2;

The output of your query is present in excel file named as : wait1


Query 2 ( Query given by steeve ) :
select * from (
select event_name, wait_class,to_char(begin_interval_time,'YYYY_MM_DD
HH24:MI'),
dhse.instance_number,
time_waited_micro - lag(time_waited_micro, 1, 0) over (order by
dhse.instance_number, dhse.snap_id) as time_waited,
total_waits - lag(total_waits, 1, 0) over (order by
dhse.instance_number, dhse.snap_id) as total_waits
from dba_hist_snapshot dhs,
dba_hist_system_event dhse
where dhs.snap_id = dhse.snap_id
and dhs.instance_number = dhse.instance_number
and begin_interval_time > sysdate - num_of_days_to_analyze
order by 5 desc)
where rownum < 10000;

The output of your query is present in excel file named as : wait2
Here, the wait event : "PX qref latch" tops the list

Since i am not able to post those files in our group's thread, i am
sending those files to ur mail ids.
Please check these 2 files and tell me your observations.
Also, another doubt is : whether should be neglect the idle events
always ? then why are they notified to us...

With Regards,
Raja.
From: Charles Hooper on
On Dec 12, 7:39 am, raja <dextersu...(a)gmail.com> wrote:
> Hi Charles/ Steeve,
>
> Query 1 ( Query given by charles ) :
> select
>   to_char(begin_interval_time,'YYYY_MM_DD HH24:MI'),
>   dhse.instance_number,
>   time_waited_micro - nvl(lag(time_waited_micro, 1, 0) over (partition
> by dhse.instance_number, dhse.event_name order by dhse.snap_id),0) as
> time_waited,
>   total_waits - nvl(lag(total_waits, 1, 0) over (partition by
> dhse.instance_number, dhse.event_name order by dhse.snap_id),0) as
> total_waits
> from dba_hist_snapshot dhs,
>      dba_hist_system_event dhse
> where dhs.snap_id = dhse.snap_id
>   and dhs.instance_number = dhse.instance_number
> order by 1, 2;
>
> Since i am not able to post those files in our group's thread, i am
> sending those files to ur mail ids.
> Please check these 2 files and tell me your observations.
> Also, another doubt is : whether should be neglect the idle events
> always ? then why are they notified to us...
>
> With Regards,
> Raja.

Raja,

Now that I am able to see the output of the query that I suggested
that you run, I see that I did not tell you to select the
dhse.event_name column, so I am not able to see the wait event names
associated with the various waits, also the sort order probably needs
to be improved. The fixed query should look something like this:
select
dhse.event_name,
to_char(begin_interval_time,'YYYY_MM_DD HH24:MI'),
dhse.instance_number,
time_waited_micro - nvl(lag(time_waited_micro, 1, 0) over
(partition
by dhse.instance_number, dhse.event_name order by dhse.snap_id),0) as
time_waited,
total_waits - nvl(lag(total_waits, 1, 0) over (partition by
dhse.instance_number, dhse.event_name order by dhse.snap_id),0) as
total_waits
from dba_hist_snapshot dhs,
dba_hist_system_event dhse
where dhs.snap_id = dhse.snap_id
and dhs.instance_number = dhse.instance_number
order by 2, 4;

The idle wait events do have value, as they tell you to what degree
the database server is not causing performance problems (the user took
a 2 hour nap, resulting in a SQL*Net Message from Client exceeding 2
hours). Idle waits have a limited value in Statspack and AWR reports
(for instance "SQL*Net more data to client" shows 4,642,420 waits,
possibly indicating that the clients are requesting a lot of data to
be returned which will not fit into a single SDU size per request).
However, the idle wait events are very important wait events when
examining 10046 traces for sessions, as a 10046 trace is time scoped
for a single session.

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.
From: raja on
Hi,

Kindly check the attached excel sheet.
Since i am not able to post here, i am sending the output to your mail
id.

Query :
select
dhse.event_name,
to_char(begin_interval_time,'YYYY_MM_DD HH24:MI'),
dhse.instance_number,
time_waited_micro - nvl(lag(time_waited_micro, 1, 0) over (partition
by dhse.instance_number, dhse.event_name order by dhse.snap_id),0) as
time_waited,
total_waits - nvl(lag(total_waits, 1, 0) over (partition by
dhse.instance_number, dhse.event_name order by dhse.snap_id),0) as
total_waits
from dba_hist_snapshot dhs, dba_hist_system_event dhse
where dhs.snap_id = dhse.snap_id
and dhs.instance_number = dhse.instance_number
order by 2, 4;

Kindly provide me your suggestions.

With Regards,
Raja.
From: Charles Hooper on
On Dec 19, 6:17 am, raja <dextersu...(a)gmail.com> wrote:
> Hi,
>
> Kindly check the attached excel sheet.
> Since i am not able to post here, i am sending the output to your mail
> id.
>
> Query :
> select
>   dhse.event_name,
>   to_char(begin_interval_time,'YYYY_MM_DD HH24:MI'),
>   dhse.instance_number,
>   time_waited_micro - nvl(lag(time_waited_micro, 1, 0) over (partition
> by dhse.instance_number, dhse.event_name order by dhse.snap_id),0) as
> time_waited,
>   total_waits - nvl(lag(total_waits, 1, 0) over (partition by
> dhse.instance_number, dhse.event_name order by dhse.snap_id),0) as
> total_waits
> from dba_hist_snapshot dhs, dba_hist_system_event dhse
> where dhs.snap_id = dhse.snap_id
>   and dhs.instance_number = dhse.instance_number
> order by 2, 4;
>
> Kindly provide me your suggestions.
>
> With Regards,
> Raja.

As an experiment, I examined four time intervals for a single day, and
posted those numbers below. "Wait/Sec" is a calculation where I
divided the original TIME_WAITED value by 1,000,000 so that the wait
time was stated in seconds, and then divided that value by 3,600
(there are 3,600 seconds in an hour, as the totals were over the
course of an hour). The number calculated is the number of seconds
waited per second, accumulated for all sessions/processes waiting on
that event (that would explain the value of 9.52 below).

Between 2008_12_16 02:00 and 2008_12_16 03:00
EVENT_NAME TOTAL_WAITS Wait/Sec
jobq slave wait 1,394 1.05
db file sequential read 469,456 0.59
SQL*Net break/reset to client 9,236,797 0.35
SQL*Net more data from dblink 220,844 0.28
db file parallel write 260,297 0.17
Backup: sbtwrite2 36,033 0.10
log file parallel write 8,771 0.06
log file sequential read 6,602 0.02
single-task message 489 0.01
log file sync 2,238 0.01
Log archive I/O 7,783 0.00
RMAN backup & recovery I/O 5,302 0.00

Between 2008_12_16 07:00 and 2008_12_16 08:00
EVENT_NAME TOTAL_WAITS Wait/Sec
PX Deq: Execution Msg 97,375 9.52
PX Idle Wait 14,453 6.52
PX Deq Credit: send blkd 3,162,989 5.72
PX Deq: Table Q Normal 17,889,617 4.82
jobq slave wait 4,112 3.20
db file scattered read 435,471 1.61
db file sequential read 881,400 0.69
PX Deq: Execute Reply 17,145 0.58
SQL*Net break/reset to client 7,065,596 0.37
direct path write temp 10,411 0.33
read by other session 705,527 0.26
Backup: sbtwrite2 40,263 0.24
PX Deq Credit: need buffer 287,494 0.22
direct path read 1,363,276 0.16
db file parallel write 20,396 0.13
PX qref latch 204,808,981 0.10
direct path read temp 36,759 0.07
log file parallel write 9,155 0.04
single-task message 516 0.01
log file sequential read 3,559 0.01
log file sync 3,387 0.01
optimizer stats update retry 130 0.01
PX Deq: Msg Fragment 21,945 0.01
enq: TX - row lock contention 233 0.00
enq: MS - contention 3 0.00

Between 2008_12_16 14:00 and 2008_12_16 15:00
EVENT_NAME TOTAL_WAITS Wait/Sec
jobq slave wait 3,792 2.95
db file sequential read 1,044,182 0.49
SQL*Net break/reset to client 9,797,440 0.33
Backup: sbtwrite2 13,535 0.04
db file scattered read 152,387 0.03
read by other session 30,474 0.02
single-task message 516 0.01
db file parallel write 3,351 0.00
log file parallel write 5,155 0.00
log file sync 2,678 0.00

Between 2008_12_16 19:00 and 2008_12_16 20:00
EVENT_NAME TOTAL_WAITS Wait/Sec
Backup: sbtwrite2 352,599 3.74
jobq slave wait 3,709 2.89
SQL*Net break/reset to client 8,473,325 0.36
db file sequential read 161,017 0.21
RMAN backup & recovery I/O 13,208 0.11
log file parallel write 7,232 0.03
db file parallel write 13,223 0.03
single-task message 511 0.01
log file sequential read 3,066 0.01
log file sync 2,695 0.00
read by other session 972 0.00

The above may contain some "idle" wait events. What is possibly
interesting is the consistently high number of waits on the event
"SQL*Net break/reset to client", indicating that the client is
submitting something, possibly a badly formed SQL statement, and the
database instance is rejecting the message sent by the client. The
time consumed seems to be a faily consistent 1/3 of a second per
second in all four of the time intervals. You might check the per
session statistics to see which sessions are involved.

I will delay my analysis of the above until you have had an
opportunity to research and report your findings regarding what the
above may be showing. I would encourage others viewing this thread to
offer suggestions or an analysis of the above in the context of this
thread.

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.
From: raja on
Hi,

Thanks for guiding me Charles and Steve.

From the wait list given by Charles, i have tried to look into the
wait event descriptions and made my following observations.

In general the wait list has been split into 4 parts :
Part 1 -> 2am to 3am ; Part 2 -> 7am to 8am ; Part 3 -> 2pm to 3pm ;
Part 4 -> 7pm to 8pm

First let us Neglect following the idle events from the wait list :
jobq slave wait
PX Deq Credit: need buffer
PX Deq: Execute Reply
PX Deq: Execution Msg
PX Deq: Msg Fragment
PX Deq: Table Q Normal
PX Idle Wait
single-task message

In general - about the application analysis - the following 2
observations are :
a. if we see the no.of wait events, it looks like they are at : Part
2 -> 7am to 8am, which should be the peak hour of the transaction.
b. the jobs, backup and recovery, archiving should be run when it is
in less peak hour : Part 1 -> 2am to 3am,
since there are the following events :
Backup: sbtwrite2 - ( Administrative - Wait class )             
Log archive I/O                  
RMAN backup & recovery I/O.

With respect to wait events, the observations are as follows :
c. The following wait events are due to parallel process - it is like
running parallel queries, using parallel hints.
PX Deq Credit: send blkd
PX qref latch
solution to solve this problem is (from web) : increasing the value of
parallel_execution_message_size parameter.
But I feel that reducing the usage of parallel hints will reduce this
problem. but this would either cause not the efficient usage of the
CPU or not finishing the query execution faster !!!

d. read by other session :
In previous versions this wait was classified under the "buffer busy
waits" event. However, in Oracle 10.1 and higher this wait time is now
broken out into the "read by other session" wait event. Excessive
waits for this event are typically due to several processes repeatedly
reading the same blocks, e.g. many sessions scanning the same index or
performing full table scans on the same table.
solution to solve this problem is (from web) :
->buffer busy waits / read by other session: Increase DB_CACHE_SIZE
(DB_BLOCK_BUFFERS prior to 9i)/ Analyze contention from SYS.V$BH
But going through the link : http://www.confio.com/English/Tips/Read_By_Other_Session.php,
i feel that tuning the queries would be the option to reduce this
wait. Have to check for hot blocks/objects too ?

e.
In Oracle, we see two types of data block access - db file sequential
read and db file scattered read :
->db file sequential read — A single-block read (i.e., index fetch by
ROWID).
A large number of waits here could indicate poor joining orders of
tables, or unselective indexing. It is normal for this number to be
large for a high-transaction, well-tuned system, but it can indicate
problems in some circumstances.
They can also show up as direct path read/write waits.

->db file scattered read — A multiblock read (a full-table scan, OPQ,
sorting)
A large number here indicates that your table may have missing or
suppressed indexes. Although it may be more efficient in your
situation to perform a full table scan than an index scan, check to
ensure that full table scans are necessary when you see these waits.
Try to cache small tables to avoid reading them in over and over
again
solution to solve this problem is (from web) : increase the RAM size
of the db_cache_size to reduce disk I/O, tune the SQL to reduce disk I/
O, or invest in a faster disk I/O subsystem.

->log file sequential read - indicates that the process is waiting for
blocks to be read from the online redo log into memory. This primarily
occurs at instance startup and when the ARCH process archives filled
online redo logs.

->db file parallel write - occurs when the process, typically DBWR,
has issued multiple I/O requests in parallel to write dirty blocks
from the buffer cache to disk, and is waiting for all requests to
complete.
( dont know much - have to check regarding DBWR )
Looks like this is mostly due to i/o issue.

The direct path read, direct path write wait events are also
associated with the data block access (db file sequential read and db
file scattered read ) - dont know about these wait events, have to
check regarding these.

Info : Starting in Oracle 10g release2, Oracle recommends not setting
the db_file_multiblock_read_count parameter.
solution to solve these problem are (from web) :
->db file sequential read: Tune SQL to do less I/O. Make sure all
objects are analyzed. Redistribute I/O across disks.

So, in general to reduce the following waits, we have to tune the sql
queries (checking the indexes and joins) may give improvement in
performance :
db file sequential read,
db file scattered read,
log file sequential read,
db file parallel write,
direct path read, and
direct path write

f. The following 3 wait events comes together, since they are related
to LGWR/redo log ( dont know much - have to check regarding LGWR/redo
log )
->log file sequential read - LGWR background process waits for this
event while it is copying redo records from the memory Log Buffer
cache to the current redo group's member logfiles on disk.
->log file sync - User session waits on this wait event while waiting
for LGWR to post after commit write of dirty blocks  
->log file parallel write - Waiting for the writes of redo records to
the redo log files

solution to solve these problem are (from web) :
->log file sync: If this event is in the top 5, you are committing too
often (talk to your developers)                 
->log file parallel write: deals with flushing out the redo log buffer
to disk. Your disks may be too slow or you have an I/O bottleneck.

Conclusion from the above wait event list given by you :
More waits are seen (db file sequential read, db file scattered
read ), which are due to i/o's.
To reduce these wait events, have to tune the sql ( check index and
joins).

Thanks. I have learnt something about the wait events.

Please check whether the above observations are correct ?

Also, Please provide your suggestions too...

With Regards,
Raja.
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8
Prev: Role , System privilege and Object privilege
Next: RMAN Error