From: raja on
Hi,

I will try to get a AWR Report which has an interval of 10-15 minutes
interval.
Currently i have only this report and i am trying to analyze it.

In AWR Report, SQL Statistics, there are many parts like :
SQL ordered by Elapsed Time
SQL ordered by CPU Time
SQL ordered by Gets
SQL ordered by Reads
SQL ordered by Executions
SQL ordered by Parse Calls
SQL ordered by Sharable Memory
SQL ordered by Version Count
Which i should look into first ? why ?

With Regards,
Raja.
From: Charles Hooper on
On Nov 21, 9:28 am, raja <dextersu...(a)gmail.com> wrote:
> Hi,
>
> I will try to get a AWR Report which has an interval of 10-15 minutes
> interval.
> Currently i have only this report and i am trying to analyze it.
>
> In AWR Report, SQL Statistics, there are many parts like :
> SQL ordered by Elapsed Time
> SQL ordered by CPU Time
> SQL ordered by Gets
> SQL ordered by Reads
> SQL ordered by Executions
> SQL ordered by Parse Calls
> SQL ordered by Sharable Memory
> SQL ordered by Version Count
> Which i should look into first ? why ?
>
> With Regards,
> Raja.

On first examination, I see:
Buffer Hit %: 99.89

One might say, that is a near perfect buffer cache hit ratio, move on
to the next report. But I have an example which shows a 100% buffer
cache hit ratio and the performance of the database

instance was terrible, so maybe we should permanently ignore this
statistic.

Watch for line wrapping (word wrapping) in the following:

Just a couple statistics to frame the rest of the report:
Elapsed: 359.46 (mins) (21,567.6 seconds)
DB Time: 1,723.72 (mins) {DB Time includes CPU time and wait events
for all sessions}

One approach to looking at the AWR/Statspack reports might be to start
with the CPU usage and the wait events, and see if any further
investigation is needed.
NUM_CPUS 14
NUM_CPU_CORES 7

Total CPU Time Possible: 14 * 359.46 * 60 seconds = 301,946.4 seconds
- note that it is unlikely that processes will jump from one CPU to
another CPU, although if parallel processing is in

use, more than one CPU might be used to retrieve the data for SQL
statements. It is quite possible that one CPU might become nearly
100% busy, yet the average CPU busy calculation might

might indicate that the CPUs are only 7% busy.

CPU time: 56,931 seconds

Percentage of total possible CPU Time that was consumed by Oracle:
56,931/301,946 = 18.9%

The above is not conclusive, digging deeper...

What is happening at the system level:
OS Statistics
Statistic Total
AVG_BUSY_TIME 773,431/100= 7,734.31s (total divided by # CPUs)
AVG_IDLE_TIME 1,382,735/100= 13,827.35s
AVG_IOWAIT_TIME 198,722/100= 1,987.22s
AVG_SYS_TIME 112,958/100= 1,129.58s
AVG_USER_TIME 659,771/100= 6,597.71s
OS_CPU_WAIT_TIME 11,375,200/100=113,752.00s

OS_CPU_WAIT_TIME is defined as: Total number of seconds that processes
have been in a ready state, waiting to be selected by the operating
system scheduler to run. This seems to indicate

that there was significant waiting for CPU time slices (competition
for the same CPU), even though the average CPU utilization at the
operating system level is 35.9%

[7,734.31/(7,734.31+13,827.35)]. Several processes are apparently
fighting for time on the same CPU.

You could look at the Time Model Statistics, but they may not be
telling much other than there may be a lot of context switches, and
more than half of the total SQL execution time was spent

on the CPUs (rather than waiting for disk or other wait events):
Statistic Name Time (s) % of DB Time
sql execute elapsed time 102,885.27 99.48
DB CPU 56,931.50 55.05
Java execution elapsed time 18,173.85 17.57
PL/SQL execution elapsed time 1,146.88 1.11
....
DB time 103,423.35
background elapsed time 9,817.43
background cpu time 1,169.73

The hierarchy of the above Time Model Statistics should be read like
this:
1) background elapsed time 9,817.43
2) background cpu time 1,169.73 (of the 9,817.43)
1) DB time 103,423.35
2) DB CPU 56,931.50 (of the 103,423.35)
2) connection management call elapsed time
2) sequence load elapsed time
2) sql execute elapsed time
2) parse time elapsed
3) hard parse elapsed time
4) hard parse (sharing criteria) elapsed time
5) hard parse (bind mismatch) elapsed time
3) failed parse elapsed time
4) failed parse (out of shared memory) elapsed time
2) PL/SQL execution elapsed time 1,146.88
2) inbound PL/SQL rpc elapsed time
2) PL/SQL compilation elapsed time
2) Java execution elapsed time 18,173.85 (of the 103,423.35)

Let's switch our attention to the wait events, but first a couple
statistics to provide a frame of reference:
SQL*Net roundtrips to/from client 4,813,090
SQL*Net roundtrips to/from dblink 43,646
user calls 4,836,293
user commits 24,259
user rollbacks 20

Event Waits %Time -outs Total Wait
Time (s) Avg wait (ms) Waits /txn
db file sequential read 4,036,055 0.00
13,310 3 166.24
db file parallel write 567,009 0.00
6,515 11 23.35
db file scattered read 934,341 0.00
1,383 1 38.48
log file parallel write 38,621 0.00
1,019 26 1.59
Backup: sbtwrite2 51,438 0.00
896 17 2.12
Log archive I/O 15,175 0.00
658 43 0.63
log file sync 17,706 0.10
387 22 0.73
direct path write temp 98,515 0.00
313 3 4.06
SQL*Net more data to client 4,642,420 0.00
154 0 191.21
enq: TX - row lock contention 1,475 3.19
138 94 0.06
control file parallel write 10,484 0.00
107 10 0.43
log file sequential read 12,839 0.00
87 7 0.53
read by other session 40,059 0.00
79 2 1.65
direct path read 272,647 0.00
54 0 11.23
log buffer space 181 0.00
46 256 0.01
enq: WL - contention 16 87.50
44 2742 0.00
latch: cache buffers chains 363,571 0.00
37 0 14.97
control file single write 448 0.00
28 63 0.02
os thread startup 424 0.00
23 55 0.02
RMAN backup & recovery I/O 6,130 0.00
18 3 0.25
control file sequential read 955,614 0.00
9 0 39.36
log file switch completion 75 0.00
7 89 0.00
SQL*Net more data from dblink 127,162 0.00
6 0 5.24
cursor: pin S wait on X 365 100.00
4 10 0.02
SQL*Net message to client 4,819,165 0.00
4 0 198.49
....

According to the number of waits for the SQL*Net events, on average,
just about every request for data from the server requires two sends
as the data returned is too large to fit into the SDU

(defaults to 2,000 bytes) - this is based on comparing the value of
"SQL*Net message to client" to the value of "SQL*Net more data to
client". This indicates that network latency, if

significant, could have a significant impact on apparent database
performance.

There are timeouts reported for some of the wait events - this may be
a critical indicator. If the timeout for a paricular wait event
happens to be 3 seconds, and 88% of the waits for an

event happen to result in time outs, that means on average 88% of the
time the event happens, it will last 3 seconds in duration. However,
the wait count increases by 1 every time a time-out

occurs, thus it might be a single session waiting non-stop for 42
seconds (16 * 87.50 * 3 seconds) = 42 seconds.

"db file sequential read" indicates that Oracle is reading one block
from disk - this usually indicates that Oracle is reading from an
index, but might also happen rarely during a tablescan

(if the other blocks needed are already in the buffer cache), or when
reading the undo blocks to provide a consistent read. "db file
scattered read" indicates that Oracle is reading more

than one block at a time from disk - this usually indicates that
Oracle is reading from a table during a full table scan.

"db file parallel write" indicates waits related to the DBWR process,
and may be related to the number of DBWR processes or DBWR I/O
slaves. May also indicate slow or high contention devices. I believe
that I previously commented on the value of the parameter
db_writer_processes, and suggested that you take a look at an article
on Kevin Closson's website related to what happens when that parameter
is set to a high value - that article talks about the impact on CPU
queuing, which seems to be a problem in this AWR report.

"log file parallel write" indicates Writing redo records to the redo
log files from the log buffer. This usually indicates slow devices or
contention where the online redo logs are located.

Take a look at the definitions for the other wait events.

Use the CPU utilization and wait events to determine what other
sections of the AWR report need to be examined. Post the results of
your findings.

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.
From: joel garry on
On Nov 21, 6:28 am, raja <dextersu...(a)gmail.com> wrote:
> Hi,
>
> I will try to get a AWR Report which has an interval of 10-15 minutes
> interval.
> Currently i have only this report and i am trying to analyze it.
>
> In AWR Report, SQL Statistics, there are many parts like :
> SQL ordered by Elapsed Time
> SQL ordered by CPU Time
> SQL ordered by Gets
> SQL ordered by Reads
> SQL ordered by Executions
> SQL ordered by Parse Calls
> SQL ordered by Sharable Memory
> SQL ordered by Version Count
> Which i should look into first ? why ?
>
> With Regards,
> Raja.

The why relates to why you are looking at this report in the first
place. You should be familiar with the system and how it is running,
and have some idea of what is wrong. It's easy to fall into the trap
of tuning for tuning's sake, also wittily referred to as OTD -
Obsessive Tuning Disorder. It's also easy to fall into the trap of
saying "no one is complaining, so therefore nothing is wrong."

Personally, I think there is value in checking a system to be sure
nothing is about to become a problem, or become a problem when two not-
problems run concurrently. Also, when a bottleneck is created, things
can go downhill in a hurry, one wants to be aware of what to watch
for. This falls under the part of a DBA's job called "knowing your
system."

Tuning is an iterative process, too. You need to order the importance
of what is wrong, and from there decide which to look for first. Once
you've fixed it, something else may show up. It is even possible the
fix may make things worse. For example, you may discover lots of
context switching due to unnecessary PL calls by a major app, and
decide to fix it by converting a particular bit of code to SQL. Then
the processor is free to do everything faster, overwhelming the I/O
system with calls.

So, it depends. That's why you follow a particular methodology,
preferably based on rational inquiry like Charles is demonstrating,
rather than just twiddling with ratios or expecting some cookbook
scheme to help. The trick is to tease out the clues that send you in
the right direction, and ignore the red herrings. So what is the
database waiting for? Are the waits what you would expect given what
your application is doing? Charles' SDU comment is certainly worth
investigating further.

You might want to briefly look at top sql by parse calls if you are
using a packaged app, some vendors write poor code (ie, without bind
variables) and it becomes painfully obvious when your most-used code
suffers from excessive parsing. Also compare the top executed with
all the others, at least to be sure your most executed code is working
as it should. Sometimes some blatant easily fixable errors pop out
there. But then again, the same applies to all those top sql lists.

jg
--
@home.com is bogus.
http://home.nestor.minsk.by/jazz/news/2008/11/1804.html
From: raja on
Ok.
I will check with the definitions of wait events and get back to you.

Thanks.
With Regards,
Raja.
From: raja on
Hi,

I have also another information which would help to solve the problem
along with the AWR Report.
From the advisor tables, i got the following information ( these were
the findings made by oracle ) :

Application Analysis :
-----------------------
Wait event "Backup: sbtwrite2" in wait class "Administrative"
Wait event "Data file init write" in wait class "User I/O"
Wait event "enq: CF - contention" in wait class "Other"
Wait event "enq: JI - contention" in wait class "Other"
Wait event "enq: TC - contention" in wait class "Other"
Wait event "inactive session" in wait class "Other"
Wait event "SQL*Net more data from dblink" in wait class "Network"
Wait event "wait for a undo record" in wait class "Other"
Waits on event "log file sync" while performing COMMIT and ROLLBACK
operations

I will check over the above wait events also and get back to you.

Waits on event "log file sync" while performing COMMIT and ROLLBACK
operations - i think we should not use more commits in the
application, to solve this problem.

I hope most of the above mentioned wait events were due to the Backup
activity that has been taken place ( This was also present in the AWR
Report).
I have consolidated the above wait events ( data was taken for 2
months )
So the problem with the database should be mostly with the backup
activity only.
Correct ?

With Regards,
Raja.
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8
Prev: Role , System privilege and Object privilege
Next: RMAN Error