From: GPage on
We have a server hosting some large data warehouse databases which is
experiencing slowness during the day. We are fairly sure that the main
culprits are poorly written SQL and the SAN. We mirror the databases from
this server to another to provide an isolated environment for reporting.

When we look at the DMVs to try and see why the server is slow we're not
seeing the IO related issues we expect to see. Based on PAL logs and other
metrics our disk is very slow and saturated, but the waits we're seeing as
the highest % look like system waits, and infact buffer IO waits are under
sleep waits which seems odd. Can I ignore these waits? And how can I tell
what the real issue is? I've looked up the states in BOL and the descriptions
are not very helpful.

The highest waits deal with the mirroring at 36.6% wait time
DBMIRRORING_CMD 99%- BOL states that it's expected for this wait to be long
DB_MIRROR_SEND 0.55%

Next down is other at 28%
BROKER_TASK_STOP - 80% seems to be related to mirroring, but I don't really
know what this is doing. We're not using SQL Broker. Is this an issue and
what steps should be taken to allieviate it?

ASYNC_IO_COMPLETION is at 4.78% in this category


Sleep is next down at 22% with
SQLTRACE_BUFFER_FLUSH at 49.5% - we have traces running and BOL states this
is waiting for the trace log to flush to disk. That seems to mean that it's
waiting on the disk to flush the log, so is this any sort of indicator for a
slow disk?

LAZYWRITER_SLEEP 49%. I assume this means that the lazy writer has nothing
to do during this period and thus is waiting for a task

After these waits we get to the BUFFER_IO waits which I'm fairly sure is the
actual cause of the problem.

Are these other waits indicative of a problem on the box? If so how do I
determine what the problem and solution is?

Thanks

From: Andrew J. Kelly on
For the most part you can ignore most of these waits:

WHERE [wait_type] IN
('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK','SLEEP_SYSTEMTASK',
'SQLTRACE_BUFFER_FLUSH','WAITFOR', 'BROKER_TASK_STOP',
'BROKER_RECEIVE_WAITFOR', 'OLEDB','CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT' ) ;

Filter those out of the results and look at the percentages from there. If
you are concerned with IO then look at the dmv for physical stats.

--

Andrew J. Kelly SQL MVP
Solid Quality Mentors

"GPage" <GPage(a)discussions.microsoft.com> wrote in message
news:080AF8A2-0693-4398-9DDA-807B553E5FE5(a)microsoft.com...
> We have a server hosting some large data warehouse databases which is
> experiencing slowness during the day. We are fairly sure that the main
> culprits are poorly written SQL and the SAN. We mirror the databases from
> this server to another to provide an isolated environment for reporting.
>
> When we look at the DMVs to try and see why the server is slow we're not
> seeing the IO related issues we expect to see. Based on PAL logs and other
> metrics our disk is very slow and saturated, but the waits we're seeing as
> the highest % look like system waits, and infact buffer IO waits are under
> sleep waits which seems odd. Can I ignore these waits? And how can I tell
> what the real issue is? I've looked up the states in BOL and the
> descriptions
> are not very helpful.
>
> The highest waits deal with the mirroring at 36.6% wait time
> DBMIRRORING_CMD 99%- BOL states that it's expected for this wait to be
> long
> DB_MIRROR_SEND 0.55%
>
> Next down is other at 28%
> BROKER_TASK_STOP - 80% seems to be related to mirroring, but I don't
> really
> know what this is doing. We're not using SQL Broker. Is this an issue and
> what steps should be taken to allieviate it?
>
> ASYNC_IO_COMPLETION is at 4.78% in this category
>
>
> Sleep is next down at 22% with
> SQLTRACE_BUFFER_FLUSH at 49.5% - we have traces running and BOL states
> this
> is waiting for the trace log to flush to disk. That seems to mean that
> it's
> waiting on the disk to flush the log, so is this any sort of indicator for
> a
> slow disk?
>
> LAZYWRITER_SLEEP 49%. I assume this means that the lazy writer has nothing
> to do during this period and thus is waiting for a task
>
> After these waits we get to the BUFFER_IO waits which I'm fairly sure is
> the
> actual cause of the problem.
>
> Are these other waits indicative of a problem on the box? If so how do I
> determine what the problem and solution is?
>
> Thanks
>
From: GPage on
Thanks for the response. I notice that the filtered list does not contain the
mirror waits though. Are those something that can be ignored as well, or do
they indicate some bottleneck?

"Andrew J. Kelly" wrote:

> For the most part you can ignore most of these waits:
>
> WHERE [wait_type] IN
> ('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK','SLEEP_SYSTEMTASK',
> 'SQLTRACE_BUFFER_FLUSH','WAITFOR', 'BROKER_TASK_STOP',
> 'BROKER_RECEIVE_WAITFOR', 'OLEDB','CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT' ) ;
>
> Filter those out of the results and look at the percentages from there. If
> you are concerned with IO then look at the dmv for physical stats.
>
> --
>
> Andrew J. Kelly SQL MVP
> Solid Quality Mentors
>
> "GPage" <GPage(a)discussions.microsoft.com> wrote in message
> news:080AF8A2-0693-4398-9DDA-807B553E5FE5(a)microsoft.com...
> > We have a server hosting some large data warehouse databases which is
> > experiencing slowness during the day. We are fairly sure that the main
> > culprits are poorly written SQL and the SAN. We mirror the databases from
> > this server to another to provide an isolated environment for reporting.
> >
> > When we look at the DMVs to try and see why the server is slow we're not
> > seeing the IO related issues we expect to see. Based on PAL logs and other
> > metrics our disk is very slow and saturated, but the waits we're seeing as
> > the highest % look like system waits, and infact buffer IO waits are under
> > sleep waits which seems odd. Can I ignore these waits? And how can I tell
> > what the real issue is? I've looked up the states in BOL and the
> > descriptions
> > are not very helpful.
> >
> > The highest waits deal with the mirroring at 36.6% wait time
> > DBMIRRORING_CMD 99%- BOL states that it's expected for this wait to be
> > long
> > DB_MIRROR_SEND 0.55%
> >
> > Next down is other at 28%
> > BROKER_TASK_STOP - 80% seems to be related to mirroring, but I don't
> > really
> > know what this is doing. We're not using SQL Broker. Is this an issue and
> > what steps should be taken to allieviate it?
> >
> > ASYNC_IO_COMPLETION is at 4.78% in this category
> >
> >
> > Sleep is next down at 22% with
> > SQLTRACE_BUFFER_FLUSH at 49.5% - we have traces running and BOL states
> > this
> > is waiting for the trace log to flush to disk. That seems to mean that
> > it's
> > waiting on the disk to flush the log, so is this any sort of indicator for
> > a
> > slow disk?
> >
> > LAZYWRITER_SLEEP 49%. I assume this means that the lazy writer has nothing
> > to do during this period and thus is waiting for a task
> >
> > After these waits we get to the BUFFER_IO waits which I'm fairly sure is
> > the
> > actual cause of the problem.
> >
> > Are these other waits indicative of a problem on the box? If so how do I
> > determine what the problem and solution is?
> >
> > Thanks
> >
> .
>
From: Andrew J. Kelly on
Yes they can usually be ignored as well but they don't show up like some of
the SB ones unless you are actually using Mirroring.

--

Andrew J. Kelly SQL MVP
Solid Quality Mentors

"GPage" <GPage(a)discussions.microsoft.com> wrote in message
news:EE02FB16-834E-4A4E-9BEA-99758D7C9BF4(a)microsoft.com...
> Thanks for the response. I notice that the filtered list does not contain
> the
> mirror waits though. Are those something that can be ignored as well, or
> do
> they indicate some bottleneck?
>
> "Andrew J. Kelly" wrote:
>
>> For the most part you can ignore most of these waits:
>>
>> WHERE [wait_type] IN
>> ('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK','SLEEP_SYSTEMTASK',
>> 'SQLTRACE_BUFFER_FLUSH','WAITFOR', 'BROKER_TASK_STOP',
>> 'BROKER_RECEIVE_WAITFOR', 'OLEDB','CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT' )
>> ;
>>
>> Filter those out of the results and look at the percentages from there.
>> If
>> you are concerned with IO then look at the dmv for physical stats.
>>
>> --
>>
>> Andrew J. Kelly SQL MVP
>> Solid Quality Mentors
>>
>> "GPage" <GPage(a)discussions.microsoft.com> wrote in message
>> news:080AF8A2-0693-4398-9DDA-807B553E5FE5(a)microsoft.com...
>> > We have a server hosting some large data warehouse databases which is
>> > experiencing slowness during the day. We are fairly sure that the main
>> > culprits are poorly written SQL and the SAN. We mirror the databases
>> > from
>> > this server to another to provide an isolated environment for
>> > reporting.
>> >
>> > When we look at the DMVs to try and see why the server is slow we're
>> > not
>> > seeing the IO related issues we expect to see. Based on PAL logs and
>> > other
>> > metrics our disk is very slow and saturated, but the waits we're seeing
>> > as
>> > the highest % look like system waits, and infact buffer IO waits are
>> > under
>> > sleep waits which seems odd. Can I ignore these waits? And how can I
>> > tell
>> > what the real issue is? I've looked up the states in BOL and the
>> > descriptions
>> > are not very helpful.
>> >
>> > The highest waits deal with the mirroring at 36.6% wait time
>> > DBMIRRORING_CMD 99%- BOL states that it's expected for this wait to be
>> > long
>> > DB_MIRROR_SEND 0.55%
>> >
>> > Next down is other at 28%
>> > BROKER_TASK_STOP - 80% seems to be related to mirroring, but I don't
>> > really
>> > know what this is doing. We're not using SQL Broker. Is this an issue
>> > and
>> > what steps should be taken to allieviate it?
>> >
>> > ASYNC_IO_COMPLETION is at 4.78% in this category
>> >
>> >
>> > Sleep is next down at 22% with
>> > SQLTRACE_BUFFER_FLUSH at 49.5% - we have traces running and BOL states
>> > this
>> > is waiting for the trace log to flush to disk. That seems to mean that
>> > it's
>> > waiting on the disk to flush the log, so is this any sort of indicator
>> > for
>> > a
>> > slow disk?
>> >
>> > LAZYWRITER_SLEEP 49%. I assume this means that the lazy writer has
>> > nothing
>> > to do during this period and thus is waiting for a task
>> >
>> > After these waits we get to the BUFFER_IO waits which I'm fairly sure
>> > is
>> > the
>> > actual cause of the problem.
>> >
>> > Are these other waits indicative of a problem on the box? If so how do
>> > I
>> > determine what the problem and solution is?
>> >
>> > Thanks
>> >
>> .
>>