Grouping sysjobhistory [Programming SQL]

From: stavros on 1 Sep 2006 15:04

I'm trying to report on the runtimes of a couple new jobs through
several iterations, testing various code and data changes in each
iteration. The sysjobs and sysjobhistory tables have all the info I
need for this, except I'd like to add a column showing the run
iteration for each job. I'm surprised this isn't already in the
sysjobhistory table; something that ties the individual steps together
into one instance of a job run.

For example, this query returns the data ordered the way I want (faked
data here):

select j.name, jh.step_id, jh.step_name, run_duration
from sysjobhistory jh join sysjobs j
on j.job_id = jh.job_id
order by j.name, jh.instance_id

name step_id step_name run_duration
Job1 1 Job 1 Step 1 30
Job1 2 Job 1 Step 2 30
Job1 3 Job 1 Step 3 30
Job1 0 (Job outcome) 90
Job1 1 Job 1 Step 1 20
Job1 2 Job 1 Step 2 20
Job1 3 Job 1 Step 3 20
Job1 0 (Job outcome) 60
Job2 1 Job 2 Step 1 10
Job2 2 Job 2 Step 2 10
Job2 3 Job 2 Step 3 10
Job2 4 Job 2 Step 4 10
Job2 0 (Job outcome) 40
Job2 1 Job 2 Step 1 15
Job2 2 Job 2 Step 2 15
Job2 3 Job 2 Step 3 15
Job2 4 Job 2 Step 4 15
Job2 0 (Job outcome) 60

But I'd like to add an instance/iteration column:

name iteration step_id step_name run_duration
Job1 1 1 Job 1 Step 1 30
Job1 1 2 Job 1 Step 2 30
Job1 1 3 Job 1 Step 3 30
Job1 1 0 (Job outcome) 90
Job1 2 1 Job 1 Step 1 20
Job1 2 2 Job 1 Step 2 20
Job1 2 3 Job 1 Step 3 20
Job1 2 0 (Job outcome) 60
Job2 1 1 Job 2 Step 1 10
Job2 1 2 Job 2 Step 2 10
Job2 1 3 Job 2 Step 3 10
Job2 1 4 Job 2 Step 4 10
Job2 1 0 (Job outcome) 40
Job2 2 1 Job 2 Step 1 15
Job2 2 2 Job 2 Step 2 15
Job2 2 3 Job 2 Step 3 15
Job2 2 4 Job 2 Step 4 15
Job2 2 0 (Job outcome) 60

I'd be happy with a unique id for each iteration; it doesn't have to
reset for each job_name the way I've shown. This is on 2005, and I
thought the ranking functions might help (DENSE_RANK), but I can't
figure out how to make it work. Any thoughts?

From: Anith Sen on 2 Sep 2006 02:40

Have you looked into the possibility of using the ROLLUP operation with
GROUP BY?

--
Anith

From: Erland Sommarskog on 2 Sep 2006 07:21

stavros (stavros(a)mailinator.com) writes:
> I'm trying to report on the runtimes of a couple new jobs through
> several iterations, testing various code and data changes in each
> iteration. The sysjobs and sysjobhistory tables have all the info I
> need for this, except I'd like to add a column showing the run
> iteration for each job. I'm surprised this isn't already in the
> sysjobhistory table; something that ties the individual steps together
> into one instance of a job run.
>
> For example, this query returns the data ordered the way I want (faked
> data here):
>
> select j.name, jh.step_id, jh.step_name, run_duration
> from sysjobhistory jh join sysjobs j
> on j.job_id = jh.job_id
> order by j.name, jh.instance_id
>...
> I'd be happy with a unique id for each iteration; it doesn't have to
> reset for each job_name the way I've shown. This is on 2005, and I
> thought the ranking functions might help (DENSE_RANK), but I can't
> figure out how to make it work. Any thoughts?

The problem is that the sysjobhistory does not seemed to be design to
cope with this sort of queries. A proper data model would probably have
add sysjobhistory and sysjobstephistory. Trying to get the data out
from this demorlalised design is not that easy. (But I should add that
I have not worked much with sysjobhistory.)

Here is a query that I came up with:

WITH jobhist AS (
SELECT job_id, step_id, step_name, run_duration,
run_date, run_time, instance_id,
sortkey = row_number()
OVER (ORDER BY run_date, run_time,
CASE WHEN step_id = 0 THEN -10000
ELSE instance_id
END)
FROM sysjobhistory
)
select j.name, jh.step_id, jh.step_name, jh.run_duration, jh.run_date,
jh.run_time, jh.instance_id, jh.sortkey, jh2.sortkey,
iteration = dense_rank() over (ORDER BY j.name, jh2.sortkey)
from jobhist jh
join jobhist jh2 ON jh.job_id = jh2.job_id
AND jh2.sortkey = (SELECT MAX(jh3.sortkey)
FROM jobhist jh3
WHERE jh3.job_id = jh2.job_id
AND jh3.step_id = 0
AND jh.sortkey >= jh3.sortkey)
join sysjobs j on j.job_id = jh.job_id
order by j.name, jh.sortkey

As you see it's quite messy. It could be simplified if you make the
assumption that instance id is growing monotonically, and if you
use step 1 as your baseline and not step 0. (As step 0 appears to
be written last to the table.)

You could still get funky results if the same job runs twice in the
same second, which could happen for jobs triggered by WMI events I
guess.

Once you have the grouping in place, using dense_rank() is trivial.

--
Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

From: stavros on 5 Sep 2006 18:37

Thanks very much. You're right, it's a little messy, but a good
introduction to common table expressions. And it works! Cheers...

Erland Sommarskog wrote:
> stavros (stavros(a)mailinator.com) writes:
> > I'm trying to report on the runtimes of a couple new jobs through
> > several iterations, testing various code and data changes in each
> > iteration. The sysjobs and sysjobhistory tables have all the info I
> > need for this, except I'd like to add a column showing the run
> > iteration for each job. I'm surprised this isn't already in the
> > sysjobhistory table; something that ties the individual steps together
> > into one instance of a job run.
> >
> > For example, this query returns the data ordered the way I want (faked
> > data here):
> >
> > select j.name, jh.step_id, jh.step_name, run_duration
> > from sysjobhistory jh join sysjobs j
> > on j.job_id = jh.job_id
> > order by j.name, jh.instance_id
> >...
> > I'd be happy with a unique id for each iteration; it doesn't have to
> > reset for each job_name the way I've shown. This is on 2005, and I
> > thought the ranking functions might help (DENSE_RANK), but I can't
> > figure out how to make it work. Any thoughts?
>
> The problem is that the sysjobhistory does not seemed to be design to
> cope with this sort of queries. A proper data model would probably have
> add sysjobhistory and sysjobstephistory. Trying to get the data out
> from this demorlalised design is not that easy. (But I should add that
> I have not worked much with sysjobhistory.)
>
> Here is a query that I came up with:
>
> WITH jobhist AS (
> SELECT job_id, step_id, step_name, run_duration,
> run_date, run_time, instance_id,
> sortkey = row_number()
> OVER (ORDER BY run_date, run_time,
> CASE WHEN step_id = 0 THEN -10000
> ELSE instance_id
> END)
> FROM sysjobhistory
> )
> select j.name, jh.step_id, jh.step_name, jh.run_duration, jh.run_date,
> jh.run_time, jh.instance_id, jh.sortkey, jh2.sortkey,
> iteration = dense_rank() over (ORDER BY j.name, jh2.sortkey)
> from jobhist jh
> join jobhist jh2 ON jh.job_id = jh2.job_id
> AND jh2.sortkey = (SELECT MAX(jh3.sortkey)
> FROM jobhist jh3
> WHERE jh3.job_id = jh2.job_id
> AND jh3.step_id = 0
> AND jh.sortkey >= jh3.sortkey)
> join sysjobs j on j.job_id = jh.job_id
> order by j.name, jh.sortkey
>
> As you see it's quite messy. It could be simplified if you make the
> assumption that instance id is growing monotonically, and if you
> use step 1 as your baseline and not step 0. (As step 0 appears to
> be written last to the table.)
>
> You could still get funky results if the same job runs twice in the
> same second, which could happen for jobs triggered by WMI events I
> guess.
>
> Once you have the grouping in place, using dense_rank() is trivial.
>
>
> --
> Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

|
Pages: 1
Prev: Reading .LDF
Next: Working Now...