From: obiron on
Hi guys,

first post here.

I have the following problem
I have records in a source table which I have pulled out to a temp table and
built some indexes. As part of this extract I have ranked the records using
the rank() over funcitonality to force a sequence key resetting on
PATHWAY_ID. I am now trying to validate that the records have been entered
in the correct order. For example. It is incorrect to have a main_code = 90
if you have not had a main_code = 30. The following code is correctly
identifying every PATHWAY_ID where this is occuring

select '90 with no 30' as 'error',* from #PATHWAY where Patnt_pathwauy_id in
(
SELECT pathway_id from #PATHWAY X where main_code = 90 and Patnt_pathway_id
not in
(
SELECT pathway_id from #PATHWAY Y where Y.main_code = 30 and Y.rank < X.
rank
)
)

in English, get all the data where the main_code is 90 and there is not a
record with a 30 with a higher ranked number.

The problem is that there may be any number of 90s before the 30 and I only
really need the inner query to continue processing if there is one and then
it can bomb out. I know that once I have the results set I could SELECT
DISTINCT but I am looking to save time by not processing every record.

E.g. if the pathway is

10,20,90,90,90,30 then currently the middle select statement fires three
times, but I only need it to fire once because I will get an error on the
first 90 code.

In the above example, I suppose I could use MIN() to find the first 90 and
first 30 and only compare them. On some of the more complex validations
though, a 10 woudl restart the sequence and so I would be looking for a 90
after a 10 but before a 30 (with any codes in between the 10 and 90 and any
codes between the 90 and the 30) which would make life more difficult as I
would have to find the smallest 30 that is larger than the largest 10 which
is smaller than the 90....

The data set is 385000 records on 82365 pathways and it is therefore doing
lots of nested loops which are unneccessary. This query runs in 46 minutes,
returning 60200 rows and I have about 20 of these sequence checks to carry
out, some of which may require 4 or 5 passes of the pathway and the current
performance is not acceptable.

I have already built indexes against the temp table based on pathway,rank,
pathway,main_code and patway,main_code,rank.

Can anyone suggest a way to speed this up, either with better indexes or a
way of forcing the query to drop out as soon as the error is discovered on
the pathway




Thanks in advance

Obiron





Development environment is SQL2005