|
From: Chris on 17 Jul 2008 12:01 Hi, We have a table called Clients that allows up to 4 telephone numbers to be entered. I am trying to write a query that will identify if another client has the same telephone number. I need to look in any of the 4 fields. E.g. If Client 1 has a tel no of 555555555 in TelNo1ID column and Client 2 has a tel no of 555555555 in TelNo3ID I want to bring these up as a duplicate. Note: Both clients should be listed. This is the query I have at the moment... SELECT * FROM dbo.Clients C WHERE EXISTS (SELECT * FROM dbo.Clients D WHERE D.ClientSerialNo <> C.ClientSerialNo AND (D.TelNo1ID = C.TelNo1ID OR D.TelNo1ID = C.TelNo2ID OR D.TelNo1ID = C.TelNo3ID OR D.TelNo1ID = C.MobileNoID)) OR EXISTS (SELECT * FROM dbo.Clients D WHERE D.ClientSerialNo <> C.ClientSerialNo AND (D.TelNo2ID = C.TelNo1ID OR D.TelNo2ID = C.TelNo2ID OR D.TelNo2ID = C.TelNo3ID OR D.TelNo2ID = C.MobileNoID)) OR EXISTS (SELECT * FROM dbo.Clients D WHERE D.ClientSerialNo <> C.ClientSerialNo AND (D.TelNo3ID = C.TelNo1ID OR D.TelNo3ID = C.TelNo2ID OR D.TelNo3ID = C.TelNo3ID OR D.TelNo3ID = C.MobileNoID)) OR EXISTS (SELECT * FROM dbo.Clients D WHERE D.ClientSerialNo <> C.ClientSerialNo AND (D.MobileNoID = C.TelNo1ID OR D.MobileNoID = C.TelNo2ID OR D.MobileNoID = C.TelNo3ID OR D.MobileNoID = C.MobileNoID)) It works but on large data files it is incredibly slow. This is the line that is slow with my current data... EXISTS (SELECT * FROM dbo.Clients D WHERE D.ClientSerialNo <> C.ClientSerialNo AND (D.TelNo3ID = C.TelNo1ID OR D.TelNo3ID = C.TelNo2ID OR D.TelNo3ID = C.TelNo3ID OR D.TelNo3ID = C.MobileNoID)) OR .... More specifically if I remove "D.TelNo3ID = C.TelNo1ID OR" from the query it is suddenly fast. I am assuming here (not sure if this is the reason) this is happening because TelNo1ID column has lots of telephone numbers and TelNo3ID has just a few numbers and lots of NULLS. The data has 90000 clients and TelNo1ID is populated for 76000 telephone numbers and TelNo3ID is populated for only 250 telephone numbers. All 4 telephone number columns are indexed. Is there a much better way around this? Thanks, Chris
From: Eric Isaacs on 17 Jul 2008 12:27 > Is there a much better way around this? The better way around this is in the table design, not in the query design. Instead of adding 4 columns to the same table, create a new table with ClientSerialNo, TelNoID The serial number for the client identifies the client and the telephone number id identifies the telephone number record in what I assume is the telephone number table. With that type of design, you can have 1, 2, 3, 4, or more telephone numbers for a client and queriying for duplicates is as easy as this... SELECT ct1.ClientSerialNo AS Client1, ct2.ClientSerialNo AS Client2, ct1.TelNoID AS DuplicateTelephoneNumber FROM ClientTelNo AS ct1 INNER JOIN ClientTelNo ct1 ON ct1.TelNoID = ct2.TelNoID AND ct1.ClientSerialNo <> ct2.ClientSerialNo -Eric Isaacs
From: Chris on 17 Jul 2008 12:49 Hi Eric, Thanks for that. I know you are right and that is the way forwards. I was just after an easier solution until I've got the time to change the structure! Might have to do it now, as I will need to do it soon rather than later. Thanks, Chris "Eric Isaacs" <eisaacs(a)gmail.com> wrote in message news:2c3e10d2-5476-4ca2-a1b1-72001600d242(a)k13g2000hse.googlegroups.com... >> Is there a much better way around this? > > The better way around this is in the table design, not in the query > design. Instead of adding 4 columns to the same table, create a new > table with > > ClientSerialNo, > TelNoID > > The serial number for the client identifies the client and the > telephone number id identifies the telephone number record in what I > assume is the telephone number table. > > With that type of design, you can have 1, 2, 3, 4, or more telephone > numbers for a client and queriying for duplicates is as easy as > this... > > SELECT > ct1.ClientSerialNo AS Client1, > ct2.ClientSerialNo AS Client2, > ct1.TelNoID AS DuplicateTelephoneNumber > FROM > ClientTelNo AS ct1 > INNER JOIN ClientTelNo ct1 ON ct1.TelNoID = ct2.TelNoID AND > ct1.ClientSerialNo <> ct2.ClientSerialNo > > -Eric Isaacs > >
From: Payson on 17 Jul 2008 13:16 > > Thanks for that. I know you are right and that is the way forwards. I was > just after an easier solution until I've got the time to change the > structure! Might have to do it now, as I will need to do it soon rather > than later. > You can use Eric's basic approach to clean your existing data. Try something like: (untested) SELECT ClientSerialNo, TelNoId, COUNT(*) FROM ( SELECT ClientSerialNo, TelNo1ID AS TelNoId FROM dbo.Clients C UNION ALL SELECT ClientSerialNo, TelNo2ID AS TelNoId FROM dbo.Clients C UNION ALL SELECT ClientSerialNo, TelNo3ID AS TelNoId FROM dbo.Clients C UNION ALL SELECT ClientSerialNo, TelNo4ID AS TelNoId FROM dbo.Clients C ) CT GROUP BY ClientSerialNo, TelNoId HAVING COUNT(*) > 1 HTH Payson
From: Payson on 17 Jul 2008 14:09 Just noticed your statement that some of the columns have a lot of nulls. To speed things up, you might add "WHERE TelNo?ID IS NOT NULL" to each of the selects in the union query. Payson
|
Pages: 1 Prev: Handling null dates ? Next: How to display free spaces after massive truncate. |