From: Chris on
Hi,

We have a table called Clients that allows up to 4 telephone numbers to be
entered. I am trying to write a query that will identify if another client
has the same telephone number. I need to look in any of the 4 fields. E.g.
If Client 1 has a tel no of 555555555 in TelNo1ID column and Client 2 has a
tel no of 555555555 in TelNo3ID I want to bring these up as a duplicate.
Note: Both clients should be listed.

This is the query I have at the moment...

SELECT *
FROM dbo.Clients C
WHERE EXISTS (SELECT * FROM dbo.Clients D WHERE D.ClientSerialNo <>
C.ClientSerialNo AND (D.TelNo1ID = C.TelNo1ID OR D.TelNo1ID = C.TelNo2ID OR
D.TelNo1ID = C.TelNo3ID OR D.TelNo1ID = C.MobileNoID)) OR
EXISTS (SELECT * FROM dbo.Clients D WHERE D.ClientSerialNo <>
C.ClientSerialNo AND (D.TelNo2ID = C.TelNo1ID OR D.TelNo2ID = C.TelNo2ID OR
D.TelNo2ID = C.TelNo3ID OR D.TelNo2ID = C.MobileNoID)) OR
EXISTS (SELECT * FROM dbo.Clients D WHERE D.ClientSerialNo <>
C.ClientSerialNo AND (D.TelNo3ID = C.TelNo1ID OR D.TelNo3ID = C.TelNo2ID OR
D.TelNo3ID = C.TelNo3ID OR D.TelNo3ID = C.MobileNoID)) OR
EXISTS (SELECT * FROM dbo.Clients D WHERE D.ClientSerialNo <>
C.ClientSerialNo AND (D.MobileNoID = C.TelNo1ID OR D.MobileNoID = C.TelNo2ID
OR D.MobileNoID = C.TelNo3ID OR D.MobileNoID = C.MobileNoID))

It works but on large data files it is incredibly slow. This is the line
that is slow with my current data...

EXISTS (SELECT * FROM dbo.Clients D WHERE D.ClientSerialNo <>
C.ClientSerialNo AND (D.TelNo3ID = C.TelNo1ID OR D.TelNo3ID = C.TelNo2ID OR
D.TelNo3ID = C.TelNo3ID OR D.TelNo3ID = C.MobileNoID)) OR

.... More specifically if I remove "D.TelNo3ID = C.TelNo1ID OR" from the
query it is suddenly fast. I am assuming here (not sure if this is the
reason) this is happening because TelNo1ID column has lots of telephone
numbers and TelNo3ID has just a few numbers and lots of NULLS. The data has
90000 clients and TelNo1ID is populated for 76000 telephone numbers and
TelNo3ID is populated for only 250 telephone numbers.

All 4 telephone number columns are indexed.

Is there a much better way around this?

Thanks,
Chris



From: Eric Isaacs on
> Is there a much better way around this?

The better way around this is in the table design, not in the query
design. Instead of adding 4 columns to the same table, create a new
table with

ClientSerialNo,
TelNoID

The serial number for the client identifies the client and the
telephone number id identifies the telephone number record in what I
assume is the telephone number table.

With that type of design, you can have 1, 2, 3, 4, or more telephone
numbers for a client and queriying for duplicates is as easy as
this...

SELECT
ct1.ClientSerialNo AS Client1,
ct2.ClientSerialNo AS Client2,
ct1.TelNoID AS DuplicateTelephoneNumber
FROM
ClientTelNo AS ct1
INNER JOIN ClientTelNo ct1 ON ct1.TelNoID = ct2.TelNoID AND
ct1.ClientSerialNo <> ct2.ClientSerialNo

-Eric Isaacs


From: Chris on
Hi Eric,

Thanks for that. I know you are right and that is the way forwards. I was
just after an easier solution until I've got the time to change the
structure! Might have to do it now, as I will need to do it soon rather
than later.

Thanks,
Chris


"Eric Isaacs" <eisaacs(a)gmail.com> wrote in message
news:2c3e10d2-5476-4ca2-a1b1-72001600d242(a)k13g2000hse.googlegroups.com...
>> Is there a much better way around this?
>
> The better way around this is in the table design, not in the query
> design. Instead of adding 4 columns to the same table, create a new
> table with
>
> ClientSerialNo,
> TelNoID
>
> The serial number for the client identifies the client and the
> telephone number id identifies the telephone number record in what I
> assume is the telephone number table.
>
> With that type of design, you can have 1, 2, 3, 4, or more telephone
> numbers for a client and queriying for duplicates is as easy as
> this...
>
> SELECT
> ct1.ClientSerialNo AS Client1,
> ct2.ClientSerialNo AS Client2,
> ct1.TelNoID AS DuplicateTelephoneNumber
> FROM
> ClientTelNo AS ct1
> INNER JOIN ClientTelNo ct1 ON ct1.TelNoID = ct2.TelNoID AND
> ct1.ClientSerialNo <> ct2.ClientSerialNo
>
> -Eric Isaacs
>
>


From: Payson on
>
> Thanks for that.  I know you are right and that is the way forwards.  I was
> just after an easier solution until I've got the time to change the
> structure!  Might have to do it now, as I will need to do it soon rather
> than later.
>

You can use Eric's basic approach to clean your existing data. Try
something like: (untested)

SELECT ClientSerialNo, TelNoId, COUNT(*)
FROM (
SELECT ClientSerialNo, TelNo1ID AS TelNoId
FROM dbo.Clients C
UNION ALL
SELECT ClientSerialNo, TelNo2ID AS TelNoId
FROM dbo.Clients C
UNION ALL
SELECT ClientSerialNo, TelNo3ID AS TelNoId
FROM dbo.Clients C
UNION ALL
SELECT ClientSerialNo, TelNo4ID AS TelNoId
FROM dbo.Clients C ) CT
GROUP BY ClientSerialNo, TelNoId
HAVING COUNT(*) > 1

HTH

Payson
From: Payson on
Just noticed your statement that some of the columns have a lot of
nulls. To speed things up, you might add "WHERE TelNo?ID IS NOT NULL"
to each of the selects in the union query.

Payson