From: DavidC on
I have a stored procedure that is suppose to find duplicate entries in a
table. I am using a subselect and GROUP BY to check for duplicates. The
code below is the section of my WHERE clause that is suppose to find
duplicates but a few are showing up that are not duplicates. The GROUP BY in
the subselect indicates the columns that must be the same for a duplicate.
Can anyone see what I am missing that returns fake duplicates? I can post
the table schema if helpful but I didn't think it was needed in this case.
Thank you.

AND (CASE WHEN @OnlyDuplicates = 0 THEN 'T'
WHEN @OnlyDuplicates = 1 THEN
CASE WHEN dbo.Timesheets.WorkerLinkID IN(SELECT dbo.Timesheets.WorkerLinkID
FROM dbo.Timesheets INNER JOIN
dbo.PeopleLink ON dbo.Timesheets.WorkerLinkID =
dbo.PeopleLink.PeopleLinkID
WHERE (dbo.Timesheets.CheckID = 0)
GROUP BY dbo.Timesheets.WorkerLinkID,
dbo.Timesheets.ClientLinkID,
dbo.Timesheets.ServiceID,
dbo.Timesheets.Week2Ending,
dbo.Timesheets.PayRate,
dbo.Timesheets.AuthNumber,
dbo.PeopleLink.Branch
HAVING (COUNT(*) > 1)
AND (dbo.PeopleLink.Branch = @Branch)) THEN 'T'
ELSE 'F'
END
END = 'T')

--
David
From: John Bell on
On Fri, 7 May 2010 09:54:01 -0700, DavidC <dlchase(a)lifetimeinc.com>
wrote:

>I have a stored procedure that is suppose to find duplicate entries in a
>table. I am using a subselect and GROUP BY to check for duplicates. The
>code below is the section of my WHERE clause that is suppose to find
>duplicates but a few are showing up that are not duplicates. The GROUP BY in
>the subselect indicates the columns that must be the same for a duplicate.
>Can anyone see what I am missing that returns fake duplicates? I can post
>the table schema if helpful but I didn't think it was needed in this case.
>Thank you.
>
>AND (CASE WHEN @OnlyDuplicates = 0 THEN 'T'
> WHEN @OnlyDuplicates = 1 THEN
> CASE WHEN dbo.Timesheets.WorkerLinkID IN(SELECT dbo.Timesheets.WorkerLinkID
> FROM dbo.Timesheets INNER JOIN
> dbo.PeopleLink ON dbo.Timesheets.WorkerLinkID =
>dbo.PeopleLink.PeopleLinkID
> WHERE (dbo.Timesheets.CheckID = 0)
> GROUP BY dbo.Timesheets.WorkerLinkID,
> dbo.Timesheets.ClientLinkID,
> dbo.Timesheets.ServiceID,
> dbo.Timesheets.Week2Ending,
> dbo.Timesheets.PayRate,
> dbo.Timesheets.AuthNumber,
> dbo.PeopleLink.Branch
> HAVING (COUNT(*) > 1)
> AND (dbo.PeopleLink.Branch = @Branch)) THEN 'T'
> ELSE 'F'
> END
> END = 'T')


Hi

You will certainly get false posititves for any WorkerLinkID that has
any duplicates but not necessarily the combination of WorkerLinkID,
ClientLinkID, ServiceID, Week2Ending, PayRate, AuthNumbe and Branch
returned from the main query. Try corrolating the Timesheets and
PeopleLink columns between the sub-query and the main query and
checking the branch in the where clause although that should be in the
main where clause anyhow!#

John


From: --CELKO-- on
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. If you know how, follow ISO-11179 data element naming
conventions and formatting rules. Temporal data should use ISO-8601
formats. Code should be in Standard SQL as much as possible and not
local dialect.

Sample data is also a good idea, along with clear specifications. It
is very hard to debug code when you do not let us see it. If you want
to learn how to ask a question on a Newsgroup, look at:
http://www.catb.org/~esr/faqs/smart-questions.html

I am also trying to figure out what a "link_id" is and why it has two
different roles. To me, a link is part of a pointer chain from the old
network databases and an "id" is an identifier. I understand
"Personnel" but not what kind of entity a "PeopleLink" might be -- a
chain gang?

You might want to look up "flag coupling" in any book on Software
Engineering, since your "@only_duplicates" looks like this kind of
design problem.

From: Geoff Schaller on
Still acting like a traffic cop...

No-one's going to look for the tiny morsel of intelligence you tried to
offer at the end. Once they see the diatribe rolling out again it is a
simple invitation to stop reading altogether.

But keep practising. You might get it right one day.


From: Gert-Jan Strik on
In which table are you trying to find duplicates? In Timesheets or in
PeopleLink?

If PeopleLink.PeopleLinkID is not unique you will almost certainly get
duplicates.

Also, you might get fake duplicates if Timesheets.WorkerLinkID is not
unique. If WorkerLinkID is not unique, then you are not using a key,
which will go wrong. If the key is comprised of more than one element,
then you might have to rewrite you IN() to an EXISTS().

--
Gert-Jan


DavidC wrote:
>
> I have a stored procedure that is suppose to find duplicate entries in a
> table. I am using a subselect and GROUP BY to check for duplicates. The
> code below is the section of my WHERE clause that is suppose to find
> duplicates but a few are showing up that are not duplicates. The GROUP BY in
> the subselect indicates the columns that must be the same for a duplicate.
> Can anyone see what I am missing that returns fake duplicates? I can post
> the table schema if helpful but I didn't think it was needed in this case.
> Thank you.
>
> AND (CASE WHEN @OnlyDuplicates = 0 THEN 'T'
> WHEN @OnlyDuplicates = 1 THEN
> CASE WHEN dbo.Timesheets.WorkerLinkID IN(SELECT dbo.Timesheets.WorkerLinkID
> FROM dbo.Timesheets INNER JOIN
> dbo.PeopleLink ON dbo.Timesheets.WorkerLinkID =
> dbo.PeopleLink.PeopleLinkID
> WHERE (dbo.Timesheets.CheckID = 0)
> GROUP BY dbo.Timesheets.WorkerLinkID,
> dbo.Timesheets.ClientLinkID,
> dbo.Timesheets.ServiceID,
> dbo.Timesheets.Week2Ending,
> dbo.Timesheets.PayRate,
> dbo.Timesheets.AuthNumber,
> dbo.PeopleLink.Branch
> HAVING (COUNT(*) > 1)
> AND (dbo.PeopleLink.Branch = @Branch)) THEN 'T'
> ELSE 'F'
> END
> END = 'T')
>
> --
> David