From: SnapDive on
I have a table with only a simple int pk. I do a merge operation and
need to compare several columns in order to know if I need to 'when
not matched then insert'. There are 20,000 rows+ in the source table
(a temp table) and a few million in the destination table. The table
schemata are the same.

I know the approx number of unique values in each table, and think I
need to create a unique index across all of the columns. Is that true?
What is the right syntax for doing that?

-- ColumnA: 167 unique values in #temptable
-- ColumnB: 1 unique values in #temptable
-- ColumnC: 13 unique values in #temptable
-- ColumnD: 89 unique values in #temptable
-- ColumnE: 550 unique values in #temptable
-- ColumnF: 487 unique values in #temptable

-- ColumnA: 3690 unique values in desttable
-- ColumnB: 3 unique values (plus null is possible) in desttable
-- ColumnC: 1113 unique values in desttable
-- ColumnD: 2662 unique values in desttable
-- ColumnE: 1770 unique values in desttable
-- ColumnF: 1480 unique values in desttable
From: Erland Sommarskog on
SnapDive (SnapDive(a)community.nospam) writes:
> I have a table with only a simple int pk. I do a merge operation and
> need to compare several columns in order to know if I need to 'when
> not matched then insert'. There are 20,000 rows+ in the source table
> (a temp table) and a few million in the destination table. The table
> schemata are the same.
>
> I know the approx number of unique values in each table, and think I
> need to create a unique index across all of the columns. Is that true?
> What is the right syntax for doing that?

Not really sure why you would need to create an index over all
the columns, although creating the index on the desttable may be good for
performance.

The syntax? You did not try Books Online? Seems like a more efficient way to
get answer of a question than waiting for a response in a newsgroup. But
here it goes:

CREATE UNIQUE INDEX my_index ON desttable(ColumnA, ColumnB, ...)



--
Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

From: SnapDive on
"synatx" was not the right word. Should have used strategy. One thing
that came to mind was creating a new varchar(42) column and stuffing
the sha1 hash of the column values in there, then I could have a
simple unique index on that single column instead of a unique index on
the 6 columns. Not sure how that would perform though.




On Thu, 29 Jul 2010 00:07:15 +0200, Erland Sommarskog
<esquel(a)sommarskog.se> wrote:

>SnapDive (SnapDive(a)community.nospam) writes:
>> I have a table with only a simple int pk. I do a merge operation and
>> need to compare several columns in order to know if I need to 'when
>> not matched then insert'. There are 20,000 rows+ in the source table
>> (a temp table) and a few million in the destination table. The table
>> schemata are the same.
>>
>> I know the approx number of unique values in each table, and think I
>> need to create a unique index across all of the columns. Is that true?
>> What is the right syntax for doing that?
>
>Not really sure why you would need to create an index over all
>the columns, although creating the index on the desttable may be good for
>performance.
>
>The syntax? You did not try Books Online? Seems like a more efficient way to
>get answer of a question than waiting for a response in a newsgroup. But
>here it goes:
>
> CREATE UNIQUE INDEX my_index ON desttable(ColumnA, ColumnB, ...)
>

From: Dan on
I'm not sure if it would perform better, but how about a computed column
using CHECKSUM for the columns you need to compare - this might be quicker
than a "roll your own" hashing solution.

Dan


"SnapDive" <SnapDive(a)community.nospam> wrote in message
news:8nf156p8jaf05ueco0vmsc167qltm0hi16(a)4ax.com...
> "synatx" was not the right word. Should have used strategy. One thing
> that came to mind was creating a new varchar(42) column and stuffing
> the sha1 hash of the column values in there, then I could have a
> simple unique index on that single column instead of a unique index on
> the 6 columns. Not sure how that would perform though.
>
>
>
>
> On Thu, 29 Jul 2010 00:07:15 +0200, Erland Sommarskog
> <esquel(a)sommarskog.se> wrote:
>
>>SnapDive (SnapDive(a)community.nospam) writes:
>>> I have a table with only a simple int pk. I do a merge operation and
>>> need to compare several columns in order to know if I need to 'when
>>> not matched then insert'. There are 20,000 rows+ in the source table
>>> (a temp table) and a few million in the destination table. The table
>>> schemata are the same.
>>>
>>> I know the approx number of unique values in each table, and think I
>>> need to create a unique index across all of the columns. Is that true?
>>> What is the right syntax for doing that?
>>
>>Not really sure why you would need to create an index over all
>>the columns, although creating the index on the desttable may be good for
>>performance.
>>
>>The syntax? You did not try Books Online? Seems like a more efficient way
>>to
>>get answer of a question than waiting for a response in a newsgroup. But
>>here it goes:
>>
>> CREATE UNIQUE INDEX my_index ON desttable(ColumnA, ColumnB, ...)
>>
>



From: Erland Sommarskog on
SnapDive (SnapDive(a)community.nospam) writes:
> "synatx" was not the right word. Should have used strategy. One thing
> that came to mind was creating a new varchar(42) column and stuffing
> the sha1 hash of the column values in there, then I could have a
> simple unique index on that single column instead of a unique index on
> the 6 columns. Not sure how that would perform though.

I would go for the index on the six columns, at least at first. You know,
keep it simple.

Dan suggested CHECKSUM(), but with a few million rows in the destination
table, you are bound to get rows with the same checksum, even if they
have different content. A 32-bit value is not enough for that volume.

--
Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx