From: James Hunter Ross on
We are running a big (6 million entities) conversion for a customer. Rather
than BCP or anything "streamlined" we need to use our standard INSERT
procedure to ensure complex business rules are enforced. The procedure is
quite complex. Many checks are done prior to the actual record INSERT.
Some of these check are queries against the table(s) being inserted into.
(Presently taking 30 hours to complete, hope to pull it down under 20.)

Performance during processing is far from linear. Our own experiments show
that raw INSERT into the main table are fairly linear, (which to me rules
out/down index maintenance, page splitting, etc.) Until I can spend more
time actually profiling things, I thought I'd pose a question.

Do you think there is any value in clearing the query and or procedure cache
periodically throughout processing? Is it even possible? My thinking is
that plans that were good while the main table is small are not so good once
millions of records exist.

I know it's a pretty vague, but any loose thoughts of yours will be very
much appreciated!

Thanks, James


From: Erland Sommarskog on
James Hunter Ross (james.ross(a)oneilsoft.com) writes:
> We are running a big (6 million entities) conversion for a customer.
> Rather than BCP or anything "streamlined" we need to use our standard
> INSERT procedure to ensure complex business rules are enforced. The
> procedure is quite complex. Many checks are done prior to the actual
> record INSERT. Some of these check are queries against the table(s)
> being inserted into. (Presently taking 30 hours to complete, hope to
> pull it down under 20.)

Hah, that's nothing! I have done similar exercises, and for the biggest
conversion, it took six days to execute.

Anyway, I perfectly understand why you take the path you take, as we
did the same.

> Performance during processing is far from linear. Our own experiments
> show that raw INSERT into the main table are fairly linear, (which to me
> rules out/down index maintenance, page splitting, etc.) Until I can
> spend more time actually profiling things, I thought I'd pose a
> question.
>
> Do you think there is any value in clearing the query and or procedure
> cache periodically throughout processing? Is it even possible? My
> thinking is that plans that were good while the main table is small are
> not so good once millions of records exist.

What we did in our conversions is that we periodically ran reindexing
on tables that we wrote to. By reindexing, we also got statistics
updated with fullscan, not just only sampled. And as you probably know,
new statistics triggers recompiles.

The reindexing was part of the conversion job. I think the logic was to
rebuild indexes every third month we passed by. (Our job replays the
old transaction days so to speak.)

Another thing that we had in our job was that it ran backups at certain
points, so if something went very wrong, we did not always have to start
from the beginning.


--
Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx