From: Lars Tafjord on
A variable length column need to change its nullability property:

alter table T alter column c varchar(5000) not null;

The length of 5000 is not changed.

Changing a column's nullability to disallow NULLs, should imply a scan of
the table and then a change of the metadata. The actual data should not be
modified. That seems reasonable, and that's how we have understood it from
written sources on the subject. However, using Profiler, it turns out that
its execution requires a lot of resources. It seems like the table is read
twice and written once. Why?

If the above observation is correct, in a system where the table has one
billion (1,000,000,000) rows, issuing such a DDL statement is thus not
possible/doable. How should we proceed?

--
Lars T
Siemens Healthcare, Norway
From: Gert-Jan Strik on
Lars Tafjord wrote:
> If the above observation is correct, in a system where the table has one
> billion (1,000,000,000) rows, issuing such a DDL statement is thus not
> possible/doable.

Why not? Obviously it would take quite some time, locking the table in
the process. You could create a medium sized table and see how it react
when you run a statement like that.

> How should we proceed?

Test and prepare. Calculate the resources and maintenance window you
would need.

Of course there are always alternatives, but IMO the best way to
disallow NULLs is to use the Nullability property.

Alternatives:
1. add a CHECK constraint that disallows NULL

2. add a trigger that disallows NULL

3. create a new table with the right definition. Copy all the data. Drop
the original table. Rename the new table.

--
Gert-Jan
From: Erland Sommarskog on
Lars Tafjord (lars(dot)tafjord(at)live(dot)no) writes:
> A variable length column need to change its nullability property:
>
> alter table T alter column c varchar(5000) not null;
>
> The length of 5000 is not changed.
>
> Changing a column's nullability to disallow NULLs, should imply a scan of
> the table and then a change of the metadata. The actual data should not be
> modified. That seems reasonable, and that�s how we have understood it from
> written sources on the subject. However, using Profiler, it turns out that
> its execution requires a lot of resources. It seems like the table is read
> twice and written once. Why?

Peeking with Profiler I find that ALTER TABLE emits this command:

UPDATE tbl SET col = col

The purpose is surely to trap NULL values.

In my test table I was not able to detect any actual write activity.
That is, SQL Server is smart enough to avoid writes when the value does
not change.

I tested this on both SQL 2005 and SQL 2008. Which version do you have?

> If the above observation is correct, in a system where the table has one
> billion (1,000,000,000) rows, issuing such a DDL statement is thus not
> possible/doable. How should we proceed?

Any table change that is not a metadata on a table that size presents a
major challenge. Particulary if the table must be readable while the
change is running. Even more if you must also support writes.

A CHECK constraint like Gert-Jan suggested, but if you want the optimizer
to trust the constraint, you must permit a single scan of the table.

The alternative is to take the long road and copy data over. If you need
to support concurrent reads and writes, this can be done, but it sure is
not trivial. Test carefully.


--
Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx