From: Christoffer on
Hello, one or two of my colleague has managed to break a foreign key in our
database, I'd like some help to understand how this have happened. This might
be a long story so bear with me :)

There are two tables involved in this. The "DataRepository" that holds data
that are to be sent to a "DataAddress". Thus the DataRepository has a foreign
key to DataAddress. The DataRepository has a couple of columns, but the
important ones is Dar_Id (PK, bigint) and the Dar_Daa_Id (FK, bigint). The
DataAddres has a Daa_Id (PK, bigint). The foreign key is set between
Dar_Daa_Id and Daa_Id.

The story is that the DataRepository and DataAddress tables grew quickly in
our production systems because a daily clean up job wasn't started. This was
discovered when the server was low on disk space. The clean up job was
started and ran a stored procedures that did two things:

DELETE FROM [DataRepository]
WHERE [Dar_Id] NOT IN
(SELECT [Meq_Dar_Id] FROM [MessageQueue]) AND [Dar_CreateDate] <
DATEADD(hour, -1, getdate())

DELETE FROM [DataAddress]
WHERE [Daa_Id] NOT IN (SELECT [Dar_Daa_Id_Sender] FROM [DataRepository]
WHERE [Dar_Daa_Id_Sender] IS NOT NULL)
AND [Daa_Id] NOT IN (SELECT [Dar_Daa_Id_Receiver] FROM [DataRepository]
WHERE [Dar_Daa_Id_Receiver] IS NOT NULL)

The first statement pretty much deletes data from DataRepository that is not
in use and is older than one hour. The second deletes data from DataAddress
that isn't used be DataRepository.

After the clean up job (took about 55 minutes), we ended up with 20 rows in
DataRepository that does not exist in DataAddress(!). The rows were created
when the clean up job had just finished and were created in less than a
minute. I've checked the sys.foreign_keys table and both is_disabled and
is_not_trusted is 0.

I'm not sure what the next step is, how can this happen?

Cheers,
Chris

From: John Bell on
On Tue, 27 Apr 2010 23:36:01 -0700, Christoffer
<Christoffer(a)discussions.microsoft.com> wrote:

>Hello, one or two of my colleague has managed to break a foreign key in our
>database, I'd like some help to understand how this have happened. This might
>be a long story so bear with me :)
>
>There are two tables involved in this. The "DataRepository" that holds data
>that are to be sent to a "DataAddress". Thus the DataRepository has a foreign
>key to DataAddress. The DataRepository has a couple of columns, but the
>important ones is Dar_Id (PK, bigint) and the Dar_Daa_Id (FK, bigint). The
>DataAddres has a Daa_Id (PK, bigint). The foreign key is set between
>Dar_Daa_Id and Daa_Id.
>
>The story is that the DataRepository and DataAddress tables grew quickly in
>our production systems because a daily clean up job wasn't started. This was
>discovered when the server was low on disk space. The clean up job was
>started and ran a stored procedures that did two things:
>
>DELETE FROM [DataRepository]
>WHERE [Dar_Id] NOT IN
>(SELECT [Meq_Dar_Id] FROM [MessageQueue]) AND [Dar_CreateDate] <
>DATEADD(hour, -1, getdate())
>
>DELETE FROM [DataAddress]
>WHERE [Daa_Id] NOT IN (SELECT [Dar_Daa_Id_Sender] FROM [DataRepository]
>WHERE [Dar_Daa_Id_Sender] IS NOT NULL)
>AND [Daa_Id] NOT IN (SELECT [Dar_Daa_Id_Receiver] FROM [DataRepository]
>WHERE [Dar_Daa_Id_Receiver] IS NOT NULL)
>
>The first statement pretty much deletes data from DataRepository that is not
>in use and is older than one hour. The second deletes data from DataAddress
>that isn't used be DataRepository.
>
>After the clean up job (took about 55 minutes), we ended up with 20 rows in
>DataRepository that does not exist in DataAddress(!). The rows were created
>when the clean up job had just finished and were created in less than a
>minute. I've checked the sys.foreign_keys table and both is_disabled and
>is_not_trusted is 0.
>
>I'm not sure what the next step is, how can this happen?
>
>Cheers,
>Chris


Hi

You don't say how the data is inserted! To be consistent the delete
statements should be in a single transaction.

Posting DDL would be a lot clearer and easier to understand than your
description. Example data to re-create the scenario would also be
useful.

What does not seem to fit is that you are deleting from DataAddress
when Daa_id does not match Dar_Daa_id_Sender or Dar_Daa_id_Receiver
not Dar_Daa_ID which is where you say the FK defined.

John
From: Christoffer on
> On Tue, 27 Apr 2010 23:36:01 -0700, Christoffer
> <Christoffer(a)discussions.microsoft.com> wrote:
>
> >Hello, one or two of my colleague has managed to break a foreign key in our
> >database, I'd like some help to understand how this have happened. This might
> >be a long story so bear with me :)
> >
> >There are two tables involved in this. The "DataRepository" that holds data
> >that are to be sent to a "DataAddress". Thus the DataRepository has a foreign
> >key to DataAddress. The DataRepository has a couple of columns, but the
> >important ones is Dar_Id (PK, bigint) and the Dar_Daa_Id (FK, bigint). The
> >DataAddres has a Daa_Id (PK, bigint). The foreign key is set between
> >Dar_Daa_Id and Daa_Id.
> >
> >The story is that the DataRepository and DataAddress tables grew quickly in
> >our production systems because a daily clean up job wasn't started. This was
> >discovered when the server was low on disk space. The clean up job was
> >started and ran a stored procedures that did two things:
> >
> >DELETE FROM [DataRepository]
> >WHERE [Dar_Id] NOT IN
> >(SELECT [Meq_Dar_Id] FROM [MessageQueue]) AND [Dar_CreateDate] <
> >DATEADD(hour, -1, getdate())
> >
> >DELETE FROM [DataAddress]
> >WHERE [Daa_Id] NOT IN (SELECT [Dar_Daa_Id_Sender] FROM [DataRepository]
> >WHERE [Dar_Daa_Id_Sender] IS NOT NULL)
> >AND [Daa_Id] NOT IN (SELECT [Dar_Daa_Id_Receiver] FROM [DataRepository]
> >WHERE [Dar_Daa_Id_Receiver] IS NOT NULL)
> >
> >The first statement pretty much deletes data from DataRepository that is not
> >in use and is older than one hour. The second deletes data from DataAddress
> >that isn't used be DataRepository.
> >
> >After the clean up job (took about 55 minutes), we ended up with 20 rows in
> >DataRepository that does not exist in DataAddress(!). The rows were created
> >when the clean up job had just finished and were created in less than a
> >minute. I've checked the sys.foreign_keys table and both is_disabled and
> >is_not_trusted is 0.
> >
> >I'm not sure what the next step is, how can this happen?
> >
> >Cheers,
> >Chris
>
>
> Hi
>
> You don't say how the data is inserted! To be consistent the delete
> statements should be in a single transaction.
>
> Posting DDL would be a lot clearer and easier to understand than your
> description. Example data to re-create the scenario would also be
> useful.
>
> What does not seem to fit is that you are deleting from DataAddress
> when Daa_id does not match Dar_Daa_id_Sender or Dar_Daa_id_Receiver
> not Dar_Daa_ID which is where you say the FK defined.
>
> John
> .
>

Heya

The data is inserted by using INSERT over a period of time (no batch moves).
The underlying service using the database is a distribution network. So each
time a package arrives it is inserted in the DataRepository, the sender and
receiver are inserted in the DataAddress table.

I also noticed that the DELETE statements aren't in a transaction, but that
"should" not be a problem. They can be used separately. And sorry about the
column names for the FK, there is no Dar_Daa_Id, its Dar_Daa_Id_Receiver and
Dar_Daa_Id_Sender

I'm not sure how I can provide you with data to recreate the problem since
inserting the data would yield a error :) I'll post one of the rows below
though. Scripting the tables and stored procedures gives me this DDL.

CREATE TABLE [dbo].[DataAddress](
[Daa_Id] [bigint] IDENTITY(10,1) NOT NULL,
[Daa_AdapterId] [uniqueidentifier] NOT NULL,
[Daa_ClientUserName] [nvarchar](256) NOT NULL,
CONSTRAINT [PK_DataAddress] PRIMARY KEY CLUSTERED
(
[Daa_Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[DataRepository](
[Dar_Id] [bigint] IDENTITY(10,1) NOT NULL,
[Dar_Key] [binary](32) NOT NULL,
[Dar_InitialVector] [binary](16) NOT NULL,
[Dar_CreateDate] [datetime] NOT NULL,
[Dar_Daa_Id_Sender] [bigint] NOT NULL,
[Dar_Daa_Id_Receiver] [bigint] NULL,
[Dar_DataTrackingId] [uniqueidentifier] NULL,
[Dar_Data] [varbinary](max) NULL,
CONSTRAINT [PK_DataRepository] PRIMARY KEY CLUSTERED
(
[Dar_Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[DataRepository] WITH CHECK ADD CONSTRAINT
[FK_DataRepository_DataAddress_Receiver] FOREIGN KEY([Dar_Daa_Id_Receiver])
REFERENCES [dbo].[DataAddress] ([Daa_Id])
GO

ALTER TABLE [dbo].[DataRepository] CHECK CONSTRAINT
[FK_DataRepository_DataAddress_Receiver]
GO

ALTER TABLE [dbo].[DataRepository] WITH CHECK ADD CONSTRAINT
[FK_DataRepository_DataAddress_Sender] FOREIGN KEY([Dar_Daa_Id_Sender])
REFERENCES [dbo].[DataAddress] ([Daa_Id])
GO

ALTER TABLE [dbo].[DataRepository] CHECK CONSTRAINT
[FK_DataRepository_DataAddress_Sender]
GO

CREATE PROCEDURE [dbo].[DataAddress_Insert]
(
@Daa_AdapterId uniqueidentifier,
@Daa_ClientUserName nvarchar(256),
@Daa_Id bigint out
)
AS
BEGIN

DECLARE @IdTable TABLE ([Daa_Id] bigint)

INSERT INTO [DataAddress]
(
[Daa_AdapterId],
[Daa_ClientUserName]
)
OUTPUT INSERTED.[Daa_Id] INTO @IdTable
VALUES
(
@Daa_AdapterId,
@Daa_ClientUserName
)

SELECT @Daa_Id = [Daa_Id] FROM @IdTable

END
GO

CREATE PROCEDURE [dbo].[DataRepository_Insert]
@Dar_Key binary(32),
@Dar_InitialVector binary(16),
@Dar_Daa_Id_Sender bigint,
@Dar_Daa_Id_Receiver bigint,
@Dar_DataTrackingId uniqueidentifier,
@Dar_Data varbinary(max),
@Dar_Id bigint OUTPUT
AS
BEGIN

DECLARE @IdTable TABLE ([Dar_Id] bigint)

INSERT INTO [DataRepository]
(
[Dar_Key],
[Dar_InitialVector],
[Dar_CreateDate],
[Dar_Daa_Id_Sender],
[Dar_Daa_Id_Receiver],
[Dar_DataTrackingId],
[Dar_Data]
)
OUTPUT INSERTED.[Dar_Id] INTO @IdTable
VALUES
(
@Dar_Key,
@Dar_InitialVector,
getdate(),
@Dar_Daa_Id_Sender,
@Dar_Daa_Id_Receiver,
@Dar_DataTrackingId,
@Dar_Data
)

SELECT @Dar_Id = [Dar_Id] FROM @IdTable

END
GO

Dar_Id: 10010453
Dar_Key: 0x636E134BA5327071447F5CBD60BC48EB6A1FFB838F5DE65007CDDDF6C6F027BA
Dar_InitialVector: 0xEF395BD72247C3B5F4C506DBAE31EEFA
Dar_CreateDate: 2010-04-22 09:20:27.630
Dar_Daa_Id_Sender: 14786105
Dar_Daa_Id_Receiver: 14786106
Dar_DataTrackingId: 46212D94-349D-4289-95BA-178BAD7B6AE7
Dar_Data: NULL

Hope this helps :)
/Chris

From: John Bell on
On Thu, 6 May 2010 07:30:01 -0700, Christoffer
<Christoffer(a)discussions.microsoft.com> wrote:

>> On Tue, 27 Apr 2010 23:36:01 -0700, Christoffer
>> <Christoffer(a)discussions.microsoft.com> wrote:
>>
>> >Hello, one or two of my colleague has managed to break a foreign key in our
>> >database, I'd like some help to understand how this have happened. This might
>> >be a long story so bear with me :)
>> >
>> >There are two tables involved in this. The "DataRepository" that holds data
>> >that are to be sent to a "DataAddress". Thus the DataRepository has a foreign
>> >key to DataAddress. The DataRepository has a couple of columns, but the
>> >important ones is Dar_Id (PK, bigint) and the Dar_Daa_Id (FK, bigint). The
>> >DataAddres has a Daa_Id (PK, bigint). The foreign key is set between
>> >Dar_Daa_Id and Daa_Id.
>> >
>> >The story is that the DataRepository and DataAddress tables grew quickly in
>> >our production systems because a daily clean up job wasn't started. This was
>> >discovered when the server was low on disk space. The clean up job was
>> >started and ran a stored procedures that did two things:
>> >
>> >DELETE FROM [DataRepository]
>> >WHERE [Dar_Id] NOT IN
>> >(SELECT [Meq_Dar_Id] FROM [MessageQueue]) AND [Dar_CreateDate] <
>> >DATEADD(hour, -1, getdate())
>> >
>> >DELETE FROM [DataAddress]
>> >WHERE [Daa_Id] NOT IN (SELECT [Dar_Daa_Id_Sender] FROM [DataRepository]
>> >WHERE [Dar_Daa_Id_Sender] IS NOT NULL)
>> >AND [Daa_Id] NOT IN (SELECT [Dar_Daa_Id_Receiver] FROM [DataRepository]
>> >WHERE [Dar_Daa_Id_Receiver] IS NOT NULL)
>> >
>> >The first statement pretty much deletes data from DataRepository that is not
>> >in use and is older than one hour. The second deletes data from DataAddress
>> >that isn't used be DataRepository.
>> >
>> >After the clean up job (took about 55 minutes), we ended up with 20 rows in
>> >DataRepository that does not exist in DataAddress(!). The rows were created
>> >when the clean up job had just finished and were created in less than a
>> >minute. I've checked the sys.foreign_keys table and both is_disabled and
>> >is_not_trusted is 0.
>> >
>> >I'm not sure what the next step is, how can this happen?
>> >
>> >Cheers,
>> >Chris
>>
>>
>> Hi
>>
>> You don't say how the data is inserted! To be consistent the delete
>> statements should be in a single transaction.
>>
>> Posting DDL would be a lot clearer and easier to understand than your
>> description. Example data to re-create the scenario would also be
>> useful.
>>
>> What does not seem to fit is that you are deleting from DataAddress
>> when Daa_id does not match Dar_Daa_id_Sender or Dar_Daa_id_Receiver
>> not Dar_Daa_ID which is where you say the FK defined.
>>
>> John
>> .
>>
>
>Heya
>
>The data is inserted by using INSERT over a period of time (no batch moves).
>The underlying service using the database is a distribution network. So each
>time a package arrives it is inserted in the DataRepository, the sender and
>receiver are inserted in the DataAddress table.
>
>I also noticed that the DELETE statements aren't in a transaction, but that
>"should" not be a problem. They can be used separately. And sorry about the
>column names for the FK, there is no Dar_Daa_Id, its Dar_Daa_Id_Receiver and
>Dar_Daa_Id_Sender
>
>I'm not sure how I can provide you with data to recreate the problem since
>inserting the data would yield a error :) I'll post one of the rows below
>though. Scripting the tables and stored procedures gives me this DDL.
>
>CREATE TABLE [dbo].[DataAddress](
> [Daa_Id] [bigint] IDENTITY(10,1) NOT NULL,
> [Daa_AdapterId] [uniqueidentifier] NOT NULL,
> [Daa_ClientUserName] [nvarchar](256) NOT NULL,
>CONSTRAINT [PK_DataAddress] PRIMARY KEY CLUSTERED
>(
> [Daa_Id] ASC
>)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
>OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
>) ON [PRIMARY]
>GO
>
>CREATE TABLE [dbo].[DataRepository](
> [Dar_Id] [bigint] IDENTITY(10,1) NOT NULL,
> [Dar_Key] [binary](32) NOT NULL,
> [Dar_InitialVector] [binary](16) NOT NULL,
> [Dar_CreateDate] [datetime] NOT NULL,
> [Dar_Daa_Id_Sender] [bigint] NOT NULL,
> [Dar_Daa_Id_Receiver] [bigint] NULL,
> [Dar_DataTrackingId] [uniqueidentifier] NULL,
> [Dar_Data] [varbinary](max) NULL,
>CONSTRAINT [PK_DataRepository] PRIMARY KEY CLUSTERED
>(
> [Dar_Id] ASC
>)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =
>OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
>) ON [PRIMARY]
>GO
>
>ALTER TABLE [dbo].[DataRepository] WITH CHECK ADD CONSTRAINT
>[FK_DataRepository_DataAddress_Receiver] FOREIGN KEY([Dar_Daa_Id_Receiver])
>REFERENCES [dbo].[DataAddress] ([Daa_Id])
>GO
>
>ALTER TABLE [dbo].[DataRepository] CHECK CONSTRAINT
>[FK_DataRepository_DataAddress_Receiver]
>GO
>
>ALTER TABLE [dbo].[DataRepository] WITH CHECK ADD CONSTRAINT
>[FK_DataRepository_DataAddress_Sender] FOREIGN KEY([Dar_Daa_Id_Sender])
>REFERENCES [dbo].[DataAddress] ([Daa_Id])
>GO
>
>ALTER TABLE [dbo].[DataRepository] CHECK CONSTRAINT
>[FK_DataRepository_DataAddress_Sender]
>GO
>
>CREATE PROCEDURE [dbo].[DataAddress_Insert]
>(
> @Daa_AdapterId uniqueidentifier,
> @Daa_ClientUserName nvarchar(256),
> @Daa_Id bigint out
>)
>AS
>BEGIN
>
>DECLARE @IdTable TABLE ([Daa_Id] bigint)
>
>INSERT INTO [DataAddress]
>(
> [Daa_AdapterId],
> [Daa_ClientUserName]
>)
>OUTPUT INSERTED.[Daa_Id] INTO @IdTable
>VALUES
>(
> @Daa_AdapterId,
> @Daa_ClientUserName
>)
>
>SELECT @Daa_Id = [Daa_Id] FROM @IdTable
>
>END
>GO
>
>CREATE PROCEDURE [dbo].[DataRepository_Insert]
> @Dar_Key binary(32),
> @Dar_InitialVector binary(16),
> @Dar_Daa_Id_Sender bigint,
> @Dar_Daa_Id_Receiver bigint,
> @Dar_DataTrackingId uniqueidentifier,
> @Dar_Data varbinary(max),
> @Dar_Id bigint OUTPUT
>AS
>BEGIN
>
>DECLARE @IdTable TABLE ([Dar_Id] bigint)
>
>INSERT INTO [DataRepository]
>(
> [Dar_Key],
> [Dar_InitialVector],
> [Dar_CreateDate],
> [Dar_Daa_Id_Sender],
> [Dar_Daa_Id_Receiver],
> [Dar_DataTrackingId],
> [Dar_Data]
>)
>OUTPUT INSERTED.[Dar_Id] INTO @IdTable
>VALUES
>(
> @Dar_Key,
> @Dar_InitialVector,
> getdate(),
> @Dar_Daa_Id_Sender,
> @Dar_Daa_Id_Receiver,
> @Dar_DataTrackingId,
> @Dar_Data
>)
>
>SELECT @Dar_Id = [Dar_Id] FROM @IdTable
>
>END
>GO
>
>Dar_Id: 10010453
>Dar_Key: 0x636E134BA5327071447F5CBD60BC48EB6A1FFB838F5DE65007CDDDF6C6F027BA
>Dar_InitialVector: 0xEF395BD72247C3B5F4C506DBAE31EEFA
>Dar_CreateDate: 2010-04-22 09:20:27.630
>Dar_Daa_Id_Sender: 14786105
>Dar_Daa_Id_Receiver: 14786106
>Dar_DataTrackingId: 46212D94-349D-4289-95BA-178BAD7B6AE7
>Dar_Data: NULL
>
>Hope this helps :)
>/Chris

Hi Chris

It's interesting that the FKs are created with CHECK as altering the
constraint to set CHECK is not necessary.

If the FKs were created with NOCHECK and there is was invalid data
before they were created, then altering the constraints to set CHECK
would still work, but if this occurred is_not_trusted would be 1 in
sys.foreign_keys which you say is not the case.

Therefore if you drop the foreign key and then re-created them it
should fail.

I'm not sure if you are going to find out how this occurred and I
don't think you are going to find out now.

Going forward I would make sure that your transactions are correctly
formed and that your error checking is thorough.

John