From: Rubens on
Hey Kevin,

Sorry for the late reply as I haven't been able to check in here until now.

I can appreciate where you are coming from. Last week was an extremely
stressful week as you can imagine. To top it off, we lost our Senior DBA
(we are supposed to have 3), and we recently hired 2 other DBAs. I've been
trying to train one of them and the more senior hire will be starting in 2
weeks. So I've been pretty much going at it solo and things became even
more stressful when that failover happened while I was on vacation, only to
come back and deal with the aftermath.

So I apologize if I came across poorly to you here as I am sure I let my
stresses and emotions get the better of me. I am sure I took your comments
the wrong when when essentially what you were saying was, "Stop everything
immediately and get on the horn to Microsoft!" And that made total sense.

I also appreciate your comments below. We actually have a new CIO and
Director that have been in place for about 4 months now and they are
definitely not happy with these types of outages. The building this server
is housed in has terrible power issues, and the new senior leadership team
has actually looked at buying a generator. But with a $250,000 price tag
just to run the wiring required, they aren't going to do it. They are now
looking at a new site altogether to move the hardware too. It's unfortunate
considering this is a 16 processor clustered box with 128 GB of RAM, only to
go down to bad power.

In case anyone is interested, we've identified the issue as a problem with
Change Tracking and something going wrong when the power went out. We
(Microsoft and I) essentially purged most of this system table until there
was no more key violation error. Once that was done, we were able to get
good backups again. We were the second customer only to report this issue
to Microsoft and are kind of in unchartered ground.

Additionally, our Change Tracking process isn't cleaning up after itself the
way it's supposed to so they've recommended we install cumulative update 5.
We are going to do that soon after we've put it through our test
environment.

http://support.microsoft.com/default.aspx?scid=kb;EN-US;973696

Thanks again for your comments and feedback. I do appreciate the time you
put in and the advice you offer.

Rubens

"TheSQLGuru" <kgboles(a)earthlink.net> wrote in message
news:z8CdnejDS5LKz4zWnZ2dnUVZ_hSdnZ2d(a)earthlink.com...
> There have been many examples of people who have wasted a LOT of time
> looking for help on forums for a production outage when getting MS support
> (or a qualified consultant) on board immediately was by far their best
> option. I recommended that as your best option since you failed to
> mention that you were already in the queue with MS support.
>
> I could have also asked other questions but they would have all been
> related to what could/should have been done (had you been doing
> consistency checks, had you been doing test restores of your backups to
> make sure they were viable, etc), but I did not because they were not
> relevant or helpful to getting you back online. I could also question if
> appropriate measures were in place on the hardware and infrastructure side
> such as battery backups for caches and systems as a whole and why the
> system wasn't shut down gracefully when the power went out. Hopefully if
> all of those things were not being done you can now put them into place
> once you get back online.
>
> Best of luck with the recovery.
>
> --
> Kevin G. Boles
> Indicium Resources, Inc.
> SQL Server MVP
> kgboles a earthlink dt net
>
>
> "Rubens" <rubensrose(a)hotmail.com> wrote in message
> news:C598299C-72D1-4B15-A565-4CDA17ED5189(a)microsoft.com...
>> I'm a little stunned that you'd make the assumption that this was all I
>> was doing. I already had a call into Microsoft when I posted this and
>> was waiting for a call back, so I was looking to this group to provide me
>> a little direction as I was researching the problem on my own.
>>
>> Thanks for your help Kevin.
>>
>> Rubens
>>
>> "TheSQLGuru" <kgboles(a)earthlink.net> wrote in message
>> news:4ISdnWvcFrOt-JbWnZ2dnUVZ_uednZ2d(a)earthlink.com...
>>> Gotta say I am pretty stunned that you have failing backups and a
>>> potentially corrupted system on a 1.7TB OLTP system and you are hunting
>>> and pecking here on a forum for help? This should be zooming up the
>>> food chain at Microsoft customer support...
>>>
>>> --
>>> Kevin G. Boles
>>> Indicium Resources, Inc.
>>> SQL Server MVP
>>> kgboles a earthlink dt net
>>>
>>>
>>> "Rubens" <rubensrose(a)hotmail.com> wrote in message
>>> news:0D1604E3-BB25-488E-8251-4BFF84C5A257(a)microsoft.com...
>>>>I have a rather concerning issue on our OLTP system for our largest
>>>>database (1.7 TB's). On Friday (when I was on vacation of course), we
>>>>had a power outage and the cluster failed. Everything was brought back
>>>>online from the Network guys, and then our team of DBA's took over with
>>>>making sure SQL was ok using the documentation we have. Since then, the
>>>>log backups for the above database began to fail, but it still appears
>>>>to be writing the log files. We do weekly full backups, followed by
>>>>daily differentials. My hope was that after the full backup ran,
>>>>everything would be fine with the log backups. Nope, they are still
>>>>failing with the error message below...
>>>>
>>>> Processed 127550 pages for database 'Focus', file 'Focus2000_Log' on
>>>> file 1. [SQLSTATE 01000] (Message 4035) Processed 0 pages for database
>>>> 'Focus', file 'Focus2000_Log3' on file 1. [SQLSTATE 01000] (Message
>>>> 4035) Cannot insert duplicate key row in object 'sys.syscommittab' with
>>>> unique index 'si_xdes_id'. [SQLSTATE 23000] (Error 2601) Failed to
>>>> flush the commit table to disk in dbid 7 due to error 2601. Check the
>>>> errorlog for more information. [SQLSTATE 42000] (Error 3999) The
>>>> statement has been terminated. [SQLSTATE 01000] (Error 3621). The step
>>>> failed.
>>>>
>>>> I was hoping that the weekend full backups would reset the database and
>>>> then the log backups would run fine after. NOPE. Still having the
>>>> same issue. And interestingly, or index rebuild job failed with the
>>>> same error message for this database.
>>>>
>>>> SQL 2008 SP1, Enterprise Edition, active-active-active cluster on
>>>> Windows 2003 R2 x64 SP2
>>>>
>>>> Can someone offer any advice?
>>>>
>>>> Thanks,
>>>> Rubens
>>>
>>>
>
>
From: Tibor Karaszi on
<<In case anyone is interested, we've identified the issue as a problem with
Change Tracking and something going wrong when the power went out. >>

Ouch. Thanks for reporting this for us, Rubens.

--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://sqlblog.com/blogs/tibor_karaszi



"Rubens" <rubensrose(a)hotmail.com> wrote in message
news:21D27569-AC24-45D0-B823-DE8DFE88B203(a)microsoft.com...
> Hey Kevin,
>
> Sorry for the late reply as I haven't been able to check in here until
> now.
>
> I can appreciate where you are coming from. Last week was an extremely
> stressful week as you can imagine. To top it off, we lost our Senior DBA
> (we are supposed to have 3), and we recently hired 2 other DBAs. I've
> been trying to train one of them and the more senior hire will be starting
> in 2 weeks. So I've been pretty much going at it solo and things became
> even more stressful when that failover happened while I was on vacation,
> only to come back and deal with the aftermath.
>
> So I apologize if I came across poorly to you here as I am sure I let my
> stresses and emotions get the better of me. I am sure I took your
> comments the wrong when when essentially what you were saying was, "Stop
> everything immediately and get on the horn to Microsoft!" And that made
> total sense.
>
> I also appreciate your comments below. We actually have a new CIO and
> Director that have been in place for about 4 months now and they are
> definitely not happy with these types of outages. The building this
> server is housed in has terrible power issues, and the new senior
> leadership team has actually looked at buying a generator. But with a
> $250,000 price tag just to run the wiring required, they aren't going to
> do it. They are now looking at a new site altogether to move the hardware
> too. It's unfortunate considering this is a 16 processor clustered box
> with 128 GB of RAM, only to go down to bad power.
>
> In case anyone is interested, we've identified the issue as a problem with
> Change Tracking and something going wrong when the power went out. We
> (Microsoft and I) essentially purged most of this system table until there
> was no more key violation error. Once that was done, we were able to get
> good backups again. We were the second customer only to report this issue
> to Microsoft and are kind of in unchartered ground.
>
> Additionally, our Change Tracking process isn't cleaning up after itself
> the way it's supposed to so they've recommended we install cumulative
> update 5. We are going to do that soon after we've put it through our test
> environment.
>
> http://support.microsoft.com/default.aspx?scid=kb;EN-US;973696
>
> Thanks again for your comments and feedback. I do appreciate the time you
> put in and the advice you offer.
>
> Rubens
>
> "TheSQLGuru" <kgboles(a)earthlink.net> wrote in message
> news:z8CdnejDS5LKz4zWnZ2dnUVZ_hSdnZ2d(a)earthlink.com...
>> There have been many examples of people who have wasted a LOT of time
>> looking for help on forums for a production outage when getting MS
>> support (or a qualified consultant) on board immediately was by far their
>> best option. I recommended that as your best option since you failed to
>> mention that you were already in the queue with MS support.
>>
>> I could have also asked other questions but they would have all been
>> related to what could/should have been done (had you been doing
>> consistency checks, had you been doing test restores of your backups to
>> make sure they were viable, etc), but I did not because they were not
>> relevant or helpful to getting you back online. I could also question if
>> appropriate measures were in place on the hardware and infrastructure
>> side such as battery backups for caches and systems as a whole and why
>> the system wasn't shut down gracefully when the power went out.
>> Hopefully if all of those things were not being done you can now put them
>> into place once you get back online.
>>
>> Best of luck with the recovery.
>>
>> --
>> Kevin G. Boles
>> Indicium Resources, Inc.
>> SQL Server MVP
>> kgboles a earthlink dt net
>>
>>
>> "Rubens" <rubensrose(a)hotmail.com> wrote in message
>> news:C598299C-72D1-4B15-A565-4CDA17ED5189(a)microsoft.com...
>>> I'm a little stunned that you'd make the assumption that this was all I
>>> was doing. I already had a call into Microsoft when I posted this and
>>> was waiting for a call back, so I was looking to this group to provide
>>> me a little direction as I was researching the problem on my own.
>>>
>>> Thanks for your help Kevin.
>>>
>>> Rubens
>>>
>>> "TheSQLGuru" <kgboles(a)earthlink.net> wrote in message
>>> news:4ISdnWvcFrOt-JbWnZ2dnUVZ_uednZ2d(a)earthlink.com...
>>>> Gotta say I am pretty stunned that you have failing backups and a
>>>> potentially corrupted system on a 1.7TB OLTP system and you are hunting
>>>> and pecking here on a forum for help? This should be zooming up the
>>>> food chain at Microsoft customer support...
>>>>
>>>> --
>>>> Kevin G. Boles
>>>> Indicium Resources, Inc.
>>>> SQL Server MVP
>>>> kgboles a earthlink dt net
>>>>
>>>>
>>>> "Rubens" <rubensrose(a)hotmail.com> wrote in message
>>>> news:0D1604E3-BB25-488E-8251-4BFF84C5A257(a)microsoft.com...
>>>>>I have a rather concerning issue on our OLTP system for our largest
>>>>>database (1.7 TB's). On Friday (when I was on vacation of course), we
>>>>>had a power outage and the cluster failed. Everything was brought back
>>>>>online from the Network guys, and then our team of DBA's took over with
>>>>>making sure SQL was ok using the documentation we have. Since then,
>>>>>the log backups for the above database began to fail, but it still
>>>>>appears to be writing the log files. We do weekly full backups,
>>>>>followed by daily differentials. My hope was that after the full
>>>>>backup ran, everything would be fine with the log backups. Nope, they
>>>>>are still failing with the error message below...
>>>>>
>>>>> Processed 127550 pages for database 'Focus', file 'Focus2000_Log' on
>>>>> file 1. [SQLSTATE 01000] (Message 4035) Processed 0 pages for
>>>>> database 'Focus', file 'Focus2000_Log3' on file 1. [SQLSTATE 01000]
>>>>> (Message 4035) Cannot insert duplicate key row in object
>>>>> 'sys.syscommittab' with unique index 'si_xdes_id'. [SQLSTATE 23000]
>>>>> (Error 2601) Failed to flush the commit table to disk in dbid 7 due
>>>>> to error 2601. Check the errorlog for more information. [SQLSTATE
>>>>> 42000] (Error 3999) The statement has been terminated. [SQLSTATE
>>>>> 01000] (Error 3621). The step failed.
>>>>>
>>>>> I was hoping that the weekend full backups would reset the database
>>>>> and then the log backups would run fine after. NOPE. Still having
>>>>> the same issue. And interestingly, or index rebuild job failed with
>>>>> the same error message for this database.
>>>>>
>>>>> SQL 2008 SP1, Enterprise Edition, active-active-active cluster on
>>>>> Windows 2003 R2 x64 SP2
>>>>>
>>>>> Can someone offer any advice?
>>>>>
>>>>> Thanks,
>>>>> Rubens
>>>>
>>>>
>>
>>
From: TheSQLGuru on
Quite the story there! Multiple levels of badness and unfortunateness for
sure.

Change Tracking is essentially 1.0 product. I am not surprised some issues
are surfacing with it. Pretty complex stuff going on under the covers
there. Glad you could get things functional again! That isn't always
possible...

As a consultant, I have had to have several heart-to-hearts with CIOs and
CFOs about issues just like you mention. Generators are rarely a viable
option unless you have a major facility in place already that also has
sufficient infrastructure, zoning/permit/environmental issues covered, etc.
Hosting is the obvious solution for most entities.

Feel free to drop me a line if you get in a pickle or need some help getting
things going in the right direction. Sometimes C-level people will listen
to consultants when they won't listen to in-house resources (even if we
state the same things the staffers have been screaming until they are blue
in the face).


--
Kevin G. Boles
Indicium Resources, Inc.
SQL Server MVP
kgboles a earthlink dt net


"Rubens" <rubensrose(a)hotmail.com> wrote in message
news:21D27569-AC24-45D0-B823-DE8DFE88B203(a)microsoft.com...
> Hey Kevin,
>
> Sorry for the late reply as I haven't been able to check in here until
> now.
>
> I can appreciate where you are coming from. Last week was an extremely
> stressful week as you can imagine. To top it off, we lost our Senior DBA
> (we are supposed to have 3), and we recently hired 2 other DBAs. I've
> been trying to train one of them and the more senior hire will be starting
> in 2 weeks. So I've been pretty much going at it solo and things became
> even more stressful when that failover happened while I was on vacation,
> only to come back and deal with the aftermath.
>
> So I apologize if I came across poorly to you here as I am sure I let my
> stresses and emotions get the better of me. I am sure I took your
> comments the wrong when when essentially what you were saying was, "Stop
> everything immediately and get on the horn to Microsoft!" And that made
> total sense.
>
> I also appreciate your comments below. We actually have a new CIO and
> Director that have been in place for about 4 months now and they are
> definitely not happy with these types of outages. The building this
> server is housed in has terrible power issues, and the new senior
> leadership team has actually looked at buying a generator. But with a
> $250,000 price tag just to run the wiring required, they aren't going to
> do it. They are now looking at a new site altogether to move the hardware
> too. It's unfortunate considering this is a 16 processor clustered box
> with 128 GB of RAM, only to go down to bad power.
>
> In case anyone is interested, we've identified the issue as a problem with
> Change Tracking and something going wrong when the power went out. We
> (Microsoft and I) essentially purged most of this system table until there
> was no more key violation error. Once that was done, we were able to get
> good backups again. We were the second customer only to report this issue
> to Microsoft and are kind of in unchartered ground.
>
> Additionally, our Change Tracking process isn't cleaning up after itself
> the way it's supposed to so they've recommended we install cumulative
> update 5. We are going to do that soon after we've put it through our test
> environment.
>
> http://support.microsoft.com/default.aspx?scid=kb;EN-US;973696
>
> Thanks again for your comments and feedback. I do appreciate the time you
> put in and the advice you offer.
>
> Rubens
>
> "TheSQLGuru" <kgboles(a)earthlink.net> wrote in message
> news:z8CdnejDS5LKz4zWnZ2dnUVZ_hSdnZ2d(a)earthlink.com...
>> There have been many examples of people who have wasted a LOT of time
>> looking for help on forums for a production outage when getting MS
>> support (or a qualified consultant) on board immediately was by far their
>> best option. I recommended that as your best option since you failed to
>> mention that you were already in the queue with MS support.
>>
>> I could have also asked other questions but they would have all been
>> related to what could/should have been done (had you been doing
>> consistency checks, had you been doing test restores of your backups to
>> make sure they were viable, etc), but I did not because they were not
>> relevant or helpful to getting you back online. I could also question if
>> appropriate measures were in place on the hardware and infrastructure
>> side such as battery backups for caches and systems as a whole and why
>> the system wasn't shut down gracefully when the power went out.
>> Hopefully if all of those things were not being done you can now put them
>> into place once you get back online.
>>
>> Best of luck with the recovery.
>>
>> --
>> Kevin G. Boles
>> Indicium Resources, Inc.
>> SQL Server MVP
>> kgboles a earthlink dt net
>>
>>
>> "Rubens" <rubensrose(a)hotmail.com> wrote in message
>> news:C598299C-72D1-4B15-A565-4CDA17ED5189(a)microsoft.com...
>>> I'm a little stunned that you'd make the assumption that this was all I
>>> was doing. I already had a call into Microsoft when I posted this and
>>> was waiting for a call back, so I was looking to this group to provide
>>> me a little direction as I was researching the problem on my own.
>>>
>>> Thanks for your help Kevin.
>>>
>>> Rubens
>>>
>>> "TheSQLGuru" <kgboles(a)earthlink.net> wrote in message
>>> news:4ISdnWvcFrOt-JbWnZ2dnUVZ_uednZ2d(a)earthlink.com...
>>>> Gotta say I am pretty stunned that you have failing backups and a
>>>> potentially corrupted system on a 1.7TB OLTP system and you are hunting
>>>> and pecking here on a forum for help? This should be zooming up the
>>>> food chain at Microsoft customer support...
>>>>
>>>> --
>>>> Kevin G. Boles
>>>> Indicium Resources, Inc.
>>>> SQL Server MVP
>>>> kgboles a earthlink dt net
>>>>
>>>>
>>>> "Rubens" <rubensrose(a)hotmail.com> wrote in message
>>>> news:0D1604E3-BB25-488E-8251-4BFF84C5A257(a)microsoft.com...
>>>>>I have a rather concerning issue on our OLTP system for our largest
>>>>>database (1.7 TB's). On Friday (when I was on vacation of course), we
>>>>>had a power outage and the cluster failed. Everything was brought back
>>>>>online from the Network guys, and then our team of DBA's took over with
>>>>>making sure SQL was ok using the documentation we have. Since then,
>>>>>the log backups for the above database began to fail, but it still
>>>>>appears to be writing the log files. We do weekly full backups,
>>>>>followed by daily differentials. My hope was that after the full
>>>>>backup ran, everything would be fine with the log backups. Nope, they
>>>>>are still failing with the error message below...
>>>>>
>>>>> Processed 127550 pages for database 'Focus', file 'Focus2000_Log' on
>>>>> file 1. [SQLSTATE 01000] (Message 4035) Processed 0 pages for
>>>>> database 'Focus', file 'Focus2000_Log3' on file 1. [SQLSTATE 01000]
>>>>> (Message 4035) Cannot insert duplicate key row in object
>>>>> 'sys.syscommittab' with unique index 'si_xdes_id'. [SQLSTATE 23000]
>>>>> (Error 2601) Failed to flush the commit table to disk in dbid 7 due
>>>>> to error 2601. Check the errorlog for more information. [SQLSTATE
>>>>> 42000] (Error 3999) The statement has been terminated. [SQLSTATE
>>>>> 01000] (Error 3621). The step failed.
>>>>>
>>>>> I was hoping that the weekend full backups would reset the database
>>>>> and then the log backups would run fine after. NOPE. Still having
>>>>> the same issue. And interestingly, or index rebuild job failed with
>>>>> the same error message for this database.
>>>>>
>>>>> SQL 2008 SP1, Enterprise Edition, active-active-active cluster on
>>>>> Windows 2003 R2 x64 SP2
>>>>>
>>>>> Can someone offer any advice?
>>>>>
>>>>> Thanks,
>>>>> Rubens
>>>>
>>>>
>>
>>


From: Rubens on
It certainly was a really nerve-racking week, one that I don't want to
experience anytime soon.

I would agree, I think we were very fortunate to get things back up and
running. It was looking sketchy for a time there after I had worked 24
straight hours, so by then I was feeling pretty defeated only to know that I
would've had to be the one restoring it all. In the 26th hour, we were then
golden and I went home to sleep like a baby!

Good info about the generators. And as for dropping you a line, I really
appreciate that as well Kevin. The CIO and my Director seem to "get it"
which is why they are looking to host elsewhere. But if we need more
firepower, you're the man I shall recommend! :-)

Thanks everybody for your thoughts, and again, apologize for the way I may
have come across last week. I really respect your opinions, knowledge and
posts.

Rubens

"TheSQLGuru" <kgboles(a)earthlink.net> wrote in message
news:6KCdnYu7rY-ub4jWnZ2dnUVZ_rudnZ2d(a)earthlink.com...
> Quite the story there! Multiple levels of badness and unfortunateness for
> sure.
>
> Change Tracking is essentially 1.0 product. I am not surprised some
> issues are surfacing with it. Pretty complex stuff going on under the
> covers there. Glad you could get things functional again! That isn't
> always possible...
>
> As a consultant, I have had to have several heart-to-hearts with CIOs and
> CFOs about issues just like you mention. Generators are rarely a viable
> option unless you have a major facility in place already that also has
> sufficient infrastructure, zoning/permit/environmental issues covered,
> etc. Hosting is the obvious solution for most entities.
>
> Feel free to drop me a line if you get in a pickle or need some help
> getting things going in the right direction. Sometimes C-level people
> will listen to consultants when they won't listen to in-house resources
> (even if we state the same things the staffers have been screaming until
> they are blue in the face).
>
>
> --
> Kevin G. Boles
> Indicium Resources, Inc.
> SQL Server MVP
> kgboles a earthlink dt net
>
>
> "Rubens" <rubensrose(a)hotmail.com> wrote in message
> news:21D27569-AC24-45D0-B823-DE8DFE88B203(a)microsoft.com...
>> Hey Kevin,
>>
>> Sorry for the late reply as I haven't been able to check in here until
>> now.
>>
>> I can appreciate where you are coming from. Last week was an extremely
>> stressful week as you can imagine. To top it off, we lost our Senior DBA
>> (we are supposed to have 3), and we recently hired 2 other DBAs. I've
>> been trying to train one of them and the more senior hire will be
>> starting in 2 weeks. So I've been pretty much going at it solo and
>> things became even more stressful when that failover happened while I was
>> on vacation, only to come back and deal with the aftermath.
>>
>> So I apologize if I came across poorly to you here as I am sure I let my
>> stresses and emotions get the better of me. I am sure I took your
>> comments the wrong when when essentially what you were saying was, "Stop
>> everything immediately and get on the horn to Microsoft!" And that made
>> total sense.
>>
>> I also appreciate your comments below. We actually have a new CIO and
>> Director that have been in place for about 4 months now and they are
>> definitely not happy with these types of outages. The building this
>> server is housed in has terrible power issues, and the new senior
>> leadership team has actually looked at buying a generator. But with a
>> $250,000 price tag just to run the wiring required, they aren't going to
>> do it. They are now looking at a new site altogether to move the
>> hardware too. It's unfortunate considering this is a 16 processor
>> clustered box with 128 GB of RAM, only to go down to bad power.
>>
>> In case anyone is interested, we've identified the issue as a problem
>> with Change Tracking and something going wrong when the power went out.
>> We (Microsoft and I) essentially purged most of this system table until
>> there was no more key violation error. Once that was done, we were able
>> to get good backups again. We were the second customer only to report
>> this issue to Microsoft and are kind of in unchartered ground.
>>
>> Additionally, our Change Tracking process isn't cleaning up after itself
>> the way it's supposed to so they've recommended we install cumulative
>> update 5. We are going to do that soon after we've put it through our
>> test environment.
>>
>> http://support.microsoft.com/default.aspx?scid=kb;EN-US;973696
>>
>> Thanks again for your comments and feedback. I do appreciate the time
>> you put in and the advice you offer.
>>
>> Rubens
>>
>> "TheSQLGuru" <kgboles(a)earthlink.net> wrote in message
>> news:z8CdnejDS5LKz4zWnZ2dnUVZ_hSdnZ2d(a)earthlink.com...
>>> There have been many examples of people who have wasted a LOT of time
>>> looking for help on forums for a production outage when getting MS
>>> support (or a qualified consultant) on board immediately was by far
>>> their best option. I recommended that as your best option since you
>>> failed to mention that you were already in the queue with MS support.
>>>
>>> I could have also asked other questions but they would have all been
>>> related to what could/should have been done (had you been doing
>>> consistency checks, had you been doing test restores of your backups to
>>> make sure they were viable, etc), but I did not because they were not
>>> relevant or helpful to getting you back online. I could also question
>>> if appropriate measures were in place on the hardware and infrastructure
>>> side such as battery backups for caches and systems as a whole and why
>>> the system wasn't shut down gracefully when the power went out.
>>> Hopefully if all of those things were not being done you can now put
>>> them into place once you get back online.
>>>
>>> Best of luck with the recovery.
>>>
>>> --
>>> Kevin G. Boles
>>> Indicium Resources, Inc.
>>> SQL Server MVP
>>> kgboles a earthlink dt net
>>>
>>>
>>> "Rubens" <rubensrose(a)hotmail.com> wrote in message
>>> news:C598299C-72D1-4B15-A565-4CDA17ED5189(a)microsoft.com...
>>>> I'm a little stunned that you'd make the assumption that this was all I
>>>> was doing. I already had a call into Microsoft when I posted this and
>>>> was waiting for a call back, so I was looking to this group to provide
>>>> me a little direction as I was researching the problem on my own.
>>>>
>>>> Thanks for your help Kevin.
>>>>
>>>> Rubens
>>>>
>>>> "TheSQLGuru" <kgboles(a)earthlink.net> wrote in message
>>>> news:4ISdnWvcFrOt-JbWnZ2dnUVZ_uednZ2d(a)earthlink.com...
>>>>> Gotta say I am pretty stunned that you have failing backups and a
>>>>> potentially corrupted system on a 1.7TB OLTP system and you are
>>>>> hunting and pecking here on a forum for help? This should be zooming
>>>>> up the food chain at Microsoft customer support...
>>>>>
>>>>> --
>>>>> Kevin G. Boles
>>>>> Indicium Resources, Inc.
>>>>> SQL Server MVP
>>>>> kgboles a earthlink dt net
>>>>>
>>>>>
>>>>> "Rubens" <rubensrose(a)hotmail.com> wrote in message
>>>>> news:0D1604E3-BB25-488E-8251-4BFF84C5A257(a)microsoft.com...
>>>>>>I have a rather concerning issue on our OLTP system for our largest
>>>>>>database (1.7 TB's). On Friday (when I was on vacation of course), we
>>>>>>had a power outage and the cluster failed. Everything was brought
>>>>>>back online from the Network guys, and then our team of DBA's took
>>>>>>over with making sure SQL was ok using the documentation we have.
>>>>>>Since then, the log backups for the above database began to fail, but
>>>>>>it still appears to be writing the log files. We do weekly full
>>>>>>backups, followed by daily differentials. My hope was that after the
>>>>>>full backup ran, everything would be fine with the log backups. Nope,
>>>>>>they are still failing with the error message below...
>>>>>>
>>>>>> Processed 127550 pages for database 'Focus', file 'Focus2000_Log' on
>>>>>> file 1. [SQLSTATE 01000] (Message 4035) Processed 0 pages for
>>>>>> database 'Focus', file 'Focus2000_Log3' on file 1. [SQLSTATE 01000]
>>>>>> (Message 4035) Cannot insert duplicate key row in object
>>>>>> 'sys.syscommittab' with unique index 'si_xdes_id'. [SQLSTATE 23000]
>>>>>> (Error 2601) Failed to flush the commit table to disk in dbid 7 due
>>>>>> to error 2601. Check the errorlog for more information. [SQLSTATE
>>>>>> 42000] (Error 3999) The statement has been terminated. [SQLSTATE
>>>>>> 01000] (Error 3621). The step failed.
>>>>>>
>>>>>> I was hoping that the weekend full backups would reset the database
>>>>>> and then the log backups would run fine after. NOPE. Still having
>>>>>> the same issue. And interestingly, or index rebuild job failed with
>>>>>> the same error message for this database.
>>>>>>
>>>>>> SQL 2008 SP1, Enterprise Edition, active-active-active cluster on
>>>>>> Windows 2003 R2 x64 SP2
>>>>>>
>>>>>> Can someone offer any advice?
>>>>>>
>>>>>> Thanks,
>>>>>> Rubens
>>>>>
>>>>>
>>>
>>>
>
>
From: Jay on
"Rubens" <rubensrose(a)hotmail.com> wrote in message
news:21D27569-AC24-45D0-B823-DE8DFE88B203(a)microsoft.com...

....
> The building this server is housed in has terrible power issues, and the
> new senior leadership team has actually looked at buying a generator. But
> with a $250,000 price tag just to run the wiring required, they aren't
> going to do it. They are now looking at a new site altogether to move the
> hardware too. It's unfortunate considering this is a 16 processor
> clustered box with 128 GB of RAM, only to go down to bad power.
>

I had a similar issue back in the early/mid 1990's with a single database
server and two 16 drive SAN's. I managed to get my CIO to pop about $1,000
for a UPS (and shutdown software) to prevent database corruption when there
were power issues. It worked out really well.

It should be even less expensive now and you might even be able to get rack
mount UPS's.


First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4
Prev: Install SQL2008 Std edtn x64
Next: Enable Full Text Search