Sunday, February 19, 2012

Help: CHECKDB gives errors I can't fix

Hi,
I've got what appears to be a corrupt database, (SQL 7 SP4) running CHECKDB on it returns lots of errors, the first being:
Msg 8966, Level 16, State 5, Server 01AW01, Procedure , Line 3
[Microsoft][ODBC SQL Server Driver][SQL Server]Could not read and latch page
(1:177936) with latch type SH. PFS failed.
I switched the database to single user, and tried various repair flags on CHECKDB, and I always seem to get the same errors, and it returns lots of:
'The repair level on the DBCC statement caused this repair to be bypassed.'
Any ideas on where to start on this.. Its a customers database, I don't have information on what backups they have etc.. and its probably problems in the only large table in the database (some 2.5 million records, and 900Mb on disk, with 400Mb index).. Could this be a resource related problem somehow 'Hopefully you get some specific suggestions about your message. Below you
find my general recommendations. Note, however, that this *might* not
actually be a corrupt database (I can't say for sure) so opening a case with
MS might be a better option than start with the restore route (also search
Kb etc first):
Here are the general recommendations for handling a suspect or corrupt
database:
0. Ensure you have a backup strategy that you can use to recover from
hardware failures (including corruption). I recommend performing both
database and log backup in most situations.
1. If you can run DBCC CHECKDB against the database: Search Books Online and
KB for the error numbers that CHECKDB gives you. There might be specific
info for that type of error.
2. Find out why this happened. Check eventlog, do HW diagnostics etc.;
search Books Online and KB for those errors. You don't want this to happen
again! If the database is suspect, the file might have been in use by for
instance an anti-virus program and restarting SQL Server might be all that
is needed - but you still want to read logs etc to find out what happened.
3. If there is a hardware problem, ensure the faulty hardware is replaced.
4. Backup the log. This assumes that log backup schedule is in place, of
course. If the database is suspect, then the NO_TRUNCATE option for the
RESTORE command must be used. Also, you might want to do a file backup of
the mdf and ldf files, for extra safety.
5. Restore is the best thing to do now. If you managed to backup log as per
step 4, then you will most probably have zero dataloss. You should restore
the latest clean database backup and the subsequent log backups including
the one taken in above step.
If the database isn't suspect, then DBCC with a REPAIR option might be a
secondary option but this will often result in loss of data. Additional
solutions, depending on the errors, may be to manually rebuild non-clustered
indexes, manually drop and reload a table if the data is static, and so on.
If the database is suspect, a secondary option can be to try to "un-suspect"
the database using sp_resetstatus. Read about it (books online, KB, google
etc). It might help but if the database is too damaged, it might just pop
back to suspect again. There's also something called "emergency mode" which
is a "panic" status you can set in order to try to get data out of a damaged
database. I think the name of that option speaks for itself. Again search
the net for info.
If you feel uncertain with above steps, I recommend letting MS hand-hold you
through the steps appropriate for your particular situation.
--
Tibor Karaszi, SQL Server MVP
Archive at:
http://groups.google.com/groups?oi=djq&as_ugroup=microsoft.public.sqlserver
"Kevin" <anonymous@.discussions.microsoft.com> wrote in message
news:BC4EEEE9-BA17-416F-8C9A-8DED0C6AED3F@.microsoft.com...
> Hi,
> I've got what appears to be a corrupt database, (SQL 7 SP4) running
CHECKDB on it returns lots of errors, the first being:
> Msg 8966, Level 16, State 5, Server 01AW01, Procedure , Line 3
> [Microsoft][ODBC SQL Server Driver][SQL Server]Could not read and latch
page
> (1:177936) with latch type SH. PFS failed.
> I switched the database to single user, and tried various repair flags on
CHECKDB, and I always seem to get the same errors, and it returns lots of:
> 'The repair level on the DBCC statement caused this repair to be
bypassed.'
> Any ideas on where to start on this.. Its a customers database, I don't
have information on what backups they have etc.. and its probably problems
in the only large table in the database (some 2.5 million records, and 900Mb
on disk, with 400Mb index).. Could this be a resource related problem
somehow '
>
>|||This is a multi-part message in MIME format.
--=_NextPart_000_0058_01C3BD8A.EDA4E890
Content-Type: text/plain;
charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
Hi Kevin,
I think that this is probably a corruption in the database. This error =message indicates that DBCC was not able to read (or latch) page 177936 =in your customer's database. The page type in question (PFS) is a =system allocation page that tracks database free space and some other =state information. This page is actually used to drive the DBCC scan, =so an error here is pretty severe. This is made worse by the fact that =DBCC can't repair this type of page if there is a problem.
Without more error context it's hard for me to say why we can't read =this page: it could be hardware causing a bad read from disk, a bad =page header that resulted from a page corruption, etc.
I completely agree with Tibor that you should restore the database, =perhaps pairing that with a case with PSS to investigate root cause.
Thanks,
Ryan Stonecipher
SQL Server Storage Engine (DBCC)
"Tibor Karaszi" =<tibor.please_reply_to_public_forum.karaszi@.cornerstone.se> wrote in =message news:einCq2YvDHA.2712@.tk2msftngp13.phx.gbl...
Hopefully you get some specific suggestions about your message. Below =you
find my general recommendations. Note, however, that this *might* not
actually be a corrupt database (I can't say for sure) so opening a =case with
MS might be a better option than start with the restore route (also =search
Kb etc first):
Here are the general recommendations for handling a suspect or corrupt
database:
0. Ensure you have a backup strategy that you can use to recover from
hardware failures (including corruption). I recommend performing both
database and log backup in most situations.
1. If you can run DBCC CHECKDB against the database: Search Books =Online and
KB for the error numbers that CHECKDB gives you. There might be =specific
info for that type of error.
2. Find out why this happened. Check eventlog, do HW diagnostics etc.;
search Books Online and KB for those errors. You don't want this to =happen
again! If the database is suspect, the file might have been in use by =for
instance an anti-virus program and restarting SQL Server might be all =that
is needed - but you still want to read logs etc to find out what =happened.
3. If there is a hardware problem, ensure the faulty hardware is =replaced.
4. Backup the log. This assumes that log backup schedule is in place, =of
course. If the database is suspect, then the NO_TRUNCATE option for =the
RESTORE command must be used. Also, you might want to do a file backup =of
the mdf and ldf files, for extra safety.
5. Restore is the best thing to do now. If you managed to backup log =as per
step 4, then you will most probably have zero dataloss. You should =restore
the latest clean database backup and the subsequent log backups =including
the one taken in above step.
If the database isn't suspect, then DBCC with a REPAIR option might be =a
secondary option but this will often result in loss of data. =Additional
solutions, depending on the errors, may be to manually rebuild =non-clustered
indexes, manually drop and reload a table if the data is static, and =so on.
If the database is suspect, a secondary option can be to try to ="un-suspect"
the database using sp_resetstatus. Read about it (books online, KB, =google
etc). It might help but if the database is too damaged, it might just =pop
back to suspect again. There's also something called "emergency mode" =which
is a "panic" status you can set in order to try to get data out of a =damaged
database. I think the name of that option speaks for itself. Again =search
the net for info.
If you feel uncertain with above steps, I recommend letting MS =hand-hold you
through the steps appropriate for your particular situation.
-- Tibor Karaszi, SQL Server MVP
Archive at:
=http://groups.google.com/groups?oi=3Ddjq&as_ugroup=3Dmicrosoft.public.sql=
server
"Kevin" <anonymous@.discussions.microsoft.com> wrote in message
news:BC4EEEE9-BA17-416F-8C9A-8DED0C6AED3F@.microsoft.com...
> Hi,
>
> I've got what appears to be a corrupt database, (SQL 7 SP4) running
CHECKDB on it returns lots of errors, the first being:
>
> Msg 8966, Level 16, State 5, Server 01AW01, Procedure , Line 3
> [Microsoft][ODBC SQL Server Driver][SQL Server]Could not read and =latch
page
> (1:177936) with latch type SH. PFS failed.
>
> I switched the database to single user, and tried various repair =flags on
CHECKDB, and I always seem to get the same errors, and it returns lots =of:
>
> 'The repair level on the DBCC statement caused this repair to be
bypassed.'
>
> Any ideas on where to start on this.. Its a customers database, I =don't
have information on what backups they have etc.. and its probably =problems
in the only large table in the database (some 2.5 million records, and =900Mb
on disk, with 400Mb index).. Could this be a resource related problem
somehow '
>
>
>
>
--=_NextPart_000_0058_01C3BD8A.EDA4E890
Content-Type: text/html;
charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
&

Hi Kevin,
I think that this is probably a =corruption in the database. This error message indicates that DBCC was not =able to read (or latch) page 177936 in your customer's database. The page =type in question (PFS) is a system allocation page that tracks database free =space and some other state information. This page is actually used to drive =the DBCC scan, so an error here is pretty severe. This is made worse by the =fact that DBCC can't repair this type of page if there is a =problem.
Without more error context it's =hard for me to say why we can't read this page: it could be hardware causing a =bad read from disk, a bad page header that resulted from a page corruption, etc.
I completely agree with Tibor =that you should restore the database, perhaps pairing that with a case with PSS =to investigate root cause.
Thanks,
Ryan Stonecipher
SQL Server Storage Engine (DBCC)
"Tibor Karaszi" wrote in message news:einCq2YvDHA.2712=@.tk2msftngp13.phx.gbl...Hopefully you get some specific suggestions about your message. Below =youfind my general recommendations. Note, however, that this *might* =notactually be a corrupt database (I can't say for sure) so opening a case withMS =might be a better option than start with the restore route (also searchKb =etc first):Here are the general recommendations for handling a =suspect or corruptdatabase:0. Ensure you have a backup strategy that =you can use to recover fromhardware failures (including corruption). I =recommend performing bothdatabase and log backup in most =situations.1. If you can run DBCC CHECKDB against the database: Search Books Online =andKB for the error numbers that CHECKDB gives you. There might be =specificinfo for that type of error.2. Find out why this happened. Check =eventlog, do HW diagnostics etc.;search Books Online and KB for those =errors. You don't want this to happenagain! If the database is suspect, the =file might have been in use by forinstance an anti-virus program and =restarting SQL Server might be all thatis needed - but you still want to read =logs etc to find out what happened.3. If there is a hardware problem, =ensure the faulty hardware is replaced.4. Backup the log. This assumes =that log backup schedule is in place, ofcourse. If the database is suspect, =then the NO_TRUNCATE option for theRESTORE command must be used. Also, =you might want to do a file backup ofthe mdf and ldf files, for extra safety.5. Restore is the best thing to do now. If you managed =to backup log as perstep 4, then you will most probably have zero =dataloss. You should restorethe latest clean database backup and the =subsequent log backups includingthe one taken in above step.If the database =isn't suspect, then DBCC with a REPAIR option might be asecondary option =but this will often result in loss of data. Additionalsolutions, =depending on the errors, may be to manually rebuild non-clusteredindexes, =manually drop and reload a table if the data is static, and so on.If the =database is suspect, a secondary option can be to try to "un-suspect"the =database using sp_resetstatus. Read about it (books online, KB, googleetc). =It might help but if the database is too damaged, it might just =popback to suspect again. There's also something called "emergency mode" =whichis a "panic" status you can set in order to try to get data out of a damageddatabase. I think the name of that option speaks for =itself. Again searchthe net for info.If you feel uncertain with =above steps, I recommend letting MS hand-hold youthrough the steps appropriate =for your particular situation.-- Tibor Karaszi, SQL Server =MVPArchive at:http://groups.google.com/groups?oi=3Ddjq&as_ugrou=p=3Dmicrosoft.public.sqlserver"Kevin" wrote in messagenews:BC4=EEEE9-BA17-416F-8C9A-8DED0C6AED3F@.microsoft.com...> Hi,>> I've got what appears to be a corrupt database, =(SQL 7 SP4) runningCHECKDB on it returns lots of errors, the first being:>> Msg 8966, Level 16, State 5, Server 01AW01, =Procedure , Line 3> [Microsoft][ODBC SQL Server Driver][SQL Server]Could =not read and latchpage> (1:177936) with latch type SH. PFS failed.>> I switched the database to single user, and =tried various repair flags onCHECKDB, and I always seem to get the same =errors, and it returns lots of:>> 'The repair level on the DBCC statement caused this repair to bebypassed.'>> Any =ideas on where to start on this.. Its a customers database, I =don'thave information on what backups they have etc.. and its probably problemsin the only large table in the database (some 2.5 million =records, and 900Mbon disk, with 400Mb index).. Could this be a =resource related problemsomehow '>>>>

--=_NextPart_000_0058_01C3BD8A.EDA4E890--

No comments:

Post a Comment