Skip to content
Snippets Groups Projects
Commit db47d4b1 authored by dpgilbert's avatar dpgilbert
Browse files

combine BadBlockHowTo.txt and BadBlockSCSIHowTo.txt into one docbook xml file

git-svn-id: https://smartmontools.svn.sourceforge.net/svnroot/smartmontools/trunk@2331 4ea69e1a-61f1-4043-bf83-b5c94c648137
parent 97b46f21
No related branches found
No related tags found
No related merge requests found
<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" >
<!--
This is DocBook XML that can be rendered into a single HTML page with a
command like 'xmlto html-nochunks <this_file_name>'. It can
also be rendered into multi-page HTML (drop the "-nochunks") or pdf,
ps, txt, etc.
-->
<article id="index">
<articleinfo>
<title>Bad block HOWTO for smartmontools</title>
<author>
<firstname>Bruce</firstname>
<surname>Allen</surname>
<affiliation>
<address>
<email>smartmontools-support@lists.sourceforge.net</email>
</address>
</affiliation>
</author>
<authorinitials>ba</authorinitials>
<author>
<firstname>Douglas</firstname>
<surname>Gilbert</surname>
<affiliation>
<address>
<email>smartmontools-support@lists.sourceforge.net</email>
</address>
</affiliation>
</author>
<authorinitials>dpg</authorinitials>
<pubdate>2006-11-14</pubdate>
<revhistory>
<revision>
<revnumber>1.0</revnumber>
<date>2006-11-14</date>
<authorinitials>dpg</authorinitials>
<revremark>
merge BadBlockHowTo.txt and BadBlockSCSIHowTo.txt
</revremark>
</revision>
</revhistory>
<copyright>
<year>2004</year>
<year>2005</year>
<year>2006</year>
<holder>Bruce Allen</holder>
</copyright>
<legalnotice>
<para>
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1
or any later version published by the Free Software Foundation;
with no Invariant Sections, with no Front-Cover Texts, and with
no Back-Cover Texts.
</para>
<para>
For an online copy of the license see
<ulink url="http://www.fsf.org/copyleft/fdl.html">
<literal>www.fsf.org/copyleft/fdl.html</literal></ulink>.
</para>
</legalnotice>
<abstract>
<para>
This article describes what actions might be taken when smartmontools
detects a bad block on a disk. It demonstrates how to identify the file
associated with an unreadable disk sector, and how to force that sector
to reallocate.
</para>
</abstract>
</articleinfo>
<!--
<toc></toc>
-->
<sect1 id="intro">
<title>Introduction</title>
<para>
Handling bad blocks is a difficult problem as it often involves
decisions about losing information. Modern storage devices tend
to handle the simple cases automatically, for example by writing
a disk sector that was read with difficulty to another area on
the media. Even though such a remapping can be done by a disk
drive transparently, there is still a lingering worry about media
deterioration and the disk running out of spare sectors to remap.
</para>
<para>
Can smartmontools help? As the <acronym>SMART</acronym> acronym suggests,
the <command>smartctl</command> command and the <command>smartd</command>
daemon concentrate on monitoring and analysis. So apart from changing some
reporting settings, smartmontools will not modify the raw data in a
device. Also smartmontools only works with physical devices, it does
not know about partitions and file systems. So other tools are needed.
The job of smartmontools is to alert the user that something is wrong
and user intervention may be required.
</para>
<para>
One approach is to work out the mapping between the logical block
address used by a storage device and a file or some other component of a
file system using that device. Note that there may not be such a mapping
reflecting that a bad block has been found at a location not currently
used by the file system. A user may want to do this analysis to localize
and minimize the replacement file(s) that are retrieved from some
backup store. This approach requires knowledge of the file system
involved and this document uses the Linux ext2 and ext3 file systems for
examples. Also the type of content may come into play. For example if
an area storing video has a corrupted sector, it may be easiest to
accept that a frame or two might be corrupted and instruct the disk
not to retry as that may have the visual effect of changing a momentary
blank into a 1 second pause.
</para>
<para>
Another approach is to ignore the upper level consequences (e.g. corrupting
a file or worse damage to a file system) and use the facilities offered by
a storage device to repair the damage. The SCSI disk command set is used
elaborate this approach.
</para>
</sect1>
<sect1 id="rfile">
<title>Repairs in a file system</title>
<para>
This section contains examples of what to do at the file system level
when smartmontools reports a bad block. These examples assume the Linux
operating system and either the ext2 or ext3 file system. The various
Linux commands shown have man pages and the reader is encouraged to examine
these. Of note is the <command>dd</command> command which is often used in
repair work
<footnote><para>
Starting with GNU coreutils release 5.3.0, the <command>dd</command>
command in Linux includes the options 'iflag=direct' and 'oflag=direct'.
Using these with the <command>dd</command> commands should be helpful,
because adding these flags should avoid any interaction
with the block buffering IO layer in Linux and permit direct reads/writes
from the raw device. Use <command>dd --help</command> to see if your
version of dd supports these options. If not, the latest code for dd
can be found at <ulink url="http://alpha.gnu.org/gnu/coreutils">
<literal>alpha.gnu.org/gnu/coreutils</literal></ulink>.
</para></footnote>
and has a unique command line syntax.
</para>
<para>
The author would like to thank Sergey Vlasov, Theodore Ts'o,
Michael Bendzick, and others for explaining this approach. The author would
like to add text showing how to do this for other file systems, in
particular ReiserFS, XFS, and JFS: please email if you can provide this
information.
</para>
<sect2 id="example1">
<title>First example</title>
<para>
In this example, the disk is failing self-tests at Logical Block
Address LBA = 0x016561e9 = 23421417. The LBA counts sectors in units
of 512 bytes, and starts at zero.
</para>
<para>
<programlisting>
root]# smartctl -l selftest /dev/hda:
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 217 0x016561e9
</programlisting>
Note that other signs that there is a bad sector on the disk can be
found in the non-zero value of the Current Pending Sector count:
<programlisting>
root]# smartctl -A /dev/hda
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 1
</programlisting>
</para>
<para>
First Step: We need to locate the partition on which this sector of
the disk lives:
<programlisting>
root]# fdisk -lu /dev/hda
Disk /dev/hda: 123.5 GB, 123522416640 bytes
255 heads, 63 sectors/track, 15017 cylinders, total 241254720 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/hda1 * 63 4209029 2104483+ 83 Linux
/dev/hda2 4209030 5269319 530145 82 Linux swap
/dev/hda3 5269320 238227884 116479282+ 83 Linux
/dev/hda4 238227885 241248104 1510110 83 Linux
</programlisting>
The partition /dev/hda3 starts at LBA 5269320 and extends past the
'problem' LBA. The 'problem' LBA is offset 23421417 - 5269320 =
18152097 sectors into the partition /dev/hda3.
</para>
<para>
To verify the type of the file system and the mount point, look in
/etc/fstab:
<programlisting>
root]# grep hda3 /etc/fstab
/dev/hda3 /data ext2 defaults 1 2
</programlisting>
You can see that this is an ext2 file system, mounted at /data.
</para>
<para>
Second Step: we need to find the blocksize of the file system
(normally 4096 bytes for ext2):
<programlisting>
root]# tune2fs -l /dev/hda3 | grep Block
Block count: 29119820
Block size: 4096
</programlisting>
In this case the block size is 4096 bytes.
Third Step: we need to determine which File System Block contains this
LBA. The formula is:
<programlisting>
b = (int)((L-S)*512/B)
where:
b = File System block number
B = File system block size in bytes
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk -lu
and (int) denotes the integer part.
</programlisting>
In our example, L=23421417, S=5269320, and B=4096. Hence the
'problem' LBA is in block number
<programlisting>
b = (int)18152097*512/4096 = (int)2269012.125
so b=2269012.
</programlisting>
</para>
<para>
Note: the fractional part of 0.125 indicates that this problem LBA is
actually the second of the eight sectors that make up this file system
block.
</para>
<para>
Fourth Step: we use debugfs to locate the inode stored in this block,
and the file that contains that inode:
<programlisting>
root]# debugfs
debugfs 1.32 (09-Nov-2002)
debugfs: open /dev/hda3
debugfs: icheck 2269012
Block Inode number
2269012 41032
debugfs: ncheck 41032
Inode Pathname
41032 /S1/R/H/714197568-714203359/H-R-714202192-16.gwf
</programlisting>
In this example, you can see that the problematic file (with the mount
point included in the path) is:
/data/S1/R/H/714197568-714203359/H-R-714202192-16.gwf
</para>
<para>
To force the disk to reallocate this bad block we'll write zeros to
the bad block, and sync the disk:
<programlisting>
root]# dd if=/dev/zero of=/dev/hda3 bs=4096 count=1 seek=2269012
root]# sync
</programlisting>
</para>
<para>
<emphasis>NOTE:</emphasis> This last step has <emphasis>permanently
</emphasis> and irretrievably <emphasis>destroyed</emphasis> some of
the data that was in this file. Don't do this unless you don't need
the file or you can replace it with a fresh or correct version.
</para>
<para>
Now everything is back to normal: the sector has been reallocated.
Compare the output just below to similar output near the top of this
article:
<programlisting>
root]# smartctl -A /dev/hda
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 1
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 1
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 1
</programlisting>
Note: for some disks it may be necessary to update the SMART Attribute values by using
<command>smartctl -t offline /dev/hda</command>
</para>
<para>
The disk now passes its self-tests again:
<programlisting>
root]# smartctl -t long /dev/hda [wait until test completes, then]
root]# smartctl -l selftest /dev/hda
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 239 -
# 2 Extended offline Completed: read failure 90% 217 0x016561e9
# 3 Extended offline Completed: read failure 90% 212 0x016561e9
# 4 Extended offline Completed: read failure 90% 181 0x016561e9
# 5 Extended offline Completed without error 00% 14 -
# 6 Extended offline Completed without error 00% 4 -
</programlisting>
</para>
<para>
and no longer shows any offline uncorrectable sectors:
<programlisting>
root]# smartctl -A /dev/hda
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 1
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 1
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
</programlisting>
</para>
</sect2>
<sect2 id="example2">
<title>Second Example</title>
<para>
On this drive, the first sign of trouble was this email from smartd:
<programlisting>
To: ballen
Subject: SMART error (selftest) detected on host: medusa-slave166.medusa.phys.uwm.edu
This email was generated by the smartd daemon running on host:
medusa-slave166.medusa.phys.uwm.edu in the domain: master001-nis
The following warning/error was logged by the smartd daemon:
Device: /dev/hda, Self-Test Log error count increased from 0 to 1
</programlisting>
</para>
<para>
Running <command>smartctl -a /dev/hda</command> confirmed the problem:
<programlisting>
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 80% 682 0x021d9f44
Note that the failing LBA reported is 0x021d9f44 (base 16) = 35495748 (base 10)
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 3
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 3
</programlisting>
</para>
<para>
and one can see above that there are 3 sectors on the list of pending
sectors that the disk can't read but would like to reallocate.
</para>
<para>
The device also shows errors in the SMART error log:
<programlisting>
Error 212 occurred at disk power-on lifetime: 690 hours
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 12 46 9f 1d e2 Error: UNC 18 sectors at LBA = 0x021d9f46 = 35495750
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
25 00 12 46 9f 1d e0 00 2485545.000 READ DMA EXT
</programlisting>
</para>
<para>
Signs of trouble at this LBA may also be found in SYSLOG:
<programlisting>
[root]# grep LBA /var/log/messages | awk '{print $12}' | sort | uniq
LBAsect=35495748
LBAsect=35495750
</programlisting>
</para>
<para>
So I decide to do a quick check to see how many bad sectors there
really are. Using the bash shell I check 70 sectors around the trouble
area:
<programlisting>
[root]# export i=35495730
[root]# while [ $i -lt 35495800 ]
> do echo $i
> dd if=/dev/hda of=/dev/null bs=512 count=1 skip=$i
> let i+=1
> done
&lt;SNIP&gt;
35495734
1+0 records in
1+0 records out
35495735
dd: reading `/dev/hda': Input/output error
0+0 records in
0+0 records out
&lt;SNIP&gt;
35495751
dd: reading `/dev/hda': Input/output error
0+0 records in
0+0 records out
35495752
1+0 records in
1+0 records out
&lt;SNIP&gt;
</programlisting>
</para>
<para>
which shows that the seventeen sectors 35495735-35495751 (inclusive)
are not readable.
</para>
<para>
Next, we identify the files at those locations. The partitioning
information on this disk is identical to the first example above, and
as in that case the problem sectors are on the third partition
/dev/hda3. So we have:
<programlisting>
L=35495735 to 35495751
S=5269320
B=4096
</programlisting>
so that b=3778301 to 3778303 are the three bad blocks in the file
system.
<programlisting>
[root]# debugfs
debugfs 1.32 (09-Nov-2002)
debugfs: open /dev/hda3
debugfs: icheck 3778301
Block Inode number
3778301 45192
debugfs: icheck 3778302
Block Inode number
3778302 45192
debugfs: icheck 3778303
Block Inode number
3778303 45192
debugfs: ncheck 45192
Inode Pathname
45192 /S1/R/H/714979488-714985279/H-R-714979984-16.gwf
debugfs: quit
</programlisting>
</para>
<para>
And finally, just to confirm that this is really the damaged file:
</para>
<para>
<programlisting>
[root]# md5sum /data/S1/R/H/714979488-714985279/H-R-714979984-16.gwf
md5sum: /data/S1/R/H/714979488-714985279/H-R-714979984-16.gwf: Input/output error
</programlisting>
</para>
<para>
Finally we force the disk to reallocate the three bad blocks:
<programlisting>
[root]# dd if=/dev/zero of=/dev/hda3 bs=4096 count=3 seek=3778301
[root]# sync
</programlisting>
</para>
<para>
We could also probably use:
<programlisting>
[root]# dd if=/dev/zero of=/dev/hda bs=512 count=17 seek=35495735
</programlisting>
</para>
<para>
At this point we now have:
<programlisting>
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
</programlisting>
</para>
<para>
which is encouraging, since the pending sectors count is now zero.
Note that the drive reallocation count has not yet increased: the
drive may now have confidence in these sectors and have decided not to
reallocate them..
</para>
<para>
A device self test:
<programlisting>
[root#] smartctl -t long /dev/hda
(then wait about an hour) shows no unreadable sectors or errors:
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 692 -
# 2 Extended offline Completed: read failure 80% 682 0x021d9f44
</programlisting>
</para>
</sect2>
<sect2 id="unmapped">
<title>Unassigned sectors</title>
<para>
This section was written by Kay Diederichs.
</para>
<para>
I read your badblocks-howto at and greatly
benefited from it. One thing that's (maybe) missing is that often the
<command>smartctl -t long</command> scan finds a bad sector which is
<emphasis> not</emphasis> assigned to
any file. In that case it does not help to run debugfs, or rather
debugfs reports the fact that no file owns that sector. Furthermore,
it is somewhat laborious to come up with the correct numbers for
debugfs, and debugfs is slow ...
</para>
<para>
So what I suggest in the case of presence of
Current_Pending_Sector/Offline_Uncorrectable errors is to create a
huge file on that file system.
<programlisting>
dd if=/dev/zero of=/some/mount/point bs=4k
</programlisting>
creates the file. Leave it running until the partition/file system is
full. This will make the disk reallocate those sectors which do not
belong to a file. Check the <command>smartctl -a</command> output after
that and make
sure that the sectors are reallocated. If any remain, use the debugfs
method. Of course the usual caveats apply - back it up first, and so
on.
</para>
</sect2>
<sect2 id="lvm">
<title>LVM repairs</title>
<para>
This section was written by Frederic BOITEUX. It was titled: "HOW TO
LOCATE AND REPAIR BAD BLOCKS ON AN LVM VOLUME".
</para>
<para>
Smartd reports an error in a short test :
<programlisting>
# smartctl -a /dev/hdb
...
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 66 37383668
</programlisting>
So the disk has a bad block located in LBA block 37383668
</para>
<para>
In which physical partition is the bad block ?
<programlisting>
# sfdisk -lu /dev/hdb
Disk /dev/hdb: 9729 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0
Device Boot Start End #sectors Id System
/dev/hdb1 63 996029 995967 82 Linux swap / Solaris
/dev/hdb2 * 996030 1188809 192780 83 Linux
/dev/hdb3 1188810 156296384 155107575 8e Linux LVM
/dev/hdb4 0 - 0 0 Empty
</programlisting>
It's in the /dev/hdb3 partition, a LVM2 partition.
From the LVM2 partition beginning, the bad block has an offset of
<programlisting>
(37383668 - 1188810) = 36194858
</programlisting>
</para>
<para>
We have to find in which LVM2 logical partition the block belongs to.
</para>
<para>
In which logical partition is the bad block ?
</para>
<para>
<emphasis>IMPORTANT</emphasis> : LVM2 can use different schemes dividing
its physical partitions to logical ones : linear, striped, contiguous or
not... The following example assumes that allocation is linear !
</para>
<para>
The physical partition used by LVM2 is divided in PE (Physical Extent)
units of the same size, starting at pe_start' 512 bytes blocks from
the beginning of the physical partition.
</para>
<para>
The 'pvdisplay' command gives the size of the PE (in KB) of the
LVM partition :
<programlisting>
# part=/dev/hdb3 ; pvdisplay -c $part | awk -F: '{print $8}'
4096
</programlisting>
</para>
<para>
To get its size in LBA block size (512 bytes or 0.5 KB), we multiply this
number by 2 : 4096 * 2 = 8192 blocks for each PE.
</para>
<para>
To find the offset from the beginning of the physical partition is a
bit more difficult : if you have a recent LVM2 version, try :
<programlisting>
# pvs -o+pe_start $part
</programlisting>
</para>
<para>
Either, you can look in /etc/lvm/backup :
<programlisting>
# grep pe_start $(grep -l $part /etc/lvm/backup/*)
pe_start = 384
</programlisting>
</para>
<para>
Then, we search in which PE is the badblock, calculating the PE rank
in which the faulty block of the partition is :
physical partition's bad block number / sizeof(PE) =
<programlisting>
36194858 / 8192 = 4418.3176
</programlisting>
</para>
<para>
So we have to find in which LVM2 logical partition is used the PE
number 4418 (count starts from 0) :
<programlisting>
# lvdisplay --maps |egrep 'Physical|LV Name|Type'
LV Name /dev/WDC80Go/racine
Type linear
Physical volume /dev/hdb3
Physical extents 0 to 127
LV Name /dev/WDC80Go/usr
Type linear
Physical volume /dev/hdb3
Physical extents 128 to 1407
LV Name /dev/WDC80Go/var
Type linear
Physical volume /dev/hdb3
Physical extents 1408 to 1663
LV Name /dev/WDC80Go/tmp
Type linear
Physical volume /dev/hdb3
Physical extents 1664 to 1791
LV Name /dev/WDC80Go/home
Type linear
Physical volume /dev/hdb3
Physical extents 1792 to 3071
LV Name /dev/WDC80Go/ext1
Type linear
Physical volume /dev/hdb3
Physical extents 3072 to 10751
LV Name /dev/WDC80Go/ext2
Type linear
Physical volume /dev/hdb3
Physical extents 10752 to 18932
</programlisting>
</para>
<para>
So the PE #4418 is in the <filename>/dev/WDC80Go/ext1</filename>
LVM logical partition.
</para>
<para>
Size of logical block of filesystem on <filename>/dev/WDC80Go/ext1
</filename> :
</para>
<para>
It's a ext3 fs, so I get it like this :
<programlisting>
# dumpe2fs /dev/WDC80Go/ext1 | grep 'Block size'
dumpe2fs 1.37 (21-Mar-2005)
Block size: 4096
</programlisting>
</para>
<para>
bad block number for the filesystem :
</para>
<para>
The logical partition begins on PE 3072 :
<programlisting>
(# PE's start of partition * sizeof(PE)) + parttion offset[pe_start] =
(3072 * 8192) + 384 = 25166208
</programlisting>
512b block of the physical partition, so the bad block number for the
filesystem is :
<programlisting>
(36194858 - 25166208) / (sizeof(fs block) / 512)
= 11028650 / (4096 / 512) = 1378581.25
</programlisting>
</para>
<para>
Test of the fs bad block :
<programlisting>
dd if=/dev/WDC80Go/ext1 of=block1378581 bs=4096 count=1 skip=1378581
</programlisting>
</para>
<para>
If this dd command succeeds, without any error message in console or
syslog, then the block number calculation is probably wrong ! *Don't*
go further, re-check it and if you don't find the error, please
renunce !
</para>
<para>
Search / correction follows the same scheme as for simple
partitions :
<itemizedlist>
<listitem><para>
find possible impacted files with debugfs (icheck &lt;fs block nb&gt;,
then ncheck &lt;icheck nb&gt;).
</para></listitem>
<listitem><para>
reallocate bad block writing zeros in it, *using the fs block size* :
</para></listitem>
</itemizedlist>
</para>
<para>
<programlisting>
dd if=/dev/zero of=/dev/WDC80Go/ext1 count=1 bs=4096 seek=1378581
</programlisting>
</para>
<para>
Et voilà !
</para>
</sect2>
</sect1>
<sect1 id="sdisk">
<title>Repairs at the disk level</title>
<para>
This section ignores the upper level impact of a bad block and just
repairs the underlying sector so that defective sectors will not cause
problems in the future. The SCSI disk command set and associated disk
architecture are assumed.
</para>
<para>
SCSI disks have their own logical to physical mapping allowing
a damaged sector (usually carrying 512 bytes of data) to be
remapped irrespective of the operating system, file system or software
RAID being used.
</para>
<sect2 id="sdetails">
<title>Details</title>
<para>
The terms <emphasis>block</emphasis> and <emphasis>sector</emphasis> are
used interchangeably, although block tends to get used in higher level or
more abstract contexts such as a <emphasis>logical block</emphasis>.
</para>
<para>
When a SCSI disk is formatted, defective sectors identified during
the manufacturing process (the so called primary list: PLIST),
those found during the format itself (the certification list: CLIST),
those given explicitly to the format command (the DLIST) and optionally
the previous grown list (GLIST) are not used in the logical block
map. The number (and low level addresses) of the unmapped sectors can be
found with the READ DEFECT DATA SCSI command.
</para>
<para>
SCSI disks tend to be divided into zones which have spare sectors and
perhaps spare tracks, to support the logical block address mapping
process. The idea is that if a logical block is remapped, the heads do not
have to move a long way to access the replacement sector. Note that spare
sectors are a scarce resource.
</para>
<para>
Once a SCSI disk format has completed successfully, other problems
may appear over time. These fall into two categories:
<itemizedlist>
<listitem><para>
recoverable: the Error Correction Codes (ECC) detect a problem
but it is small enough to be corrected. Optionally other strategies
such as retrying the access may retrieve the data.
</para></listitem>
<listitem><para>
unrecoverable: try as it may, the disk logic and ECC algorithms
cannot recover the data. This is often reported as a
<emphasis>medium error</emphasis>.
</para></listitem>
</itemizedlist>
</para>
<para>
Other things can go wrong, typically associated with the transport and
they will be reported using a term other than
<emphasis>medium error</emphasis>. For example a disk may decide a read
operation was successful but a computer's host bus adapter (HBA) checking
the incoming data detects a CRC error due to a bad cable or termination.
</para>
<para>
Depending on the disk vendor, recoverable errors can be ignored. After all,
some disks have up to 68 bytes of ECC above the payload size of 512 bytes
so why use up spare sectors which are limited in number
<footnote><para>
Detecting and fixing an error with ECC "on the fly" and not going the further
step and reassigning the block in question may explain why some disks have
large numbers in their read error counter log. Various worried users have
reported large numbers in the "errors corrected without substantial delay"
counter field which is in the "Errors corrected by ECC fast" column in
the <command>smartctl -l error</command> output.
</para></footnote>
?
If the disk can recover the data and does decide to re-allocate (reassign)
a sector, then first it checks the settings of the ARRE and AWRE bits in the
read-write error recovery mode page. Usually these bits are set
<footnote><para>
Often disks inside a hardware RAID have the ARRE and AWRE bits
cleared (disabled) so the RAID controller can do things manually or flag
the disk for replacement.
</para></footnote>
enabling automatic (read or write) re-allocation. The automatic
re-allocation may also fail if the zone (or disk) has run out of spare
sectors.
</para>
<para>
Another consideration with RAIDs, and applications that require a high
data rate without pauses, is that the controller logic may not want a
disk to spend too long trying to recover an error.
</para>
<para>
Unrecoverable errors will cause a <emphasis>medium error</emphasis> sense
key, perhaps with some useful additional sense information. If the extended
background self test includes a full disk read scan, one would expect the
self test log to list the bad block, as shown in the <xref linkend="rfile"/>.
Recent SCSI disks with a periodic background scan should also list
unrecoverable read errors (and some recoverable errors as well). The
advantage of the background scan is that it runs to completion while self
tests will often terminate at the first serious error.
</para>
<para>
SCSI disks expect unrecoverable errors to be fixed manually using the
REASSIGN BLOCKS SCSI command since loss of data is involved. It is possible
that an operating system or a file system could issue the REASSIGN BLOCKS
command itself but the author is unaware of any examples. The REASSIGN BLOCKS
command will reassign one or more blocks, attempting to (partially ?) recover
the data (a forlorn hope at this stage), fetch an unused spare sector from the
current zone while adding the damaged old sector to the GLIST (hence the
name "grown" list). The contents of the GLIST may not be that interesting
but <command>smartctl</command> prints out the number of entries in the grown
list and if that number grows quickly, the disk may be approaching the end
of its useful life.
</para>
<para>
Here is an alternate brute force technique to consider: if the data on the
SCSI or ATA disk has all been backed up (e.g. is held on the other disks in
a RAID 5 enclosure), then simply reformatting the disk may be the least
cumbersome approach.
</para>
</sect2>
<sect2 id="sexample">
<title>Example</title>
<para>
Given a "bad block", it still may be useful to look at the
<command>fdisk</command> command (if the disk has multiple partitions)
to find out which partition is involved, then use
<command>debugfs</command> (or a similar tool for the file system in
question) to find out which, if any, file or other part of the file system
may have been damaged. This is discussed in the <xref linkend="rfile"/>.
</para>
<para>
Then a program that can execute the REASSIGN BLOCKS SCSI command is
required. In Linux (2.4 and 2.6 series), FreeBSD, Tru64(OSF) and Windows
the author's <command>sg_reassign</command> utility in the sg3_utils
package can be used. Also found in that package is
<command>sg_verify</command> which can be used to check that a block is
readable.
</para>
<para>
Assume that logical block address 1193046 (which is 123456 in hex) is
corrupt
<footnote><para>
In this case the corruption was manufactured by using the WRITE LONG
SCSI command. See <command>sg_write_long</command> in sg3_utils.
</para></footnote>
on the disk at <filename>/dev/sdb</filename>. A long selftest command like
<command>smartctl -t long /dev/sdb</command> may result in log results
like this:
<programlisting>
# smartctl -l selftest /dev/sdb
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Failed in segment - 354 1193046 [0x3 0x11 0x0]
# 2 Background short Completed - 323 - [- - -]
# 3 Background short Completed - 194 - [- - -]
</programlisting>
</para>
<para>
The <command>sg_verify</command> utility can be used to confirm that there
is a problem at that address:
<programlisting>
# sg_verify --lba=1193046 /dev/sdb
verify (10): Fixed format, current; Sense key: Medium Error
Additional sense: Unrecovered read error
Info fld=0x123456 [1193046]
Field replaceable unit code: 228
Actual retry count: 0x008b
medium or hardware error, reported lba=0x123456
</programlisting>
</para>
<para>
Now the GLIST length is checked before the block reassignment:
<programlisting>
# sg_reassign --grown /dev/sdb
>> Elements in grown defect list: 0
</programlisting>
</para>
<para>
And now for the actual reassignment followed by another check of the GLIST
length:
<programlisting>
# sg_reassign --address=1193046 /dev/sdb
# sg_reassign --grown /dev/sdb
>> Elements in grown defect list: 1
</programlisting>
</para>
<para>
The GLIST length has grown by one as expected. If the disk was unable to
recover any data, then the "new" block at lba 0x123456 has vendor specific
data in it. The <command>sg_reassign</command> utility can also do bulk
reassigns, see <command>man sg_reassign</command> for more information.
</para>
<para>
The <command>dd</command> command could be used to read the contents of
the "new" block:
<programlisting>
# dd if=/dev/sdb iflag=direct skip=1193046 of=blk.img bs=512 count=1
</programlisting>
</para>
<para>
and a hex editor
<footnote><para>
Most window managers have a handy calculator that will do hex to
decimal conversions. More work may be needed at the file system level,
</para></footnote>
used to view and potentially change the
<filename>blk.img</filename> file. An altered <filename>blk.img</filename>
file (or <filename>/dev/zero</filename>) could be written back with:
<programlisting>
# dd if=blk.img of=/dev/sdb seek=1193046 oflag=direct bs=512 count=1
</programlisting>
</para>
<para>
More work may be needed at the file system level, especially if the
reassigned block held critical file system information such as
a superblock or a directory.
</para>
<para>
Even if a full backup of the disk is available, or the disk has been
"ejected" from a RAID, it may still be worthwhile to reassign the bad
block(s) that caused the problem (or simply format the disk (see
<command>sg_format</command> in the sg3_utils package)) and re-use the
disk later (not unlike the way a replacement disk from a manufacturer
might be used).
</para>
<para>
CVS $Id: badblockhowto.xml,v 1.1 2006/11/16 02:19:58 dpgilbert Exp $
</para>
</sect2>
</sect1>
<!--
<appendix id="appendix">
<title>annex a</title>
<sect1 id="what">
<title>what</title>
<para>
dummy
</para>
<para>
CVS $Id: badblockhowto.xml,v 1.1 2006/11/16 02:19:58 dpgilbert Exp $
</para>
</sect1>
</appendix>
-->
</article>
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment