This is the home page for smartmontools. The smartmontools
package contains two utility programs (
smartctl
and
smartd) to control
and monitor storage systems using the Self-Monitoring, Analysis and
Reporting Technology System (S.M.A.R.T.) built into most modern ATA
and SCSI hard disks. It is derived from the smartsuite package, and includes
support for ATA/ATAPI-5 disks. It should run on any modern linux system.
For your convenience, this is a single page, so you can print
it easily.
How to download and
install smartmontools
There are three different ways to get and install smartmontools.
You can use any one of these three procedures. Just after
"Method 3" below are some instructions for trying out smartmontools once
you have completed the installation.
- First Method - Install from the RPM file:
- Download the latest binary RPM file (*.rpm) from
here
. Don't get the SRPM file (*.src.rpm).
- Install it using RPM. You must be root to do this:
su root (enter root password)
rpm -ivh smartmontools-5.0-1.i386.rpm
For most users, this is all that is needed.
- If you receive an error message, you have probably previously
installed the smartsuite package, or RedHat's kernel-utils
package, which provide older versions of the smartd and
smartctl utilities. In this case you should use the
--nodeps or --force arguments of rpm to replace
these two utilities:
rpm -ivh --nodeps --force smartmontools-5.0-1.i386.rpm
- If you want to remove the package (rpm -e smartmontools-5.0-1
) and your system does not have chkconfig installed, you may
need to use the --noscripts option to rpm -e.
- Second Method - Install from the source tarball:
- Download the latest source-code tarball from here.
Note: you probably want the most recent release.
- Uncompress the tarball:
tar zxvf smartmontools-5.0-1.tar.gz
- The previous step created a directory called smartmontools-5.0-1
containing the code. Go to that directory, build, and install:
cd smartmontools-5.0-1
make
make install
[only root can do this]
- Third Method - Download code directly from the CVS archive:
- Download the latest code snapshot from CVS. If prompted for
a password, simply press the Enter key. Note that the two lines
below that start "cvs" are long!
cvs -d:pserver:anonymous@cvs.smartmontools.sourceforge.net:/cvsroot/smartmontools
login
cvs -z3 -d:pserver:anonymous@cvs.smartmontools.sourceforge.net:/cvsroot/smartmontools
co sm5
- The previous step created a subdirectory called sm5/
containing the code. Go to that directory, build, and install:
cd sm5
make
make install
[only root can do this]
After installing using Method 1, 2 or 3 above, you can read
the man pages, and try out the commands:
man 8 smartctl
man 8 smartd
/usr/sbin/smartctl -etf /dev/hda [only root can
do this]
/usr/sbin/smartctl -a /dev/hda [only root
can do this]
Note that the default location for the manual pages is in
/usr/share/man/man8
. If "
man" does not find the manual pages, then you may
need to add
/usr/share/man to your
MANPATH environment variable.
Frequently-asked questions
If your question is not here, please
email me.
- What do I do if I have problems, or need support? Suppose
I want to become a developer, or suggest some new extensions?
Please send an email to the
smartmontools-support mailing list.
- What are the future plans for smartmontools?
My plan is that smartmontools-5.x will support ATA/ATAPI-5 disks.
Eventually, we'll do smartmontools-6.x to support ATA/ATAPI-6
disks, smartmontools-7.x for the ATA/ATAPI-7 standard, and so on. The
"x" will denote revision level, as bugs get found and fixed, and as enhancements
get added. If it's possible to maintain backwards compatibility,
that would be nice, but I don't know if it will be possible or practical.
- Why are you doing this?
My research group runs a beowulf cluster with 300 ATA-5 disks.
We have more than 20 TB of data stored on the system. It's
nice to have advanced warning when a disk is going to fail.
- I see some strange output from smartctl. What does
it mean?
The raw S.M.A.R.T. attributes (temperature, power-on lifetime,
and so on) are stored in vendor-specific structures. Sometime these
are strange. Hitachi disks (at least some of them) store power-on
lifetime in minutes, rather than hours. IBM disks (at least some
of them) have three temperatures stored in the raw structure, not just one.
And so on. If you find strange output, or unknown attributes, please
send an email to
smartmontools-support and we'll help you try and figure it out.
- What attributes does smartmontools not yet recognize?
From a Hitachi disk: (230)(250)
If you can attach names/meanings to these attributes, please send
me a note to
smartmontools-support.
- When I run smartd , the SYSLOG (/var/log/messages
) contains messages like this:
smartd: Reading Device /dev/sdv
modprobe: modprobe: Can't locate module block-major-65
This is because when smartd starts, it looks for
all ATA and SCSI devices to monitor (matching the pattern /dev/hd[a-z]
or /dev/sd[a-z]). The log messages appear because
your system doesn't have most of these devices.
A future release of smartd will have a command-line option to specify
which devices to include or exclude from start-up search.
Help needed in testing
smartmontools, especially on SCSI disks/systems
I have access to a number of systems with ATA S.M.A.R.T. disks,
but I don't have any access to systems with SCSI S.M.A.R.T. devices.
I'd be very grateful to find someone who could help me test the
smartmontools code on SCSI disks. Since it's derived from the smartsuite
package, it should initially work about the same way with SCSI devices
as the smartsuite tools did.
I'd be especially happy if someone would like to take on the task,
as a developer, of maintaining the SCSI code. Do you have a beowulf
cluster with a few hundred SCSI disks? Please volunteer!
How does smartmontools
differ from smartsuite?
The smartsuite code was originally developed as a Senior Thesis by Michael Cornwell
at the Concurrent Systems Laboratory (now part of the
Storage Systems Research Center), Jack
Baskin School of Engineering, University of California, Santa Cruz.
Smartmontools was derived directly from smartsuite . It differs
from smartsuite in that it supports
the ATA/ATAPI-5 standard. So for example
smartctl from smartsuite
has no facility for printing the S.M.A.R.T. self-test logs, and doesn't
print timestamp information in the most usable way. The
smartctl
utility in smartmontools has added functionality (
-l,-L,-f,
-F and -m options), updated documentation, and also fixes small technical
bugs in smartsuite.
The other principle difference is that smartmontools is an open-source
development project, meaning that we keep the files in CVS, and that other
developers who wish to contribute can commit changes to the archive. If
you would like to contribute, please write to to
smartmontools-support.
But the bottom line is that the code in smartmontools is derived directly
from smartsuite and is very similar. The smartsuite package can be found
here.
Useful references on S.M.A.R.T.
and the ATA/ATAPI standards
If you are having trouble understanding the output of smartctl
or smartd, please first read the manual pages:
man 8 smartctl
man 8 smartd
If you'd like to know more about S.M.A.R.T., then the following
references may be helpful:
Sample output from
smartmontools
root# /usr/sbin/smartctl -am /dev/hda
smartctl version 5.0-6 Copyright (C) 2002 Bruce Allen
Home page of smartctl is http://smartmontools.sourceforge.net/
Device Model: HITACHI_DK23BA-20
Serial Number: 12H7M8
Firmware Version: 00E0A0D2
ATA Version is: 5
ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1
SMART support is: Enabled
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Off-line data collection status: (0x00) Offline data collection activity was
never started.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete off-line
data collection: (1530) seconds.
Offline data collection
capabilities: (0x1b) SMART execute Offline immediate.
Automatic timer ON/OFF support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 26) minutes.
Vendor Specific SMART Attributes with Thresholds:
Revision Number: 16
Attribute Flag Value Worst Threshold Raw Value
( 1)Raw Read Error Rate 0x000d 100 084 050 247
( 3)Spin Up Time 0x0007 100 100 050 0
( 4)Start Stop Count 0x0032 100 100 050 197
( 5)Reallocated Sector Ct 0x0033 100 100 010 12
( 7)Seek Error Rate 0x000f 100 100 050 330
( 9)Power On Hours 0x0032 100 100 060 482 h + 4 m
( 10)Spin Retry Count 0x0013 100 100 050 0
( 12)Power Cycle Count 0x0032 100 100 050 197
(192)Power-Off Retract Count 0x0032 100 100 050 13
(195)Hardware ECC Recovered 0x001a 100 065 050 191
(196)Reallocated Event Count 0x0032 099 099 001 12
(197)Current Pending Sector 0x0032 097 096 001 3
(198)Offline Uncorrectable 0x0010 097 096 001 15
(199)UDMA CRC Error Count 0x003e 200 200 000 0
(221)G-Sense Error Rate 0x000a 100 100 050 0
(223)Load Retry Count 0x0012 100 100 050 0
(225)Load Cycle Count 0x0032 098 098 050 822100607
(230)Unknown Attribute 0x0032 100 100 060 13875
(250)Unknown Attribute 0x000a 100 070 050 937
SMART Error Log
SMART Error Logging Version: 1
ATA Error Count: 9 (only the most recent five errors are shown below)
Acronyms used below:
DCR = Device Control Register
FR = Features Register
SC = Sector Count Register
SN = Sector Number Register
CL = Cylinder Low Register
CH = Cylinder High Register
D/H = Device/Head Register
CR = Content written to Command Register
ER = Error register
STA = Status register
Timestamp is time (in seconds) since the command that caused an error was accepted,
measured from the time the disk was powered-on, during the session when the error occurred.
Note: timestamp "wraps" after 1193.046 hours = 49.710 days = 2^32 seconds.
Error Log Structure 1:
Error occurred at disk power-on lifetime: 458 hours
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER:40 SC:01 SN:15 CL:be CH:2e D/H:e0 ST:51
Sequence of commands leading to the command that caused the error were:
DCR FR SC SN CL CH D/H CR Timestamp
00 00 01 15 be 2e e0 c8 831.599
00 00 01 14 be 2e e0 c8 831.594
00 00 01 13 be 2e e0 c8 831.594
00 00 01 12 be 2e e0 c8 831.594
00 00 01 11 be 2e e0 c8 831.594
Error Log Structure 2:
Error occurred at disk power-on lifetime: 458 hours
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER:40 SC:45 SN:15 CL:be CH:2e D/H:e0 ST:51
Sequence of commands leading to the command that caused the error were:
DCR FR SC SN CL CH D/H CR Timestamp
00 00 80 da bd 2e e0 c8 829.680
00 00 80 5a bd 2e e0 c8 829.677
00 00 80 da bc 2e e0 c8 829.673
00 00 80 5a bc 2e e0 c8 829.671
00 00 01 58 bc 2e e0 c8 829.671
Error Log Structure 3:
Error occurred at disk power-on lifetime: 458 hours
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER:40 SC:01 SN:47 CL:bc CH:2e D/H:e0 ST:51
Sequence of commands leading to the command that caused the error were:
DCR FR SC SN CL CH D/H CR Timestamp
00 00 01 47 bc 2e e0 c8 826.962
00 00 01 46 bc 2e e0 c8 826.961
00 00 01 45 bc 2e e0 c8 826.961
00 00 01 44 bc 2e e0 c8 826.961
00 00 01 43 bc 2e e0 c8 826.961
Error Log Structure 4:
Error occurred at disk power-on lifetime: 458 hours
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER:40 SC:13 SN:47 CL:bc CH:2e D/H:e0 ST:51
Sequence of commands leading to the command that caused the error were:
DCR FR SC SN CL CH D/H CR Timestamp
00 00 80 da bb 2e e0 c8 825.038
00 00 80 5a bb 2e e0 c8 825.033
00 00 80 da ba 2e e0 c8 825.030
00 00 80 5a ba 2e e0 c8 824.940
00 00 80 da b9 2e e0 c8 824.937
Error Log Structure 5:
Error occurred at disk power-on lifetime: 458 hours
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER:40 SC:01 SN:85 CL:19 CH:2c D/H:e0 ST:51
Sequence of commands leading to the command that caused the error were:
DCR FR SC SN CL CH D/H CR Timestamp
00 00 01 85 19 2c e0 c8 816.487
00 00 01 84 19 2c e0 c8 816.487
00 00 01 83 19 2c e0 c8 816.486
00 00 01 82 19 2c e0 c8 816.486
00 00 01 81 19 2c e0 c8 816.486
SMART Self-test log, version number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short captive Completed 00% 459
# 2 Short captive Completed 00% 459
# 3 Extended off-line Completed: read failure 40% 455 0x002c1985
# 4 Extended off-line Aborted by host 50% 455
# 5 Short off-line Completed 00% 451
# 6 Short off-line Completed 00% 451
# 7 Extended off-line Completed: read failure 40% 449 0x002c1985
# 8 Short off-line Completed: read failure 20% 391 0x0003e00a
# 9 Short captive Interrupted (host reset) 40% 390
#10 Short captive Interrupted (host reset) 40% 390
#11 Short off-line Completed: read failure 20% 390 0x0003e00a
#12 Extended off-line Completed: read failure 40% 247 0x002c1979
Page maintained by
Bruce Allen