This is the home page for smartmontools. The smartmontools
package contains two utility programs (
smartctl
and
smartd) to control and
monitor storage systems using the Self-Monitoring, Analysis and Reporting
Technology System (S.M.A.R.T.) built into most modern ATA and SCSI hard
disks. It is derived from the smartsuite package, and includes support
for ATA/ATAPI-5 disks. It should run on any modern linux system.
For your convenience, this is a single page, so you can print it easily.
How to download and install
smartmontools
There are three different ways to get and install smartmontools.
You can use any one of these three procedures. Just after "Method
3" below are some instructions for trying out smartmontools once you have
completed the installation.
- First Method - Install from the RPM file:
- Download the latest binary RPM file (*.rpm) from
here
. Don't get the SRPM file (*.src.rpm).
- Install it using RPM. You must be root to do this:
su root (enter root password)
rpm -ivh smartmontools-5.0-1.i386.rpm
- If you receive an error message, you have probably previously
installed the smartsuite package, or RedHat's kernel-utils package,
which provide older versions of the smartd and smartctl
utilities. In this case you should use the --nodeps or
--force arguments of rpm to replace these two utilities:
rpm -ivh --nodeps --force smartmontools-5.0-1.i386.rpm
- Second Method - Install from the source tarball:
- Download the latest source-code tarball from here
- Uncompress the tarball (note: you probably want the most
recent release)
tar zxvf smartmontools-5.0.tar.gz
- The previous step created a directory called smartmontools-5.0
containing the code. Go to that directory, build, and install:
cd smartmontools-5.0
make
make install
[only root can do this]
- Third Method - Download code directly from the CVS archive:
- Download the latest code snapshot from CVS. If prompted for a
password, simply press the Enter key. Note that the two lines below
that start "cvs" are long!
cvs -d:pserver:anonymous@cvs.smartmontools.sourceforge.net:/cvsroot/smartmontools
login
cvs -z3 -d:pserver:anonymous@cvs.smartmontools.sourceforge.net:/cvsroot/smartmontools
co sm5
- The previous step created a subdirectory called sm5/
containing the code. Go to that directory, build, and install:
cd sm5
make
make install
[only root can do this]
After installing using Method 1, 2 or 3 above, you can read the
man pages, and try out the commands:
man 8 smartctl
man 8 smartd
smartctl -etf /dev/hda [only root can do this]
smartctl -a /dev/hda [only root can do this]
Frequently-asked questions
If your question is not here, please
email me.
- What do I do if I have problems, or need support? Suppose
I want to become a developer, or suggest some new extensions?
Please send an email to the
smartmontools-support mailing list.
- What are the future plans for smartmontools?
My plan is that smartmontools-5.x will support ATA/ATAPI-5 disks.
Eventually, we'll do smartmontools-6.x to support ATA/ATAPI-6 disks,
smartmontools-7.x for the ATA/ATAPI-7 standard, and so on. The
"x" will denote revision level, as bugs get found and fixed, and as enhancements
get added. If it's possible to maintain backwards compatibility,
that would be nice, but I don't know if it will be possible or practical.
- Why are you doing this?
My research group runs a beowulf cluster with 300 ATA-5 disks. We
have more than 20 TB of data stored on the system. It's nice to
have advanced warning when a disk is going to fail.
- I see some strange output from smartctl. What does it
mean?
The raw S.M.A.R.T. attributes (temperature, power-on lifetime, and
so on) are stored in vendor-specific structures. Sometime these are
strange. Hitachi disks (at least some of them) store power-on lifetime
in minutes, rather than hours. IBM disks (at least some of them) have
three temperatures stored in the raw structure, not just one. And so on.
If you find strange output, or unknown attributes, please send an email
to
smartmontools-support and we'll help you try and figure it out.
- What attributes does smartmontools not yet recognize?
From a Hitachi disk: (221)(223)(225)(230)(250)
If you can attach names/meanings to these attributes, please send me
a note to
smartmontools-support.
- When I run smartd , the SYSLOG (/var/log/messages
) contains messages like this:
smartd: Reading Device /dev/sdv
modprobe: modprobe: Can't locate module block-major-65
This is because when smartd starts, it looks for all
ATA and SCSI devices to monitor (matching the pattern /dev/hd[a-z]
or /dev/sd[a-z]). The log messages appear because your
system doesn't have most of these devices.
A future release of smartd will have a command-line option to specify which
devices to include or exclude from start-up search.
Help needed in testing
smartmontools, especially on SCSI disks/systems
I have access to a number of systems with ATA S.M.A.R.T. disks,
but I don't have any access to systems with SCSI S.M.A.R.T. devices. I'd
be very grateful to find someone who could help me test the smartmontools
code on SCSI disks. Since it's derived from the smartsuite package,
it should initially work about the same way with SCSI devices as the smartsuite
tools did.
I'd be especially happy if someone would like to take on the task,
as a developer, of maintaining the SCSI code. Do you have a beowulf
cluster with a few hundred SCSI disks? Please volunteer!
How does smartmontools
differ from smartsuite?
Initially, only in that it supports the ATA/ATAPI-5 standard.
So for example
smartctl from smartsuite has no facility
for printing the self-test logs, and doesn't print timestamp information
in the most usable way. But smartmontools is derived directly from smartsuite
and is very similar.
The other principle difference is that I'd like to have smartmontools
be a true open-source project, meaning that we keep the files in CVS,
and that other developers who wish to contribute can commit changes to
the archive.
Useful references on S.M.A.R.T.
and the ATA/ATAPI standards
If you are having trouble understanding the output of smartctl
or smartd, please first read the manual pages:
man 8 smartctl
man 8 smartd
If you'd like to know more about S.M.A.R.T., then the following
references may be helpful:
Sample output from
smartmontools
root# smartctl -a /dev/hda
Device: HITACHI_DK23BA-20 Supports ATA Version 5
Serial Number: 12H7M8
Firmware Version: 00E0A0D2
ATA minor number (version support) 0x15
Drive supports S.M.A.R.T. and is enabled
Check S.M.A.R.T. Passed.
General Smart Values:
Off-line data collection status: (0x00) Offline data collection activity was
never started
Self-test execution status: ( 114) The previous self-test completed having
the read element of the test failed
Total time to complete off-line
data collection: (1530) Seconds
Offline data collection
Capabilities: (0x1b)SMART EXECUTE OFF-LINE IMMEDIATE
Automatic timer ON/OFF support
Suspend Offline Collection upon new
command
Offline surface scan supported
Self-test supported
Smart Capablilities: (0x0003) Saves SMART data before entering
power-saving mode
Supports SMART auto save timer
Error logging capability: (0x01) Error logging supported
Short self-test routine
recommended polling time: ( 2) Minutes
Extended self-test routine
recommended polling time: ( 26) Minutes
Vendor Specific SMART Attributes with Thresholds:
Revision Number: 16
Attribute Flag Value Worst Threshold Raw Value
( 1)Raw Read Error Rate 0x000d 100 100 050 23
( 3)Spin Up Time 0x0007 100 100 050 0
( 4)Start Stop Count 0x0032 100 100 050 182
( 5)Reallocated Sector Ct 0x0033 100 100 010 7
( 7)Seek Error Rate 0x000f 100 100 050 506
( 9)Power On Hours 0x0032 100 100 060 437
( 10)Spin Retry Count 0x0013 100 100 050 0
( 12)Power Cycle Count 0x0032 100 100 050 182
(192)Power-Off Retract Count 0x0032 100 100 050 11
(195)Hardware ECC Recovered 0x001a 100 080 050 80
(196)Reallocated Event Count 0x0032 100 100 001 7
(197)Current Pending Sector 0x0032 098 097 001 2
(198)Offline Uncorrectable 0x0010 097 097 001 3
(199)UDMA CRC Error Count 0x003e 200 200 000 0
(221)Unknown Attribute 0x000a 100 100 050 0
(223)Unknown Attribute 0x0012 100 100 050 0
(225)Unknown Attribute 0x0032 098 098 050 2113943230
(230)Unknown Attribute 0x0032 100 100 060 12873
(250)Unknown Attribute 0x000a 100 070 050 432
SMART Error Log
SMART Error Logging Version: 1
ATA Error Count: 3
Acronyms used below:
DCR = Device Control Register
FR = Features Register
SC = Sector Count Register
SN = Sector Number Register
CL = Cylinder Low Register
CH = Cylinder High Register
D/H = Device/Head Register
CR = Content written to Command Register
ER = Error register
STA = Status register
Timestamp is time (in seconds) since the command that caused an error was accepted,
measured from the time the disk was powered-on, during the session when the error occured.
Note: timestamp "wraps" after 1193.046 hours = 49.710 days = 2^32 seconds.
Error Log Structure 1:
Error occured at disk power-on lifetime: 424 hours
When the command that caused the error occured, the device was active or idle.
After command completion occured, registers were:
ER:40 SC:05 SN:79 CL:19 CH:2c D/H:e0 ST:51
Sequence of commands leading to the command that caused the error were:
DCR FR SC SN CL CH D/H CR Timestamp
00 00 08 76 19 2c e0 c8 57586.716
00 00 08 c6 79 19 e0 c8 57586.691
00 00 08 36 9d 01 e0 ca 57586.690
00 00 80 76 20 2c e0 c8 57586.676
00 00 80 f6 1f 2c e0 c8 57586.672
Error Log Structure 2:
Error occured at disk power-on lifetime: 424 hours
When the command that caused the error occured, the device was active or idle.
After command completion occured, registers were:
ER:40 SC:71 SN:85 CL:19 CH:2c D/H:e0 ST:51
Sequence of commands leading to the command that caused the error were:
DCR FR SC SN CL CH D/H CR Timestamp
00 00 78 7e 19 2c e0 c8 57583.243
00 00 80 76 19 2c e0 c8 57581.389
00 00 80 f6 18 2c e0 c8 57581.385
00 00 80 76 18 2c e0 c8 57581.380
00 00 80 f6 17 2c e0 c8 57581.376
Error Log Structure 3:
Error occured at disk power-on lifetime: 424 hours
When the command that caused the error occured, the device was active or idle.
After command completion occured, registers were:
ER:40 SC:7d SN:79 CL:19 CH:2c D/H:e0 ST:51
Sequence of commands leading to the command that caused the error were:
DCR FR SC SN CL CH D/H CR Timestamp
00 00 80 76 19 2c e0 c8 57581.389
00 00 80 f6 18 2c e0 c8 57581.385
00 00 80 76 18 2c e0 c8 57581.380
00 00 80 f6 17 2c e0 c8 57581.376
00 00 80 76 17 2c e0 c8 57581.372
SMART Self-test log, version number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short off-line Completed: read failure 20% 391 0x0003e00a
# 2 Short captive Interrupted (host reset) 40% 390
# 3 Short captive Interrupted (host reset) 40% 390
# 4 Short off-line Completed: read failure 20% 390 0x0003e00a
# 5 Extended off-line Completed: read failure 40% 247 0x002c1979
Page maintained by
Bruce Allen