|
|
# Hardware Details
|
|
|
|
|
|
[_TOC_]
|
|
|
|
|
|
## "History"
|
|
|
|
|
|
AEI/Hannover has been operating basic (NAS)[https://en.wikipedia.org/wiki/Network-attached_storage)] servers for many years, but recently explored different usage scenarios. The initial idea is to define a simple but capable hardware which ought to be able to cover a large variety of usage scenarios with decent performance and low costs. It should be noted that even though this idea can be expanded in various metric directions to cover more usage scenarios possibly sacrificing some base principles, this kind of storage system should '''not''' be expected to be a jack-of-all-trades ("eierlegende Wollmilchsau")!
|
|
|
|
|
|
In mid 2019, we looked for a decent hardware base at a cost point of just below €10,000 net which would allow to procure three of these without going through a full tender. At that time, we did not find any systems matching the proposed one closely enough from our frame contracts as either some hardware combination was not available or the price per system was much higher.
|
|
|
|
|
|
## The original 24 HDD example system (288TByte)
|
|
|
|
|
|
Our less than €10,000 system is based on a Supermicro chassis with
|
|
|
|
|
|
* a single CPU socket board ''MBD-X10SRH-CF-B''
|
|
|
* ''Intel Xeon E5 1650 v4'' 3.6 GHz CPU (6 physical, 12 logical cores)
|
|
|
* 4 x 32GB DDR4 ECC reg. DIMMs
|
|
|
* 2 x 250GB Biwin SSD C2004 for OS
|
|
|
* 24 x 12TB 3.5" 24x7 enterprise HDDs
|
|
|
* 1.6 TB NVMe
|
|
|
* dual 10GBase-T network
|
|
|
|
|
|
Apart from having a raw storage capacity of 288TByte, there are a couple of notable details here:
|
|
|
|
|
|
### Chassis
|
|
|
|
|
|
For these systems, we opted for a 4U 24x3.5" disk chassis with two additional 2.5" slots at the back and 1+1 redundant power supplies. In principle, one could scale this from a 1U 4-disk systems to a system with 46 disks in 4U or even beyond by adding various JBODs.
|
|
|
|
|
|
### CPU
|
|
|
|
|
|
The CPU was chosen to offer high single thread performance as well enough threads to handle software RAID as well as networking. Also, the later generation Intel CPUs were vastly more expensive and would have mandated to change the memory layout (see also the next item). With the arrival of inexpensive, high-performance many core systems from AMD (Epyc/Rome), one could now use a 16core@3GHz CPU at the same price point.
|
|
|
|
|
|
### RAM
|
|
|
In this kind of systems, the more RAM is present, the more data can be cached and thus more memory is desirable.
|
|
|
|
|
|
For this system, we settled on four memory modules as the CPU offers four memory channels as for performance reasons all available memory channels should be used with an equal number of memory modules. Thus, using a later generation Intel CPU would have mandated six or an AMD CPU eight memory modules (or multiples there-of). In this case, we were somewhat limited by the available budget and thus restricted the system to ''only'' 128GByte RAM.
|
|
|
|
|
|
As data integrity is very important, any file server should only be used with ECC capable memory.
|
|
|
|
|
|
### SSD/Flash
|
|
|
|
|
|
The system has three flash based block storage devices. Two small, cheap ones for the operating system and a large and reliable one for caching.
|
|
|
|
|
|
#### OS drives
|
|
|
|
|
|
For use within the Atlas cluster at AEI/Hannover, we usually do not mirror operating system devices as in our experience a failure in one of the two mirrored components quite often leads to unstable operation and/or more immediate failures. Usually, one could manually restore operations, but we usually opt to automatically reinstall the system on a replaced OS drive while not touching the data sets present on other disks as it is faster and less error prone than manual intervention.
|
|
|
|
|
|
However, as many people considers having mirrored boot/OS devices a good practice, we opted to put two of these into the design.
|
|
|
|
|
|
#### Cache device
|
|
|
|
|
|
Note: So far, we have only used the cache devices together with ZFS and hence the guide lines given here, should be taken with a grain of salt when considering using these as a log device for XFS or as a block layer cache with bcache.
|
|
|
|
|
|
The requirements for the larger flash storage device are many. This particular version as a integrated capacitor which allows the device to write out all data stored in its internal RAM to flash in case of a power failure. This ought to help to ensure data integrity.
|
|
|
|
|
|
The size and write endurance of the device are mostly governed by the usage scenario. If the system is expected to see heavy synchronous writes, a small device with a high write endurance and power-loss protections should be preferred, for systems which serve mostly reads, a much larger device should be selected.
|
|
|
|
|
|
The typical size for the former should be at least as large as the system is expected to receive data for 5-10 seconds, e.g. for a 40Gbit connected host, it ought to be able to buffer ~20-40GByte of bursts. In the latter case, one should choose a device much larger then the physical RAM of the system.
|
|
|
|
|
|
### HDD
|
|
|
The major cost factor of these system will be the rotating disk storage. In mid 2019, we typically opted for 10 or 12 TByte disks which are rated for 24/7 operation. Obviously, the outside requirements and the available budget will usually dictate which kind of disks and which size can be used. Please note that depending on the set-up quite a fraction of the raw disk space may not be available (see below).
|
|
|
|
|
|
### network/interconnect
|
|
|
AEI/Hannover so far has exclusively used standard 10Gbit/s Ethernet (currently 10GBase-T) with a notably exception of 40Gbit/s Ethernet for iSCSI. In principle, these systems should work regardless of the connected network as long as the host adapters are well supported by the Linux kernel.
|
|
|
|
|
|
# Operating systems
|
|
|
|
|
|
All our internal testing was performed with (Debian GNU/Linux Buster)[https://www.debian.org/] Buster, ZFS on Linux (from Debian's backport repository) and (scst)[http://scst.sourceforge.net/] as an iSCSI implementation.
|
|
|
|
|
|
But we do not see any major obstacles to use and recent version of another Linux distribution (RedHat, SuSE, Ubuntu, ...).
|
|
|
|
|
|
Untested so far, both (Free)BSD and (FreeNAS)[https://www.freenas.org/] should also viable routes.
|
|
|
|