Some opened questions:
- again their datasets contain the mess. E.g. negative values for some disk capacity (some from HGST), blank SMART values (even w/o temperature records)
- again Unexpected Power Loss Count (ID 174) for SSD (Seagate Barracudas)? I can't get it. Why?
- again they have 108pcs of undefined "Seagate SSD" in operation. No details.
- in Jun 21st Max temperature of 55 HDDs reached 86C! it was a mix of diff capacities and diff classes (desktop/enterprise) and diff form factors (3.5/2.5").
- They keep the maximum HDD temperature at 53C (Median of the Max temperatures). Which is too much for the data centre.
I will continue with the Median failure index for comparisons disk to disk (not finished here). In this model, you can look at the complete context of individual events.
This is one of the reasons why you cannot blindly look at interpretations of data unless the author/source provides you with the context or more details.
As usual, I tried to address the author directly, who does not respond again.
HUS (helium filled) drives (now WDC HC5xx series) performed well! 90% of them in 12 and 14 TB capacity:
EXOS in similar level (75,23k of HDDs there) almost 2.2x more devices than HUSes in operation
But w/o specific operation conditions (workload, FS, BB custom pool levels, ...) no one except architect of the data centre is able to create clean picture about the reliability.
Some opened questions:
- again their datasets contain the mess. E.g. negative values for some disk capacity (some from HGST), blank SMART values (even w/o temperature records)
- again Unexpected Power Loss Count (ID 174) for SSD (Seagate Barracudas)? I can't get it. Why?
- again they have 108pcs of undefined "Seagate SSD" in operation. No details.
- in Jun 21st Max temperature of 55 HDDs reached 86C! it was a mix of diff capacities and diff classes (desktop/enterprise) and diff form factors (3.5/2.5").
- They keep the maximum HDD temperature at 53C (Median of the Max temperatures). Which is too much for the data centre.
I will continue with the Median failure index for comparisons disk to disk (not finished here). In this model, you can look at the complete context of individual events.
This is one of the reasons why you cannot blindly look at interpretations of data unless the author/source provides you with the context or more details.
As usual, I tried to address the author directly, who does not respond again.
HUS (helium filled) drives (now WDC HC5xx series) performed well! 90% of them in 12 and 14 TB capacity: View attachment 4591
EXOS in similar level (75,23k of HDDs there) almost 2.2x more devices than HUSes in operation View attachment 4592
But w/o specific operation conditions (workload, FS, BB custom pool levels, ...) no one except architect of the data centre is able to create clean picture about the reliability.
Interesting, good info. I am about to get new higher capacity drives. I was going to AVOID Helium filled drives, as (1) Helium escapes over time leading to failure (2) It's impossible to repair/recover if failed... but is that not the case anymore?
Helium filled HDD
Helium is 7x lighter than air and this is the base of the Helium usage in HDDs.
Warning: This is a long shot. Useful only for those, which have an interest to be more than followers.
Pros:
- Helium creates less drag and turbulence when HDD platters spin = Less drag = less noise
- Squeezing tracks closer together means more data tracks per disk = more data per HDD
- Thinner disks = more disks = more data per HDD
- Thinner disks + less drag = less power to spin
- higher thermal conductivity of helium vs air = less overheating
- "Sufficiently" sealed drives keep helium in and keep contaminants out vs standard HDD (air-filled)
- Helium filled HDD is therefore recommended in the higher attitude operation environment
Cons:
- you can't repair HW parts (stuck heads, ...) of the He filled HDD in a common domestic environment vs standard HDD (open, repair, close)
- leakage of the He will be the biggest cause of major damage to the internal parts of the HDD = all named advantages turn into disadvantages
There is SMART ID 22 indicator of He leakage from the HDD.
Theory about the He leakage = what does it means = US patent #434987:
A method to detect helium leakage from a disk drive enclosure is disclosed and claimed. A measurement electrical current is passed through a temperature sensor disposed within the disk drive enclosure. A reference electrical resistance corresponds to a reference temperature of the temperature sensor. A heating electrical current is passed through the temperature sensor. A heated electrical resistance of the temperature sensor, corresponding to a heated temperature of the temperature sensor that exceeds the reference temperature by at least 5° C., is determined. A value that corresponds to a quantity of helium within the disk drive enclosure is determined based on the reference electrical resistance and heated electrical resistance.
However, nowhere on the official HDD vendor sites (or associate sites), I found what exactly the specific value in ID22 means.
Because I follow the data from Backblaze (BB) for a long time (I am interested in storage mediums from any principle points, because it is part of my education) & according to the source of dataset - BB (nothing was changed from 2018):
We have both HGST and Seagate helium-filled hard drives, but only the HGST drives currently report the SMART 22 attribute.
+ my note: they have also Toshiba He filled HDDs in operation.
I can read the SMART ID 22 just from the HSG/WDC models.
Note: SMART ID 22 indicator - this is not even included among the most critical indicators that BB monitors
He filled HDD in BB operation (Q2/2021 dataset):
model
vendor
model name
model type
ST10000NM0086
Seagate
Exos
X10
ST12000NM0007
Seagate
Exos
X12
ST12000NM0008
Seagate
Exos
X14
ST14000NM0018
Seagate
Exos
X14
ST14000NM0138
Seagate
Exos
X14
ST10000NM001G
Seagate
Exos
X16
ST12000NM001G
Seagate
Exos
X16
ST14000NM001G
Seagate
Exos
X16
ST16000NM001G
Seagate
Exos
X16
ST16000NM005G
Seagate
Exos
X16
ST18000NM000J
Seagate
Exos
X18
TOSHIBA HDWE160
Toshiba
X300
TOSHIBA HDWF180
Toshiba
X300
TOSHIBA MG07ACA14TA
Toshiba
MG07
TOSHIBA MG07ACA14TEY
Toshiba
MG07
TOSHIBA MG08ACA16TA
Toshiba
MG08
TOSHIBA MG08ACA16TEY
Toshiba
MG08
HGST HUH721010ALE600
Western Digital
Ultrastar
DC HC510
HGST HUH721212ALE600
Western Digital
Ultrastar
DC HC510
HGST HUH721212ALE604
Western Digital
Ultrastar
DC HC510
HGST HUH721212ALN604
Western Digital
Ultrastar
DC HC510
HGST HUH728080ALE600
Western Digital
Ultrastar
DC HC510
WDC WUH721414ALE6L4
Western Digital
Ultrastar
DC HC510
WDC WUH721816ALE6L0
Western Digital
Ultrastar
DC HC510
WDC WUH721414ALE6L4
Western Digital
Ultrastar
DC HC530
WDC WUH721816ALE6L0
Western Digital
Ultrastar
DC HC530
ALL Helium-filled HDDs in BB operations. Filter: ALL Helium filled (2Q/2021 dataset source):
Based on Distinct Serial No, they have: 66.5% of ALL drives (HDD+SSD) are He filled HDD: 128.11K from 192.63K
or 98.7% of ALL HDDs are He filled HDD: 190.06K from 192.63K
Note: it is interesting because follow BB 2Q/2021 evaluation they stated that they have up to 177,935 disks in operation. I wrote them many times, that I often don't like the regular discrepancies/results in their official blog and what they export/publish as RAW datasets. I often encounter the problem that someone leaves DB analysts uncontrolled to search the data, without knowing the context. It can also be seen in the quality of their datasets that they are too far from data taxonomy regulations.
Just for a comparison a copy of their report from BB Blog:
In their dataset for Q2/2021 they have: 3254 unique HGST HMS5C4040ALE640 (first row in their table)
but in their blog evaluation (table above) they have only 3209 drives.
Source:
Let someone try to explain to me that they publish a different dataset than they use for the Blog. Reason?
BB Dataset, Filtered by He filled HDD & by ID 22 nonblank event:
- 34.29K HDDs (distinct SerialNo.) = 18% from the ALL He filled HDDs in operations (all of them contain Failure flag = 0(No) or 1(Yes)
- Ultrastar models only not older than 3.6 years (based on Power on hours ID)
- you can see their Capacities, Models (Part numbers), ....
After filtration by Failure Flag = 1, you can get just a total of 28 failures (ID 22) in Q2/2021. For the HDDs with Power on Hours within 0-2.5.
- 15 of the failures for drives not older than 1Y ...54% from all failures (what supports the experiences, that most of the HDD failures you can get in the first year of operation)
- no relation with an overheating (max temp was about 40C)
- 66.67% of the failures were related to 14TB capacity ... WUH721414ALE6L4 (DCHC530)
more here:
Conclusion:
There isn't a fundamental association of the He leakage based on the Power on hours indicator for the HDDs within the dataset. Except for few ID22 events for a negligible number of disks from all the operates.
Of course, a huge amount of data from other disks are missing. Seagate also does not provide data. Since it is not at all clear what those numbers mean, there is no need to deal with the leakage of He from the disks. For now.
Cheers.
-- post merged: --
Just last note: BB does not specify anywhere in the datasets which disks are used for booting. It is more than likely that these will not be those with a capacity above 250GB.
It would be interesting to get information on which disks were used in the same Pool.
And then compare their mutual values - influencing (how I hate the term).
Same for a workload performance.
It would be interesting to get information on which disks were used in the same Pool.
And then compare their mutual values - influencing (how I hate the term).
Same for a workload performance.
Thanks for looking at the data. So, we can't really tell right now, correct? I was looking to find the largest non-HE filled drive, which seems to be WD Red 10TB, but have to match part# to be sure.
Is it worth digging to find one of those? or HE filled 12TB drives seems to be doing ok (albeit not much long-term data). My 6TB drives are a few years old and still good.