Info Backblaze drive statistics 2021Q1 are in, now with SSD statistics included

EAZ1964 · 7. May 2021

The Q1 statistics are in. For the first time, SSD are included, In the first looks, SSD appear to be 10-20 times more reliable than HDD (with a clear note on lifetime of the drives).

Hard Drive Failure Rates for Q1 2021

A look at the quarterly and lifetime failure rates of 175,443 drives, including a comparison of failure rates of HDD and SSD boot drives.

www.backblaze.com

interesting read, but take the usual precautions about interpretation, taking into account the use in a datacenter, the amount of drives and hours etc.

jeyare · 8. May 2021

in the source from BB:
- no exact failure types described for each kind of drives
- no redundancy mode described
- no FS described
- no workload described
- all of the mentioned drives they use just for boot store (same/different operation?)
and last:
- no described what apples was compared by what pears, just $50 per 200G SSD or per 500G HDD.

Pretty partially described comparison.
I will check the data samples, but this is common behavior from BB in their reports. Enough for a hammer for the drives vendors. Empty for an experienced.

EAZ1964 · 8. May 2021

@jeyare
Understand your position, and would like to add that I am much more positive in my opinion. There is a huge amount of underlying data on the Backblaze Hard Drive Stats pages.
I am grateful that BB publishes the data, it is as far as I know the only source for this kind of data and they are correct in their statistics, leaving out data or pointing it out, if confidence level is low.
They clearly state on the SSD that they have not enough data to report on SSD drive level. I think that is fair.

jeyare · 9. May 2021

I know the datasets from BB (analyzed many times in past by me)

Some insights re: SSDs used for the BB booting purposes only from the BB Q1 2021 datasets:

92 CSVs, each per single day of operation from Jan 1st, 2021 till March 31st, 2021

1. there is no generally defined disk type. But it's simple to recognize just the SSD drives there (incl. bus type):
Micron MTFDDAV240TCB SSD M.2
Micron MTFDDAV240TDU SSD M.2
Seagate Seagate BarraCuda 120 SSD ZA250CM10003 SSD 2.5
Seagate Seagate BarraCuda SSD ZA2000CM10002 SSD 2.5
Seagate Seagate BarraCuda SSD ZA250CM10002 SSD 2.5
Seagate Seagate BarraCuda SSD ZA500CM10002 SSD 2.5
Seagate Seagate SSD SSD
Dell (Intel) SSDSCKKB480G8R SSD M.2

- as you can se there is one drive defined as "Seagate SSD" ... btw it's about +100 unique SNs discovered with 250GB capacity.
No one knows what kind of drive are used there.

2. Every single disk drive (defined by SN, then by disk vendor, PN) has complex SMART events evidence on daily basis for the Q1 2021. Normalized value and recorded value(s). Include all the SSD drives.

Then you can compare (bad idea) the HDD/SSD types.
Reason of the bad idea is explained in next point.

3. In the huge amount datasets you can't find what HDDs was used for the boot operation (no definitions there). Then you can't compare failure types (by SMART code) between HDD/ SSD for the boot operation.

4. I found SMART 233 ID (error/failure) - Unexpected Power Loss .... two events in one day (Jan 1st 2021) for single SN. .... OMG, really in BB DCs? And it wasn't last one event in the Q1. ... this isn't first time found in BB data sets.

5. There is a strange value based on Power-On Hours for one Seagate SSD - 65809 = 7.5y ... but they stated:

A little over two years ago, Backblaze started using SSDs as boot drives.

then 2y x 36d x 24h = 17 520 hours = max possible value for this indicator. But in deep dive I found 789 next SSDs with higher value than 17 520, much higher.

I like data, also data interpretation. But something is missing or wrong there - I mean the data quality.

From one point of view, I appreciate the attitude of BB to publish the data. On second side, there is an unprofessional mess:
- again same mistakes in the datasets from previous Q.
- there are missing principles of data taxonomy
- single column for the vendor is missing
- single column for the PN
- specially when you need to merge 92 CVS you need get cleaner datasets
- OFC the types mentioned above (HDD/SSD) also are missing
- no defined date of operation - mandatory for Read errors evaluation, etc. ... ... OK, there is the SMART E09 value (but you can't check of the value validity)
- no defined FS or redundancy operation mode for each drives (no chance to understand the failures deeply).

Finally:
- There isn't exactly defined - what does mean the Failure. No described methodology.
Follow the data from BB - it means every single SMART record. What is really strange. Because it's wrong.
And when the methodology is missing, there is a space for unanswered questions. Btw - I have list of unanswered questions from last BB drive report.

So this is another point of view for the data regularly published from BB.
For someone it's enough. Fast evaluation - SSD is better.

jeyare · 10. May 2021

last point of view:
- from the BB datasets you could extract some clusters (similar technology used in the HDDs) for a comparison (drive geometry, cache, block density, ...)
- OFC you need be experienced in the HDD technologies, to be able get these missing data from an internet sources (vendor related)
- then you could get more reliable (segmented) point of view
however
- these BB datasets contain too shallow information for such deep dive (mentioned above, regarding operation setup dependencies) - just some numbers for every person outside from BB operation.

Then the datasets are just a "marketing" driven step of BB public relations - obviously, what works for the masses of followers.

Coop777 · 10. May 2021

HGST all the way.....

jeyare · 12. May 2021

for a correct evaluation I must say, that BB doesn’t have standard RAID in operation. There is their own proprietary Vault architecture, based on 20 drives in each vault. You can’t compare their output with standard Linux (Sw based) RAIDs. There is only Ext4 in operation.
What is OK for me. No doubt.

What isn’t OK is the grouping of all SMART events to the failures count.
Because not all SMART events we can perceive as disk drive failure or failure caused by the disk drive.

And this is my concern about the reliability of the BB reports for readers.
Include the mentioned anomalies in their datasets vs their report outputs. There is missing some consistency.

Info Backblaze drive statistics 2021Q1 are in, now with SSD statistics included

Currently reading
Info Backblaze drive statistics 2021Q1 are in, now with SSD statistics included

EAZ1964

Hard Drive Failure Rates for Q1 2021

jeyare

EAZ1964

jeyare

jeyare

Coop777

jeyare

Similar threads

Trending threads

Forum statistics

We value your privacy

Info Backblaze drive statistics 2021Q1 are in, now with SSD statistics included

Currently reading Info Backblaze drive statistics 2021Q1 are in, now with SSD statistics included

Similar threads

We value your privacy

Currently reading
Info Backblaze drive statistics 2021Q1 are in, now with SSD statistics included