I know the datasets from BB (analyzed many times in past by me)
Some insights re: SSDs used for the BB booting purposes only from the BB Q1 2021 datasets:
92 CSVs, each per single day of operation from Jan 1st, 2021 till March 31st, 2021
1. there is no generally defined disk type. But it's simple to recognize just the SSD drives there (incl. bus type):
Micron MTFDDAV240TCB SSD M.2
Micron MTFDDAV240TDU SSD M.2
Seagate Seagate BarraCuda 120 SSD ZA250CM10003 SSD 2.5
Seagate Seagate BarraCuda SSD ZA2000CM10002 SSD 2.5
Seagate Seagate BarraCuda SSD ZA250CM10002 SSD 2.5
Seagate Seagate BarraCuda SSD ZA500CM10002 SSD 2.5
Seagate Seagate SSD SSD
Dell (Intel) SSDSCKKB480G8R SSD M.2
- as you can se there is one drive defined as "Seagate SSD" ... btw it's about +100 unique SNs discovered with 250GB capacity.
No one knows what kind of drive are used there.
2. Every single disk drive (defined by SN, then by disk vendor, PN) has complex SMART events evidence on daily basis for the Q1 2021. Normalized value and recorded value(s). Include all the SSD drives.
Then you can compare (bad idea) the HDD/SSD types.
Reason of the bad idea is explained in next point.
3. In the huge amount datasets you can't find what HDDs was used for the boot operation (no definitions there). Then you can't compare failure types (by SMART code) between HDD/ SSD for the boot operation.
4. I found SMART 233 ID (error/failure) - Unexpected Power Loss .... two events in one day (Jan 1st 2021) for single SN. .... OMG, really in BB DCs? And it wasn't last one event in the Q1. ... this isn't first time found in BB data sets.
5. There is a strange value based on Power-On Hours for one Seagate SSD - 65809 = 7.5y ... but they stated:
A little over two years ago, Backblaze started using SSDs as boot drives.
then 2y x 36d x 24h = 17 520 hours = max possible value for this indicator. But in deep dive I found 789 next SSDs with higher value than 17 520, much higher.
I like data, also data interpretation. But something is missing or wrong there - I mean the data quality.
From one point of view, I appreciate the attitude of BB to publish the data. On second side, there is an unprofessional mess:
- again same mistakes in the datasets from previous Q.
- there are missing principles of data taxonomy
- single column for the vendor is missing
- single column for the PN
- specially when you need to merge 92 CVS you need get cleaner datasets
- OFC the types mentioned above (HDD/SSD) also are missing
- no defined date of operation - mandatory for Read errors evaluation, etc. ... ... OK, there is the SMART E09 value (but you can't check of the value validity)
- no defined FS or redundancy operation mode for each drives (no chance to understand the failures deeply).
Finally:
- There isn't exactly defined - what does mean the Failure. No described methodology.
Follow the data from BB - it means every single SMART record. What is really strange. Because it's wrong.
And when the methodology is missing, there is a space for unanswered questions. Btw - I have list of unanswered questions from last BB drive report.
So this is another point of view for the data regularly published from BB.
For someone it's enough. Fast evaluation - SSD is better.