Thunder, Lightning & NAS Goes Beep! - SSD Failure

Currently reading
Thunder, Lightning & NAS Goes Beep! - SSD Failure

707
377
NAS
RS1221+, RS819, RS217
Operating system
  1. macOS
Mobile operating system
  1. iOS
0512hrs and already awake due to big thunderstorm and the arrival of the email and muffled beep were nearly simultaneous. DSM had the bad news:

 2023-09-18 at 08.04.44.png


I think this is my first ever SSD failure. No warning at all before hand, the drive just dropped from the pool. I've just put in the spare SSD (that I never thought I would use) into the bay and DSM is doing its thing, rebuilding the SHR-1 array:

 2023-09-18 at 10.53.52.png


It's kinda weird not hearing the NAS thrashing itself to death during the rebuild. The 're-silver' is not going to take a day or 2 to complete either. Allegedly it is 50% done already.

 2023-09-18 at 10.57.49.png


Anyway, we don't get to see many Syno SSD arrays fail & (hopefully) rebuild. Wish it wasn't mine. 🙃

Time for a cup of tea.

☕
 
The statistics of someone getting struck twice, are as low as winning the lottery, so you should be safe now!
 
The lighting show could have been coincidence; everything else was ok and with the joy of FTTP even the WAN stayed up. A once trusted source of timing for log analysis is no more with the slow demise of DSL copper. The UPS didn't trigger either but I know they can be out-paced by lightning transients.

I'll try and do diagnostics on the failed SSD, if I can get anything to talk to it. Meanwhile the RAID rebuild is at the data-scrubbing stage, with 5 mins left to run... We shall see.

☕
 
For those keeping score:

Code:
18/09/2023    10:28:49    Info    System started to perform Fast Repair on [Storage Pool 1] with [Drive 3].
18/09/2023    12:32:45    Info    System successfully repaired [Storage Pool 1] with drive [Drive 3].
18/09/2023    14:20:46    Info    System successfully finished data scrubbing on [Storage Pool 1].

2hrs 03min 56s to repair array
+
2hrs 12min 01s to data scrub repaired volume
=
4hrs 15 min 57s to get back to normal

Phew.

 2023-09-18 at 14.22.33.png



Fastest read was 1.8 Gbps
Fastest write was 380 Mbps

Typical read was 1.4 Gbps
Typical write was 295 Mbps

No more WD Blue 4TBs on Amazon at the moment. 🛸

☕
 
I think this is my first ever SSD failure
Have you tested this device? It may have been dropped by DSM, but otherwise fine. Seems odd that only one drive would "fail" if there was serious overvoltage (apparently the unit was unaffected).
 
Last edited:
Only tried it on a Win server - no joy. Now trying macOS, which can read the drive info, so now trying to format it. Seems to be taking a long time though...

Only talking about EMI transients, so bit flips and alike. The system is protected against typical over-voltages or surges.

 2023-09-18 at 15.29.36.png

 2023-09-18 at 15.30.41.png

 2023-09-18 at 15.31.13.png

 2023-09-18 at 15.32.15.png


☕
-- post merged: --



Ok, took a a while but it accepted an APFS format:

 2023-09-18 at 15.42.05.png

Now it accepts a ExFAT format too:

 2023-09-18 at 15.51.50.png


 2023-09-18 at 15.52.22.png

I'll have to mount on a native SATA port to get proper smart data from it. It's just on a USB dongle currently.

☕
 
Last edited:
Few years back a nearby hit took out most devices on my HDMI distribution in the house. One TV, 2x HDMI DA, 2x HDMI switchers, HDMI I/O on $$ camera capture system, and one 24 port GB Ethernet switch.
Lightning is strange. Camera capture system HDMI ports dead yet processing and other I/O ports worked. Believe it came in on RF cable to TV, went out on HDMI & Ethernet cables.
Multiple people on mountainside affected.
In my case nothing else died weeks later, but other neighbors had lingering failures for weeks.

Watch out for that— delayed lightning failures elsewhere.
 
Installed the 'failed' but reformatted SSD in my test NAS and it was recognised, adopted to a storage pool and allowed a volume to be created. All with no apparent issues:

20230918-Syno-WD4TB SSD-Previously failed drive working ok-1.png


Ran a full SMART extended test - again, nothing of real note:

20230918-Syno-WD4TB Drive 3 post-fail SMART data.png


3 years 260 days 12 hours of power-on time & 99% lifetime remaining - presumably out of warranty!

Not sure if I should trust it or not?

☕
 
If you have a security camera with SS creating motion videos, and set sensitivity way too high so it starts and stops quite often. Let it run for a week or so… That’ll test it! Many SSD’s here on PC’s & NAS, in SE or Raid 0 & 1, but as of yet, not one failure
 
Last edited:
The statistics of someone getting struck twice, are as low as winning the lottery, so you should be safe now!
Maybe you were speaking in jest, if so sorry for the intrusion - but that is a common statistics fallacy. The odds of getting struck are the same as they were before. Lightning strikes are independent events.
-- post merged: --

Installed the 'failed' but reformatted SSD in my test NAS and it was recognised, adopted to a storage pool and allowed a volume to be created. All with no apparent issues:

View attachment 13540

Ran a full SMART extended test - again, nothing of real note:

View attachment 13541

3 years 260 days 12 hours of power-on time & 99% lifetime remaining - presumably out of warranty!

Not sure if I should trust it or not?

☕
Maybe the lightning upset the memory of the operating system in a weird way to make it 'think' the disk had failed and did not really break the disk. It could have also been a bunch of flipped bits (soft errors) in the disk that caused the DSM to see it as failed but a power cycle and running a test restored it. It is after all just memory, not a magnetic patch on a platter, so is sensitive to static flipping bits (actually static could even flip bits on a platter).
Run the test a second time to verify and then put in in a not-so-critical area to test it out.

Since it is really just memory underneath a disk interface, somewhere there should be - but I have no idea where except maybe at the manufacturer - a way to test the disk interface into the memory and then test the underlying memory - write/read all 0's, then 1's, then 0101's (hex 55's), then 1010's (hex aa's), etc.
 
Most likely a bit-flip. I would not want to complete multiple drive-writes on an SSD for fun. Probably I good way to find that your SSD was fine, until it was worn-out.

Remember the time when we looked-forward to or even buy SSDs that effectively* never wore out?

☕

(In fairness, I do have a very small SSD that gets pounded with writes and only has 57.33 years of predicted life left. As I am over 50, even it should see me out...)
 

Create an account or login to comment

You must be a member in order to leave a comment

Create account

Create an account on our community. It's easy!

Log in

Already have an account? Log in here.

Similar threads

Thank you. Once I get the network figured out I will look at how I can do SMB Multichannel.
Replies
6
Views
367
Yes, it’s weird. I would’ve expected some indication of activity. You may have closed the initial dialogue...
Replies
4
Views
447
Hi I've configured Quick connect and as far as I can tell, I have no access to a browser in there. And...
Replies
17
Views
762
  • Solved
I guess I'd say it depends. My biggest "I'm not doing that" is people who rely on services like gmail or...
Replies
19
Views
780
Yes sir, always helps to have a second set of eyes on things. 1691005708 One other note - there should...
Replies
10
Views
632
As already said, the expansion unit is part of the main NAS unit and so it's storage pool* and volume/s...
Replies
6
Views
502
Synology says no. Read Rusty's link ever so closely. My post was copied from there.
Replies
4
Views
461

Welcome to SynoForum.com!

SynoForum.com is an unofficial Synology forum for NAS owners and enthusiasts.

Registration is free, easy and fast!

Back
Top