Reset drive "critical" state

atakacs · 13. May 2022

Ok, bear with me as I know it might sound a bit as an odd request.

I am looking for a way to reset the “critical” status of a drive without actually changing it.

Yes, I know what I am doing (or I hope I do)…

I have a virtualisation lab setup in which I have a lab machine running ESX and one of the VMs is running synology DSM with 5 disks exposed to it via pass-through.

Overall this is working perfectly but from time to time one of the drive would encounter an I/O timeout. Pretty rare but it happens… and (for good reasons) DSM will flag the drive as “bad”. Actually it seems to be a bug / limitation of my SATA controller being used in that unusual way and despite researching it fairly extensively there isn’t much I can do about it. I am 100% sure the disk is functioning perfectly (I have swapped a few already a ran extensive diagnostics just in case), the only error logged are those random timeouts but DSM doesn’t want to touch the drive anymore.

Hence my question: is there a way to reset the error count or otherwise tell DSM that I still want to use said drive ?

Rusty · 13. May 2022

How are the 5 drives configured on DSM layer?

atakacs · 13. May 2022

As a RAID-6 array

Rusty · 13. May 2022

Have you tried to degrade the array, remove the drive, wipe it and then add it again? Counter stays the same?

atakacs · 13. May 2022

No have not - will give it a try

Telos · 13. May 2022

atakacs said:
No have not - will give it a try

That may have no affect, as AFAIK, DSM keeps a drive database with status of individual drives. IF, when you attempt to reintroduce the drive, and it fails (critical), use a PC to change the drive serial number (others have successfully edited the DSM database file to remove the drive flag, as well.

atakacs · 18. May 2022

others have successfully edited the DSM database file to remove the drive flag, as well.

Any pointer as of how hthis is done (can't seem to find relevant links) ?

Rusty · 18. May 2022

atakacs said:
Any pointer as of how hthis is done (can't seem to find relevant links) ?

Its unofficial just bare that in mind

atakacs · 18. May 2022

Of course - this is defnitely hacking territory

Telos · 18. May 2022

atakacs said:
Any pointer as of how hthis is done (can't seem to find relevant links) ?

See this post. Quite nice!

atakacs · 21. Jun 2022

I can confirm that this worked for me with DSM 7.1

Enable SSH on `Control Panel/Terminal & SNMP`
Boot up terminal on your local machine
Run `ssh [username]@[ip address] -p[port]` replace the fields without the square brackets
Run `sudo sqlite3 /var/log/synolog/.SYNODISKDB`
Enter your password if they ask for it
Run `DELETE FROM logs WHERE serial ='[drive S/N]';` replace the fields without the square brackets

After that the drive should return to healthy.

Hossy · 15. Jun 2023

atakacs said:
I can confirm that this worked for me with DSM 7.1

Enable SSH on Control Panel/Terminal & SNMP

Boot up terminal on your local machine

Run ssh [username]@[ip address] -p[port] replace the fields without the square brackets

Run sudo sqlite3 /var/log/synolog/.SYNODISKDB

Enter your password if they ask for it

Run DELETE FROM logs WHERE serial ='[drive S/N]'; replace the fields without the square brackets

After that the drive should return to healthy.

I just tried this and I still show Critical for my two SSDs. Is there a timer/refresh action I need to take?

Code:

root@Hossy-NAS01:~# sudo sqlite3 /var/log/synolog/.SYNODISKDB
SQLite version 3.34.1 2021-01-20 14:10:07
Enter ".help" for usage hints.
sqlite> SELECT * FROM logs WHERE serial LIKE 'S5GDNG0MC00%';
21959|1683305012|debug|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|disk_refresh|0||0||||2147438647
21960|1683305012|debug|/dev/nvc2|SSD 860 EVO M.2 2TB|S5GDNG0MC00519W|DS3617xs|Cache device 1-2|disk_refresh|0||0||||2147438647
21975|1686857043|info|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|secure_erase_done|1||0||||2147438647
21976|1686857052|info|/dev/nvc2|SSD 860 EVO M.2 2TB|S5GDNG0MC00519W|DS3617xs|Cache device 1-2|secure_erase_done|1||0||||2147438647
sqlite> .quit

Deleted member 5784 · 16. Jun 2023

What are we looking at above - the code you ran to fix your issue, or are you showing us that your drives are still listed in the db?

If you havent already, you need to run a SQL DELETE query to remove your drive entries from the db. Based on your code above, in your case you need to run DELETE * FROM logs WHERE serial LIKE 'S5GDNG0MC00%'; after you've opened the db. This is step 6 in the instructions above.

This will delete the 4 entries above from the .SYNODISKDB, assuming that's what you want to do. If you've already done this and the code above is showing that the entries are stil there, then post back with any message returned after trying the DELETE command.

SQL DELETES happen immediately; there is no refreshing or other mechanisms required to make them happen.

Hossy · 17. Jun 2023

Fortran said:
What are we looking at above - the code you ran to fix your issue, or are you showing us that your drives are still listed in the db?

If you havent already, you need to run a SQL DELETE query to remove your drive entries from the db. Based on your code above, in your case you need to run DELETE * FROM logs WHERE serial LIKE 'S5GDNG0MC00%'; after you've opened the db. This is step 6 in the instructions above.

This will delete the 4 entries above from the .SYNODISKDB, assuming that's what you want to do. If you've already done this and the code above is showing that the entries are stil there, then post back with any message returned after trying the DELETE command.

SQL DELETES happen immediately; there is no refreshing or other mechanisms required to make them happen.

I deleted the entries that pertained to the health of the disks (life_below_thre_with_value and status_critical). The disk_refresh and secure_erase_done are just action history.

I also checked in .SYNODISKHEALTHDB and .SYNODISKTESTDB, but found nothing of interest in either of those.

Code:

sqlite> SELECT * FROM logs WHERE model LIKE '%SSD%' LIMIT 10;
74|1631176000|warning|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|life_below_thre_with_value|1||0|5 5|||2147438647
75|1631532707|warning|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|life_below_thre_with_value|1||0|4 5|||2147438647
76|1631863270|warning|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|life_below_thre_with_value|1||0|3 5|||2147438647
77|1632219859|warning|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|life_below_thre_with_value|1||0|2 5|||2147438647
78|1632644858|warning|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|life_below_thre_with_value|1||0|1 5|||2147438647
79|1632795370|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
80|1633061237|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
81|1633067056|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
82|1633350441|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
83|1633355901|warning|/dev/nvc2|SSD 860 EVO M.2 2TB|S5GDNG0MC00519W|DS3617xs|Cache device 1-2|life_below_thre_with_value|1||0|5 5|||2147438647
sqlite> SELECT * FROM logs WHERE model LIKE '%SSD%' AND msg LIKE '%critical%' LIMIT 10;
79|1632795370|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
80|1633061237|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
81|1633067056|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
82|1633350441|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
85|1633934515|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
87|1634336167|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
90|1634760134|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
91|1634760134|err|/dev/nvc2|SSD 860 EVO M.2 2TB     |S5GDNG0MC00519W|DS3617xs|Cache device 1-2|status_critical|1||0||||2147438647
92|1634878874|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
93|1634878876|err|/dev/nvc2|SSD 860 EVO M.2 2TB     |S5GDNG0MC00519W|DS3617xs|Cache device 1-2|status_critical|1||0||||2147438647
sqlite> DELETE FROM logs WHERE serial LIKE 'S5GDNG0MC00%' AND msg LIKE '%critical%';
sqlite> SELECT * FROM logs WHERE model LIKE '%SSD%' AND msg LIKE '%critical%' LIMIT 10;
sqlite> DELETE FROM logs WHERE serial LIKE 'S5GDNG0MC00%' AND msg LIKE 'life_below_thre_with_value';
sqlite> SELECT * FROM logs WHERE model LIKE '%SSD%' LIMIT 10;
21959|1683305012|debug|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|disk_refresh|0||0||||2147438647
21960|1683305012|debug|/dev/nvc2|SSD 860 EVO M.2 2TB|S5GDNG0MC00519W|DS3617xs|Cache device 1-2|disk_refresh|0||0||||2147438647
21975|1686857043|info|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|secure_erase_done|1||0||||2147438647
21976|1686857052|info|/dev/nvc2|SSD 860 EVO M.2 2TB|S5GDNG0MC00519W|DS3617xs|Cache device 1-2|secure_erase_done|1||0||||2147438647

atakacs · 26. Jun 2023

Well you still seem to have entries that report issues with your disk (be it real or bogus). Are you sure your DELETE operation actually works ?

Hossy · 27. Jun 2023

atakacs said:
Well you still seem to have entries that report issues with your disk (be it real or bogus). Are you sure your DELETE operation actually works ?

Not sure what you're talking about. To which entries are you referring?

Irksome · 18. Jul 2023

Worked for me - thank you.

PS - needed a reboot to clear the error

PlacePort · 30. Mar 2024

The procedure worked for me too on a DS415+. Actually this drive reset error happened on another drive first. I replaced it with a new one. Then I wiped it and reformatted the "critical drive" in Windows and ran multiple tests. There were no bad sectors, nor any other faults, nor any SMART critical values. Now I'm using it as an external USB on the DS415+ and no problems. Then this happened to another drive. Same error messages on multiple days. Drive critical, replace ASAP, etc. So I applied this sqlite procedure to clear the logs. Now the drive has been working for two days without any errors. It also passed the extended SMART test. I read that this error may stem from the controller instead of the drive. For now I'll keep the drive for which I cleared the logs and monitor it. If it goes critical again, I may replace it with a new one to be safe. But if a new one goes critical I will not replace it.

Reset drive "critical" state

Currently reading
Reset drive "critical" state

atakacs

Rusty

atakacs

Rusty

atakacs

Telos

atakacs

Rusty

atakacs

Telos

atakacs

Hossy

Deleted member 5784

Hossy

atakacs

Hossy

Irksome

PlacePort

Similar threads

Trending threads

Forum statistics

We value your privacy

Reset drive "critical" state

Currently reading Reset drive "critical" state

Deleted member 5784

Similar threads

We value your privacy

Currently reading
Reset drive "critical" state