Reset drive "critical" state

Currently reading
Reset drive "critical" state

Ok, bear with me as I know it might sound a bit as an odd request.

I am looking for a way to reset the “critical” status of a drive without actually changing it.

Yes, I know what I am doing (or I hope I do)…

I have a virtualisation lab setup in which I have a lab machine running ESX and one of the VMs is running synology DSM with 5 disks exposed to it via pass-through.

Overall this is working perfectly but from time to time one of the drive would encounter an I/O timeout. Pretty rare but it happens… and (for good reasons) DSM will flag the drive as “bad”. Actually it seems to be a bug / limitation of my SATA controller being used in that unusual way and despite researching it fairly extensively there isn’t much I can do about it. I am 100% sure the disk is functioning perfectly (I have swapped a few already a ran extensive diagnostics just in case), the only error logged are those random timeouts but DSM doesn’t want to touch the drive anymore.

Hence my question: is there a way to reset the error count or otherwise tell DSM that I still want to use said drive ?
 
No have not - will give it a try
That may have no affect, as AFAIK, DSM keeps a drive database with status of individual drives. IF, when you attempt to reintroduce the drive, and it fails (critical), use a PC to change the drive serial number (others have successfully edited the DSM database file to remove the drive flag, as well.
 
I can confirm that this worked for me with DSM 7.1
  1. Enable SSH on `Control Panel/Terminal & SNMP`
  2. Boot up terminal on your local machine
  3. Run `ssh [username]@[ip address] -p[port]` replace the fields without the square brackets
  4. Run `sudo sqlite3 /var/log/synolog/.SYNODISKDB`
  5. Enter your password if they ask for it
  6. Run `DELETE FROM logs WHERE serial ='[drive S/N]';` replace the fields without the square brackets
After that the drive should return to healthy.
 
I can confirm that this worked for me with DSM 7.1
  1. Enable SSH on Control Panel/Terminal & SNMP
  2. Boot up terminal on your local machine
  3. Run ssh [username]@[ip address] -p[port] replace the fields without the square brackets
  4. Run sudo sqlite3 /var/log/synolog/.SYNODISKDB
  5. Enter your password if they ask for it
  6. Run DELETE FROM logs WHERE serial ='[drive S/N]'; replace the fields without the square brackets
After that the drive should return to healthy.
I just tried this and I still show Critical for my two SSDs. Is there a timer/refresh action I need to take?

Code:
root@Hossy-NAS01:~# sudo sqlite3 /var/log/synolog/.SYNODISKDB
SQLite version 3.34.1 2021-01-20 14:10:07
Enter ".help" for usage hints.
sqlite> SELECT * FROM logs WHERE serial LIKE 'S5GDNG0MC00%';
21959|1683305012|debug|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|disk_refresh|0||0||||2147438647
21960|1683305012|debug|/dev/nvc2|SSD 860 EVO M.2 2TB|S5GDNG0MC00519W|DS3617xs|Cache device 1-2|disk_refresh|0||0||||2147438647
21975|1686857043|info|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|secure_erase_done|1||0||||2147438647
21976|1686857052|info|/dev/nvc2|SSD 860 EVO M.2 2TB|S5GDNG0MC00519W|DS3617xs|Cache device 1-2|secure_erase_done|1||0||||2147438647
sqlite> .quit
 
What are we looking at above - the code you ran to fix your issue, or are you showing us that your drives are still listed in the db?

If you havent already, you need to run a SQL DELETE query to remove your drive entries from the db. Based on your code above, in your case you need to run DELETE * FROM logs WHERE serial LIKE 'S5GDNG0MC00%'; after you've opened the db. This is step 6 in the instructions above.

This will delete the 4 entries above from the .SYNODISKDB, assuming that's what you want to do. If you've already done this and the code above is showing that the entries are stil there, then post back with any message returned after trying the DELETE command.

SQL DELETES happen immediately; there is no refreshing or other mechanisms required to make them happen.
 
What are we looking at above - the code you ran to fix your issue, or are you showing us that your drives are still listed in the db?

If you havent already, you need to run a SQL DELETE query to remove your drive entries from the db. Based on your code above, in your case you need to run DELETE * FROM logs WHERE serial LIKE 'S5GDNG0MC00%'; after you've opened the db. This is step 6 in the instructions above.

This will delete the 4 entries above from the .SYNODISKDB, assuming that's what you want to do. If you've already done this and the code above is showing that the entries are stil there, then post back with any message returned after trying the DELETE command.

SQL DELETES happen immediately; there is no refreshing or other mechanisms required to make them happen.
I deleted the entries that pertained to the health of the disks (life_below_thre_with_value and status_critical). The disk_refresh and secure_erase_done are just action history.

I also checked in .SYNODISKHEALTHDB and .SYNODISKTESTDB, but found nothing of interest in either of those.

Code:
sqlite> SELECT * FROM logs WHERE model LIKE '%SSD%' LIMIT 10;
74|1631176000|warning|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|life_below_thre_with_value|1||0|5 5|||2147438647
75|1631532707|warning|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|life_below_thre_with_value|1||0|4 5|||2147438647
76|1631863270|warning|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|life_below_thre_with_value|1||0|3 5|||2147438647
77|1632219859|warning|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|life_below_thre_with_value|1||0|2 5|||2147438647
78|1632644858|warning|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|life_below_thre_with_value|1||0|1 5|||2147438647
79|1632795370|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
80|1633061237|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
81|1633067056|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
82|1633350441|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
83|1633355901|warning|/dev/nvc2|SSD 860 EVO M.2 2TB|S5GDNG0MC00519W|DS3617xs|Cache device 1-2|life_below_thre_with_value|1||0|5 5|||2147438647
sqlite> SELECT * FROM logs WHERE model LIKE '%SSD%' AND msg LIKE '%critical%' LIMIT 10;
79|1632795370|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
80|1633061237|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
81|1633067056|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
82|1633350441|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
85|1633934515|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
87|1634336167|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
90|1634760134|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
91|1634760134|err|/dev/nvc2|SSD 860 EVO M.2 2TB     |S5GDNG0MC00519W|DS3617xs|Cache device 1-2|status_critical|1||0||||2147438647
92|1634878874|err|/dev/nvc1|SSD 860 EVO M.2 2TB     |S5GDNG0MC00524H|DS3617xs|Cache device 1-1|status_critical|1||0||||2147438647
93|1634878876|err|/dev/nvc2|SSD 860 EVO M.2 2TB     |S5GDNG0MC00519W|DS3617xs|Cache device 1-2|status_critical|1||0||||2147438647
sqlite> DELETE FROM logs WHERE serial LIKE 'S5GDNG0MC00%' AND msg LIKE '%critical%';
sqlite> SELECT * FROM logs WHERE model LIKE '%SSD%' AND msg LIKE '%critical%' LIMIT 10;
sqlite> DELETE FROM logs WHERE serial LIKE 'S5GDNG0MC00%' AND msg LIKE 'life_below_thre_with_value';
sqlite> SELECT * FROM logs WHERE model LIKE '%SSD%' LIMIT 10;
21959|1683305012|debug|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|disk_refresh|0||0||||2147438647
21960|1683305012|debug|/dev/nvc2|SSD 860 EVO M.2 2TB|S5GDNG0MC00519W|DS3617xs|Cache device 1-2|disk_refresh|0||0||||2147438647
21975|1686857043|info|/dev/nvc1|SSD 860 EVO M.2 2TB|S5GDNG0MC00524H|DS3617xs|Cache device 1-1|secure_erase_done|1||0||||2147438647
21976|1686857052|info|/dev/nvc2|SSD 860 EVO M.2 2TB|S5GDNG0MC00519W|DS3617xs|Cache device 1-2|secure_erase_done|1||0||||2147438647
 
The procedure worked for me too on a DS415+. Actually this drive reset error happened on another drive first. I replaced it with a new one. Then I wiped it and reformatted the "critical drive" in Windows and ran multiple tests. There were no bad sectors, nor any other faults, nor any SMART critical values. Now I'm using it as an external USB on the DS415+ and no problems. Then this happened to another drive. Same error messages on multiple days. Drive critical, replace ASAP, etc. So I applied this sqlite procedure to clear the logs. Now the drive has been working for two days without any errors. It also passed the extended SMART test. I read that this error may stem from the controller instead of the drive. For now I'll keep the drive for which I cleared the logs and monitor it. If it goes critical again, I may replace it with a new one to be safe. But if a new one goes critical I will not replace it.
 

Create an account or login to comment

You must be a member in order to leave a comment

Create account

Create an account on our community. It's easy!

Log in

Already have an account? Log in here.

Similar threads

I'd like to format some external USB-drives from my NAS, however it only seems to quick-format them, not a...
Replies
0
Views
487
I found this. Interesting as it seems many modern synology boxes should be able to sync 1-5million files...
Replies
1
Views
1,222
  • Question
When you add share folders on your local PC/Mac, it lists your share folders in the order you added them...
Replies
0
Views
899
had a quick read still unsure on what redundant storage ive created though it did take over 6 hours to add...
Replies
37
Views
3,211

Welcome to SynoForum.com!

SynoForum.com is an unofficial Synology forum for NAS owners and enthusiasts.

Registration is free, easy and fast!

Back
Top