Question Intermittent file transfers, system responsiveness and drive #5 pegged at 100% usage

Currently reading
Question Intermittent file transfers, system responsiveness and drive #5 pegged at 100% usage

5
2
NAS
RS2211+
Operating system
  1. Windows
Mobile operating system
  1. Android
Hello everyone and thanks for having me.

I've already posted this question on the official forum but it already got buried under a ton of new - also unanswered - questions, so I hope this attemt here is a little more fruitful.

Even though this might come off as a pretty noob question, I'm not exactly new on Synology's and competitive products given the fact that I've set up a couple dozens of 2 to 6-bay (so, RAID1, 5 or 6) stations for clients.

So, I got a pretty sweet deal on a used RackStation RS2211+ including a couple of drives and I've been setting up this thing for the past couple of days (more like weeks now). I threw in seven 2TB and two 4TB drives, went for a SHR-2 array and the consistency check ran for a whole week or so.
The thing that I noticed was that during all the time the consistency check ran only disk #5 was blinking and constantly shows usage and only sometimes the other 8 disks show usage. The same thing also shows in the resource monitor.

After it had finished it's consistency check and showed me 12.7TB of usageable storage capacity, I happily started copying over my files to the NAS with gigabit speed. Now, after every couple of transferred gigabytes the file transfer comes to a complete halt and resumes after one or two minutes. During this time no file access to the NAS is possible, even though the DSM-web-interface works fine. I've tried Windows-Explorer, Robocopy, Total Commander, it's all the same. It has nothing to do with SMB, because I've also tried FTP and it's the same strange phenomenon. I've troubleshooted this also on different machines over network differerent switches and all that fun stuff. This also occures while reading from the NAS, which makes this strange misbehaviour even more irritating. There's no encryption going on as well.

The resource monitor shows that every time the file transfer pauses, disk usage of drive #5 is pegged at a 100% until it's done with whatever it's doing and the transfer resumes with full gigabit-speed.

So, my question is: is there any way for me to find out what exactly the NAS is reading or writing to or from drive #5 at that particular moment?

The drive is a known good drive with no S.M.A.R.T. hickups or any other stuff showing up. If push comes to shove I'll have to just replace the drive and see what happens.

As far as I know, the RackStation only supports ext4 filesytstem so I don't think it has something to do with scrubbing or defragging.
When files are transferred the

And yes, I'm well aware it's an old NAS and only has a dual-core Atom with 3GB of RAM, but seriously?
Also I've never experienced performance issues like this on any of my client's DiskStations.

When the file transfer is running normally it shows an even usage of all the drives while drive #5 is at about 80% usage and the md2_raid6 task shows about 15-25% usage in the resource manager and on SSH with the top command.
2020-08-02 23_36_24-Window.png


But every minute or so, Drive #5 goes nuts at 100% usage while the other drives show 0% and the network transfer stops completely.
2020-08-02 20_31_29-Window.png

2020-08-02 20_31_17-Window.png


And does anyone know the maximal drive size this RackStation will accept? 4TB drives were not a even thing when the RS2211+ came out, but they work fine, so is there a chance it will take 10-14TB drives as well?

Any hints in the right direction are highly appreciated!
 

jeyare

Subscriber
1,885
625
welcome here

first, need to understand some basic from your setup:
1. disks vendor, model, part numbers

2. you wrote:
I threw in seven 2TB and two 4TB drives, went for a SHR-2 array ..... After it had finished it's consistency check and showed me 12.7TB of usageable storage capacity.
but 7x4TB plus 2x2TB in SHR2 is 14TB of total capacity, then you have usable 12.7TB only?
—xx
test cases (send a results here):

smartctl ... if doesn’t installed, then install smartmontools
smartctl --smart=on /dev/sdx .... for the drive no 5
smartctl --info /dev/sdx ... dtto
smartctl --capabilities /dev/sdx ...dtto
smartctl --attributes /dev/sdx .... dtto
hdparm -i /dev/sdx .... dtto
hdparm -N /dev/sdx ....dtto

then short test:
smartctl -t short /dev/sdx ... dtto
finally
smartctl -H /dev/sdx ...dtto
 
5
2
NAS
RS2211+
Operating system
  1. Windows
Mobile operating system
  1. Android
Thanks for your quick reply! I knew this forum would be more helpful than the official one.

Three of the 2TB drives are Hitachis (Deskstar 5K3000 Model: HDS5C3020ALA632)
the other four are Seagates (NAS HDD Model: ST2000VN000-1HJ164) and the two 4TB drives are also Seagates (BarraCuda 3.5 Model: ST4000DM004-2CV104)

And the 14 Terabytes in total come down to a usable 12.73TB Tebibytes so that's all fine and normal.

Here are the results you asked for:
=== START OF INFORMATION SECTION ===
Model Family: Seagate NAS HDD
Device Model: ST2000VN000-1HJ164
Serial Number: W720BTL6
LU WWN Device Id: 5 000c50 08b6b5cce
Firmware Version: SC60
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5900 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Tue Aug 4 01:29:33 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 97) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 244) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x10bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 113 099 006 Pre-fail Always - 51252808
3 Spin_Up_Time 0x0003 096 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 098 098 020 Old_age Always - 2117
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 663388232
9 Power_On_Hours 0x0032 066 066 000 Old_age Always - 30148
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 52
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 8590065666
189 High_Fly_Writes 0x003a 001 001 000 Old_age Always - 387
190 Airflow_Temperature_Cel 0x0022 070 055 045 Old_age Always - 30 (Min/Max 29/31)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 24
193 Load_Cycle_Count 0x0032 099 099 000 Old_age Always - 2694
194 Temperature_Celsius 0x0022 030 045 000 Old_age Always - 30 (0 15 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 9

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

The short test passed with flying feathers, so nothing really surprising here.

I'm gonna start filling up my NAS and continue to post updates on my observations.
 

jeyare

Subscriber
1,885
625
first finding:
Barracuda HDDs: ST4000DM004-2CV104 ..... SMR .... take them out from the RAID ASAP

next:
try to use same disk line from same vendor (diff capacity is OK) in RAID6=SHR2 because you can run into "mysterious" operation
 

jeyare

Subscriber
1,885
625
then use
Code:
iotop -o
to see what processes utilized your storage, filtered for actually doing I/O
then for a better detail (threads filtered, see only processes):
Code:
iotop -o -P
 
5
2
NAS
RS2211+
Operating system
  1. Windows
Mobile operating system
  1. Android
Well, yesterday everything was looking great. I could write and read for hours with full gigabit speeds but today it's back to normal with the mysterious activity on disk #5. This one is actually one of the four 2TB Seagate NAS drives and the one I posted the smartctl info of.
If I had the money i'd go all IronWolf but unfortunately that's not the case right now.

I had really high hopes for iotop to be exactly the tool I was looking for to finally figure out, what the heck #5 is doing, when it's ramping up. So I ran iotop and while I'm writing or reading to/from the NAS it shows smb, jbd/md0-8 and some kworkers with 20-30% IO but when the disk usage goes to 100%, the screen goes blank and shows no activity at all.
Darn, I thought we were onto something...

2020-08-04 21_10_03-OpenSSH iotop-o-P.png


But(!) I think you're absolutely right about the 4TB Seagate drives. I didn't even realize they were SMR drives. They are probably the root cause of this whole fiasco because they're throwing the RAID6 array out of whack and my RackStation is using disk#5 to cope with whatever the SMR drives are doing to the array.

I'll take the 4TB drives out and replace them with some other 2TB drives I have left (which are definately not SMR drives) and nuke the whole array and start anew. Or Maybe I'll use them for a standalone RAID1 array, where they are probably doing fine on their own.

Thank you very much for all your advice! 🙇‍♂️
I've actually learned a lot along this journey.
 

jeyare

Subscriber
1,885
625
in this part of our forum you can find a recommendation for disk drives:


you have to count with these limitations, based on your NAS model:
- it’s model from 2011
- just SATA II support, ofc there is SATA III backward compatibility, but with 50% throughput
- no VVM support
then you don’t need purchase fast disks, except RAID0 operation
this model is not in roadmap for new DSM7
 
5
2
NAS
RS2211+
Operating system
  1. Windows
Mobile operating system
  1. Android
A little update on my RackStation adventure.
I nuked the array, yanked all drives out and put them in again but in a different order, where the four Seagate NAS drives (predecessor to their famous IronWolves with similar model numbers) were in bays 1-4 and the other three drives in 5-7. The previously odd acting drive #5 was now in bay #4 and after creating a completely fresh storage pool, now all of a sudden, drives #1 and #2 were showing the same strange behaviour, only alternating which one was pegged at 100% and also bogging down the whole system again.

So, I went ahead and yanked all of them NAS drives out again and also all 2TB Seagate drives out of my computers, backed up my stuff on the offending NAS drives and replaced them with the four Seagate Barracudas from my computers. And lo and behold: the Barracudas work absolutely normal as expected. No funny 100% usage all the time - only occasionally when parities are writton onto it, I presume. And even when one drive is at 100% usage, the system is still responding normal, instead of being completely inaccessible at times.

The NAS drives themselves are doing absolutely fine as normal non-raid computer drives with normal read and write speeds.
I'm just highly curious as to what's going on with these NAS drives, when they're actually in the NAS RackStation.
I just can't believe I'm the only one with this particularly peculiar case, which is driving me bonkers.
The drives are from 2015 and the Rackstation is from around 2011, so there should be no compatibility issues. I haven't checked for firmware updates for the NAS drives yet, that's maybe another thing I could try. I'm running out of ideas here...

Another thing I've tried is putting a borrowed 6TB IronWolf in my RackStation and it happily accepts it, so I think 10 or 12TB drives should also work with it. I'm just worried, new, bigger (and expensive) IronWolf drives might behave in the same strange way, which would make this whole endeavour quite futile.

And thanks for letting me know about all the limitations I impose on myself with this old RackStations, but I just needed a stable set-it-and-forget-it file server where I could throw in all of my spare drives until I can afford a reasonable upgrade. New RackStations are crazy expensive, even the basic ones.

Happy NASing.
 

jeyare

Subscriber
1,885
625
I can't answer you what is wrong, because in post No. #3 I can read SMART info just from single disk drive (and you have 9 HDDs). Check your homework again.

Some theory:

1. every SATAIII disk drives will run with SATAII controller, just performance of the drives will be slower (up to used HDDs)
You are in safe water.

2. there isn't defined capacity limitation for SATA2, just limitation based on used FS. In your case (I hope for ext4) = now we are far away from its limit.
Then safe again.

Back to your new test scenarios from last post:

3. your issue is poorly described, pls. be more specific:
- you pulled out all the disk drives from the NAS
- are you still with single SHR2 config for all connected HDDs?
- correct your wording:
Three of the 2TB drives are Hitachis (Deskstar 5K3000 Model: HDS5C3020ALA632)
the other four are Seagates (NAS HDD Model: ST2000VN000-1HJ164) and the two 4TB drives are also Seagates (BarraCuda 3.5 Model: ST4000DM004-2CV104)
vs.
where the four Seagate NAS drives...were in bays 1-4 and the other three drives in 5-7.
please take a pencil and paper and write previous/new bay config (HDD PN (model name)/ bay No.)

4. Test scenarios - next homework:
- create diff RAID groups with same disk PN, e.g. RAID5(SHR) ... possible for the 4x Seagate NAS HDD models 9ST2000VN000). Check it for the similar behavior.
- same for the 3x Deskstar HDDs only

5. regarding a miracle with Barracuda SMR:
- SMR doesn't have problem during initialization. Then no miracle.
- also SMR doesn't have problem during first write = it's pure sequential writing operation = without intermediate seeks. What is absolutely OK for SMR technology. Then no miracle.
- you will get troubles when you will delete and rewrite blocks = random write = more small block of seek-written operation, also with heavy internal disk drive cache support, because it needs more idle time for the recording. What is problem for the SMR.
Reason why is SMR better in JBOD than in RAID.

Final stage:

you have really old Grandpa, then you need take more test to find best storage architecture for your operation.
BTW:
- I don't know your operation model, but take into account, that 12TB SHR2 in case of the RAID rebuild will take lot of time + used SATA2 controller in advance.
- is it necessary to have just single group?
- it's really slow (SHR2) performed RAID in case of write operation + SATA2 controller only... Good for an archive storage (90% sequential write operations) or for backup (similar reason).
 
5
2
NAS
RS2211+
Operating system
  1. Windows
Mobile operating system
  1. Android
Yeah, sorry I forgot about that. I actually did try building a RAID5 or SHR-1 array with only the four NAS drives after rearranging them and that's where I first noticed the same strange behaviour happening on Disk #1 and #2 alternating on 100% usage while the system was unresponsive which previously did on only one drive. I also ran other the three drives (two Hitachis and one Mediamax) that initially came with the RackStation as seperate 3-Disk RAID5 and they showed no such strange behaviour.
After double and triple checking everything I really am sure that the funny stuff only starts happening when the Seagate NAS drives are involved.

Considering favouring two seperate RAID5/SHR arrays to a single RAID6/SHR2:
is an SHR2 RAID rebuild going to take as long as its initial consistency check? I'm growing a 3x2TB disk SHR to a 7x2TB SHR2 array right now and it's probably going to be finished by tomorrow after running for four days straight. If so, I think I could live with that.
And I'm also (painfully) aware, that a rebuild is going to put stress on all drives, so hopefully I won't have any failing disks soon - fingers crossed. Maybe that's another reason to go for two seperate arrays.

I also still have one Seagate 4TB SMR drive similar to the two in the RackStation and I'm considering running them in a 3-disk RAID5/SHR for backup purposes only. I mean, since performance is not the issue, write once read many should be okay for SMR drives.
If only the RackStation had 12 bays instead of only 10. I could try a couple more different arrangements.

Oh, and BTW yes, ext4 seems to be the only filesystem my RackStation supports.

Regarding my use case: I was getting tired of having five or six disks in each of my computers to store all of my movies and tv series distributed across, so I just want one big central network storage location where all of my family can access it and also save their own stuff to and it's actually convenient and safe. Write performance is not really my issue, as well as random access. I'll mostly use it to backup and store stuff that's very important to me or I want to have access from everywhere as my very own cloud storage. The old RackStation will serve this purpose just fine. I might even get another one for cheaps, so I can play around with that one while my current one is running.

As always, thanks for the advice. Very much appreciated.
 

Create an account or login to comment

You must be a member in order to leave a comment

Create account

Create an account on our community. It's easy!

Log in

Already have an account? Log in here.

Similar threads

Welcome to SynoForum.com!

SynoForum.com is an unofficial Synology forum for NAS owners and enthusiasts.

Registration is free, easy and fast!

Trending threads

Top