novaBBS - uk.d-i-y - Re: OT Ubuntu drive/ directory/ NFS issue

OT Ubuntu drive/ directory/ NFS issue

<9745376c-a045-45b4-892b-a1c965245324n@googlegroups.com>

https://www.novabbs.com/aus+uk/article-flat.php?id=89347&group=uk.d-i-y#89347

X-Received: by 2002:a37:aa95:0:b0:738:bca9:d4dd with SMTP id t143-20020a37aa95000000b00738bca9d4ddmr1681876qke.12.1677135804661;
Wed, 22 Feb 2023 23:03:24 -0800 (PST)
X-Received: by 2002:a05:6870:d3ca:b0:163:5449:2b22 with SMTP id
l10-20020a056870d3ca00b0016354492b22mr2140879oag.189.1677135804381; Wed, 22
Feb 2023 23:03:24 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: uk.d-i-y
Date: Wed, 22 Feb 2023 23:03:24 -0800 (PST)
Injection-Info: google-groups.googlegroups.com; posting-host=90.254.14.102; posting-account=S5azwAoAAACdr0U6eS6P_6NnzXINWTuF
NNTP-Posting-Host: 90.254.14.102
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9745376c-a045-45b4-892b-a1c965245324n@googlegroups.com>
Subject: OT Ubuntu drive/ directory/ NFS issue
From: leenow...@yahoo.co.uk (leen...@yahoo.co.uk)
Injection-Date: Thu, 23 Feb 2023 07:03:24 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5097

by: leen...@yahoo.co.uk - Thu, 23 Feb 2023 07:03 UTC

Hi All,

I have all my files stored on a central Ubuntu based server with 3 drives
1. the OS
2. all my data
3. local backup

It has been fine for a few years but annoyingly recently when accessing the data through an NFS mount it times out when reading the directory. Remotely logging on to the server if I try to "ls" that directory it takes say 30 mins to do it. Once done, the subsequent "ls" works immediately and also the NFS works correctly again.

I initially thought it was because drive 2 is starting to fail but looking at smartctrl (run long and short tests) and then reading each block with "dd" it seems like there are 2 dodgy blocks but besides that I think it is ok?

smartctl -a gives
=========================================ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 199 199 051 Pre-fail Always - 92225
3 Spin_Up_Time 0x0027 186 171 021 Pre-fail Always - 5683
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 427
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 010 010 000 Old_age Always - 66060
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 424
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 226
193 Load_Cycle_Count 0x0032 192 192 000 Old_age Always - 25655
194 Temperature_Celsius 0x0022 119 102 000 Old_age Always - 31
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 489 1326848392
# 2 Short offline Completed: read failure 90% 489 1326848392
# 3 Conveyance offline Completed without error 00% 0 -
# 4 Short offline Completed without error 00% 0 -
============================================
sudo dd if=/dev/sdb1 of=/dev/null bs=64k conv=noerror
===================================================dd: error reading '/dev/sdb1': Input/output error
43920419+1 records in
43920419+1 records out
2878368583680 bytes (2.9 TB, 2.6 TiB) copied, 24480.1 s, 118 MB/s
45785391+1 records in
45785391+1 records out
3000591388672 bytes (3.0 TB, 2.7 TiB) copied, 26169.8 s, 115 MB/s
================
running smartctl on disk 1(the OS) seems clear although having run the "short" test overnight it is stuck at 90%

So I am thinking the drives are not the cause of this issue. Anyone have any ideas?

Thanks

Lee.

Re: OT Ubuntu drive/ directory/ NFS issue

<tt7fqm$1rjnj$1@dont-email.me>

copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=89389&group=uk.d-i-y#89389

copy link Newsgroups: uk.d-i-y

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ottavio2...@yahoo.com (Ottavio Caruso)
Newsgroups: uk.d-i-y
Subject: Re: OT Ubuntu drive/ directory/ NFS issue
Date: Thu, 23 Feb 2023 10:40:22 +0000
Organization: A noiseless patient Spider
Lines: 15
Message-ID: <tt7fqm$1rjnj$1@dont-email.me>
References: <9745376c-a045-45b4-892b-a1c965245324n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 23 Feb 2023 10:40:22 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="46d1d53a9170d7abf16c6117bc146133";
logging-data="1953523"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18WqT/UBcfQf9UvtEMEoir6tYQLFjYhk3s="
User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:102.0) Gecko/20100101
Firefox/102.0
Cancel-Lock: sha1:a5Y7uWl5n+2erI8zcbgGd66zPrE=
X-No-Archive: yes
In-Reply-To: <9745376c-a045-45b4-892b-a1c965245324n@googlegroups.com>
Content-Language: en-GB

by: Ottavio Caruso - Thu, 23 Feb 2023 10:40 UTC

Am 23/02/2023 um 07:03 schrieb leen...@yahoo.co.uk:
> Hi All,
>
> I have all my files stored on a central Ubuntu based server with 3 drives
> 1. the OS
> 2. all my data
> 3. local backup
>

Have you tried using sshfs instead?

--
Ottavio Caruso

On 2023-02-23, leen...@yahoo.co.uk <leenowell@yahoo.co.uk> wrote:
> Hi All,
>
> I have all my files stored on a central Ubuntu based server with 3 drives
> 1. the OS
> 2. all my data
> 3. local backup
>
> It has been fine for a few years but annoyingly recently when
> accessing the data through an NFS mount it times out when reading the
> directory. Remotely logging on to the server if I try to "ls" that
> directory it takes say 30 mins to do it. Once done, the subsequent
> "ls" works immediately and also the NFS works correctly again.
>
> I initially thought it was because drive 2 is starting to fail but
> looking at smartctrl (run long and short tests) and then reading each
> block with "dd" it seems like there are 2 dodgy blocks but besides
> that I think it is ok?

have run an fsck on the partitions? You will need to unmount each
partition and then fsck it. You'd need to stop nfs exporting any
relevant partitions, or the umount will report the moubnt point busy.

On 2/23/2023 2:03 AM, leen...@yahoo.co.uk wrote:
> Hi All,
>
> I have all my files stored on a central Ubuntu based server with 3 drives
> 1. the OS
> 2. all my data
> 3. local backup
>
> It has been fine for a few years but annoyingly recently when accessing the data through an NFS mount it times out when reading the directory. Remotely logging on to the server if I try to "ls" that directory it takes say 30 mins to do it. Once done, the subsequent "ls" works immediately and also the NFS works correctly again.
>
> I initially thought it was because drive 2 is starting to fail but looking at smartctrl (run long and short tests) and then reading each block with "dd" it seems like there are 2 dodgy blocks but besides that I think it is ok?
>
> smartctl -a gives
> ==========================================
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x002f 199 199 051 Pre-fail Always - 92225
> 3 Spin_Up_Time 0x0027 186 171 021 Pre-fail Always - 5683
> 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 427
> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
> 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
> 9 Power_On_Hours 0x0032 010 010 000 Old_age Always - 66060
> 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
> 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 424
> 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 226
> 193 Load_Cycle_Count 0x0032 192 192 000 Old_age Always - 25655
> 194 Temperature_Celsius 0x0022 119 102 000 Old_age Always - 31
> 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
> 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1
> 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
> 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
> 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> SMART Self-test log structure revision number 1
> Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
> # 1 Extended offline Completed: read failure 90% 489 1326848392
> # 2 Short offline Completed: read failure 90% 489 1326848392
> # 3 Conveyance offline Completed without error 00% 0 -
> # 4 Short offline Completed without error 00% 0 -
> =============================================
>
> sudo dd if=/dev/sdb1 of=/dev/null bs=64k conv=noerror
> ====================================================
> dd: error reading '/dev/sdb1': Input/output error
> 43920419+1 records in
> 43920419+1 records out
> 2878368583680 bytes (2.9 TB, 2.6 TiB) copied, 24480.1 s, 118 MB/s
> 45785391+1 records in
> 45785391+1 records out
> 3000591388672 bytes (3.0 TB, 2.7 TiB) copied, 26169.8 s, 115 MB/s
> =================
>
> running smartctl on disk 1(the OS) seems clear although having run the "short" test overnight it is stuck at 90%
>
> So I am thinking the drives are not the cause of this issue. Anyone have any ideas?
>
> Thanks
>
> Lee.
>

Someone is getting a new hard drive for Valentines Day.
And that was nine days ago.

Power_On_Hours 66060 # I do have one this old. One drive of thirty two drives.
Load_Cycle_Count 25655 # head park every two or three hours
# it is not an aggressive parking-drive

Current_Pending_Sector 1 # No spare is available nearby, by the looks of it
# Normally, Current_Pending never accumulates a count
# That means we can't make the error go away.
# But you can certainly try. It could, for example
# be a high-fly error.

If I was a parish priest, I would tell you to "do a write pass followed
by a read pass". Which would be 6 hours to write the entire drive and
6 hours to read the entire drive. Then, run smartctl again and see if
the Current Pending is gone. It's like a Hail Mary for having sinned.

This may cause the things that were "sticking" or "slow" before, to perk
up a tiny bit. As you're not waiting 15 seconds for a timeout. If it's
a high fly error, a rewrite can "fix" the sector.

But as a shoot-from-the-hip comment, 66000 hours on a 3TB drive,
it "has served you well". The only way I could get that hour
count, was on a 500GB drive. Some of the less-dense drives, were
exceptionally good on hours. The bigger ones tend to be more shitty.
I've had drives start to show their true-self, at 5000 hours.

Just don't buy the cheapest SKU. There are exceptions to that rule,
but then again, they are not the absolute cheapest. My WD Blue, now
that was crap. The recent WD Black 1TB have been low-cost for some
reason, but are usually cost a tiny bit more than a WD Blue.

Seagate can vary from generation to generation. You need some customer
reviews that haven't been fudged, to capture the essence of the product.

You want a Perpendicular Magnetic Recording (PMR), not
a Shingled Magnetic Recording (SMR) drive. SMR drives are not
good as boot drives. They may be used as data drives... if
you are "desperate for trouble". The manufacturers do not
want to identify the SMR ones, and they have had to apologize
on at least one occasion, for slipping SMR into applications
where they do not belong (as a near-line NAS drive).

Helium drives start at either 6TB or 8TB capacity. There still isn't a
good idea as to how long the Helium stays inside the drive.
Apparently there is a sensor inside the drive, and some SMART
parameter may cover that. I have some 6TB drives here, and those
are air breather drives (the normal kind), rather than (sealed)
Helium drives. Helium drives have two covers and no breather hole.
(A breather hole is marked as "do not cover this hole", although
some models do not have a warning on the label any more.)

Paul

On Thu, 23 Feb 2023 07:36:42 -0500, Paul wrote:

> But as a shoot-from-the-hip comment, 66000 hours on a 3TB drive,
> it "has served you well". The only way I could get that hour count, was
> on a 500GB drive. Some of the less-dense drives, were exceptionally good
> on hours. The bigger ones tend to be more shitty.
> I've had drives start to show their true-self, at 5000 hours.

I normally pension drives off at about 50,000 hours, but noticed recently
that I had some that had reached longer values - in one case nearly
90,000. I think they were 1TB or 2TB.

I have replaced them all. Easy enough as they were paired in mirrors -
take one out of the mirror, change it, insert it back, wait for a sync.
Then do it with the other one.

> Just don't buy the cheapest SKU. There are exceptions to that rule,
> but then again, they are not the absolute cheapest. My WD Blue, now that
> was crap. The recent WD Black 1TB have been low-cost for some reason,
> but are usually cost a tiny bit more than a WD Blue.

I tend to use WD Red - - but always Plus or Pro as they are not SMR.

> You want a Perpendicular Magnetic Recording (PMR), not a Shingled
> Magnetic Recording (SMR) drive.

WD tend to call the non-shingled ones 'CMR' - Conventional Magnetic
Recording. Probably some marketing thing to make them sound old fashioned.

> SMR drives are not good as boot drives.

They are a disaster as part of a RAID array.

> They may be used as data drives... if you are "desperate for trouble".
> The manufacturers do not want to identify the SMR ones

WD don't make it crystal clear, but they do tell you if you look
carefully.

--
My posts are my copyright and if @diy_forums or Home Owners' Hub
wish to copy them they can pay me £1 a message.
Use the BIG mirror service in the UK: http://www.mirrorservice.org
*lightning surge protection* - a w_tom conductor

On Thu, 23 Feb 2023 23:36:42 +1100, Paul <nospam@needed.invalid> wrote:

> On 2/23/2023 2:03 AM, leen...@yahoo.co.uk wrote:
>> Hi All,
>> I have all my files stored on a central Ubuntu based server with 3
>> drives
>> 1. the OS
>> 2. all my data
>> 3. local backup
>> It has been fine for a few years but annoyingly recently when
>> accessing the data through an NFS mount it times out when reading the
>> directory. Remotely logging on to the server if I try to "ls" that
>> directory it takes say 30 mins to do it. Once done, the subsequent
>> "ls" works immediately and also the NFS works correctly again.
>> I initially thought it was because drive 2 is starting to fail but
>> looking at smartctrl (run long and short tests) and then reading each
>> block with "dd" it seems like there are 2 dodgy blocks but besides that
>> I think it is ok?
>> smartctl -a gives
>> ==========================================
>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
>> UPDATED WHEN_FAILED RAW_VALUE
>> 1 Raw_Read_Error_Rate 0x002f 199 199 051 Pre-fail
>> Always - 92225
>> 3 Spin_Up_Time 0x0027 186 171 021 Pre-fail
>> Always - 5683
>> 4 Start_Stop_Count 0x0032 100 100 000 Old_age
>> Always - 427
>> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
>> Always - 0
>> 7 Seek_Error_Rate 0x002e 100 253 000 Old_age
>> Always - 0
>> 9 Power_On_Hours 0x0032 010 010 000 Old_age
>> Always - 66060
>> 10 Spin_Retry_Count 0x0032 100 100 000 Old_age
>> Always - 0
>> 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age
>> Always - 0
>> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
>> Always - 424
>> 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age
>> Always - 226
>> 193 Load_Cycle_Count 0x0032 192 192 000 Old_age
>> Always - 25655
>> 194 Temperature_Celsius 0x0022 119 102 000 Old_age
>> Always - 31
>> 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age
>> Always - 0
>> 197 Current_Pending_Sector 0x0032 200 200 000 Old_age
>> Always - 1
>> 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age
>> Offline - 0
>> 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age
>> Always - 0
>> 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age
>> Offline - 0
>> SMART Error Log Version: 1
>> No Errors Logged
>> SMART Self-test log structure revision number 1
>> Num Test_Description Status Remaining
>> LifeTime(hours) LBA_of_first_error
>> # 1 Extended offline Completed: read failure 90%
>> 489 1326848392
>> # 2 Short offline Completed: read failure 90%
>> 489 1326848392
>> # 3 Conveyance offline Completed without error 00%
>> 0 -
>> # 4 Short offline Completed without error 00%
>> 0 -
>> =============================================
>> sudo dd if=/dev/sdb1 of=/dev/null bs=64k conv=noerror
>> ====================================================
>> dd: error reading '/dev/sdb1': Input/output error
>> 43920419+1 records in
>> 43920419+1 records out
>> 2878368583680 bytes (2.9 TB, 2.6 TiB) copied, 24480.1 s, 118 MB/s
>> 45785391+1 records in
>> 45785391+1 records out
>> 3000591388672 bytes (3.0 TB, 2.7 TiB) copied, 26169.8 s, 115 MB/s
>> =================
>> running smartctl on disk 1(the OS) seems clear although having run the
>> "short" test overnight it is stuck at 90%
>> So I am thinking the drives are not the cause of this issue. Anyone
>> have any ideas?
>> Thanks
>> Lee.
>>
>
> Someone is getting a new hard drive for Valentines Day.
> And that was nine days ago.

Remains to be seen if that fixes the problem.

> Power_On_Hours 66060

One of mine is 101724

> # I do have one this old. One drive of thirty two drives.
> Load_Cycle_Count 25655 # head park every two or three hours
> # it is not an aggressive parking-drive
>
> Current_Pending_Sector 1 # No spare is available nearby, by the
> looks of it

Wrong. It means that that sector has not been written to yet.

> # Normally, Current_Pending never
> accumulates a count
> # That means we can't make the error go
> away.
> # But you can certainly try. It could,
> for example
> # be a high-fly error.
>
> If I was a parish priest, I would tell you to "do a write pass followed
> by a read pass". Which would be 6 hours to write the entire drive and
> 6 hours to read the entire drive. Then, run smartctl again and see if
> the Current Pending is gone. It's like a Hail Mary for having sinned.

Nope, its what the drive is waiting for before it reallocates the sector.

> This may cause the things that were "sticking" or "slow" before, to perk
> up a tiny bit. As you're not waiting 15 seconds for a timeout. If it's
> a high fly error, a rewrite can "fix" the sector.
>
> But as a shoot-from-the-hip comment, 66000 hours on a 3TB drive,
> it "has served you well". The only way I could get that hour
> count, was on a 500GB drive.

Mine is a 2TB drive.

> Some of the less-dense drives, were
> exceptionally good on hours. The bigger ones tend to be more shitty.
> I've had drives start to show their true-self, at 5000 hours.

> Just don't buy the cheapest SKU. There are exceptions to that rule,
> but then again, they are not the absolute cheapest. My WD Blue, now
> that was crap. The recent WD Black 1TB have been low-cost for some
> reason, but are usually cost a tiny bit more than a WD Blue.

> Seagate can vary from generation to generation. You need some customer
> reviews that haven't been fudged, to capture the essence of the product.

> You want a Perpendicular Magnetic Recording (PMR), not
> a Shingled Magnetic Recording (SMR) drive. SMR drives are not
> good as boot drives. They may be used as data drives... if
> you are "desperate for trouble". The manufacturers do not
> want to identify the SMR ones, and they have had to apologize
> on at least one occasion, for slipping SMR into applications
> where they do not belong (as a near-line NAS drive).

> Helium drives start at either 6TB or 8TB capacity. There still isn't a
> good idea as to how long the Helium stays inside the drive.
> Apparently there is a sensor inside the drive, and some SMART
> parameter may cover that. I have some 6TB drives here, and those
> are air breather drives (the normal kind), rather than (sealed)
> Helium drives. Helium drives have two covers and no breather hole.
> (A breather hole is marked as "do not cover this hole", although
> some models do not have a warning on the label any more.)

Re: OT Ubuntu drive/ directory/ NFS issue

<ttabgq$279gm$3@dont-email.me>

copy mid

https://www.novabbs.com/aus+uk/article-flat.php?id=89565&group=uk.d-i-y#89565

copy link Newsgroups: uk.d-i-y

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: tnp...@invalid.invalid (The Natural Philosopher)
Newsgroups: uk.d-i-y
Subject: Re: OT Ubuntu drive/ directory/ NFS issue
Date: Fri, 24 Feb 2023 12:45:14 +0000
Organization: A little, after lunch
Lines: 24
Message-ID: <ttabgq$279gm$3@dont-email.me>
References: <9745376c-a045-45b4-892b-a1c965245324n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 24 Feb 2023 12:45:14 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="9a37c6725f4cfa5f0c4c846de80a481a";
logging-data="2336278"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/QiW9mRguBkZksdgUoHo5opfwGYy1bHI8="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.7.1
Cancel-Lock: sha1:5XjOrfragtcWT+a6IpZfVRUhE/U=
Content-Language: en-GB
In-Reply-To: <9745376c-a045-45b4-892b-a1c965245324n@googlegroups.com>

by: The Natural Philosop - Fri, 24 Feb 2023 12:45 UTC

On 23/02/2023 07:03, leen...@yahoo.co.uk wrote:
> Hi All,
>
> I have all my files stored on a central Ubuntu based server with 3 drives
> 1. the OS
> 2. all my data
> 3. local backup
>
> It has been fine for a few years but annoyingly recently when accessing the data through an NFS mount it times out when reading the directory. Remotely logging on to the server if I try to "ls" that directory it takes say 30 mins to do it. Once done, the subsequent "ls" works immediately and also the NFS works correctly again.
>
> I initially thought it was because drive 2 is starting to fail but looking at smartctrl (run long and short tests) and then reading each block with "dd" it seems like there are 2 dodgy blocks but besides that I think it is ok?

No. Your raw error rate should be zero

Replace the disk.

--
“The ultimate result of shielding men from the effects of folly is to
fill the world with fools.”

Herbert Spencer

On Fri, 24 Feb 2023 23:45:14 +1100, The Natural Philosopher
<tnp@invalid.invalid> wrote:

> On 23/02/2023 07:03, leen...@yahoo.co.uk wrote:
>> Hi All,
>> I have all my files stored on a central Ubuntu based server with 3
>> drives
>> 1. the OS
>> 2. all my data
>> 3. local backup
>> It has been fine for a few years but annoyingly recently when
>> accessing the data through an NFS mount it times out when reading the
>> directory. Remotely logging on to the server if I try to "ls" that
>> directory it takes say 30 mins to do it. Once done, the subsequent
>> "ls" works immediately and also the NFS works correctly again.
>> I initially thought it was because drive 2 is starting to fail but
>> looking at smartctrl (run long and short tests) and then reading each
>> block with "dd" it seems like there are 2 dodgy blocks but besides that
>> I think it is ok?
>
> No. Your raw error rate should be zero

Nope.

> Replace the disk.

Nope.

On 2/24/2023 7:45 AM, The Natural Philosopher wrote:
> On 23/02/2023 07:03, leen...@yahoo.co.uk wrote:
>> Hi All,
>>
>> I have all my files stored on a central Ubuntu based server with 3 drives
>> 1. the OS
>> 2. all my data
>> 3. local backup
>>
>> It has been fine for a few years but annoyingly recently when accessing the data through an NFS mount it times out when reading the directory. Remotely logging on to the server if I try to "ls" that directory it takes say 30 mins to do it. Once done, the subsequent "ls" works immediately and also the NFS works correctly again.
>>
>> I initially thought it was because drive 2 is starting to fail but looking at smartctrl (run long and short tests) and then reading each block with "dd" it seems like there are 2 dodgy blocks but besides that I think it is ok?
>
> No. Your raw error rate should be zero
>
> Replace the disk.
>

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 199 199 051 Pre-fail Always - 92225
^^^ ^^^

The "Value" is higher than the "Threshold", so the drive passes.

Let's look at my ST4000DM000-2AE1 drive, a drive I bought to see if
Seagate had learned their lesson yet. The "Value" is still higher than
the "Threshold". The drive is not in any trouble (the drive has 550 hours
on it). You can see the "Worst" it has ever been, is a bit closer to the failure threshold
(so we know which direction the statistic goes in).

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 076 064 006 Pre-fail Always - 35656066

The field that has been removed and is not visible, is

Hardware_ECC_Corrected 35656066

Every bloody error was fixed! Yet, they don't tell you that.
That is why the Raw Value in question, is "not a death sentence".
It is a quality metric. If it is 10^7, that's still OK. If it
is 2*10^8, that rates as "failed".

There is yet another missing field.

Recorded_Uncorrected_Errors 1078

What does that mean ?

Should not ( Raw - Corrected ) be equal to Uncorrected ?

Apparently not!

Even the fields with names, they are NOT being used for their
intended purpose. Is the purpose defined in the standard ?
I've never seen ANY standards text leak, so I don't even know
whether good-quality definitions are available or not.

There is a field called "Current Pending", which was supposed to be
a queue of questionable sectors that needed "processing". As far as
I know, the processing might happen around a "write event" to the
sector. Either the write works, or it doesn't. You could do a read
verify. You could use automatic sparing, and replace the sector with
another, if attempts to make the existing sector work, failed.

Now, I watched a whole bunch of HDTune smart tables (I own 32 drives).
I watched as a drive declined in health. The Reallocated Sector Count field
was accumulating counts. 200 one day for raw data. 300 the next day.
Yet, while that was going on, Current Pending stayed at 0 like
an obedient puppy! It was quite obvious, that the *actual*
queue of dodgy sectors was hidden from us.

Then one day, finally, a Current Pending went to a raw value of 1.
What was that. Well, since a hard CRC error appeared after that,
it occurred to me, that spares in the local area were exhausted,
and that seemed to correlate with the Current Pending *finally*
going off the peg. THIS is why I am recommending to the OP to
replace drive. It is based on me correlating Current Pending
activity as related to "all_spares_exhausted" in that area.

As it is, Reallocated Sector Count is thresholded. The first
hundred thousand corrected sectors are ignored. At some point,
they start displaying the reallocations, and there is a finite
number of those remaining to be counted. On one drive, if I had
accumulated 5500 reallocations past the thresholded value,
the drive would likely be declared "failed" at that point.

Summary: While it is fun to poke your finger at "Raw" field values,
since the interpretation is "not possible", due to fields
missing and fields being used for the wrong purpose, all
we can continue to do is look at "Value" and "Threshold"
as indicators. The Smartctl that generated the OPs table,
we don't know the drive model number, and the "summary field"
above that table undoubtedly says "Good". Which in many cases
is bullshit, because the drive is not "Good", if you did the
analysis properly the drive would be "Fair".

I use a benchmark transfer curve, to spot trouble. On Windows,
there is HDTune for this. On Linux, Gnome-disks does have
a benchmark (make SURE to turn off the write tick box as
you don't want the bench overwriting the drive). The graphical
resolution of the gnome-disks benchmark, is too crude for
determining drive health. Of you spot a 50GB wide swath of disk
surface running the bench at only 10MB/sec, that is an indicator
to replace the drive as well, even though smartctl has
rated the drive "Good". SMART works best when defects are
uniformly spread over the surface. If the defects concentrate
in one spot on the disk, then the metrics in the table will not
work properly to declare a "Fail". Thus, if you the user, use
a read benchmark curve, a good quality one, you can spot trouble
before smartctl does.

Paul

Our houseplants have a good sense of humous.

aus+uk / uk.d-i-y / Re: OT Ubuntu drive/ directory/ NFS issue

Subject	Author
OT Ubuntu drive/ directory/ NFS issue	leen...@yahoo.co.uk
Re: OT Ubuntu drive/ directory/ NFS issue	Ottavio Caruso
Re: OT Ubuntu drive/ directory/ NFS issue	Jim Jackson
Re: OT Ubuntu drive/ directory/ NFS issue	Paul
Re: OT Ubuntu drive/ directory/ NFS issue	Bob Eager
Re: OT Ubuntu drive/ directory/ NFS issue	Rod Speed
Re: OT Ubuntu drive/ directory/ NFS issue	The Natural Philosopher
Re: OT Ubuntu drive/ directory/ NFS issue	Rod Speed
Re: OT Ubuntu drive/ directory/ NFS issue	Paul