Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  nodelist  faq  login

"Don't fear the pen. When in doubt, draw a pretty picture." -- Baker's Third Law of Design.


computers / comp.sys.dec / best guess for mount-verification problem

SubjectAuthor
* best guess for mount-verification problemPhillip Helbig (undress to reply
`* Re: best guess for mount-verification problemPhillip Helbig (undress to reply
 `* Re: best guess for mount-verification problemPhillip Helbig (undress to reply
  `* Re: best guess for mount-verification problemHans Bachner
   `- Re: best guess for mount-verification problemPhillip Helbig (undress to reply

1
Subject: best guess for mount-verification problem
From: Phillip Helbig (undr
Newsgroups: comp.os.vms, comp.sys.dec
Organization: Multivax C&R
Date: Mon, 28 Jun 2021 10:27 UTC
Path: i2pn2.org!i2pn.org!aioe.org!sEhyPUBSAewVba/Xrg1Apw.user.gioia.aioe.org.POSTED!not-for-mail
From: hel...@asclothestro.multivax.de (Phillip Helbig (undress to reply)
Newsgroups: comp.os.vms,comp.sys.dec
Subject: best guess for mount-verification problem
Date: Mon, 28 Jun 2021 10:27:29 +0000 (UTC)
Organization: Multivax C&R
Lines: 41
Message-ID: <sbc86h$656$1@gioia.aioe.org>
NNTP-Posting-Host: sEhyPUBSAewVba/Xrg1Apw.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 8bit
X-Complaints-To: abuse@aioe.org
X-Notice: Filtered by postfilter v. 0.9.2
View all headers
I have a three-node cluster (when no satellite or test system has joined
it) and physical disks (blue SBB in BA356) on each node (no dual-ported
disks; each disk has a direct connection to only one node).  All disks
are HBVS; system disks have both members on one node while others have
both (in one case three) members on different nodes.  I've been running
such a setup (though with different machines, even different
architectures, different disks, different expansion boxes) for decades.

When something fails, I just replace it with something of similar build.
(The main reason for moving to SBB disks was to be able to replace a
disk (the most common failure) without having to dismount the members it
hosts, shut down the system, remove it from the shelf, open it, replace
the disk, close it, put it back on the shelf, boot it, remount the
members it hosts.)

For a while now I've noticed disks going in and out of mount
verification.  It is clear which node is involved.  So, my plan is to
replace hardware (and maybe try to find the problem when the hardware is
out of the cluster) and hope that it goes away.  Since all disks with
members on this system, but no others, are involved, it is clear that
the problem is only on one node.  It is unlikely to be a problem with
the physical SCSI disks.

Theoretically it could be the SCSI cable, but my guess is that it is
either the expansion box or the SCSI card.  (I have had one expansion
box fail, but it failed completely.)  Which is more likely?

Has anyone seen anything like this before?  The mount-verification
problem occurs regularly every few minutes, but always completes
automatically after a few seconds or half a minute or so (depending on
the shadow set).

It would be easiest to replace the BA356: dismount the members, power
down the box, remove the members, stick them in another box, swap the
cables, power up the other box, remount the members (and be very
thankful for MINICOPY).  Of course, if it is exceedingly unlikely that
the box is the problem, as opposed to the SCSI card (or something else
which I haven't thought of), then that would be a waste of time.

Thoughts?



Subject: Re: best guess for mount-verification problem
From: Phillip Helbig (undr
Newsgroups: comp.os.vms, comp.sys.dec
Organization: Multivax C&R
Date: Mon, 28 Jun 2021 10:40 UTC
References: 1
Path: i2pn2.org!i2pn.org!aioe.org!sEhyPUBSAewVba/Xrg1Apw.user.gioia.aioe.org.POSTED!not-for-mail
From: hel...@asclothestro.multivax.de (Phillip Helbig (undress to reply)
Newsgroups: comp.os.vms,comp.sys.dec
Subject: Re: best guess for mount-verification problem
Date: Mon, 28 Jun 2021 10:40:30 +0000 (UTC)
Organization: Multivax C&R
Lines: 37
Message-ID: <sbc8uu$hn9$1@gioia.aioe.org>
References: <sbc86h$656$1@gioia.aioe.org>
NNTP-Posting-Host: sEhyPUBSAewVba/Xrg1Apw.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 8bit
X-Complaints-To: abuse@aioe.org
X-Notice: Filtered by postfilter v. 0.9.2
View all headers
In article <sbc86h$656$1@gioia.aioe.org>,
helbig@asclothestro.multivax.de (Phillip Helbig (undress to reply))
writes:

I have a three-node cluster (when no satellite or test system has joined
it) and physical disks (blue SBB in BA356) on each node (no dual-ported
disks; each disk has a direct connection to only one node). 

When something fails, I just replace it with something of similar build.
(The main reason for moving to SBB disks was to be able to replace a
disk (the most common failure) without having to dismount the members it
hosts, shut down the system, remove it from the shelf, open it, replace
the disk, close it, put it back on the shelf, boot it, remount the
members it hosts.)

For a while now I've noticed disks going in and out of mount
verification.  It is clear which node is involved.  So, my plan is to
replace hardware (and maybe try to find the problem when the hardware is
out of the cluster) and hope that it goes away. 

Theoretically it could be the SCSI cable, but my guess is that it is
either the expansion box or the SCSI card.  (I have had one expansion
box fail, but it failed completely.)  Which is more likely?

I try to keep enough hardware on hand to last me until I die.  However,
since a BA356 failed a couple of years ago, and another one might be
failing now, the fact that I have only two spares is a bit unsettling. 
Does anyone in Europe have any BLUE BA356 boxes they would like to give
to a good home?  (I might be willing to offer some older hardware in
exchange, in particular VAX-related stuff.)  Shipping is probably too
expensive, but I'll probably be passing through sometime in the next
several months.

The boxes have to be Top-Gun Blue, since I also have the blue disks. 
(Since disks almost always fail at some point, such disks would also be
interesting for me; one can't have too many spare disks.)



Subject: Re: best guess for mount-verification problem
From: Phillip Helbig (undr
Newsgroups: comp.os.vms, comp.sys.dec
Organization: Multivax C&R
Date: Mon, 28 Jun 2021 10:57 UTC
References: 1 2
Path: i2pn2.org!i2pn.org!aioe.org!sEhyPUBSAewVba/Xrg1Apw.user.gioia.aioe.org.POSTED!not-for-mail
From: hel...@asclothestro.multivax.de (Phillip Helbig (undress to reply)
Newsgroups: comp.os.vms,comp.sys.dec
Subject: Re: best guess for mount-verification problem
Date: Mon, 28 Jun 2021 10:57:39 +0000 (UTC)
Organization: Multivax C&R
Lines: 39
Message-ID: <sbc9v3$v43$1@gioia.aioe.org>
References: <sbc86h$656$1@gioia.aioe.org> <sbc8uu$hn9$1@gioia.aioe.org>
NNTP-Posting-Host: sEhyPUBSAewVba/Xrg1Apw.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 8bit
X-Complaints-To: abuse@aioe.org
X-Notice: Filtered by postfilter v. 0.9.2
View all headers
In article <sbc8uu$hn9$1@gioia.aioe.org>,
helbig@asclothestro.multivax.de (Phillip Helbig (undress to reply))
writes:

In article <sbc86h$656$1@gioia.aioe.org>,
helbig@asclothestro.multivax.de (Phillip Helbig (undress to reply))
writes:

I have a three-node cluster (when no satellite or test system has joined
it) and physical disks (blue SBB in BA356) on each node (no dual-ported
disks; each disk has a direct connection to only one node). 

When something fails, I just replace it with something of similar build.
(The main reason for moving to SBB disks was to be able to replace a
disk (the most common failure) without having to dismount the members it
hosts, shut down the system, remove it from the shelf, open it, replace
the disk, close it, put it back on the shelf, boot it, remount the
members it hosts.)

For a while now I've noticed disks going in and out of mount
verification.  It is clear which node is involved.  So, my plan is to
replace hardware (and maybe try to find the problem when the hardware is
out of the cluster) and hope that it goes away. 

Theoretically it could be the SCSI cable, but my guess is that it is
either the expansion box or the SCSI card.  (I have had one expansion
box fail, but it failed completely.)  Which is more likely?

OK, spent some time staring at hardware in the cellar.  :-|  It seems
that before the mount verification sets in, the two LEDs to the left of
the plug in the power supply go out, then come back on, then all the
disks light up briefly.  So probably a problem with the box or the power
supply.

I can try replacing the power supply first, if that doesn't help then
the SCSI interface at the top, then if that doesn't help the entire box.

Any other ideas?



Subject: Re: best guess for mount-verification problem
From: Hans Bachner
Newsgroups: comp.sys.dec
Date: Mon, 28 Jun 2021 14:04 UTC
References: 1 2 3
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: han...@bachner.priv.at (Hans Bachner)
Newsgroups: comp.sys.dec
Subject: Re: best guess for mount-verification problem
Date: Mon, 28 Jun 2021 16:04:59 +0200
Lines: 50
Message-ID: <iju38bFspqiU1@mid.individual.net>
References: <sbc86h$656$1@gioia.aioe.org> <sbc8uu$hn9$1@gioia.aioe.org>
<sbc9v3$v43$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net 0oZlnsI9aZLMWlGIa4C1cg6ZAgUsceNrIJLQeYmQzUGyH1g9s=
Cancel-Lock: sha1:A2ML6oxp56SNH1rN5VoAcD/cYrU=
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
Thunderbird/52.9.1
In-Reply-To: <sbc9v3$v43$1@gioia.aioe.org>
Content-Language: en-GB
View all headers
Phillip,

Phillip Helbig (undress to reply) schrieb am 28.06.2021 um 12:57:
In article <sbc8uu$hn9$1@gioia.aioe.org>,
helbig@asclothestro.multivax.de (Phillip Helbig (undress to reply))
writes:

In article <sbc86h$656$1@gioia.aioe.org>,
helbig@asclothestro.multivax.de (Phillip Helbig (undress to reply))
writes:

I have a three-node cluster (when no satellite or test system has joined
it) and physical disks (blue SBB in BA356) on each node (no dual-ported
disks; each disk has a direct connection to only one node).

When something fails, I just replace it with something of similar build.
(The main reason for moving to SBB disks was to be able to replace a
disk (the most common failure) without having to dismount the members it
hosts, shut down the system, remove it from the shelf, open it, replace
the disk, close it, put it back on the shelf, boot it, remount the
members it hosts.)

For a while now I've noticed disks going in and out of mount
verification.  It is clear which node is involved.  So, my plan is to
replace hardware (and maybe try to find the problem when the hardware is
out of the cluster) and hope that it goes away.

Theoretically it could be the SCSI cable, but my guess is that it is
either the expansion box or the SCSI card.  (I have had one expansion
box fail, but it failed completely.)  Which is more likely?

OK, spent some time staring at hardware in the cellar.  :-|  It seems
that before the mount verification sets in, the two LEDs to the left of
the plug in the power supply go out, then come back on, then all the
disks light up briefly.  So probably a problem with the box or the power
supply.

I can try replacing the power supply first, if that doesn't help then
the SCSI interface at the top, then if that doesn't help the entire box.

Any other ideas?

If you don't have a disk in slot 6 you could plug in a second power supply and watch whether the problem (the mount verifications, not the flashing LEDs) disappears.

Anything in VMS's error log? $ DIAG is your friend in this case.

Hope this helps,
Hans.


Subject: Re: best guess for mount-verification problem
From: Phillip Helbig (undr
Newsgroups: comp.sys.dec
Organization: Multivax C&R
Date: Mon, 28 Jun 2021 19:21 UTC
References: 1 2 3 4
Path: i2pn2.org!i2pn.org!aioe.org!sEhyPUBSAewVba/Xrg1Apw.user.gioia.aioe.org.POSTED!not-for-mail
From: hel...@asclothestro.multivax.de (Phillip Helbig (undress to reply)
Newsgroups: comp.sys.dec
Subject: Re: best guess for mount-verification problem
Date: Mon, 28 Jun 2021 19:21:39 +0000 (UTC)
Organization: Multivax C&R
Lines: 19
Message-ID: <sbd7g3$11ed$1@gioia.aioe.org>
References: <sbc86h$656$1@gioia.aioe.org> <sbc8uu$hn9$1@gioia.aioe.org> <sbc9v3$v43$1@gioia.aioe.org> <iju38bFspqiU1@mid.individual.net>
NNTP-Posting-Host: sEhyPUBSAewVba/Xrg1Apw.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 8bit
X-Complaints-To: abuse@aioe.org
X-Notice: Filtered by postfilter v. 0.9.2
View all headers
In article <iju38bFspqiU1@mid.individual.net>, Hans Bachner
<hans@bachner.priv.at> writes:

If you don't have a disk in slot 6 you could plug in a second power
supply and watch whether the problem (the mount verifications, not the
flashing LEDs) disappears.

All slots are full.  :-(  I like the idea of two power supplies, but not
the idea of not using one of the addresses.  Even in the case of
complete failure, each shadow set would have at least one member on at
least one node, so processing could continue.  For my hobbyist purposes,
it is enough (and saves on power) to have just one power supply and swap
if necessary.

Anything in VMS's error log? $ DIAG is your friend in this case.

Will have to take a look.  The problem has been solved, but I can learn
something about DIAG and so on.



1
rocksolid light 0.7.2
clearneti2ptor