Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

24 Apr, 2024: Testing a new version of the Overboard here. If you have an issue post about it to rocksolid.nodes.help (I know. Everyone on Usenet has issues)


computers / comp.sys.raspberry-pi / Re: Weird code crash

SubjectAuthor
* Weird code crashThe Natural Philosopher
+* Re: Weird code crashAhem A Rivet's Shot
|`* Re: Weird code crashThe Natural Philosopher
| `* Re: Weird code crashAhem A Rivet's Shot
|  `- Re: Weird code crashThe Natural Philosopher
+* Re: Weird code crashTheo
|+* Re: Weird code crashRichard Kettlewell
||`* Re: Weird code crashThe Natural Philosopher
|| +* Re: Weird code crashTheo
|| |`- Re: Weird code crashThe Natural Philosopher
|| `* Re: Weird code crashRichard Kettlewell
||  `* Re: Weird code crashThe Natural Philosopher
||   +- Re: Weird code crashRichard Kettlewell
||   +* Re: Weird code crashTheo
||   |+* Re: Weird code crashRichard Kettlewell
||   ||`- Re: Weird code crashThe Natural Philosopher
||   |+- Re: Weird code crashPancho
||   |`- Re: Weird code crashThe Natural Philosopher
||   `* Re: Weird code crashPancho
||    `- Re: Weird code crashMartin Gregorie
|`- Re: Weird code crashcandycanearter07
+- Re: Weird code crashRichard Kettlewell
+- Re: Weird code crashTauno Voipio
+* Re: Weird code crashRalf Fassel
|+* Re: Weird code crashThe Natural Philosopher
||+* Re: Weird code crashDavid W. Hodgins
|||+* Re: Weird code crashThe Natural Philosopher
||||+* Re: Weird code crashDavid W. Hodgins
|||||`* Re: Weird code crashThe Natural Philosopher
||||| `* Re: Weird code crashDavid W. Hodgins
|||||  `- Re: Weird code crashThe Natural Philosopher
||||`* Re: Weird code crashcandycanearter07
|||| `* Re: Weird code crashThe Natural Philosopher
||||  `- Re: Weird code crashcandycanearter07
|||`- Re: Weird code crashRichard Kettlewell
||+* Re: Weird code crashRobert Riches
|||`- Re: Weird code crashThe Natural Philosopher
||`* Re: Weird code crashRalf Fassel
|| `* Re: Weird code crashThe Natural Philosopher
||  +* Re: Weird code crashRichard Kettlewell
||  |`- Re: Weird code crashThe Natural Philosopher
||  `* Re: Weird code crashRalf Fassel
||   `* Re: Weird code crashThe Natural Philosopher
||    +* Re: Weird code crashTheo
||    |+* Re: Weird code crashThe Natural Philosopher
||    ||`* Re: Weird code crashvallor
||    || `* Re: Weird code crashThe Natural Philosopher
||    ||  +* Re: Weird code crashRalf Fassel
||    ||  |`- Re: Weird code crashvallor
||    ||  `- Re: Weird code crashvallor
||    |+* Re: Weird code crashThe Natural Philosopher
||    ||`* Re: Weird code crashRich
||    || `- Re: Weird code crashCharlie Gibbs
||    |`* Re: Weird code crashcandycanearter07
||    | +- Re: Weird code crashRich
||    | +- Re: Weird code crashThe Natural Philosopher
||    | `* Re: Weird code crashRichard Kettlewell
||    |  `- Re: Weird code crashThe Natural Philosopher
||    +- Re: Weird code crashRalf Fassel
||    `* Re: Weird code crashRalf Fassel
||     `* Re: Weird code crashThe Natural Philosopher
||      +* Re: Weird code crashRich
||      |+* Re: Weird code crashThe Natural Philosopher
||      ||`- Re: Weird code crashRich
||      |`- Re: Weird code crashCharlie Gibbs
||      `- Re: Weird code crashRalf Fassel
|`- Re: Weird code crashcandycanearter07
`* Re: Weird code crashnev young
 `* Re: Weird code crashThe Natural Philosopher
  `* Re: Weird code crashTheo
   `- Re: Weird code crashThe Natural Philosopher

Pages:123
Re: Weird code crash

<slrnug79vi.ni1.spamtrap42@one.localnet>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7167&group=comp.sys.raspberry-pi#7167

  copy link   Newsgroups: comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: spamtra...@jacob21819.net (Robert Riches)
Newsgroups: comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: 15 Sep 2023 00:40:18 GMT
Organization: none-at-all
Lines: 31
Message-ID: <slrnug79vi.ni1.spamtrap42@one.localnet>
References: <udu5c4$2gutd$1@dont-email.me> <ygamsxoixhx.fsf@akutech.de>
<udv982$2lppb$4@dont-email.me>
Reply-To: spamtrap42@jacob21819.net
X-Trace: individual.net 9Amils9Vrif7LpL3OwUlIwwiAz2aaeOtiq8JWg0ABFJXJku/jm
Cancel-Lock: sha1:nALC9YwgBnrr2JZofUddNTH8Cas= sha256:QmCVVeT1F0B3FGviNwNiUxLYpspKaWdqKL1x4kBvu8s=
User-Agent: slrn/1.0.3 (Linux)
 by: Robert Riches - Fri, 15 Sep 2023 00:40 UTC

On 2023-09-14, The Natural Philosopher <tnp@invalid.invalid> wrote:
> On 14/09/2023 16:29, Ralf Fassel wrote:
>> * The Natural Philosopher <tnp@invalid.invalid>
>> | One possibility is that it is opening and reading a file at the
>> | precise time another process is writing it...in both cases the read
>> | and write
>> | operations are atomic and done with C code.
>>>
>> | READ
>> | ====
>> | fp=fopen(fullname, "r");
>> | len=fread(filbuf,1,255,fp); // read entire file
>>
>> Check for fp != NULL is missing here in this example code before
>> fread(). If this also in the production version, it might be a problem
>> if the file is not accessible for any reason.
>>
>> R'
> Ralf, I already put that in this morning, re compiled the code and after
> an hour, it crashed again.
>
> The filename is built by scanning a directory so the filename must exist.

Maybe not applicable in this situation, but if something deleted
the file between the time of the scan and the time of the fopen
call, it might/would not exist.

--
Robert Riches
spamtrap42@jacob21819.net
(Yes, that is one of my email addresses.)

Re: Weird code crash

<wwvv8cbgaw9.fsf@LkoBDZeT.terraraq.uk>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7168&group=comp.sys.raspberry-pi#7168

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!nntp.terraraq.uk!.POSTED.tunnel.sfere.anjou.terraraq.org.uk!not-for-mail
From: inva...@invalid.invalid (Richard Kettlewell)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 08:20:54 +0100
Organization: terraraq NNTP server
Message-ID: <wwvv8cbgaw9.fsf@LkoBDZeT.terraraq.uk>
References: <udu5c4$2gutd$1@dont-email.me> <ygamsxoixhx.fsf@akutech.de>
<udv982$2lppb$4@dont-email.me>
<op.2a9vj3yca3w0dxdave@hodgins.homeip.net>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: innmantic.terraraq.uk; posting-host="tunnel.sfere.anjou.terraraq.org.uk:172.17.207.6";
logging-data="92867"; mail-complaints-to="usenet@innmantic.terraraq.uk"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:TI/jDFNfEnSdrxLJe04TBmqe2jI=
X-Face: h[Hh-7npe<<b4/eW[]sat,I3O`t8A`(ej.H!F4\8|;ih)`7{@:A~/j1}gTt4e7-n*F?.Rl^
F<\{jehn7.KrO{!7=:(@J~]<.[{>v9!1<qZY,{EJxg6?Er4Y7Ng2\Ft>Z&W?r\c.!4DXH5PWpga"ha
+r0NzP?vnz:e/knOY)PI-
X-Boydie: NO
 by: Richard Kettlewell - Fri, 15 Sep 2023 07:20 UTC

"David W. Hodgins" <dwhodgins@nomail.afraid.org> writes:
> The Natural Philosopher <tnp@invalid.invalid> wrote:
>> I am leaning towards possibly a cracked solder joint or board.

Again, I agree with Theo. Reported behavior is not really consistent
with a hardware fault.

> Have you run fsck on the file system since the power loss? Make sure the fstab
> entry does not have a zero in the sixth field for the file system(s) in use.
> If using systemd, run dracut -f after any fstab changes. Then reboot.

Reported behavior is also not consistent with a corrupt filesystem.

--
https://www.greenend.org.uk/rjk/

Re: Weird code crash

<wwvpm2jgagf.fsf@LkoBDZeT.terraraq.uk>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7169&group=comp.sys.raspberry-pi#7169

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!news.nntp4.net!nntp.terraraq.uk!.POSTED.tunnel.sfere.anjou.terraraq.org.uk!not-for-mail
From: inva...@invalid.invalid (Richard Kettlewell)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 08:30:24 +0100
Organization: terraraq NNTP server
Message-ID: <wwvpm2jgagf.fsf@LkoBDZeT.terraraq.uk>
References: <udu5c4$2gutd$1@dont-email.me>
<VGs*53kqz@news.chiark.greenend.org.uk>
<wwv8r99b1ui.fsf@LkoBDZeT.terraraq.uk> <udusa0$2k1q0$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: innmantic.terraraq.uk; posting-host="tunnel.sfere.anjou.terraraq.org.uk:172.17.207.6";
logging-data="92867"; mail-complaints-to="usenet@innmantic.terraraq.uk"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:Rkw2SAlLyUdIckCQr6/OjPygON8=
X-Face: h[Hh-7npe<<b4/eW[]sat,I3O`t8A`(ej.H!F4\8|;ih)`7{@:A~/j1}gTt4e7-n*F?.Rl^
F<\{jehn7.KrO{!7=:(@J~]<.[{>v9!1<qZY,{EJxg6?Er4Y7Ng2\Ft>Z&W?r\c.!4DXH5PWpga"ha
+r0NzP?vnz:e/knOY)PI-
X-Boydie: NO
 by: Richard Kettlewell - Fri, 15 Sep 2023 07:30 UTC

The Natural Philosopher <tnp@invalid.invalid> writes:
> On 14/09/2023 09:23, Richard Kettlewell wrote:
>> Also:
>> * I would also have a look at the kernel log; if it’s a
>> kernel-generated signal then there’s usually a log message about it.
>>
> Nothing in kern.log after the boot process finishes.

Most likely a bug in your program then.

>> * Run the application under valgrind; depending what the issue is, that
>> will provide a backtrace and perhaps more detailed information. If it
>> is a memory corruption issue then it may identify where the corruption
>> happens, rather than the later point where malloc failed a consistency
>> check (or whatever it is).
>>
>> Using valgrind (and/or compiler sanitizer features) is a good idea
>> even before running into trouble, really.
>
> The strange thing is that it failed once after a minute, then I
> rebooted and it failed after 20 minutes, and its been running several
> days now with no issues at all.
>
> I am not sure valgrind would actually help unless it failed.

It’s extremely good at identifying memory corruption even in cases where
that doesn’t immediately lead to a crash; that’s what it’s for. But if
it doesn’t, you leave it running until the crash happens.

Up to you, of course, whether you use the tools available, or debug with
one hand tied behind your back.

--
https://www.greenend.org.uk/rjk/

Re: Weird code crash

<ygail8biyxm.fsf@akutech.de>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7170&group=comp.sys.raspberry-pi#7170

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: ralf...@gmx.de (Ralf Fassel)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 11:11:01 +0200
Lines: 42
Message-ID: <ygail8biyxm.fsf@akutech.de>
References: <udu5c4$2gutd$1@dont-email.me> <ygamsxoixhx.fsf@akutech.de>
<udv982$2lppb$4@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net QIEf7D5kxk0/+FJRNsXyzAc9JN8gWnwSlinQHknbxIp/MELSU=
Cancel-Lock: sha1:vuc5ZL9rsDaHG70wkTxhmml4TKs= sha1:u1RPg0uYbG6btBA0IL5PldPkG5M= sha256:IUo2s9HLYy4tEPn2edvK2ElT9VDE/uIN+yqPovSSA7k=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
 by: Ralf Fassel - Fri, 15 Sep 2023 09:11 UTC

* The Natural Philosopher <tnp@invalid.invalid>
| On 14/09/2023 16:29, Ralf Fassel wrote:
| > * The Natural Philosopher <tnp@invalid.invalid>
| > | One possibility is that it is opening and reading a file at the
| > | precise time another process is writing it...in both cases the read
| > | and write
| > | operations are atomic and done with C code.
| >>
| > | READ
| > | ====
| > | fp=fopen(fullname, "r");
| > | len=fread(filbuf,1,255,fp); // read entire file
| > Check for fp != NULL is missing here in this example code before
| > fread(). If this also in the production version, it might be a problem
| > if the file is not accessible for any reason.
| > R'
| Ralf, I already put that in this morning, re compiled the code and
| after an hour, it crashed again.
>
| The filename is built by scanning a directory so the filename must exist.

That assumption does not hold. Since scanning and opening are separated
by a time gap (albeit a 'small' one), there is a non-zero chance that
the file vanished between scan and open.

Further possibilities:
- how is 'filbuf' used after the fread()? If you use it as C-string, make
sure it is 0-terminated (fread() won't do that for you). Maybe use
fgets(3) instead?

| I am leaning towards possibly a cracked solder joint or board.

Well, since the Raspi is cheap, that should be easily checked by simply
using another one. I bet 1 beer that it is *not* a cracked board, since
with that many more processes should run into trouble, not only this
particular one.

R' (.sig not from me .-)
--
echo '[ bottles of beer]sa[ bottle of beer]sb[ take one down, pass it around
]sd[ on the wall]sc[no more]se99snlc[lalnpsnPplalnp1-snpldPln1=ylnpsnPp[]pst
ln0<x]sx[salblnpsnPplblnpsnpldPleplaPlcpq]sylxx' | dc

Re: Weird code crash

<ue17bu$360b0$1@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7171&group=comp.sys.raspberry-pi#7171

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tnp...@invalid.invalid (The Natural Philosopher)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 10:15:41 +0100
Organization: A little, after lunch
Lines: 98
Message-ID: <ue17bu$360b0$1@dont-email.me>
References: <udu5c4$2gutd$1@dont-email.me> <ygamsxoixhx.fsf@akutech.de>
<udv982$2lppb$4@dont-email.me> <op.2a9vj3yca3w0dxdave@hodgins.homeip.net>
<udvk6j$2nupi$2@dont-email.me> <op.2a9yq7lva3w0dxdave@hodgins.homeip.net>
<udvl30$2nupi$6@dont-email.me> <op.2a91pirva3w0dxdave@hodgins.homeip.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 15 Sep 2023 09:15:42 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="45cc9fa6fbd082e7ff5e80bf84879cb5";
logging-data="3342688"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+IISZI0L72CZCzcphWtJVYdGIu7CzQTGE="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:QCWbUR1ZBW+FlYKIq3XlxSPPpwQ=
In-Reply-To: <op.2a91pirva3w0dxdave@hodgins.homeip.net>
Content-Language: en-GB
 by: The Natural Philosop - Fri, 15 Sep 2023 09:15 UTC

On 14/09/2023 20:57, David W. Hodgins wrote:
> On Thu, 14 Sep 2023 14:57:36 -0400, The Natural Philosopher
> <tnp@invalid.invalid> wrote:
>
>> On 14/09/2023 19:53, David W. Hodgins wrote:
>>> journalctl -b --no-h|grep fsck
>>
>> Sep 14 14:17:03 systemd[1]: Created slice system-systemd\x2dfsck.slice.
>> Sep 14 14:17:03 systemd[1]: Listening on fsck to fsckd communication
>> Socket.
>> Sep 14 14:17:04 systemd-fsck[109]: e2fsck 1.46.2 (28-Feb-2021)
>> Sep 14 14:17:04 systemd-fsck[109]: rootfs: clean, 51075/932256 files,
>> 460111/3822976 blocks
>> Sep 14 14:17:14 systemd-fsck[178]: fsck.fat 4.2 (2021-01-31)
>> Sep 14 14:17:14 systemd-fsck[178]: There are differences between boot
>> sector and its backup.
>> Sep 14 14:17:14 systemd-fsck[178]: This is mostly harmless. Differences:
>> (offset:original/backup)
>> Sep 14 14:17:14 systemd-fsck[178]:   65:01/00
>> Sep 14 14:17:14 systemd-fsck[178]:   Not automatically fixing this.
>> Sep 14 14:17:14 systemd-fsck[178]: Dirty bit is set. Fs was not properly
>> unmounted and some data may be corrupt.
>> Sep 14 14:17:14 systemd-fsck[178]:  Automatically removing dirty bit.
>> Sep 14 14:17:14 systemd-fsck[178]: *** Filesystem was changed ***
>> Sep 14 14:17:14 systemd-fsck[178]: Writing changes.
>> Sep 14 14:17:14 systemd-fsck[178]: /dev/mmcblk0p1: 330 files,
>> 25815/130554 clusters
>> Sep 14 14:30:12 systemd[1]: systemd-fsckd.service: Succeeded.
>
> If there are any corrupted files, diagnosing any problems they cause
> will be
> difficult. I strongly recommend re-installing.
>
> Regards, Dave Hodgins

If it persists I may do that, but now it is been rock steady for 20 hours.

The actual code has been replaced because I recompiled it anyway, but
the problem persisted after that.

Then I twisted the board a bit, and now it hasn't failed since, No
guarantees of course.

Does anyone else remember Tracy Kidder's 'Soul of a New Machine'* where
they had a wire wrapped backplane on the prototype and a strange
intermittent bug? And the director came in, twisted the backplane and
the bug instantly reappeared?

One of the more curious 'bugs' I encountered was early in my software
career, when code that I wrote suddenly went crazy, in a way in which
the actual software as written could not possibly have caused. And only
on one machine, equipped with a custom video capture card. We removed
the card, but it made no difference.

I then compared the code on the machine with the code as compiled. Two
bytes were FFH

I burned a new floppy and transferred the code again, and the code ran
correctly.

Then we reinstalled the video card. The code ran correctly. Then we
copied over the code again with the video card installed. The code again
was corrupted.

Then the hardware guys looked at the address decide in the video card.
It was a mass of gates one after the other. The total delay was well out
of spec. It dawned on us that what was happening was that the DMA
controller on the floppy was using bus addresses that were being decoded
by the card, and then the IO request came along to access the floppy and
those addresses were still on the bus as far as the sluglike video card
was concerned, so it grabbed the data bus and shoved FFH on it.

Hardware is not perfect. That is the lesson. And chasing software when
its really hardware is a losing game.

Anyway, I have in reserve all the great techniques suggested, but for
now I am playing a wait and see game to see if any pattern emerges. My
experience suggests that the same code running a loop in the same memory
wont crash and burn unless there is a malloc/free mismatch, and that
happens fairly quickly and shows in 'top'.

This kind of weird utterly asynchronous behaviour is often hardware.
And. since I trod on the bloody PCB, I may simple get another one and
test that. It doesn't need to be installed till winter. There is time.
And my PCB design for the relay and PSU module isn't back from China yet...

*https://en.wikipedia.org/wiki/The_Soul_of_a_New_Machine . Definitely
recommended if you haven't read it.

--
"When one man dies it's a tragedy. When thousands die it's statistics."

Josef Stalin

Re: Weird code crash

<ue17dj$360b0$2@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7172&group=comp.sys.raspberry-pi#7172

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tnp...@invalid.invalid (The Natural Philosopher)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 10:16:35 +0100
Organization: A little, after lunch
Lines: 28
Message-ID: <ue17dj$360b0$2@dont-email.me>
References: <udu5c4$2gutd$1@dont-email.me> <ygamsxoixhx.fsf@akutech.de>
<udv982$2lppb$4@dont-email.me> <op.2a9vj3yca3w0dxdave@hodgins.homeip.net>
<udvk6j$2nupi$2@dont-email.me> <udvnk8$2oip3$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 15 Sep 2023 09:16:35 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="45cc9fa6fbd082e7ff5e80bf84879cb5";
logging-data="3342688"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Tmuiz7GQ/lIipfflrqm4HK4PJmZW/JgA="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:FuEiXaoUX5JzC/iA+XGF4VhmGME=
Content-Language: en-GB
In-Reply-To: <udvnk8$2oip3$1@dont-email.me>
 by: The Natural Philosop - Fri, 15 Sep 2023 09:16 UTC

On 14/09/2023 20:40, candycanearter07 wrote:
> On 9/14/23 13:42, The Natural Philosopher wrote:
>> I assumed that the thing would have done its own fsck on every boot
>> anyway...isnt that a debian default?
>
> Pretty sure it's a standard, my arch install has it set.
>>
>> (The sixth fields are 2 and 1 respectively for the file systems)
>>
>>
>> PARTUUID=b8c9fbb7-01  /boot           vfat    defaults          0       2
>> PARTUUID=b8c9fbb7-02  /               ext4    defaults,noatime  0       1
>>
>
> 1 is fsck check for the root partition and 2 is for others, right
>
I looked it up, it merely specifies the order I think, so you are right
in practice.

--
"Corbyn talks about equality, justice, opportunity, health care, peace,
community, compassion, investment, security, housing...."
"What kind of person is not interested in those things?"

"Jeremy Corbyn?"

Re: Weird code crash

<ue17qs$360b0$3@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7173&group=comp.sys.raspberry-pi#7173

  copy link   Newsgroups: comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tnp...@invalid.invalid (The Natural Philosopher)
Newsgroups: comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 10:23:40 +0100
Organization: A little, after lunch
Lines: 46
Message-ID: <ue17qs$360b0$3@dont-email.me>
References: <udu5c4$2gutd$1@dont-email.me> <ygamsxoixhx.fsf@akutech.de>
<udv982$2lppb$4@dont-email.me> <slrnug79vi.ni1.spamtrap42@one.localnet>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 15 Sep 2023 09:23:40 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="45cc9fa6fbd082e7ff5e80bf84879cb5";
logging-data="3342688"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19g6Wl+CvApWHJ5/8r3itS/CZlyzKPvOoc="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:XwTHyUSWPMwmsgKlUUpkxhfO9WE=
Content-Language: en-GB
In-Reply-To: <slrnug79vi.ni1.spamtrap42@one.localnet>
 by: The Natural Philosop - Fri, 15 Sep 2023 09:23 UTC

On 15/09/2023 01:40, Robert Riches wrote:
> On 2023-09-14, The Natural Philosopher <tnp@invalid.invalid> wrote:
>> On 14/09/2023 16:29, Ralf Fassel wrote:
>>> * The Natural Philosopher <tnp@invalid.invalid>
>>> | One possibility is that it is opening and reading a file at the
>>> | precise time another process is writing it...in both cases the read
>>> | and write
>>> | operations are atomic and done with C code.
>>>>
>>> | READ
>>> | ====
>>> | fp=fopen(fullname, "r");
>>> | len=fread(filbuf,1,255,fp); // read entire file
>>>
>>> Check for fp != NULL is missing here in this example code before
>>> fread(). If this also in the production version, it might be a problem
>>> if the file is not accessible for any reason.
>>>
>>> R'
>> Ralf, I already put that in this morning, re compiled the code and after
>> an hour, it crashed again.
>>
>> The filename is built by scanning a directory so the filename must exist.
>
> Maybe not applicable in this situation, but if something deleted
> the file between the time of the scan and the time of the fopen
> call, it might/would not exist.
>

Exactly. That is a possibility, which I have now covered. It made no
difference.

In practice the write code that *replaces* the file is very simple. It is
fopen( "w") immediately followed by
fwrite()

without knowing the exact code involved with the fopen("w"); I cant say
if that actually deletes the file and creates a new one, or merely
truncates it to zero length, or indeed just opens it and trips the
length *after* the new data is written..

--
WOKE is an acronym... Without Originality, Knowledge or Education.

Re: Weird code crash

<ue181l$360b0$4@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7174&group=comp.sys.raspberry-pi#7174

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tnp...@invalid.invalid (The Natural Philosopher)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 10:27:17 +0100
Organization: A little, after lunch
Lines: 40
Message-ID: <ue181l$360b0$4@dont-email.me>
References: <udu5c4$2gutd$1@dont-email.me> <udvbkn$2mg4j$1@dont-email.me>
<udvjua$2nupi$1@dont-email.me> <SGs*uYnqz@news.chiark.greenend.org.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 15 Sep 2023 09:27:17 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="45cc9fa6fbd082e7ff5e80bf84879cb5";
logging-data="3342688"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+5C3zVNCi6SIID3Ck9ZyTKGl+wUp8T2wE="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:djMW2ZOQLnFw2qjWEMfNEZ0eB5s=
Content-Language: en-GB
In-Reply-To: <SGs*uYnqz@news.chiark.greenend.org.uk>
 by: The Natural Philosop - Fri, 15 Sep 2023 09:27 UTC

On 14/09/2023 21:51, Theo wrote:
> In comp.sys.raspberry-pi The Natural Philosopher <tnp@invalid.invalid> wrote:
>> both already done. Not closng it was the cause of a memory leak but I
>> fixed that a fortnight ago.
>>
>> I am beginning to wonder if I did more damage than just the power socket
>> when I trod on it.
>
> SIGABRT is a problem in your code.

Very definite.

Are you sure about that?

If you aren't seeing stuff in the kernel
> log then it almost certainly isn't a hardware fault. It is a very special
> skill to have a hardware fault without spewing lots of stuff there.
>
Even a corrupted bit in a ram disk?

> Post the code somewhere and someone can take a look. Otherwise you need to
> use the development tools available to you to debug the problem.
>

I can post the code, but it may not help. You need the whole system
including the perpiherals that write, to the daemon that writes the data
files that the daemon that crashes reads.

At the moment it is behaving perfectly. Without a reproducible bug I can
see no point in using a debugger.

> Theo

--
There is nothing a fleet of dispatchable nuclear power plants cannot do
that cannot be done worse and more expensively and with higher carbon
emissions and more adverse environmental impact by adding intermittent
renewable energy.

Re: Weird code crash

<ue1966$36a0j$1@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7175&group=comp.sys.raspberry-pi#7175

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tnp...@invalid.invalid (The Natural Philosopher)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 10:46:45 +0100
Organization: A little, after lunch
Lines: 85
Message-ID: <ue1966$36a0j$1@dont-email.me>
References: <udu5c4$2gutd$1@dont-email.me>
<VGs*53kqz@news.chiark.greenend.org.uk>
<wwv8r99b1ui.fsf@LkoBDZeT.terraraq.uk> <udusa0$2k1q0$1@dont-email.me>
<wwvpm2jgagf.fsf@LkoBDZeT.terraraq.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 15 Sep 2023 09:46:46 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="45cc9fa6fbd082e7ff5e80bf84879cb5";
logging-data="3352595"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18F8C8MV/jUHjc4deuOsQgF+RVwZGFosFg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:nzLoU4ck5VDxuGGwwW4qemABnJQ=
In-Reply-To: <wwvpm2jgagf.fsf@LkoBDZeT.terraraq.uk>
Content-Language: en-GB
 by: The Natural Philosop - Fri, 15 Sep 2023 09:46 UTC

On 15/09/2023 08:30, Richard Kettlewell wrote:
> The Natural Philosopher <tnp@invalid.invalid> writes:
>> On 14/09/2023 09:23, Richard Kettlewell wrote:
>>> Also:
>>> * I would also have a look at the kernel log; if it’s a
>>> kernel-generated signal then there’s usually a log message about it.
>>>
>> Nothing in kern.log after the boot process finishes.
>
> Most likely a bug in your program then.
>
>>> * Run the application under valgrind; depending what the issue is, that
>>> will provide a backtrace and perhaps more detailed information. If it
>>> is a memory corruption issue then it may identify where the corruption
>>> happens, rather than the later point where malloc failed a consistency
>>> check (or whatever it is).
>>>
>>> Using valgrind (and/or compiler sanitizer features) is a good idea
>>> even before running into trouble, really.
>>
>> The strange thing is that it failed once after a minute, then I
>> rebooted and it failed after 20 minutes, and its been running several
>> days now with no issues at all.
>>
>> I am not sure valgrind would actually help unless it failed.
>
> It’s extremely good at identifying memory corruption even in cases where
> that doesn’t immediately lead to a crash; that’s what it’s for. But if
> it doesn’t, you leave it running until the crash happens.
>
Well that is an option for sure.

> Up to you, of course, whether you use the tools available, or debug with
> one hand tied behind your back.
>

Tell me in what way a corrupted - say - libc file, or a faulty bit of
memory would show up in the kernel logs?

The problem is that this thing is looping very frequently.
loop()
{
while (1)
{
int i;
readThermometers();
readZones();
readOverrides();
readTimerData();
setRelayState();
setRelays();
usleep (1120000);
}
}

And that means thousands of faultless iterations in a day.

So this bug ( if it is a bug) is a one in a million or worse.

I suppose I could make the thing loop ten times a second (or even
faster) and see if it happens more often..

its not as though its chewing up CPU...

The problem I have is that these crashes only recently started
happening: prior to that the code ran for days. And two things happened,
a massive brownout, and then a full power cut, and I trod on it.

And I made systemd start it...

I see it crashed again last night, again with zero errors apart from
SIGABRT...

I will start it manually and cut systemd out.

--
The lifetime of any political organisation is about three years before
its been subverted by the people it tried to warn you about.

Anon.

Re: Weird code crash

<ue1b2v$36ji2$1@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7176&group=comp.sys.raspberry-pi#7176

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tnp...@invalid.invalid (The Natural Philosopher)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 11:19:11 +0100
Organization: A little, after lunch
Lines: 105
Message-ID: <ue1b2v$36ji2$1@dont-email.me>
References: <udu5c4$2gutd$1@dont-email.me> <ygamsxoixhx.fsf@akutech.de>
<udv982$2lppb$4@dont-email.me> <ygail8biyxm.fsf@akutech.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 15 Sep 2023 10:19:11 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="45cc9fa6fbd082e7ff5e80bf84879cb5";
logging-data="3362370"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+JCMrC/S+OlHywbgCaIdVTXl7NQ8cHgCg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:KQZwRpYJdUzcE+7NHxk/k+LnTCg=
In-Reply-To: <ygail8biyxm.fsf@akutech.de>
Content-Language: en-GB
 by: The Natural Philosop - Fri, 15 Sep 2023 10:19 UTC

On 15/09/2023 10:11, Ralf Fassel wrote:
> * The Natural Philosopher <tnp@invalid.invalid>
> | On 14/09/2023 16:29, Ralf Fassel wrote:
> | > * The Natural Philosopher <tnp@invalid.invalid>
> | > | One possibility is that it is opening and reading a file at the
> | > | precise time another process is writing it...in both cases the read
> | > | and write
> | > | operations are atomic and done with C code.
> | >>
> | > | READ
> | > | ====
> | > | fp=fopen(fullname, "r");
> | > | len=fread(filbuf,1,255,fp); // read entire file
> | > Check for fp != NULL is missing here in this example code before
> | > fread(). If this also in the production version, it might be a problem
> | > if the file is not accessible for any reason.
> | > R'
> | Ralf, I already put that in this morning, re compiled the code and
> | after an hour, it crashed again.
>>
> | The filename is built by scanning a directory so the filename must exist.
>
> That assumption does not hold. Since scanning and opening are separated
> by a time gap (albeit a 'small' one), there is a non-zero chance that
> the file vanished between scan and open.
>
> Further possibilities:
> - how is 'filbuf' used after the fread()? If you use it as C-string, make
> sure it is 0-terminated (fread() won't do that for you). Maybe use
> fgets(3) instead?
>

dir = opendir(VOLATILE_DIR);

if(!dir)
return;
while ((dp = readdir (dir)) != NULL)
{
filename=dp->d_name;
// skip known bollocks
if(!strcmp(filename, "." ) || !strcmp(filename, ".." ) ||
!strcmp(filename, "relays.dat" ))
continue;
// construct full path
sprintf(fullname,"%s/%s",VOLATILE_DIR,filename);
stat(fullname,&stats);// get tfile times
if(time(NULL)-stats.st_ctime >1800) // skip files older than half an hour
continue;
len=strlen(filename);
if(strncmp(filename+len-4, ".dat",4)) // .dat file but not relays.dat
continue;
fp=fopen(fullname, "r");
if(fp==0) //file has disappeared?
continue;
len=fread(filbuf,1,255,fp);
if(len==0) // file has zero length
goto baddata;
filbuf[len]=0;
if(len=strncmp(filbuf,"ZONE",4)) //supposed to reject a file whose
contents do not start with ZONE
goto baddata;

// looking very much like a temperature file
i=(int)filbuf[4] -'1'; // this is our zone from "ZONE2" etc. 1-4 is
zone but index is 0-3 so subtract '1'
p=strstr(filbuf,"\n");
if(p)
{
p++;
if(q=strstr(p,"\n"))
{
*q++=0;
thermometers[i].name=strdup(p); // make a copy of the name and
attach it to our thermometer structure
p=q;
}
else goto baddata;
// now to fetch the temp data.
if(q=strstr(p,"\n"))
{
*q++=0;
thermometers[i].temp=atof(p);
p=q;
}
else goto baddata;
// what's left is the voltage. To hell with any crap after it
thermometers[i].battery=atof(p);
}
baddata:fclose(fp);
} // end of directory scan loop
> | I am leaning towards possibly a cracked solder joint or board.
>
> Well, since the Raspi is cheap, that should be easily checked by simply
> using another one. I bet 1 beer that it is *not* a cracked board, since
> with that many more processes should run into trouble, not only this
> particular one.
>
> R' (.sig not from me .-)

--
There is something fascinating about science. One gets such wholesale
returns of conjecture out of such a trifling investment of fact.

Mark Twain

Re: Weird code crash

<wwvpm2jk8ji.fsf@LkoBDZeT.terraraq.uk>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7177&group=comp.sys.raspberry-pi#7177

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!news.nntp4.net!nntp.terraraq.uk!.POSTED.tunnel.sfere.anjou.terraraq.org.uk!not-for-mail
From: inva...@invalid.invalid (Richard Kettlewell)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 11:58:09 +0100
Organization: terraraq NNTP server
Message-ID: <wwvpm2jk8ji.fsf@LkoBDZeT.terraraq.uk>
References: <udu5c4$2gutd$1@dont-email.me>
<VGs*53kqz@news.chiark.greenend.org.uk>
<wwv8r99b1ui.fsf@LkoBDZeT.terraraq.uk> <udusa0$2k1q0$1@dont-email.me>
<wwvpm2jgagf.fsf@LkoBDZeT.terraraq.uk> <ue1966$36a0j$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: innmantic.terraraq.uk; posting-host="tunnel.sfere.anjou.terraraq.org.uk:172.17.207.6";
logging-data="95884"; mail-complaints-to="usenet@innmantic.terraraq.uk"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:+Z5W7UPZCxORbag6q0FwKzDzDRY=
X-Face: h[Hh-7npe<<b4/eW[]sat,I3O`t8A`(ej.H!F4\8|;ih)`7{@:A~/j1}gTt4e7-n*F?.Rl^
F<\{jehn7.KrO{!7=:(@J~]<.[{>v9!1<qZY,{EJxg6?Er4Y7Ng2\Ft>Z&W?r\c.!4DXH5PWpga"ha
+r0NzP?vnz:e/knOY)PI-
X-Boydie: NO
 by: Richard Kettlewell - Fri, 15 Sep 2023 10:58 UTC

The Natural Philosopher <tnp@invalid.invalid> writes:
> On 15/09/2023 08:30, Richard Kettlewell wrote:
>> The Natural Philosopher <tnp@invalid.invalid> writes:
>>> I am not sure valgrind would actually help unless it failed.
>> It’s extremely good at identifying memory corruption even in cases
>> where that doesn’t immediately lead to a crash; that’s what it’s for.
>> But if it doesn’t, you leave it running until the crash happens.
>
> Well that is an option for sure.
>
>> Up to you, of course, whether you use the tools available, or debug with
>> one hand tied behind your back.
>
> Tell me in what way a corrupted - say - libc file, or a faulty bit of
> memory would show up in the kernel logs?

Very dependent on the nature of the corruption. But you’ve already told
us there’s nothing in the kernel logs.

Anyway, not responsible for advice not taken.

--
https://www.greenend.org.uk/rjk/

Re: Weird code crash

<SGs*X4qqz@news.chiark.greenend.org.uk>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7178&group=comp.sys.raspberry-pi#7178

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!news.chmurka.net!nntp.terraraq.uk!nntp-feed.chiark.greenend.org.uk!ewrotcd!.POSTED.chiark.greenend.org.uk!not-for-mail
From: theom+n...@chiark.greenend.org.uk (Theo)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: 15 Sep 2023 11:58:12 +0100 (BST)
Organization: University of Cambridge, England
Message-ID: <SGs*X4qqz@news.chiark.greenend.org.uk>
References: <udu5c4$2gutd$1@dont-email.me> <VGs*53kqz@news.chiark.greenend.org.uk> <wwv8r99b1ui.fsf@LkoBDZeT.terraraq.uk> <udusa0$2k1q0$1@dont-email.me> <wwvpm2jgagf.fsf@LkoBDZeT.terraraq.uk> <ue1966$36a0j$1@dont-email.me>
Injection-Info: chiark.greenend.org.uk; posting-host="chiark.greenend.org.uk:212.13.197.229";
logging-data="8431"; mail-complaints-to="abuse@chiark.greenend.org.uk"
User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (Linux/5.10.0-22-amd64 (x86_64))
Originator: theom@chiark.greenend.org.uk ([212.13.197.229])
 by: Theo - Fri, 15 Sep 2023 10:58 UTC

In comp.sys.raspberry-pi The Natural Philosopher <tnp@invalid.invalid> wrote:
> Tell me in what way a corrupted - say - libc file, or a faulty bit of
> memory would show up in the kernel logs?

Well, it could be a cosmic ray. The Pi doesn't have ECC memory to it's
possible to bit-flip in RAM or storage without it noticing. I don't know
which part of the galaxy you inhabit, but cosmic rays are rare enough down
here that random bit flips like this don't happen often - ballpark once a
year for a server (which has a much greater surface area to absorb them than
a Pi).

It is also possible to be marginal on signal integrity for PCB interconnect,
but that would mostly be a design fault: either they all work or many of
them don't. Since we don't have a lot of people complaining of the same
problem, we can assume the design is not marginal in that respect.

If computers were that unreliable they would be failing all the time - and
we'd fit ECC to everything. That they aren't suggests bit-flip corruption
isn't a problem. In general random bit-flip errors are not a statistically
major source of crashes, unless you're running a hyper-redundant mainframe
and have eliminated all the other sources.

What are a well-known class of bugs are concurrency/timing races and memory
safety violations. Which is odds-on what's happening here, especially given
we've already picked up on potentially risky code like failing to check for
NULL from fopen().

> And that means thousands of faultless iterations in a day.
>
> So this bug ( if it is a bug) is a one in a million or worse.
>
> I suppose I could make the thing loop ten times a second (or even
> faster) and see if it happens more often..

That would be a useful thing to try.

> its not as though its chewing up CPU...
>
> The problem I have is that these crashes only recently started
> happening: prior to that the code ran for days. And two things happened,
> a massive brownout, and then a full power cut, and I trod on it.

Most of those things would cause it to fail hard (ie not power up), rather
than have a very rare random fault.

> And I made systemd start it...

It is possible that being run from systemd changes the timing or environment
that provokes the fault in some way, but I doubt it would be the cause of
the fault.

Theo

Re: Weird code crash

<wwvjzsrk83f.fsf@LkoBDZeT.terraraq.uk>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7179&group=comp.sys.raspberry-pi#7179

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!news.nntp4.net!nntp.terraraq.uk!.POSTED.tunnel.sfere.anjou.terraraq.org.uk!not-for-mail
From: inva...@invalid.invalid (Richard Kettlewell)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 12:07:48 +0100
Organization: terraraq NNTP server
Message-ID: <wwvjzsrk83f.fsf@LkoBDZeT.terraraq.uk>
References: <udu5c4$2gutd$1@dont-email.me> <ygamsxoixhx.fsf@akutech.de>
<udv982$2lppb$4@dont-email.me> <ygail8biyxm.fsf@akutech.de>
<ue1b2v$36ji2$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: innmantic.terraraq.uk; posting-host="tunnel.sfere.anjou.terraraq.org.uk:172.17.207.6";
logging-data="95884"; mail-complaints-to="usenet@innmantic.terraraq.uk"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:Xtf32VE24+++8JhcK2Fr2xNBa34=
X-Face: h[Hh-7npe<<b4/eW[]sat,I3O`t8A`(ej.H!F4\8|;ih)`7{@:A~/j1}gTt4e7-n*F?.Rl^
F<\{jehn7.KrO{!7=:(@J~]<.[{>v9!1<qZY,{EJxg6?Er4Y7Ng2\Ft>Z&W?r\c.!4DXH5PWpga"ha
+r0NzP?vnz:e/knOY)PI-
X-Boydie: NO
 by: Richard Kettlewell - Fri, 15 Sep 2023 11:07 UTC

The Natural Philosopher <tnp@invalid.invalid> writes:
> dir = opendir(VOLATILE_DIR);
>
> if(!dir)
> return;
> while ((dp = readdir (dir)) != NULL)
> {
> filename=dp->d_name;
> // skip known bollocks
> if(!strcmp(filename, "." ) || !strcmp(filename, ".." )
> || !strcmp(filename, "relays.dat" ))
> continue;
> // construct full path
> sprintf(fullname,"%s/%s",VOLATILE_DIR,filename);

Possible write overrun here.

> stat(fullname,&stats);// get tfile times
> if(time(NULL)-stats.st_ctime >1800) // skip files older than half an hour
> continue;
> len=strlen(filename);
> if(strncmp(filename+len-4, ".dat",4)) // .dat file but not relays.dat
> continue;

Possible read under-run here. (But if it crashes then you’d expect
SIGSEGV rather than SIGABRT, so that’s probably not the issue.)

> fp=fopen(fullname, "r");
> if(fp==0) //file has disappeared?
> continue;
> len=fread(filbuf,1,255,fp);

I don’t think the declaration of filbuf has been posted, so there’s a
possible write overrun if it’s less than 255 bytes.

--
https://www.greenend.org.uk/rjk/

Re: Weird code crash

<ygaedizitb9.fsf@akutech.de>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7180&group=comp.sys.raspberry-pi#7180

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: ralf...@gmx.de (Ralf Fassel)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 13:12:26 +0200
Lines: 52
Message-ID: <ygaedizitb9.fsf@akutech.de>
References: <udu5c4$2gutd$1@dont-email.me> <ygamsxoixhx.fsf@akutech.de>
<udv982$2lppb$4@dont-email.me> <ygail8biyxm.fsf@akutech.de>
<ue1b2v$36ji2$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net ScfeDELMasJ4PWQ/sO7bZgtWzrUK17BsdE/edzX33htp8qn10=
Cancel-Lock: sha1:yxi0ynLJiUF0UU9iEoJVoFG8lxs= sha1:M+dg6yCkA5GNEGpFKLosLwB3VTY= sha256:/5oSFO0C0H+UsS/mUVCpOvCYv3ly5hrbuj4Ht3SWx/c=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
 by: Ralf Fassel - Fri, 15 Sep 2023 11:12 UTC

You trust the contents of 'outside'-files very much, do you? ;-)
I don't know who can create files in the directory you're scanning, but
not *assuring* the input you expect is another possible cause for
problems...

* The Natural Philosopher <tnp@invalid.invalid>
| > Further possibilities:
| > - how is 'filbuf' used after the fread()? If you use it as C-string, make
| > sure it is 0-terminated (fread() won't do that for you). Maybe use
| > fgets(3) instead?
| >
| dir = opendir(VOLATILE_DIR);
>
| if(!dir)
| return;
| while ((dp = readdir (dir)) != NULL)
[looks good, error checks for stat() et al couldn't hurt]
--<snip-snip>--
| if(len=strncmp(filbuf,"ZONE",4)) //supposed to reject
| a file whose contents do not start with ZONE
| goto baddata;
|
| // looking very much like a temperature file
| i=(int)filbuf[4] -'1'; // this is our zone from
| "ZONE2" etc. 1-4 is zone but index is 0-3 so subtract
| '1'

The access of filbuf[4] is ok (since you checked that there are at least
4 characters in the file), but what if nothing follows after the 'ZONE',
or ZONE is followed by anything but [1-4]?
=> Assert that 'i' is in the valid index range here, before using it as
index into other arrays.

| p=strstr(filbuf,"\n");
| if(p)
| {
| p++;
| if(q=strstr(p,"\n"))
| {
| *q++=0;
| thermometers[i].name=strdup(p); //
| make a copy of the name and attach it
| to our thermometer structure

Memory leak if thermometers[i].name already contains something.

Other than that, I really would have it running under a debugger or
valgrind, since then *if* it crashes, you *know* *where* in your code it
crashes.

Good luck hunting!
R'

Re: Weird code crash

<wwvedizk7us.fsf@LkoBDZeT.terraraq.uk>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7181&group=comp.sys.raspberry-pi#7181

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!news.nntp4.net!nntp.terraraq.uk!.POSTED.tunnel.sfere.anjou.terraraq.org.uk!not-for-mail
From: inva...@invalid.invalid (Richard Kettlewell)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 12:12:59 +0100
Organization: terraraq NNTP server
Message-ID: <wwvedizk7us.fsf@LkoBDZeT.terraraq.uk>
References: <udu5c4$2gutd$1@dont-email.me>
<VGs*53kqz@news.chiark.greenend.org.uk>
<wwv8r99b1ui.fsf@LkoBDZeT.terraraq.uk> <udusa0$2k1q0$1@dont-email.me>
<wwvpm2jgagf.fsf@LkoBDZeT.terraraq.uk> <ue1966$36a0j$1@dont-email.me>
<SGs*X4qqz@news.chiark.greenend.org.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: innmantic.terraraq.uk; posting-host="tunnel.sfere.anjou.terraraq.org.uk:172.17.207.6";
logging-data="95884"; mail-complaints-to="usenet@innmantic.terraraq.uk"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:5QxXZ3iHXWRoCkooy2WNiCCm4s4=
X-Face: h[Hh-7npe<<b4/eW[]sat,I3O`t8A`(ej.H!F4\8|;ih)`7{@:A~/j1}gTt4e7-n*F?.Rl^
F<\{jehn7.KrO{!7=:(@J~]<.[{>v9!1<qZY,{EJxg6?Er4Y7Ng2\Ft>Z&W?r\c.!4DXH5PWpga"ha
+r0NzP?vnz:e/knOY)PI-
X-Boydie: NO
 by: Richard Kettlewell - Fri, 15 Sep 2023 11:12 UTC

Theo <theom+news@chiark.greenend.org.uk> writes:
> The Natural Philosopher <tnp@invalid.invalid> wrote:
>> Tell me in what way a corrupted - say - libc file, or a faulty bit of
>> memory would show up in the kernel logs?
>
> Well, it could be a cosmic ray. The Pi doesn't have ECC memory to it's
> possible to bit-flip in RAM or storage without it noticing. I don't know
> which part of the galaxy you inhabit, but cosmic rays are rare enough down
> here that random bit flips like this don't happen often - ballpark once a
> year for a server (which has a much greater surface area to absorb them than
> a Pi).

I’ve seen one inarguable random bit flip in several decades. In that
case the behavior was deterministic - chiark’s /bin/ls had got a
single-bit error, and caching meant it crashed _every_ time anyone ran
it.

Maybe TNP has taken a trip to Sizewell?

--
https://www.greenend.org.uk/rjk/

Re: Weird code crash

<ue1eki$36hof$2@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7182&group=comp.sys.raspberry-pi#7182

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Pancho.J...@proton.me (Pancho)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 12:19:45 +0100
Organization: A noiseless patient Spider
Lines: 51
Message-ID: <ue1eki$36hof$2@dont-email.me>
References: <udu5c4$2gutd$1@dont-email.me>
<VGs*53kqz@news.chiark.greenend.org.uk>
<wwv8r99b1ui.fsf@LkoBDZeT.terraraq.uk> <udusa0$2k1q0$1@dont-email.me>
<wwvpm2jgagf.fsf@LkoBDZeT.terraraq.uk> <ue1966$36a0j$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 15 Sep 2023 11:19:46 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="86b60095682714d3252d577ce95b8686";
logging-data="3360527"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+X44ce7q+icr26ei2CLVI249p06bgdcms="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:I21M1ifIuywYdNyw//CoJZVkXto=
In-Reply-To: <ue1966$36a0j$1@dont-email.me>
Content-Language: en-GB
 by: Pancho - Fri, 15 Sep 2023 11:19 UTC

On 15/09/2023 10:46, The Natural Philosopher wrote:
> On 15/09/2023 08:30, Richard Kettlewell wrote:
>> The Natural Philosopher <tnp@invalid.invalid> writes:
>>> On 14/09/2023 09:23, Richard Kettlewell wrote:
>>>> Also:
>>>> * I would also have a look at the kernel log; if it’s a
>>>>    kernel-generated signal then there’s usually a log message about it.
>>>>
>>> Nothing in kern.log after the boot process finishes.
>>
>> Most likely a bug in your program then.
>>
>>>> * Run the application under valgrind; depending what the issue is, that
>>>>     will provide a backtrace and perhaps more detailed information.
>>>> If it
>>>>     is a memory corruption issue then it may identify where the
>>>> corruption
>>>>     happens, rather than the later point where malloc failed a
>>>> consistency
>>>>     check (or whatever it is).
>>>>
>>>> Using valgrind (and/or compiler sanitizer features) is a good idea
>>>> even before running into trouble, really.
>>>
>>> The strange thing is that it failed once after a minute, then I
>>> rebooted and it failed after 20 minutes, and its been running several
>>> days now with no issues at all.
>>>
>>> I am not sure valgrind would actually help unless it failed.
>>
>> It’s extremely good at identifying memory corruption even in cases where
>> that doesn’t immediately lead to a crash; that’s what it’s for.  But if
>> it doesn’t, you leave it running until the crash happens.
>>
> Well that is an option for sure.
>

Valgrind seems to be a modern version of Purify, which was absolutely
essential, when I programmed C 30 years ago.

Personally, I want to run with full debug, stack trace, logging,
exception handling, and bounds checking turned on all the time, even in
production. Which is why I generally use a modern language like C# or Java.

I'm with you on Python being rubbish, but have you considered something
like Rust? That gives you the benefit of a modern language, without
Garbage Collection pauses (if you care), or the need for a runtime
environment (like Python, C#, and Java).

Even using C++, would give you exception handling. C++ won't force you
to go too far, If you don't want to.

Re: Weird code crash

<ue1eth$36hof$3@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7183&group=comp.sys.raspberry-pi#7183

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Pancho.J...@proton.me (Pancho)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 12:24:32 +0100
Organization: A noiseless patient Spider
Lines: 18
Message-ID: <ue1eth$36hof$3@dont-email.me>
References: <udu5c4$2gutd$1@dont-email.me>
<VGs*53kqz@news.chiark.greenend.org.uk>
<wwv8r99b1ui.fsf@LkoBDZeT.terraraq.uk> <udusa0$2k1q0$1@dont-email.me>
<wwvpm2jgagf.fsf@LkoBDZeT.terraraq.uk> <ue1966$36a0j$1@dont-email.me>
<SGs*X4qqz@news.chiark.greenend.org.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 15 Sep 2023 11:24:33 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="86b60095682714d3252d577ce95b8686";
logging-data="3360527"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19HLMI+JUn9XniGUR60QZC8a5DezcUov1k="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.14.0
Cancel-Lock: sha1:91+jqWbGEAH4QKJdD1lFOjamf9o=
In-Reply-To: <SGs*X4qqz@news.chiark.greenend.org.uk>
Content-Language: en-GB
 by: Pancho - Fri, 15 Sep 2023 11:24 UTC

On 15/09/2023 11:58, Theo wrote:
> In comp.sys.raspberry-pi The Natural Philosopher <tnp@invalid.invalid> wrote:
>> Tell me in what way a corrupted - say - libc file, or a faulty bit of
>> memory would show up in the kernel logs?
>
> Well, it could be a cosmic ray. The Pi doesn't have ECC memory to it's
> possible to bit-flip in RAM or storage without it noticing. I don't know
> which part of the galaxy you inhabit, but cosmic rays are rare enough down
> here that random bit flips like this don't happen often - ballpark once a
> year for a server (which has a much greater surface area to absorb them than
> a Pi).

Lol! I thought cosmic rays when I read this thread.

Decades of having my nose rubbed in the shit of my own stupidity, I
guess. :-)

Re: Weird code crash

<ue1h79$37oi5$1@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7184&group=comp.sys.raspberry-pi#7184

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tnp...@invalid.invalid (The Natural Philosopher)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 13:03:53 +0100
Organization: A little, after lunch
Lines: 15
Message-ID: <ue1h79$37oi5$1@dont-email.me>
References: <udu5c4$2gutd$1@dont-email.me>
<VGs*53kqz@news.chiark.greenend.org.uk>
<wwv8r99b1ui.fsf@LkoBDZeT.terraraq.uk> <udusa0$2k1q0$1@dont-email.me>
<wwvpm2jgagf.fsf@LkoBDZeT.terraraq.uk> <ue1966$36a0j$1@dont-email.me>
<SGs*X4qqz@news.chiark.greenend.org.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 15 Sep 2023 12:03:53 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="45cc9fa6fbd082e7ff5e80bf84879cb5";
logging-data="3400261"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18o/s7EEFbirfq+n0q50stveS0D33mLgME="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:WEjpF65scmtyAqwnaT64Ruylii8=
Content-Language: en-GB
In-Reply-To: <SGs*X4qqz@news.chiark.greenend.org.uk>
 by: The Natural Philosop - Fri, 15 Sep 2023 12:03 UTC

On 15/09/2023 11:58, Theo wrote:
> What are a well-known class of bugs are concurrency/timing races and memory
> safety violations. Which is odds-on what's happening here, especially given
> we've already picked up on potentially risky code like failing to check for
> NULL from fopen().

No, I do check it.

--
“It is dangerous to be right in matters on which the established
authorities are wrong.”

― Voltaire, The Age of Louis XIV

Re: Weird code crash

<ue1hbd$37oi5$2@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7185&group=comp.sys.raspberry-pi#7185

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tnp...@invalid.invalid (The Natural Philosopher)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 13:06:05 +0100
Organization: A little, after lunch
Lines: 36
Message-ID: <ue1hbd$37oi5$2@dont-email.me>
References: <udu5c4$2gutd$1@dont-email.me>
<VGs*53kqz@news.chiark.greenend.org.uk>
<wwv8r99b1ui.fsf@LkoBDZeT.terraraq.uk> <udusa0$2k1q0$1@dont-email.me>
<wwvpm2jgagf.fsf@LkoBDZeT.terraraq.uk> <ue1966$36a0j$1@dont-email.me>
<SGs*X4qqz@news.chiark.greenend.org.uk>
<wwvedizk7us.fsf@LkoBDZeT.terraraq.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 15 Sep 2023 12:06:05 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="45cc9fa6fbd082e7ff5e80bf84879cb5";
logging-data="3400261"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Sm/V7HHLIpf+wIym+iEDt8oYgjF1pyvg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:AyjzzhWTc4TEQRaNvup8nQvFn/Y=
In-Reply-To: <wwvedizk7us.fsf@LkoBDZeT.terraraq.uk>
Content-Language: en-GB
 by: The Natural Philosop - Fri, 15 Sep 2023 12:06 UTC

On 15/09/2023 12:12, Richard Kettlewell wrote:
> Theo <theom+news@chiark.greenend.org.uk> writes:
>> The Natural Philosopher <tnp@invalid.invalid> wrote:
>>> Tell me in what way a corrupted - say - libc file, or a faulty bit of
>>> memory would show up in the kernel logs?
>>
>> Well, it could be a cosmic ray. The Pi doesn't have ECC memory to it's
>> possible to bit-flip in RAM or storage without it noticing. I don't know
>> which part of the galaxy you inhabit, but cosmic rays are rare enough down
>> here that random bit flips like this don't happen often - ballpark once a
>> year for a server (which has a much greater surface area to absorb them than
>> a Pi).
>
> I’ve seen one inarguable random bit flip in several decades. In that
> case the behavior was deterministic - chiark’s /bin/ls had got a
> single-bit error, and caching meant it crashed _every_ time anyone ran
> it.
>
> Maybe TNP has taken a trip to Sizewell?
>

LOL!

Nope.

I am trying some stuff out to try and get it to fail *consistently*.

I dont feel its hugely profitable to attempt to debug it when most of
the time its not doing anything wrong

--
“It is dangerous to be right in matters on which the established
authorities are wrong.”

― Voltaire, The Age of Louis XIV

Re: Weird code crash

<ue1i3g$37u15$1@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7186&group=comp.sys.raspberry-pi#7186

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tnp...@invalid.invalid (The Natural Philosopher)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 13:18:55 +0100
Organization: A little, after lunch
Lines: 82
Message-ID: <ue1i3g$37u15$1@dont-email.me>
References: <udu5c4$2gutd$1@dont-email.me> <ygamsxoixhx.fsf@akutech.de>
<udv982$2lppb$4@dont-email.me> <ygail8biyxm.fsf@akutech.de>
<ue1b2v$36ji2$1@dont-email.me> <wwvjzsrk83f.fsf@LkoBDZeT.terraraq.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 15 Sep 2023 12:18:56 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="45cc9fa6fbd082e7ff5e80bf84879cb5";
logging-data="3405861"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/4yak0+5Voj8nMVeWyioLZFAFHEXrzkZc="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:ybDsTYHiCYOpl4p0/C2VHQRr3QI=
Content-Language: en-GB
In-Reply-To: <wwvjzsrk83f.fsf@LkoBDZeT.terraraq.uk>
 by: The Natural Philosop - Fri, 15 Sep 2023 12:18 UTC

On 15/09/2023 12:07, Richard Kettlewell wrote:
> The Natural Philosopher <tnp@invalid.invalid> writes:
>> dir = opendir(VOLATILE_DIR);
>>
>> if(!dir)
>> return;
>> while ((dp = readdir (dir)) != NULL)
>> {
>> filename=dp->d_name;
>> // skip known bollocks
>> if(!strcmp(filename, "." ) || !strcmp(filename, ".." )
>> || !strcmp(filename, "relays.dat" ))
>> continue;
>> // construct full path
>> sprintf(fullname,"%s/%s",VOLATILE_DIR,filename);
>
> Possible write overrun here.
The filenames never change length.

>
>> stat(fullname,&stats);// get tfile times
>> if(time(NULL)-stats.st_ctime >1800) // skip files older than half an hour
>> continue;
>> len=strlen(filename);
>> if(strncmp(filename+len-4, ".dat",4)) // .dat file but not relays.dat
>> continue;
>
> Possible read under-run here. (But if it crashes then you’d expect
> SIGSEGV rather than SIGABRT, so that’s probably not the issue.)
>
>> fp=fopen(fullname, "r");
>> if(fp==0) //file has disappeared?
>> continue;
>> len=fread(filbuf,1,255,fp);
>
> I don’t think the declaration of filbuf has been posted, so there’s a
> possible write overrun if it’s less than 255 bytes.
>
>
char filbuf[256];
char fullname[256];

The fullname is of the form

/var/www/data/volatile/192.168.0.xx.dat

There are no other files apart from 'relay.dat' in that directory.

I mean you are all throwing noob bugs at me. Yes, in 1984 that's the
sort of shit I used to write. Not these days.

I have a drawer full of T shirts marked 'buffer overrun' 'alloc without
free' 'fopen without fclose'.

The fact is the memory footprint does not increase. So there are no
obvious or simple memory leaks.

I've absolutely covered every error case mentioned here in the one case
of the files that get written and read every few seconds.

It occurs to me that this behaviour started when I made it autoboot
under systemd as well.

Since the consensus seems to be it isn't hardware, or file corruption, I
am trying it launched manually to see if it crashes or not.

Systemd does seem to wrap things in resource limits, and start with a
slightly different ENV although I cant see that any are being exceeded.

If it wasn't a daemon I would expect it to segfault and show that on
screen. I could run it without daemonising it as well.

So lots of options to try.

As well as soft debuggers.

--
“It is dangerous to be right in matters on which the established
authorities are wrong.”

― Voltaire, The Age of Louis XIV

Re: Weird code crash

<ue1idg$37u15$2@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7187&group=comp.sys.raspberry-pi#7187

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tnp...@invalid.invalid (The Natural Philosopher)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 13:24:16 +0100
Organization: A little, after lunch
Lines: 85
Message-ID: <ue1idg$37u15$2@dont-email.me>
References: <udu5c4$2gutd$1@dont-email.me> <ygamsxoixhx.fsf@akutech.de>
<udv982$2lppb$4@dont-email.me> <ygail8biyxm.fsf@akutech.de>
<ue1b2v$36ji2$1@dont-email.me> <ygaedizitb9.fsf@akutech.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 15 Sep 2023 12:24:16 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="45cc9fa6fbd082e7ff5e80bf84879cb5";
logging-data="3405861"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19eE0M8Z0O4sEyCJY6Ia8HwWBU7crnBQ4Y="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:FpZwNReRK5L4eUVaJshK6N2TQ7Q=
Content-Language: en-GB
In-Reply-To: <ygaedizitb9.fsf@akutech.de>
 by: The Natural Philosop - Fri, 15 Sep 2023 12:24 UTC

On 15/09/2023 12:12, Ralf Fassel wrote:
> You trust the contents of 'outside'-files very much, do you? ;-)
> I don't know who can create files in the directory you're scanning, but
> not *assuring* the input you expect is another possible cause for
> problems...
>
> * The Natural Philosopher <tnp@invalid.invalid>
> | > Further possibilities:
> | > - how is 'filbuf' used after the fread()? If you use it as C-string, make
> | > sure it is 0-terminated (fread() won't do that for you). Maybe use
> | > fgets(3) instead?
> | >
> | dir = opendir(VOLATILE_DIR);
>>
> | if(!dir)
> | return;
> | while ((dp = readdir (dir)) != NULL)
> [looks good, error checks for stat() et al couldn't hurt]
> --<snip-snip>--
> | if(len=strncmp(filbuf,"ZONE",4)) //supposed to reject
> | a file whose contents do not start with ZONE
> | goto baddata;
> |
> | // looking very much like a temperature file
> | i=(int)filbuf[4] -'1'; // this is our zone from
> | "ZONE2" etc. 1-4 is zone but index is 0-3 so subtract
> | '1'
>
> The access of filbuf[4] is ok (since you checked that there are at least
> 4 characters in the file), but what if nothing follows after the 'ZONE',
> or ZONE is followed by anything but [1-4]?

That cannot happen. Its hard wired into the code that writes the file

> => Assert that 'i' is in the valid index range here, before using it as
> index into other arrays.
>
> | p=strstr(filbuf,"\n");
> | if(p)
> | {
> | p++;
> | if(q=strstr(p,"\n"))
> | {
> | *q++=0;
> | thermometers[i].name=strdup(p); //
> | make a copy of the name and attach it
> | to our thermometer structure
>
> Memory leak if thermometers[i].name already contains something.
>
further up the line...

bzero(filbuf,sizeof(filbuf));
/** first thing to do is clean any allocated memory used to store
values. **/
for(i=0;i<NUMBER_RELAYS;i++)
free(thermometers[i].name);

> Other than that, I really would have it running under a debugger or
> valgrind, since then *if* it crashes, you *know* *where* in your code it
> crashes.
>
Last resort. I have to learn how to *use* those tools.
Right now I am working on other stuff and am content to change one thing
at a time to see if that makes any difference.

That is a low user time strategy.

> Good luck hunting!
> R'

Thank you. The input has been valuable. And I now have further
strategies in reserve.

As with all intermittent faults, the thing you need most is a reliable
way to make the fault occur.

--
"The great thing about Glasgow is that if there's a nuclear attack it'll
look exactly the same afterwards."

Billy Connolly

Re: Weird code crash

<ue1k47$383vo$2@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7188&group=comp.sys.raspberry-pi#7188

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: no...@thanks.net (candycanearter07)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 07:53:27 -0500
Organization: A noiseless patient Spider
Lines: 31
Message-ID: <ue1k47$383vo$2@dont-email.me>
References: <udu5c4$2gutd$1@dont-email.me> <ygamsxoixhx.fsf@akutech.de>
<udv982$2lppb$4@dont-email.me> <op.2a9vj3yca3w0dxdave@hodgins.homeip.net>
<udvk6j$2nupi$2@dont-email.me> <udvnk8$2oip3$1@dont-email.me>
<ue17dj$360b0$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 15 Sep 2023 12:53:27 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="4c8dfdc039cb04121cfffa9b96c3520b";
logging-data="3411960"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18rGntWLU3Ys0eRESZfNeojdodlBuU6qTadMiy0TtRPgw=="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:C+mhpWh84hbweMcn2sW/hBAl3UI=
Content-Language: en-US
In-Reply-To: <ue17dj$360b0$2@dont-email.me>
 by: candycanearter07 - Fri, 15 Sep 2023 12:53 UTC

On 9/15/23 04:16, The Natural Philosopher wrote:
> On 14/09/2023 20:40, candycanearter07 wrote:
>> On 9/14/23 13:42, The Natural Philosopher wrote:
>>> I assumed that the thing would have done its own fsck on every boot
>>> anyway...isnt that a debian default?
>>
>> Pretty sure it's a standard, my arch install has it set.
>>>
>>> (The sixth fields are 2 and 1 respectively for the file systems)
>>>
>>>
>>> PARTUUID=b8c9fbb7-01  /boot           vfat    defaults
>>> 0       2
>>> PARTUUID=b8c9fbb7-02  /               ext4    defaults,noatime
>>> 0       1
>>>
>>
>> 1 is fsck check for the root partition and 2 is for others, right
>>
> I looked it up, it merely specifies the order I think, so you are right
> in practice.
>
>

Oh, the thing I learned was that you should always put root as 1 and
everything else as 2 ^^" but that makes more sense

--
--
user <candycane> is generated from /dev/urandom

Re: Weird code crash

<TGs*5Arqz@news.chiark.greenend.org.uk>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7189&group=comp.sys.raspberry-pi#7189

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsfeed.xs3.de!callisto.xs3.de!nntp-feed.chiark.greenend.org.uk!ewrotcd!.POSTED.chiark.greenend.org.uk!not-for-mail
From: theom+n...@chiark.greenend.org.uk (Theo)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: 15 Sep 2023 14:23:49 +0100 (BST)
Organization: University of Cambridge, England
Message-ID: <TGs*5Arqz@news.chiark.greenend.org.uk>
References: <udu5c4$2gutd$1@dont-email.me> <ygamsxoixhx.fsf@akutech.de> <udv982$2lppb$4@dont-email.me> <ygail8biyxm.fsf@akutech.de> <ue1b2v$36ji2$1@dont-email.me> <ygaedizitb9.fsf@akutech.de> <ue1idg$37u15$2@dont-email.me>
Injection-Info: chiark.greenend.org.uk; posting-host="chiark.greenend.org.uk:212.13.197.229";
logging-data="14737"; mail-complaints-to="abuse@chiark.greenend.org.uk"
User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (Linux/5.10.0-22-amd64 (x86_64))
Originator: theom@chiark.greenend.org.uk ([212.13.197.229])
 by: Theo - Fri, 15 Sep 2023 13:23 UTC

In comp.sys.raspberry-pi The Natural Philosopher <tnp@invalid.invalid> wrote:
> On 15/09/2023 12:12, Ralf Fassel wrote:
> > | {
> > | *q++=0;
> > | thermometers[i].name=strdup(p); //
> > | make a copy of the name and attach it
> > | to our thermometer structure
> >
> > Memory leak if thermometers[i].name already contains something.
> >
> further up the line...
>
> bzero(filbuf,sizeof(filbuf));
> /** first thing to do is clean any allocated memory used to store
> values. **/
> for(i=0;i<NUMBER_RELAYS;i++)
> free(thermometers[i].name);

You could get a SIGABRT if you were trying to free something that was
already freed. Are you sure those are interlocked such that for each i you
call strdup() exactly once, and subsequently free() exactly once? If there
was some code path that was breaking out of the loop or similar you might
get such behaviour.

Theo

Re: Weird code crash

<ue1nq7$39033$1@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7190&group=comp.sys.raspberry-pi#7190

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tnp...@invalid.invalid (The Natural Philosopher)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 14:56:23 +0100
Organization: A little, after lunch
Lines: 43
Message-ID: <ue1nq7$39033$1@dont-email.me>
References: <udu5c4$2gutd$1@dont-email.me> <ygamsxoixhx.fsf@akutech.de>
<udv982$2lppb$4@dont-email.me> <ygail8biyxm.fsf@akutech.de>
<ue1b2v$36ji2$1@dont-email.me> <ygaedizitb9.fsf@akutech.de>
<ue1idg$37u15$2@dont-email.me> <TGs*5Arqz@news.chiark.greenend.org.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 15 Sep 2023 13:56:23 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="45cc9fa6fbd082e7ff5e80bf84879cb5";
logging-data="3440739"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/lFIHAK20n46KDZlbiEZrfFCCZDmLf1V8="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.15.1
Cancel-Lock: sha1:Aw6qGeAQQsyi13jBE6RsvjI78bk=
In-Reply-To: <TGs*5Arqz@news.chiark.greenend.org.uk>
Content-Language: en-GB
 by: The Natural Philosop - Fri, 15 Sep 2023 13:56 UTC

On 15/09/2023 14:23, Theo wrote:
> In comp.sys.raspberry-pi The Natural Philosopher <tnp@invalid.invalid> wrote:
>> On 15/09/2023 12:12, Ralf Fassel wrote:
>>> | {
>>> | *q++=0;
>>> | thermometers[i].name=strdup(p); //
>>> | make a copy of the name and attach it
>>> | to our thermometer structure
>>>
>>> Memory leak if thermometers[i].name already contains something.
>>>
>> further up the line...
>>
>> bzero(filbuf,sizeof(filbuf));
>> /** first thing to do is clean any allocated memory used to store
>> values. **/
>> for(i=0;i<NUMBER_RELAYS;i++)
>> free(thermometers[i].name);
>
> You could get a SIGABRT if you were trying to free something that was
> already freed. Are you sure those are interlocked such that for each i you
> call strdup() exactly once, and subsequently free() exactly once? If there
> was some code path that was breaking out of the loop or similar you might
> get such behaviour.
>
Hmm. I free the pointers even for relay zones that don't have
thermometers, whose pointers are 0. That isn't an issue.

But that might be a remotely possible issue. I dont zero the pointers
after freeing them as far as I can tell. The silly thing is that this
program doesn't use the name anyway.

Its used elsewhere
Well I don't think its an issue, but I can zero the pointers anyway
after free()ing

> Theo

--
"I guess a rattlesnake ain't risponsible fer bein' a rattlesnake, but ah
puts mah heel on um jess the same if'n I catches him around mah chillun".

Re: Weird code crash

<yga8r97ikyf.fsf@akutech.de>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=7191&group=comp.sys.raspberry-pi#7191

  copy link   Newsgroups: comp.os.linux.misc comp.sys.raspberry-pi
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: ralf...@gmx.de (Ralf Fassel)
Newsgroups: comp.os.linux.misc,comp.sys.raspberry-pi
Subject: Re: Weird code crash
Date: Fri, 15 Sep 2023 16:12:56 +0200
Lines: 35
Message-ID: <yga8r97ikyf.fsf@akutech.de>
References: <udu5c4$2gutd$1@dont-email.me> <ygamsxoixhx.fsf@akutech.de>
<udv982$2lppb$4@dont-email.me> <ygail8biyxm.fsf@akutech.de>
<ue1b2v$36ji2$1@dont-email.me> <ygaedizitb9.fsf@akutech.de>
<ue1idg$37u15$2@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net D/O9UAITYUW00Pr5LP3rPgcuecWBOl1gZrc3wxm5c6Id85e7g=
Cancel-Lock: sha1:Vz3GvqjJsBbhOwpz0fjERHUtlUg= sha1:UAYvchrrbGvxZp7j9+F/Clciqg0= sha256:rJdmX4SckbWLAOSOZGXZPm6pbi29ad/XBd8Yj4HUcj4=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
 by: Ralf Fassel - Fri, 15 Sep 2023 14:12 UTC

* The Natural Philosopher <tnp@invalid.invalid>
| > | if(len=strncmp(filbuf,"ZONE",4)) //supposed to reject
| > | a file whose contents do not start with ZONE
| > | goto baddata;
| > |
| > | // looking very much like a temperature file
| > | i=(int)filbuf[4] -'1'; // this is our zone from
| > | "ZONE2" etc. 1-4 is zone but index is 0-3 so subtract
| > | '1'
| > The access of filbuf[4] is ok (since you checked that there are at
| > least
| > 4 characters in the file), but what if nothing follows after the 'ZONE',
| > or ZONE is followed by anything but [1-4]?
>
| That cannot happen. Its hard wired into the code that writes the file

Depending on the permissions of VOLATILE_DIR, it *might* be possible
that *anybody* can drop files in there. Save some "// skip known
bollocks", you just scan every file in VOLATILE_DIR. If I were an
attacker, I sure would try to use that vector, regardless whether the
program in question runs with elevated permissions or not ;-)

| > Other than that, I really would have it running under a debugger or
| > valgrind, since then *if* it crashes, you *know* *where* in your code it
| > crashes.
| >
| Last resort. I have to learn how to *use* those tools.

With valgrind, it is as easy as putting 'valgrind' in front of the
commandline you use to start your program. With gdb, it is a tiny bit
more complicated, agreed. But since these tools are worth learning
anyway for any programmer, the time invested in learning them is not
wasted.

R'

Pages:123
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor