Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

What hath Bob wrought?


devel / comp.lang.c / slurp/unslurp

SubjectAuthor
* slurp/unslurpmuta...@gmail.com
+* Re: slurp/unslurpKaz Kylheku
|`* Re: slurp/unslurpmuta...@gmail.com
| `* Re: slurp/unslurpKaz Kylheku
|  +* Re: slurp/unslurpmuta...@gmail.com
|  |`- Re: slurp/unslurpBart
|  `- Re: slurp/unslurpChris M. Thomasson
+- Re: slurp/unslurpThiago Adams
`* Re: slurp/unslurpKeith Thompson
 +* Re: slurp/unslurpmuta...@gmail.com
 |`- Re: slurp/unslurpKeith Thompson
 `* Re: slurp/unslurpMalcolm McLean
  +- Re: slurp/unslurpRichard Damon
  `* Re: slurp/unslurpBart
   `* Re: slurp/unslurpBen Bacarisse
    +- Re: slurp/unslurpBart
    +- Re: slurp/unslurpMalcolm McLean
    `- Re: slurp/unslurpolcott

1
slurp/unslurp

<26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16063&group=comp.lang.c#16063

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:411:: with SMTP id 17mr24775191qkp.481.1620141738301;
Tue, 04 May 2021 08:22:18 -0700 (PDT)
X-Received: by 2002:a05:620a:1456:: with SMTP id i22mr24627263qkl.400.1620141738110;
Tue, 04 May 2021 08:22:18 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Tue, 4 May 2021 08:22:17 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=202.169.113.201; posting-account=CeHKkQoAAAAowY1GfiJYG55VVc0s1zaG
NNTP-Posting-Host: 202.169.113.201
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com>
Subject: slurp/unslurp
From: mutazi...@gmail.com (muta...@gmail.com)
Injection-Date: Tue, 04 May 2021 15:22:18 +0000
Content-Type: text/plain; charset="UTF-8"
 by: muta...@gmail.com - Tue, 4 May 2021 15:22 UTC

I was wondering if there is any problem with adding
a slurp() function to C90 which would do:

fopen()
fseek(SEEK_END)
malloc() of file size
fread of entire file
(not sure about fclose, probably don't do that)

And return a pointer to the malloc() region.

It would be unspecified what happens when two processes
do a slurp of the same file, with the intention that on some
implementations it would mean mmap() MAP_SHARED.

And no ability to extend the file or truncate the file.

And an unslurp to reverse the process, writing the memory
to disk, and again, in an mmap() implementation, just do
an munmap().

I guess slurp() should take "r+b" if you wish to unslurp,
and file is updated in-place rather than being rewritten.
A slurp() with "rb" would not cause a write to be done
on unslurp().

It would be trivial to write a conforming implementation,
while then providing shared memory for systems that have
the required virtual memory smarts.

Perhaps this already exists?

BFN. Paul.

Re: slurp/unslurp

<20210504083041.263@kylheku.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16065&group=comp.lang.c#16065

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: 563-365-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: slurp/unslurp
Date: Tue, 4 May 2021 16:30:56 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 69
Message-ID: <20210504083041.263@kylheku.com>
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com>
Injection-Date: Tue, 4 May 2021 16:30:56 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="1264d887281844d551afa21ff340f1ec";
logging-data="8045"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19G9gPmAO19JDjGkNaLH2GeI3TJPShnl3U="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:+CsrJ5A5Y4qUfumK2iwtAtjFUx4=
 by: Kaz Kylheku - Tue, 4 May 2021 16:30 UTC

On 2021-05-04, muta...@gmail.com <mutazilah@gmail.com> wrote:
> I was wondering if there is any problem with adding
> a slurp() function to C90 which would do:

Yes, namely that it is 2021. CX for any X <= 2021
is done and dusted.

> fopen()
> fseek(SEEK_END)
> malloc() of file size
> fread of entire file
> (not sure about fclose, probably don't do that)
>
> And return a pointer to the malloc() region.

The best way to ensure the existence of such a function so
that your program can rely on it today is to specify it in
code and add it to your program.
>
> It would be unspecified what happens when two processes
> do a slurp of the same file, with the intention that on some
> implementations it would mean mmap() MAP_SHARED.

It is dishonest to name such a view of the file "slurp", since it isn't
making a copy of the file.

> And no ability to extend the file or truncate the file.

This severely reduces the utility of slurp. The conents of the buffer
should have an independent duration from those of the file so that the
program can rewrite the file, replacing it with a datum which is
calculated from the contents of the slurp buffer.

> And an unslurp to reverse the process, writing the memory
> to disk, and again, in an mmap() implementation, just do
> an munmap().
>
> I guess slurp() should take "r+b" if you wish to unslurp,
> and file is updated in-place rather than being rewritten.
> A slurp() with "rb" would not cause a write to be done
> on unslurp().

This encourages implementing the file transformation by mutating the
buffer in place, while discouraging functional approaches such a
calculating a new buffer from the existing one. It places unreasonable
restrictions on the transformation, not permitting it to extend or
shrink the file. It places unreasonable restrictions on the source and
destination of the update, requiring them to be the same object. What if
the application needs to tweak the data and put it into a diffrent file?

There are considerations of integrity: operations on a writable map are
automatically flushed to the file. If the operation does not complete
by reaching unslurp, the file can be left in a partially updated state,
with no backup. Generally you want to redirect the transformation to a
temporary file and then atomically (if possible) replace the original
file with the successfully produced temp.

The unslurp function can instead take a name argument, and simply open
or create the file for writing, truncate it, if necessary, and then dump
the buffer.

> It would be trivial to write a conforming implementation,

Only to waste effort working around its bad requirements whenever
it is used.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Re: slurp/unslurp

<9c57ec0b-4222-4df3-a891-5ff3fe09f9a4n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16066&group=comp.lang.c#16066

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:1121:: with SMTP id p1mr13556584qkk.299.1620155168670;
Tue, 04 May 2021 12:06:08 -0700 (PDT)
X-Received: by 2002:a37:8181:: with SMTP id c123mr26742315qkd.287.1620155168449;
Tue, 04 May 2021 12:06:08 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Tue, 4 May 2021 12:06:08 -0700 (PDT)
In-Reply-To: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=189.6.254.153; posting-account=xFcAQAoAAAAoWlfpQ6Hz2n-MU9fthxbY
NNTP-Posting-Host: 189.6.254.153
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9c57ec0b-4222-4df3-a891-5ff3fe09f9a4n@googlegroups.com>
Subject: Re: slurp/unslurp
From: thiago.a...@gmail.com (Thiago Adams)
Injection-Date: Tue, 04 May 2021 19:06:08 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Thiago Adams - Tue, 4 May 2021 19:06 UTC

On Tuesday, May 4, 2021 at 12:22:24 PM UTC-3, muta...@gmail.com wrote:
> I was wondering if there is any problem with adding
> a slurp() function to C90 which would do:
>
> fopen()
> fseek(SEEK_END)
> malloc() of file size
> fread of entire file

I am using this..
http://thradams.com/readfile.htm

It uses stat to get the size and it skips BOM if present.

Re: slurp/unslurp

<87h7ji9uhc.fsf@nosuchdomain.example.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16067&group=comp.lang.c#16067

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: slurp/unslurp
Date: Tue, 04 May 2021 12:20:15 -0700
Organization: None to speak of
Lines: 28
Message-ID: <87h7ji9uhc.fsf@nosuchdomain.example.com>
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="822ef5e91db6b5077a2495b40917cd6a";
logging-data="21684"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18MKOssLelM/St79VS1akXr"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
Cancel-Lock: sha1:JL2k1Ibg01XVkvC9ORT+PI37250=
sha1:7fzvnttqEXZHh3M+FOLYoPif0Gk=
 by: Keith Thompson - Tue, 4 May 2021 19:20 UTC

"muta...@gmail.com" <mutazilah@gmail.com> writes:
> I was wondering if there is any problem with adding
> a slurp() function to C90 which would do:
>
> fopen()
> fseek(SEEK_END)
> malloc() of file size
> fread of entire file
> (not sure about fclose, probably don't do that)
>
> And return a pointer to the malloc() region.

The biggest problem is that C90 is obsolete, replaced by newer editions
of the C standard. It will not be updated.

If you want to define your own specification based on C90 plus your
slurp() function, nobody will stop you. (There might be some copyright
issues; I won't get into that.)

But it's such a simple function, you can just write it and include it in
any program that needs it, or you can implement it in a library. I see
no point in adding it to the language standard, especially with so many
unresolved issues.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

Re: slurp/unslurp

<c4e9d436-f9a5-4275-8235-2ca812ab1786n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16069&group=comp.lang.c#16069

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:411:: with SMTP id 17mr26399328qkp.481.1620163925188;
Tue, 04 May 2021 14:32:05 -0700 (PDT)
X-Received: by 2002:a05:622a:1754:: with SMTP id l20mr13033797qtk.120.1620163925048;
Tue, 04 May 2021 14:32:05 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Tue, 4 May 2021 14:32:04 -0700 (PDT)
In-Reply-To: <87h7ji9uhc.fsf@nosuchdomain.example.com>
Injection-Info: google-groups.googlegroups.com; posting-host=202.169.113.201; posting-account=CeHKkQoAAAAowY1GfiJYG55VVc0s1zaG
NNTP-Posting-Host: 202.169.113.201
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com> <87h7ji9uhc.fsf@nosuchdomain.example.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c4e9d436-f9a5-4275-8235-2ca812ab1786n@googlegroups.com>
Subject: Re: slurp/unslurp
From: mutazi...@gmail.com (muta...@gmail.com)
Injection-Date: Tue, 04 May 2021 21:32:05 +0000
Content-Type: text/plain; charset="UTF-8"
 by: muta...@gmail.com - Tue, 4 May 2021 21:32 UTC

On Wednesday, May 5, 2021 at 5:20:35 AM UTC+10, Keith Thompson wrote:

> If you want to define your own specification based on C90 plus your
> slurp() function, nobody will stop you. (There might be some copyright
> issues; I won't get into that.)

It would be interesting to see ISO take me to court.
I know companies take other companies to court,
but I've never heard of a standards organization
taking an individual (or another standards
organization) to court.

They will have a bit of a problem showing lost
revenue when they refuse to sell C90. They would
at least have to show that they lost revenue from
C21 or whatever because of C90+.

But C90+ would just use C90 as a reference, ie delete
gets() and add '\e', "\e" and slurp/unslurp.

> But it's such a simple function, you can just write it and include it in
> any program that needs it, or you can implement it in a library. I see
> no point in adding it to the language standard,

Well what do you think the criteria should be for
inclusion of a new facility in an ISO language
standard? How do you suggest people read files
into memory so that they can use pointers instead
of file operations?

If they do it manually (malloc/fread) as I have done
myself for 30+ years, it prevents the file from being
made shareable, independent of the application
program itself (it still needs support from the CRT).

> especially with so many unresolved issues.

Can you list them please?

Thanks. Paul.

Re: slurp/unslurp

<cd42cc91-b656-4f67-93e9-8a0a6e0d9afan@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16071&group=comp.lang.c#16071

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ae9:e90b:: with SMTP id x11mr27835570qkf.261.1620165254109;
Tue, 04 May 2021 14:54:14 -0700 (PDT)
X-Received: by 2002:a37:9903:: with SMTP id b3mr27126117qke.17.1620165253924;
Tue, 04 May 2021 14:54:13 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!news.muarf.org!nntpfeed.proxad.net!feeder1-1.proxad.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Tue, 4 May 2021 14:54:13 -0700 (PDT)
In-Reply-To: <20210504083041.263@kylheku.com>
Injection-Info: google-groups.googlegroups.com; posting-host=202.169.113.201; posting-account=CeHKkQoAAAAowY1GfiJYG55VVc0s1zaG
NNTP-Posting-Host: 202.169.113.201
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com> <20210504083041.263@kylheku.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <cd42cc91-b656-4f67-93e9-8a0a6e0d9afan@googlegroups.com>
Subject: Re: slurp/unslurp
From: mutazi...@gmail.com (muta...@gmail.com)
Injection-Date: Tue, 04 May 2021 21:54:14 +0000
Content-Type: text/plain; charset="UTF-8"
 by: muta...@gmail.com - Tue, 4 May 2021 21:54 UTC

On Wednesday, May 5, 2021 at 2:31:09 AM UTC+10, Kaz Kylheku wrote:

> > I was wondering if there is any problem with adding
> > a slurp() function to C90 which would do:

> Yes, namely that it is 2021. CX for any X <= 2021
> is done and dusted.

That's what GNU will tell you about GCC 3.2.3 too.
And Binutils 2.14a.

But I have picked both of the above up, and have been
banging them into C90 compliance and fixing bugs
and making enhancements. I forked Hercules 3.07 too,
circa 2011.

Instead of chasing the latest greatest version of things,
I'm trying to get something that actually works to my
satisfaction.

> > fopen()
> > fseek(SEEK_END)
> > malloc() of file size
> > fread of entire file
> > (not sure about fclose, probably don't do that)
> >
> > And return a pointer to the malloc() region.

> The best way to ensure the existence of such a function so
> that your program can rely on it today is to specify it in
> code and add it to your program.

Well, I have the option of adding it to PDPCLIB.
Currently PDPCLIB is pure C90 and I have refused
to diverge one iota from it. One person complained
bitterly about me treating C90 as a sacred text,
willing to throw everything else in the computer
world under the bus.

> > It would be unspecified what happens when two processes
> > do a slurp of the same file, with the intention that on some
> > implementations it would mean mmap() MAP_SHARED.

> It is dishonest to name such a view of the file "slurp", since it isn't
> making a copy of the file.

I'm not sure I agree with that, either part, but I am happy
to rename the function.

> > And no ability to extend the file or truncate the file.

> This severely reduces the utility of slurp. The conents of the buffer
> should have an independent duration from those of the file so that the
> program can rewrite the file, replacing it with a datum which is
> calculated from the contents of the slurp buffer.

I think I agree with this, and actually we need another
two functions to do what you want.

But what I described above is, with the restrictions, was
deliberately designed to allow file sharing among
processes.

> > And an unslurp to reverse the process, writing the memory
> > to disk, and again, in an mmap() implementation, just do
> > an munmap().
> >
> > I guess slurp() should take "r+b" if you wish to unslurp,
> > and file is updated in-place rather than being rewritten.
> > A slurp() with "rb" would not cause a write to be done
> > on unslurp().

> This encourages implementing the file transformation by mutating the
> buffer in place, while discouraging functional approaches such a
> calculating a new buffer from the existing one. It places unreasonable
> restrictions on the transformation, not permitting it to extend or
> shrink the file. It places unreasonable restrictions on the source and
> destination of the update, requiring them to be the same object. What if
> the application needs to tweak the data and put it into a diffrent file?

Yes, see above. You have a non-sharing, and possibly
more common, API requirement.

> There are considerations of integrity: operations on a writable map are
> automatically flushed to the file. If the operation does not complete
> by reaching unslurp, the file can be left in a partially updated state,
> with no backup.

This is up to the implementation. The OS could be
designed to flush the data on program termination.
It is the OS that owns this data at all times anyway
if it is mmap() based.

And the file may be read-only with no integrity issues.

> Generally you want to redirect the transformation to a
> temporary file and then atomically (if possible) replace the original
> file with the successfully produced temp.

Maybe this is something you can negotiate with a
C90+ author or an OS author, for higher integrity.
I don't see a problem with simply saying
"implementation-defined" with regard to that.

> The unslurp function can instead take a name argument, and simply open
> or create the file for writing, truncate it, if necessary, and then dump
> the buffer.

Yes, that's a good idea for the second set of operations.

> > It would be trivial to write a conforming implementation,

> Only to waste effort working around its bad requirements whenever
> it is used.

Sure. That's why slurp/unslurp exist in comp.lang.c not
Sourceforge/PDPCLIB.

Also note that I'm here because a real-world requirement
popped up that forced me to finally investigate memory
mapped files. And it is Windows, so the underlying API
seems to be MapViewOfFile(). But I don't want to code
that directly, even though Sqlite has. I want to move it
out of Sqlite and into PDPCLIB. Move mmap out of
Sqlite too.

BFN. Paul.

Re: slurp/unslurp

<87czu69m9b.fsf@nosuchdomain.example.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16072&group=comp.lang.c#16072

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: slurp/unslurp
Date: Tue, 04 May 2021 15:17:52 -0700
Organization: None to speak of
Lines: 16
Message-ID: <87czu69m9b.fsf@nosuchdomain.example.com>
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com>
<87h7ji9uhc.fsf@nosuchdomain.example.com>
<c4e9d436-f9a5-4275-8235-2ca812ab1786n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="d1031b6d70822eb6be26b1f1c0530235";
logging-data="397"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Mq1kpId/FZzwt4je6wP1X"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
Cancel-Lock: sha1:aZJNlg36pj3593j+q13otlCTIYg=
sha1:MrvV5/TXfpZYVqlYR+4hxxgPmOo=
 by: Keith Thompson - Tue, 4 May 2021 22:17 UTC

"muta...@gmail.com" <mutazilah@gmail.com> writes:
> On Wednesday, May 5, 2021 at 5:20:35 AM UTC+10, Keith Thompson wrote:
>> If you want to define your own specification based on C90 plus your
>> slurp() function, nobody will stop you. (There might be some copyright
>> issues; I won't get into that.)
>
> It would be interesting to see ISO take me to court.

Not to me.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

Re: slurp/unslurp

<5e856231-86ed-46b9-8bfd-9f76ed37ea7cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16078&group=comp.lang.c#16078

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:4092:: with SMTP id f18mr1121652qko.63.1620205872689;
Wed, 05 May 2021 02:11:12 -0700 (PDT)
X-Received: by 2002:a05:620a:1036:: with SMTP id a22mr17035653qkk.186.1620205872433;
Wed, 05 May 2021 02:11:12 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 5 May 2021 02:11:12 -0700 (PDT)
In-Reply-To: <87h7ji9uhc.fsf@nosuchdomain.example.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:fc32:afe8:789f:1a87;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:fc32:afe8:789f:1a87
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com> <87h7ji9uhc.fsf@nosuchdomain.example.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5e856231-86ed-46b9-8bfd-9f76ed37ea7cn@googlegroups.com>
Subject: Re: slurp/unslurp
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Wed, 05 May 2021 09:11:12 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Malcolm McLean - Wed, 5 May 2021 09:11 UTC

On Tuesday, 4 May 2021 at 20:20:35 UTC+1, Keith Thompson wrote:
> "muta...@gmail.com" <muta...@gmail.com> writes:
> > I was wondering if there is any problem with adding
> > a slurp() function to C90 which would do:
> >
> > fopen()
> > fseek(SEEK_END)
> > malloc() of file size
> > fread of entire file
> > (not sure about fclose, probably don't do that)
> >
> > And return a pointer to the malloc() region.
> The biggest problem is that C90 is obsolete, replaced by newer editions
> of the C standard. It will not be updated.
>
> If you want to define your own specification based on C90 plus your
> slurp() function, nobody will stop you. (There might be some copyright
> issues; I won't get into that.)
>
> But it's such a simple function, you can just write it and include it in
> any program that needs it, or you can implement it in a library. I see
> no point in adding it to the language standard, especially with so many
> unresolved issues.
>
There is a point. To implement fslurp() portably, you cannot fseek() to the file
end position to get the size, fseek() back, then do an fread(), which would
be the most efficient way of getting the input.
Instead my fslurp() uses an fgetc() loop and an expanding realloc() buffer.
Since efficiency isn't a major concern for most uses of fslurp(), this doesn't
matter too much. But by putting it into the standard, implementations could
use the optimal method.

Re: slurp/unslurp

<fQukI.337591$DJ2.61372@fx42.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16079&group=comp.lang.c#16079

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!fdcspool3.netnews.com!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer04.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx42.iad.POSTED!not-for-mail
Subject: Re: slurp/unslurp
Newsgroups: comp.lang.c
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com>
<87h7ji9uhc.fsf@nosuchdomain.example.com>
<5e856231-86ed-46b9-8bfd-9f76ed37ea7cn@googlegroups.com>
From: Rich...@Damon-Family.org (Richard Damon)
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0)
Gecko/20100101 Thunderbird/78.10.0
MIME-Version: 1.0
In-Reply-To: <5e856231-86ed-46b9-8bfd-9f76ed37ea7cn@googlegroups.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Lines: 41
Message-ID: <fQukI.337591$DJ2.61372@fx42.iad>
X-Complaints-To: abuse@easynews.com
Organization: Forte - www.forteinc.com
X-Complaints-Info: Please be sure to forward a copy of ALL headers otherwise we will be unable to process your complaint properly.
Date: Wed, 5 May 2021 06:55:08 -0400
X-Received-Bytes: 2908
 by: Richard Damon - Wed, 5 May 2021 10:55 UTC

On 5/5/21 5:11 AM, Malcolm McLean wrote:
> On Tuesday, 4 May 2021 at 20:20:35 UTC+1, Keith Thompson wrote:
>> "muta...@gmail.com" <muta...@gmail.com> writes:
>>> I was wondering if there is any problem with adding
>>> a slurp() function to C90 which would do:
>>>
>>> fopen()
>>> fseek(SEEK_END)
>>> malloc() of file size
>>> fread of entire file
>>> (not sure about fclose, probably don't do that)
>>>
>>> And return a pointer to the malloc() region.
>> The biggest problem is that C90 is obsolete, replaced by newer editions
>> of the C standard. It will not be updated.
>>
>> If you want to define your own specification based on C90 plus your
>> slurp() function, nobody will stop you. (There might be some copyright
>> issues; I won't get into that.)
>>
>> But it's such a simple function, you can just write it and include it in
>> any program that needs it, or you can implement it in a library. I see
>> no point in adding it to the language standard, especially with so many
>> unresolved issues.
>>
> There is a point. To implement fslurp() portably, you cannot fseek() to the file
> end position to get the size, fseek() back, then do an fread(), which would
> be the most efficient way of getting the input.
> Instead my fslurp() uses an fgetc() loop and an expanding realloc() buffer.
> Since efficiency isn't a major concern for most uses of fslurp(), this doesn't
> matter too much. But by putting it into the standard, implementations could
> use the optimal method.
>

That says it wants to be in A Standard and implemented by the
implementation, it doesn't say it needs to be implemented by the C
Standard. It could be in something like POSIX which defines additional
capabilities for the system.

Particularly since you want it back-ported to older Standards, that is
way to do it.

Re: slurp/unslurp

<saxkI.225469$tDk2.96581@fx06.ams4>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16080&group=comp.lang.c#16080

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!peer03.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!fx06.ams4.POSTED!not-for-mail
Subject: Re: slurp/unslurp
Newsgroups: comp.lang.c
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com>
<87h7ji9uhc.fsf@nosuchdomain.example.com>
<5e856231-86ed-46b9-8bfd-9f76ed37ea7cn@googlegroups.com>
From: bc...@freeuk.com (Bart)
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
MIME-Version: 1.0
In-Reply-To: <5e856231-86ed-46b9-8bfd-9f76ed37ea7cn@googlegroups.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-GB
Content-Transfer-Encoding: 7bit
X-Antivirus: AVG (VPS 210505-0, 05/05/2021), Outbound message
X-Antivirus-Status: Clean
Lines: 92
Message-ID: <saxkI.225469$tDk2.96581@fx06.ams4>
X-Complaints-To: http://netreport.virginmedia.com
NNTP-Posting-Date: Wed, 05 May 2021 13:35:20 UTC
Organization: virginmedia.com
Date: Wed, 5 May 2021 14:35:16 +0100
X-Received-Bytes: 3715
 by: Bart - Wed, 5 May 2021 13:35 UTC

On 05/05/2021 10:11, Malcolm McLean wrote:
> On Tuesday, 4 May 2021 at 20:20:35 UTC+1, Keith Thompson wrote:
>> "muta...@gmail.com" <muta...@gmail.com> writes:
>>> I was wondering if there is any problem with adding
>>> a slurp() function to C90 which would do:
>>>
>>> fopen()
>>> fseek(SEEK_END)
>>> malloc() of file size
>>> fread of entire file
>>> (not sure about fclose, probably don't do that)
>>>
>>> And return a pointer to the malloc() region.
>> The biggest problem is that C90 is obsolete, replaced by newer editions
>> of the C standard. It will not be updated.
>>
>> If you want to define your own specification based on C90 plus your
>> slurp() function, nobody will stop you. (There might be some copyright
>> issues; I won't get into that.)
>>
>> But it's such a simple function, you can just write it and include it in
>> any program that needs it, or you can implement it in a library. I see
>> no point in adding it to the language standard, especially with so many
>> unresolved issues.
>>
> There is a point. To implement fslurp() portably, you cannot fseek() to the file
> end position to get the size, fseek() back, then do an fread(), which would
> be the most efficient way of getting the input.
> Instead my fslurp() uses an fgetc() loop and an expanding realloc() buffer.
> Since efficiency isn't a major concern for most uses of fslurp(), this doesn't
> matter too much. But by putting it into the standard, implementations could
> use the optimal method.
>

You always want it to be as efficient as possible. The program below,
when run on Windows (so relies on msvcrt.dll), takes 450ms to read the
test file mentioned.

That is about the same time as it takes to me compile the whole thing
(sqlite3.c to sqlite3.obj).

Using my normal method of first determining the size then doing a single
fread, it takes about 8ms, over 50 times faster.

In both cases, the filedata will have been cached by the OS.

----------------------------
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define filename "/cx/big/sqlite3.c"

char* pfiledata;
int pcapacity=1024;
int psize;

void newfilebuffer(void){
pfiledata=malloc(pcapacity);
psize=0;
}

char storechar(int c) {
int newfiledata;

++psize;
if (psize>=pcapacity) {
pcapacity*=2;
pfiledata=realloc(pfiledata,pcapacity);
}
pfiledata[psize-1]=c;
}

int main(void) {
FILE* f=fopen(filename,"rb");
int c;

if (f==NULL) exit(0);

newfilebuffer();

while (1) {
c=fgetc(f);
if (c==EOF) break;
storechar(c);
}
storechar(0);
fclose(f);

printf("P=%p size=%d\n", pfiledata, psize);
}

Re: slurp/unslurp

<874kfhz2ih.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16081&group=comp.lang.c#16081

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.c
Subject: Re: slurp/unslurp
Date: Wed, 05 May 2021 15:19:50 +0100
Organization: A noiseless patient Spider
Lines: 63
Message-ID: <874kfhz2ih.fsf@bsb.me.uk>
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com>
<87h7ji9uhc.fsf@nosuchdomain.example.com>
<5e856231-86ed-46b9-8bfd-9f76ed37ea7cn@googlegroups.com>
<saxkI.225469$tDk2.96581@fx06.ams4>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="b816ff399d8a49d6e80c009f584e409e";
logging-data="5636"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/jo9nLC7T+bVYkkV3fVq8ZoM9M75RIaow="
Cancel-Lock: sha1:ZDIRxcvJcXjmfb5w1cO5cbvT3lY=
sha1:EXeZZY3Ivzm/i51pc5dR8SQZteQ=
X-BSB-Auth: 1.d8576d296d215d0061a3.20210505151950BST.874kfhz2ih.fsf@bsb.me.uk
 by: Ben Bacarisse - Wed, 5 May 2021 14:19 UTC

Bart <bc@freeuk.com> writes:

> On 05/05/2021 10:11, Malcolm McLean wrote:
>> On Tuesday, 4 May 2021 at 20:20:35 UTC+1, Keith Thompson wrote:
>>> "muta...@gmail.com" <muta...@gmail.com> writes:
>>>> I was wondering if there is any problem with adding
>>>> a slurp() function to C90 which would do:
>>>>
>>>> fopen()
>>>> fseek(SEEK_END)
>>>> malloc() of file size
>>>> fread of entire file
>>>> (not sure about fclose, probably don't do that)
>>>>
>>>> And return a pointer to the malloc() region.
>>> The biggest problem is that C90 is obsolete, replaced by newer editions
>>> of the C standard. It will not be updated.
>>>
>>> If you want to define your own specification based on C90 plus your
>>> slurp() function, nobody will stop you. (There might be some copyright
>>> issues; I won't get into that.)
>>>
>>> But it's such a simple function, you can just write it and include it in
>>> any program that needs it, or you can implement it in a library. I see
>>> no point in adding it to the language standard, especially with so many
>>> unresolved issues.
>>>
>> ... To implement fslurp() portably, you cannot fseek()
>> to the file end position to get the size, fseek() back, then do an
>> fread(), which would be the most efficient way of getting the input.
>> Instead my fslurp() uses an fgetc() loop and an expanding realloc()
>> buffer.

The question I pose to Bart applies to this too.

>> Since efficiency isn't a major concern for most uses of
>> fslurp(), this doesn't matter too much. But by putting it into the
>> standard, implementations could use the optimal method.
>
> You always want it to be as efficient as possible.

What are you suggesting to make it efficient? I think you posted the
code below is as an example of what you consider inefficient.

> while (1) {
> c=fgetc(f);
> if (c==EOF) break;
> storechar(c);
> }

Why not use fread and expand the size to read as the buffer grows? That
may not help much on systems with very good fgetc implementations, but
it can't hurt.

BTW, Bart writes non-canonical C. I think most C programmers would
write

while ((c = fgetc(f)) != EOF) storechar(c);

(with layout etc. adjusted for preference).

--
Ben.

Re: slurp/unslurp

<aNykI.226279$tDk2.191702@fx06.ams4>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16083&group=comp.lang.c#16083

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!fx06.ams4.POSTED!not-for-mail
Subject: Re: slurp/unslurp
Newsgroups: comp.lang.c
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com>
<87h7ji9uhc.fsf@nosuchdomain.example.com>
<5e856231-86ed-46b9-8bfd-9f76ed37ea7cn@googlegroups.com>
<saxkI.225469$tDk2.96581@fx06.ams4> <874kfhz2ih.fsf@bsb.me.uk>
From: bc...@freeuk.com (Bart)
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
MIME-Version: 1.0
In-Reply-To: <874kfhz2ih.fsf@bsb.me.uk>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-GB
Content-Transfer-Encoding: 7bit
X-Antivirus: AVG (VPS 210505-0, 05/05/2021), Outbound message
X-Antivirus-Status: Clean
Lines: 42
Message-ID: <aNykI.226279$tDk2.191702@fx06.ams4>
X-Complaints-To: http://netreport.virginmedia.com
NNTP-Posting-Date: Wed, 05 May 2021 15:24:54 UTC
Organization: virginmedia.com
Date: Wed, 5 May 2021 16:24:50 +0100
X-Received-Bytes: 2448
 by: Bart - Wed, 5 May 2021 15:24 UTC

On 05/05/2021 15:19, Ben Bacarisse wrote:
> Bart <bc@freeuk.com> writes:
>
>> On 05/05/2021 10:11, Malcolm McLean wrote:

>>> Instead my fslurp() uses an fgetc() loop and an expanding realloc()
>>> buffer.

>> You always want it to be as efficient as possible.
>
> What are you suggesting to make it efficient? I think you posted the
> code below is as an example of what you consider inefficient.

It uses Malcolm's suggestion of an fgetc loop.

>> while (1) {
>> c=fgetc(f);
>> if (c==EOF) break;
>> storechar(c);
>> }
>
> Why not use fread and expand the size to read as the buffer grows?

You mean, tentatively read as much as the buffer allows, and monitor the
number of bytes actually read?

I was going to try it, but realised it's going to be rather fiddly.
(However a quick test reading my test file as 7756 1KB blocks with
fread, into a fixed 8MB buffer, was nearly as fast as my single fread of
the whole file.)

The fast is, nearly every file I'm going to read in using my whole-file
method is going to be small. My test file that took 8ms here is over 100
times bigger than a typical input file, and still only takes up 0.1% of
my machine's memory.

It's a non-issue. If I was to encounter a file that is too big to fit
into memory, then it will be too big whichever method I use.

Re: slurp/unslurp

<20210505083459.437@kylheku.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16084&group=comp.lang.c#16084

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: 563-365-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.c
Subject: Re: slurp/unslurp
Date: Wed, 5 May 2021 15:41:23 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <20210505083459.437@kylheku.com>
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com>
<20210504083041.263@kylheku.com>
<cd42cc91-b656-4f67-93e9-8a0a6e0d9afan@googlegroups.com>
Injection-Date: Wed, 5 May 2021 15:41:23 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c9c79706163449510796aaa69204f8c4";
logging-data="16052"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+geQAJHQ6LndRYQKUYjrn31K6/Xrgg5Mk="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:3+q2KCEEqo5HcZ1Ti1GNTdL3ziE=
 by: Kaz Kylheku - Wed, 5 May 2021 15:41 UTC

On 2021-05-04, muta...@gmail.com <mutazilah@gmail.com> wrote:
> Sure. That's why slurp/unslurp exist in comp.lang.c not
> Sourceforge/PDPCLIB.
>
> Also note that I'm here because a real-world requirement
> popped up that forced me to finally investigate memory
> mapped files. And it is Windows, so the underlying API
> seems to be MapViewOfFile(). But I don't want to code
> that directly, even though Sqlite has. I want to move it
> out of Sqlite and into PDPCLIB. Move mmap out of
> Sqlite too.

I think you're neglecting to consider the viewpoint that a group of
people already took the C standard and made a more extensive standard
which includes features like memory mapping of files. That standard is
POSIX.

There was POSIX in 1990, which extended C90 with useful functions.

To be fair, the ISO C people have themselves begun to neglect this view
and started reinventing POSIX wheels, such as adding threads to C that
are similar to POSIX threads, with different naming.

What should have happened is that C and POSIX people should have
negotiated to migrate parts of POSIX threads into C, with exactly the
same syntax. The same <pthread.h> header and pthread_mutex_lock
and whatever. All the more simple, easily portable aspects could be
encoded in C, so that POSIX then just describes extensions, like real
time signals or whatever is not put into C.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Re: slurp/unslurp

<5f8fcaba-03d0-40ae-8f61-29beb6d906d1n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16085&group=comp.lang.c#16085

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a37:e11:: with SMTP id 17mr24957564qko.499.1620229302355;
Wed, 05 May 2021 08:41:42 -0700 (PDT)
X-Received: by 2002:ad4:48c4:: with SMTP id v4mr21958226qvx.16.1620229302216;
Wed, 05 May 2021 08:41:42 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 5 May 2021 08:41:39 -0700 (PDT)
In-Reply-To: <874kfhz2ih.fsf@bsb.me.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=2a00:23a8:400a:5601:fc32:afe8:789f:1a87;
posting-account=Dz2zqgkAAADlK5MFu78bw3ab-BRFV4Qn
NNTP-Posting-Host: 2a00:23a8:400a:5601:fc32:afe8:789f:1a87
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com>
<87h7ji9uhc.fsf@nosuchdomain.example.com> <5e856231-86ed-46b9-8bfd-9f76ed37ea7cn@googlegroups.com>
<saxkI.225469$tDk2.96581@fx06.ams4> <874kfhz2ih.fsf@bsb.me.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5f8fcaba-03d0-40ae-8f61-29beb6d906d1n@googlegroups.com>
Subject: Re: slurp/unslurp
From: malcolm....@gmail.com (Malcolm McLean)
Injection-Date: Wed, 05 May 2021 15:41:42 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Malcolm McLean - Wed, 5 May 2021 15:41 UTC

On Wednesday, 5 May 2021 at 15:20:01 UTC+1, Ben Bacarisse wrote:
> Bart <b...@freeuk.com> writes:
>
> > On 05/05/2021 10:11, Malcolm McLean wrote:
> >> On Tuesday, 4 May 2021 at 20:20:35 UTC+1, Keith Thompson wrote:
> >>> "muta...@gmail.com" <muta...@gmail.com> writes:
> >>>> I was wondering if there is any problem with adding
> >>>> a slurp() function to C90 which would do:
> >>>>
> >>>> fopen()
> >>>> fseek(SEEK_END)
> >>>> malloc() of file size
> >>>> fread of entire file
> >>>> (not sure about fclose, probably don't do that)
> >>>>
> >>>> And return a pointer to the malloc() region.
> >>> The biggest problem is that C90 is obsolete, replaced by newer editions
> >>> of the C standard. It will not be updated.
> >>>
> >>> If you want to define your own specification based on C90 plus your
> >>> slurp() function, nobody will stop you. (There might be some copyright
> >>> issues; I won't get into that.)
> >>>
> >>> But it's such a simple function, you can just write it and include it in
> >>> any program that needs it, or you can implement it in a library. I see
> >>> no point in adding it to the language standard, especially with so many
> >>> unresolved issues.
> >>>
> >> ... To implement fslurp() portably, you cannot fseek()
> >> to the file end position to get the size, fseek() back, then do an
> >> fread(), which would be the most efficient way of getting the input.
> >> Instead my fslurp() uses an fgetc() loop and an expanding realloc()
> >> buffer.
> The question I pose to Bart applies to this too.
> >> Since efficiency isn't a major concern for most uses of
> >> fslurp(), this doesn't matter too much. But by putting it into the
> >> standard, implementations could use the optimal method.
> >
> > You always want it to be as efficient as possible.
> What are you suggesting to make it efficient? I think you posted the
> code below is as an example of what you consider inefficient.
> > while (1) {
> > c=fgetc(f);
> > if (c==EOF) break;
> > storechar(c);
> > }
> Why not use fread and expand the size to read as the buffer grows? That
> may not help much on systems with very good fgetc implementations, but
> it can't hurt.
>
There's a case for an optimal version which isn't portable, and there's a
case for a version which isn't optimised, without being catastrophic, because
the run time will be insignificant in relation to everything else. There's a
weaker case for a semi-optimised version.
>
> BTW, Bart writes non-canonical C. I think most C programmers would
> write
>
> while ((c = fgetc(f)) != EOF) storechar(c);
>
> (with layout etc. adjusted for preference).
>
You said it. People are used to the pattern of reading a stream
a character at a time, then processing it character by character
within the read loop.

Re: slurp/unslurp

<faa28c1f-d689-48e7-b42a-e7d243d774e9n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16086&group=comp.lang.c#16086

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a37:270d:: with SMTP id n13mr1104397qkn.146.1620254412497; Wed, 05 May 2021 15:40:12 -0700 (PDT)
X-Received: by 2002:a0c:c447:: with SMTP id t7mr1079730qvi.60.1620254412335; Wed, 05 May 2021 15:40:12 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 5 May 2021 15:40:12 -0700 (PDT)
In-Reply-To: <20210505083459.437@kylheku.com>
Injection-Info: google-groups.googlegroups.com; posting-host=202.169.113.201; posting-account=CeHKkQoAAAAowY1GfiJYG55VVc0s1zaG
NNTP-Posting-Host: 202.169.113.201
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com> <20210504083041.263@kylheku.com> <cd42cc91-b656-4f67-93e9-8a0a6e0d9afan@googlegroups.com> <20210505083459.437@kylheku.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <faa28c1f-d689-48e7-b42a-e7d243d774e9n@googlegroups.com>
Subject: Re: slurp/unslurp
From: mutazi...@gmail.com (muta...@gmail.com)
Injection-Date: Wed, 05 May 2021 22:40:12 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 258
 by: muta...@gmail.com - Wed, 5 May 2021 22:40 UTC

On Thursday, May 6, 2021 at 1:41:34 AM UTC+10, Kaz Kylheku wrote:

> > Also note that I'm here because a real-world requirement
> > popped up that forced me to finally investigate memory
> > mapped files. And it is Windows, so the underlying API
> > seems to be MapViewOfFile(). But I don't want to code
> > that directly, even though Sqlite has. I want to move it
> > out of Sqlite and into PDPCLIB. Move mmap out of
> > Sqlite too.

> I think you're neglecting to consider the viewpoint that a group of
> people already took the C standard and made a more extensive standard
> which includes features like memory mapping of files. That standard is
> POSIX.
>
> There was POSIX in 1990, which extended C90 with useful functions.
>
> To be fair, the ISO C people have themselves begun to neglect this view
> and started reinventing POSIX wheels, such as adding threads to C that
> are similar to POSIX threads, with different naming.
>
> What should have happened is that C and POSIX people should have
> negotiated to migrate parts of POSIX threads into C, with exactly the
> same syntax. The same <pthread.h> header and pthread_mutex_lock
> and whatever. All the more simple, easily portable aspects could be
> encoded in C, so that POSIX then just describes extensions, like real
> time signals or whatever is not put into C.

Ok, that's a good point - why not just copy aspects
of POSIX. You can also ask why Windows didn't do
that too. Or since it's too late now, do the reverse -
why doesn't C copy Windows?

I've just taken another look at mmap, to see if it can
be used unchanged, no need for slurp().

https://man7.org/linux/man-pages/man2/mmap.2.html

#include <sys/mman.h>

Directories don't exist in MVS.

int fd

Files are not numbered in MVS. They do have an
associated DCB though, so if POSIX had said:

DCB *dcb

instead, MVS users may not have complained.

If addr is NULL

This is complicating the implementation. C90 seems
to be simple.

(page-aligned)

No such thing as pages in MSDOS.

offset must be a multiple of the page size as returned by sysconf(_SC_PAGE_SIZE).

Yikes! That just made my brain freeze.

After the mmap() call has returned, the file descriptor, fd, can
be closed immediately without invalidating the mapping.

Sounds like a recipe for disaster to me. You need to
fopen() the file again to write it out, and hope it still
exists. What's the benefit of closing a file before you
have finished using it? After all the effort of opening
it.

The prot argument describes the desired memory protection of the mapping

No such thing as memory protection on 8086. This
would be the first thing added to C90 that introduced
the concept.

PROT_EXEC
Pages may be executed.

PROT_READ
Pages may be read.

PROT_WRITE
Pages may be written.

PROT_NONE
Pages may not be accessed.

Not only is the concept being introduced - it's coming
in by storm!

The flags argument
The flags argument determines whether updates to the mapping are
visible to other processes mapping the same region,

No such thing as processes in MSDOS. Another new
concept for C90.

MAP_SHARED_VALIDATE

Made my brain freeze about how anyone could even
conceive of something so complicated.

EOPNOTSUPP

A new errno not seen in C90.

MAP_PRIVATE
Create a private copy-on-write mapping.

You expect an MSDOS programmer trying to slurp a
file into memory, a conceptually simple thing, to come
to grips with whether they want a copy-on-write
mapping if they ever end up on a Unix system?

MAP_SHARED_VALIDATE is a Linux extension

There's no reason to stop there. You can incorporate
all the BSD extensions into C90+ too. And Windows.

MAP_32BIT (since Linux 2.4.20, 2.6)
Put the mapping into the first 2 Gigabytes of the process

C90 doesn't have much concept of 32-bit. There is
only one place - the minimum size of a long.

That's enough for now. If people want to create a
sophisticated, largely Unix-specific API, that's fine.
Windows has one too, and so does MVS. And
MSDOS does too. MSDOS even has an API that
allows you to mark a file as "hidden". I'm not sure
POSIX even has that concept.

But this is not suitable for C90 or the proposed C90+.

slurp(), or as someone else has renamed it (and I think
it is a good change), fslurp(), is the conceptually simple
task of reading a file into memory in a single, simple
operation that is easy to implement.

If it does a complicated mmap() behind the scenes,
on just Unix-like platforms, that's fine, and in fact,
that's the exact intention. But it is the nature of C90
to be discreet about why it has carefully worded
things. They may expose their thinking in the
Rationale though, and maybe they explain why they
wrote that you can rely on numbers being contiguous
code points, but not letters (because of EBCDIC).

I'm not sure how much of POSIX can actually be
incorporated into C, and especially not starting
from C90. POSIX seems to have been designed to
standardize Unix. That would be like if there were
5 competitors for Windows and they were trying
to standardize what they actually had. I've never
heard of anyone trying to put the Windows API
into C, even picking and choosing.

Maybe I can pose a different question - if your boss
told you you needed to code in C90 with 5 extensions,
what would those 5 extensions be?

I guess what I'm trying to do is take C90 - noting that
C90 isn't even universally implemented yet - and then
negotiate one change at a time, based on reasonable,
real-world (especially in 1990, with MSDOS all the
rage) need. But with the benefit of hindsight. So I'd
like to negotiate whether we need far pointers or not.
Maybe we don't need that, but what we instead need
is a keyword abs_addr. And you can set an absolute
address from an integer type, ie:

abs_addr x;

set_abs_addr(&x, 0xb8000UL);

But forget about that for now. That's just an example.

Do you agree that C90, even in 1990, or maybe 1991,
could be extended with an fslurp() so that one day
in the future, when a 4 GB machine is running 10
processes that all use the same 1.9 GB file, read-only
file, they can use pointers instead of file operations?
Sharing the exact same real memory, thanks to
virtual memory being available?

Is the concept of reading a file into memory for
efficient processing something that would have
been required in 1990? Maybe the scheme needs
to be extended, with some sort of API that wraps
the pointer access and falls back to file operations
if the file is too big to fit into memory.

But it seems to me that this concept of data being
in either memory or a file is pretty fundamental.
They are the only choices in C90. And there should
be some consideration given to switching between
the two for storage/processing. Yes, fread() exists,
but is that where things should end? Or is an fslurp()
logical?

And for that matter, why is a file size being given as
"long"? There was a size_t created for memory
buffers. Why not a corresponding type for files?
Who created size_t anyway? And in C90 they didn't
provide a way of printing out a size_t. So how are
you supposed to print out:

printf("your string is %xxxx bytes long\n", strlen(p));

What did the ISO committee expect people to do?

But nevermind about ISO. I'd rather negotiate with
comp.lang.c about changes, probably minimal, or
at least, one at a time, to C90. Additions and
subtractions. With fslurp() on the agenda at the
moment.

I have a C90 library already (PDPCLIB), so I can add
fslurp() to it as soon as the comp.lang.c committee
agrees on an extension to C90. Based on their
knowledge in 2021 (which I am largely lacking - I'm
still largely in 1990 - I deliberately held myself back -
especially spending a lot of time on MVS 3.8J from
the 1980s).

I also have GCC 3.2.3 as a C90-compliant compiler,
so I can make language changes such as adding
'\e' for ESC, but that is more difficult than changing
PDPCLIB. Maybe it's better to add ESC_CHAR and
ESC_STRING to stddef.h. If a platform implements
them as '\e' and "\e", that should be allowed, as
that escape sequence can be "reserved for future
use as an ESC character".

I'm here to negotiate. Please. Someone from 2021
negotiate. Bridge the time gap. Help me prepare
for time travel in case I am forced to leave 1990.
I'll tell you what the situation is in 1990 in case
you've forgotten, or never knew. I've got a public
domain replacement for MSDOS, and the divide
by zero interrupt (INT 0) has just been implemented,
and prints MVS-style diagnostics when that occurs,
ie the registers, load point of the program etc. Not
sure why MSDOS itself didn't do that, when it has
MVS as a reference. I'm mulling the possibility of
an 8086-like environment/processor that offers
rudimentary memory protection so that the MSDOS
replacement (PDOS/86) can protect itself from
applications. Then you will get diagnostics not
just from divide by zero errors but also memory
access violations. And of course, the segment shift
will be flexible, set to either 4 bits or 16 bits.


Click here to read the complete article
Re: slurp/unslurp

<KoFkI.231238$c2cf.50846@fx28.ams4>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16087&group=comp.lang.c#16087

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!news.uzoreto.com!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!peer02.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!fx28.ams4.POSTED!not-for-mail
Subject: Re: slurp/unslurp
Newsgroups: comp.lang.c
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com>
<20210504083041.263@kylheku.com>
<cd42cc91-b656-4f67-93e9-8a0a6e0d9afan@googlegroups.com>
<20210505083459.437@kylheku.com>
<faa28c1f-d689-48e7-b42a-e7d243d774e9n@googlegroups.com>
From: bc...@freeuk.com (Bart)
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.0
MIME-Version: 1.0
In-Reply-To: <faa28c1f-d689-48e7-b42a-e7d243d774e9n@googlegroups.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-GB
Content-Transfer-Encoding: 7bit
X-Antivirus: AVG (VPS 210505-0, 05/05/2021), Outbound message
X-Antivirus-Status: Clean
Lines: 57
Message-ID: <KoFkI.231238$c2cf.50846@fx28.ams4>
X-Complaints-To: http://netreport.virginmedia.com
NNTP-Posting-Date: Wed, 05 May 2021 22:56:42 UTC
Organization: virginmedia.com
Date: Wed, 5 May 2021 23:56:35 +0100
X-Received-Bytes: 3470
 by: Bart - Wed, 5 May 2021 22:56 UTC

On 05/05/2021 23:40, muta...@gmail.com wrote:
> On Thursday, May 6, 2021 at 1:41:34 AM UTC+10, Kaz Kylheku wrote:
>
>>> Also note that I'm here because a real-world requirement
>>> popped up that forced me to finally investigate memory
>>> mapped files. And it is Windows, so the underlying API
>>> seems to be MapViewOfFile(). But I don't want to code
>>> that directly, even though Sqlite has. I want to move it
>>> out of Sqlite and into PDPCLIB. Move mmap out of
>>> Sqlite too.
>
>> I think you're neglecting to consider the viewpoint that a group of
>> people already took the C standard and made a more extensive standard
>> which includes features like memory mapping of files. That standard is
>> POSIX.
>>
>> There was POSIX in 1990, which extended C90 with useful functions.
>>
>> To be fair, the ISO C people have themselves begun to neglect this view
>> and started reinventing POSIX wheels, such as adding threads to C that
>> are similar to POSIX threads, with different naming.
>>
>> What should have happened is that C and POSIX people should have
>> negotiated to migrate parts of POSIX threads into C, with exactly the
>> same syntax. The same <pthread.h> header and pthread_mutex_lock
>> and whatever. All the more simple, easily portable aspects could be
>> encoded in C, so that POSIX then just describes extensions, like real
>> time signals or whatever is not put into C.
>
> Ok, that's a good point - why not just copy aspects
> of POSIX. You can also ask why Windows didn't do
> that too.

Why would Windows do that? It's got its own API will which work on every
version of Windows.

But there were a million different versions of Unix, and POSIX as I
understand it was to tie them all together. That's what the "IX" of
POSIX was for.

> Or since it's too late now, do the reverse -
> why doesn't C copy Windows?

I'm glad it didn't because the C standard API is a lot easier to
Windows' API.

(That's actually when I started using the standard C library, because it
was just a much simpler library to use than WinAPI. I didn't even
realise it was meant to be used from C; it was just another library that
came with Windows.)

POSIX itself I think is about 80 different headers; I wouldn't want to
get involved with that either. It's so annoying when you want to build
something on Windows, and C programs use POSIX headers instead of
standard C headers

Re: slurp/unslurp

<s6v9ne$1lvj$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16088&group=comp.lang.c#16088

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!aioe.org!NBiuIU74OKL7NpIOsbuNjQ.user.gioia.aioe.org.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.lang.c
Subject: Re: slurp/unslurp
Date: Wed, 5 May 2021 16:28:47 -0700
Organization: Aioe.org NNTP Server
Lines: 35
Message-ID: <s6v9ne$1lvj$1@gioia.aioe.org>
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com>
<20210504083041.263@kylheku.com>
<cd42cc91-b656-4f67-93e9-8a0a6e0d9afan@googlegroups.com>
<20210505083459.437@kylheku.com>
NNTP-Posting-Host: NBiuIU74OKL7NpIOsbuNjQ.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Content-Language: en-US
X-Notice: Filtered by postfilter v. 0.9.2
 by: Chris M. Thomasson - Wed, 5 May 2021 23:28 UTC

On 5/5/2021 8:41 AM, Kaz Kylheku wrote:
> On 2021-05-04, muta...@gmail.com <mutazilah@gmail.com> wrote:
>> Sure. That's why slurp/unslurp exist in comp.lang.c not
>> Sourceforge/PDPCLIB.
>>
>> Also note that I'm here because a real-world requirement
>> popped up that forced me to finally investigate memory
>> mapped files. And it is Windows, so the underlying API
>> seems to be MapViewOfFile(). But I don't want to code
>> that directly, even though Sqlite has. I want to move it
>> out of Sqlite and into PDPCLIB. Move mmap out of
>> Sqlite too.
>
> I think you're neglecting to consider the viewpoint that a group of
> people already took the C standard and made a more extensive standard
> which includes features like memory mapping of files. That standard is
> POSIX.
>
> There was POSIX in 1990, which extended C90 with useful functions.
>
> To be fair, the ISO C people have themselves begun to neglect this view
> and started reinventing POSIX wheels, such as adding threads to C that
> are similar to POSIX threads, with different naming.
>
> What should have happened is that C and POSIX people should have
> negotiated to migrate parts of POSIX threads into C, with exactly the
> same syntax. The same <pthread.h> header and pthread_mutex_lock
> and whatever. All the more simple, easily portable aspects could be
> encoded in C, so that POSIX then just describes extensions, like real
> time signals or whatever is not put into C.
>

Last time I checked, POSIX has no means for portable fine grain memory
barriers and atomic operations. I loved it when C/C++ finally supported
them. Still do. :^)

Re: slurp/unslurp

<mtWdnXmcjPOHjQb9nZ2dnUU7-UednZ2d@giganews.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=16356&group=comp.lang.c#16356

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!buffer1.nntp.dca1.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Tue, 11 May 2021 18:15:38 -0500
Subject: Re: slurp/unslurp
Newsgroups: comp.lang.c
References: <26c15d3a-b341-44d0-bda4-dc0a8ec2fa20n@googlegroups.com> <87h7ji9uhc.fsf@nosuchdomain.example.com> <5e856231-86ed-46b9-8bfd-9f76ed37ea7cn@googlegroups.com> <saxkI.225469$tDk2.96581@fx06.ams4> <874kfhz2ih.fsf@bsb.me.uk>
From: NoO...@NoWhere.com (olcott)
Date: Tue, 11 May 2021 18:16:29 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1
MIME-Version: 1.0
In-Reply-To: <874kfhz2ih.fsf@bsb.me.uk>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Message-ID: <mtWdnXmcjPOHjQb9nZ2dnUU7-UednZ2d@giganews.com>
Lines: 70
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-Saf6bnBEPZagYWwhZmfOWy1FrHQa/Oa4oK+WThmLx0SN0H9KTFZxaZ5HIum5hmjJg1a1FhCbpVSAGG0!1+Ho0IsRuvHYsQ27M+Thpfce/Qh0QKdKZ4s+ZAK/bHZb9Cz4Y7XbKeHv8ZbFP4K4eYHuKHls3YdO!KQ==
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 3891
 by: olcott - Tue, 11 May 2021 23:16 UTC

On 5/5/2021 9:19 AM, Ben Bacarisse wrote:
> Bart <bc@freeuk.com> writes:
>
>> On 05/05/2021 10:11, Malcolm McLean wrote:
>>> On Tuesday, 4 May 2021 at 20:20:35 UTC+1, Keith Thompson wrote:
>>>> "muta...@gmail.com" <muta...@gmail.com> writes:
>>>>> I was wondering if there is any problem with adding
>>>>> a slurp() function to C90 which would do:
>>>>>
>>>>> fopen()
>>>>> fseek(SEEK_END)
>>>>> malloc() of file size
>>>>> fread of entire file
>>>>> (not sure about fclose, probably don't do that)
>>>>>
>>>>> And return a pointer to the malloc() region.
>>>> The biggest problem is that C90 is obsolete, replaced by newer editions
>>>> of the C standard. It will not be updated.
>>>>
>>>> If you want to define your own specification based on C90 plus your
>>>> slurp() function, nobody will stop you. (There might be some copyright
>>>> issues; I won't get into that.)
>>>>
>>>> But it's such a simple function, you can just write it and include it in
>>>> any program that needs it, or you can implement it in a library. I see
>>>> no point in adding it to the language standard, especially with so many
>>>> unresolved issues.
>>>>
>>> ... To implement fslurp() portably, you cannot fseek()
>>> to the file end position to get the size, fseek() back, then do an
>>> fread(), which would be the most efficient way of getting the input.
>>> Instead my fslurp() uses an fgetc() loop and an expanding realloc()
>>> buffer.
>
> The question I pose to Bart applies to this too.
>
>>> Since efficiency isn't a major concern for most uses of
>>> fslurp(), this doesn't matter too much. But by putting it into the
>>> standard, implementations could use the optimal method.
>>
>> You always want it to be as efficient as possible.
>
> What are you suggesting to make it efficient? I think you posted the
> code below is as an example of what you consider inefficient.
>
>> while (1) {
>> c=fgetc(f);
>> if (c==EOF) break;
>> storechar(c);
>> }
>
> Why not use fread and expand the size to read as the buffer grows? That
> may not help much on systems with very good fgetc implementations, but
> it can't hurt.
>
> BTW, Bart writes non-canonical C. I think most C programmers would
> write
>
> while ((c = fgetc(f)) != EOF) storechar(c);
>
> (with layout etc. adjusted for preference).
>

technical competence.

--
Copyright 2021 Pete Olcott

"Great spirits have always encountered violent opposition from mediocre
minds." Einstein

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor