Message-ID:

"In the long run, every program becomes rococo, and then rubble." -- Alan Perlis

devel / comp.arch / Re: Paper about ISO C

Re: Paper about ISO C

<9be5a768-5d12-4f3b-9daa-3360c08543dbn@googlegroups.com>

https://www.novabbs.com/devel/article-flat.php?id=21955&group=comp.arch#21955

X-Received: by 2002:ac8:6112:: with SMTP id a18mr10892894qtm.401.1636665642203;
Thu, 11 Nov 2021 13:20:42 -0800 (PST)
X-Received: by 2002:a05:6808:19aa:: with SMTP id bj42mr8499388oib.37.1636665641948;
Thu, 11 Nov 2021 13:20:41 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 11 Nov 2021 13:20:41 -0800 (PST)
In-Reply-To: <5NidnT3IZ9Ji0RD8nZ2dnUU7-T_NnZ2d@giganews.com>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <87fstdumxd.fsf@hotmail.com> <7c7468f3-0efc-4de0-9b14-53c6283f7a39n@googlegroups.com>
<sm9ktd$e58$1@dont-email.me> <8fe79286-c374-4129-b2a2-cb93099e9448n@googlegroups.com>
<5NidnT3IZ9Ji0RD8nZ2dnUU7-T_NnZ2d@giganews.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9be5a768-5d12-4f3b-9daa-3360c08543dbn@googlegroups.com>
Subject: Re: Paper about ISO C
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 11 Nov 2021 21:20:42 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 65

by: MitchAlsup - Thu, 11 Nov 2021 21:20 UTC

On Thursday, November 11, 2021 at 11:04:07 AM UTC-6, Kent Dickey wrote:
> In article <8fe79286-c374-4129...@googlegroups.com>,
> MitchAlsup <Mitch...@aol.com> wrote:
> >It is not the terms that make it safer, it is the prevention of Rindex<<scale
> >from changing the HoBs that make it safer, while allowing Rbase+immed
> >to still create any bit pattern appropriate. You cannot index over the
> >boundary of the VaS you are attempting to access. You can still point
> >anywhere !
> If I understand what you are saying, for a load/store using addressing
> of the form:
>
> [Rbase + Rindex << scale + imm]
>
> You calculate this in two parts. First, Rbase+imm is calculated and the
> high-order bits are looked at. These bits determine which page tables to
> use. Then, "Rindex << scale" is added in. Adding in Rindex must not change
> the high-order bits of the VA, and if they do, you will trap (or something).
>
> I'm going to suggest this is not a good idea. Let instructions form the
> VA any way they want, then look at the high-order bits of the final VA
> result and use that for any purpose you want.
>
> A complex enough pointer calculation may not have a clear "base" register,
> and if the compiler chooses incorrectly, the resulting code will not work.
> Maybe the compiler has a reason to transform:
>
> a[i] = 0;
>
> which in C is the same as:
>
> *(a + i) = 0;
>
> STR zero,[i + (a << 0) + 0)]
>
> User code could be dumbly written,
<
Is it not time to start getting rid of simply DUMB coding practices ??
<
> where they've casted pointers to an
> integer type, and in forming a pointer again, the code casts some offset to
> be a pointer, and adds in the real pointer as an integer. This could be
> happening in the compiler as well for jump tables, or for dynamic library
> linking, etc.
<
I am going to go on record as stating that when ever the compiler cannot
tell a pointer from an integer, that the proper course of action is to raise
a compile time error. You certainly would not be allowed to make such an
error in ADA, nor FORTRAN........................and it is time to start squishing
out these C-isms from the source.
<
That is:: address arithmetic is not integer arithmetic.
<
*(a+i) is as different from *(i+a)
as
x>>y is from y>>x !
>
> But with your checking that Rbase must have the proper high-order bits, this
> will not always work. And, it adds complexity to code generation for what
> I suspect is minimal hardware savings. In fact, it may be more complex to
> rely on the Rbase+imm to calculate the high-order bits, since then you have
> to do the AGEN in a certain way to get that answer.
<
The gate cost is 7-gates (in addition to the 2031 gates a 64-bit carry select
adder takes.)
>
> Kent

Re: Paper about ISO C

<_LSdnePfso4DDxD8nZ2dnUU7-KOdnZ2d@giganews.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=21956&group=comp.arch#21956

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!buffer2.nntp.dca1.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Thu, 11 Nov 2021 16:01:02 -0600
Newsgroups: comp.arch
Subject: Re: Paper about ISO C
References: <87fstdumxd.fsf@hotmail.com> <8fe79286-c374-4129-b2a2-cb93099e9448n@googlegroups.com> <5NidnT3IZ9Ji0RD8nZ2dnUU7-T_NnZ2d@giganews.com> <9be5a768-5d12-4f3b-9daa-3360c08543dbn@googlegroups.com>
Organization: provalid.com
X-Newsreader: trn 4.0-test76 (Apr 2, 2001)
From: keg...@provalid.com (Kent Dickey)
Originator: kegs@provalid.com (Kent Dickey)
Message-ID: <_LSdnePfso4DDxD8nZ2dnUU7-KOdnZ2d@giganews.com>
Date: Thu, 11 Nov 2021 16:01:02 -0600
Lines: 80
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-9V7te/5ffx54H9ArZyIwuF2NKRpfWVss3lonJXNcr1QDBjMdegfCNLLFO04AY+dCCYgRgxmz46ylbjq!UQCMpAJpKBD9JErq8wWU1nCOcfoLO4CSa/RmYrBc9ZMDK8YBDypVex4AeOMeJArvmm9qzL0Pi6Mq
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 4581

by: Kent Dickey - Thu, 11 Nov 2021 22:01 UTC

In article <9be5a768-5d12-4f3b-9daa-3360c08543dbn@googlegroups.com>,
MitchAlsup <MitchAlsup@aol.com> wrote:
>On Thursday, November 11, 2021 at 11:04:07 AM UTC-6, Kent Dickey wrote:
>> In article <8fe79286-c374-4129...@googlegroups.com>,
>> MitchAlsup <Mitch...@aol.com> wrote:
>> >It is not the terms that make it safer, it is the prevention of
>Rindex<<scale
>> >from changing the HoBs that make it safer, while allowing Rbase+immed
>> >to still create any bit pattern appropriate. You cannot index over the
>> >boundary of the VaS you are attempting to access. You can still point
>> >anywhere !
>> If I understand what you are saying, for a load/store using addressing
>> of the form:
>>
>> [Rbase + Rindex << scale + imm]
>>
>> You calculate this in two parts. First, Rbase+imm is calculated and the
>> high-order bits are looked at. These bits determine which page tables to
>> use. Then, "Rindex << scale" is added in. Adding in Rindex must not change
>> the high-order bits of the VA, and if they do, you will trap (or something).
>>
>> I'm going to suggest this is not a good idea. Let instructions form the
>> VA any way they want, then look at the high-order bits of the final VA
>> result and use that for any purpose you want.
>>
>> A complex enough pointer calculation may not have a clear "base" register,
>> and if the compiler chooses incorrectly, the resulting code will not work.
>> Maybe the compiler has a reason to transform:
>>
>> a[i] = 0;
>>
>> which in C is the same as:
>>
>> *(a + i) = 0;
>>
>> STR zero,[i + (a << 0) + 0)]
>>
>> User code could be dumbly written,
><
>Is it not time to start getting rid of simply DUMB coding practices ??
><
>> where
>they've casted pointers to an
>> integer type, and in forming a pointer again, the code casts some offset to
>> be a pointer, and adds in the real pointer as an integer. This could be
>> happening in the compiler as well for jump tables, or for dynamic library
>> linking, etc.
><
>I am going to go on record as stating that when ever the compiler cannot
>tell a pointer from an integer, that the proper course of action is to raise
>a compile time error. You certainly would not be allowed to make such an
>error in ADA, nor FORTRAN........................and it is time to start
>squishing
>out these C-isms from the source.
><
>That is:: address arithmetic is not integer arithmetic.
><
>*(a+i) is as different from *(i+a)
>as
>x>>y is from y>>x !
>>
>> But with your checking that Rbase must have the proper high-order bits, this
>> will not always work. And, it adds complexity to code generation for what
>> I suspect is minimal hardware savings. In fact, it may be more complex to
>> rely on the Rbase+imm to calculate the high-order bits, since then you have
>> to do the AGEN in a certain way to get that answer.
><
>The gate cost is 7-gates (in addition to the 2031 gates a 64-bit carry select
>adder takes.)
>>
>> Kent

Can you explain the hardware savings?

The Rbase+imm64 looks like it needs a 64-bit adder to get the high bits right.
Then, to add in Rindex (with a shift) is another 64-bit adder after that.
But if you did the adds together, I think it's a lot less logic to do the
partial adds, and then just do one full carry propagation.

Kent

On Thursday, November 11, 2021 at 4:01:11 PM UTC-6, Kent Dickey wrote:
> In article <9be5a768-5d12-4f3b...@googlegroups.com>,
> MitchAlsup <Mitch...@aol.com> wrote:
> >On Thursday, November 11, 2021 at 11:04:07 AM UTC-6, Kent Dickey wrote:
> >> In article <8fe79286-c374-4129...@googlegroups.com>,
> >> MitchAlsup <Mitch...@aol.com> wrote:
> >> >It is not the terms that make it safer, it is the prevention of
> >Rindex<<scale
> >> >from changing the HoBs that make it safer, while allowing Rbase+immed
> >> >to still create any bit pattern appropriate. You cannot index over the
> >> >boundary of the VaS you are attempting to access. You can still point
> >> >anywhere !
> >> If I understand what you are saying, for a load/store using addressing
> >> of the form:
> >>
> >> [Rbase + Rindex << scale + imm]
> >>
> >> You calculate this in two parts. First, Rbase+imm is calculated and the
> >> high-order bits are looked at. These bits determine which page tables to
> >> use. Then, "Rindex << scale" is added in. Adding in Rindex must not change
> >> the high-order bits of the VA, and if they do, you will trap (or something).
> >>
> >> I'm going to suggest this is not a good idea. Let instructions form the
> >> VA any way they want, then look at the high-order bits of the final VA
> >> result and use that for any purpose you want.
> >>
> >> A complex enough pointer calculation may not have a clear "base" register,
> >> and if the compiler chooses incorrectly, the resulting code will not work.
> >> Maybe the compiler has a reason to transform:
> >>
> >> a[i] = 0;
> >>
> >> which in C is the same as:
> >>
> >> *(a + i) = 0;
> >>
> >> STR zero,[i + (a << 0) + 0)]
> >>
> >> User code could be dumbly written,
> ><
> >Is it not time to start getting rid of simply DUMB coding practices ??
> ><
> >> where
> >they've casted pointers to an
> >> integer type, and in forming a pointer again, the code casts some offset to
> >> be a pointer, and adds in the real pointer as an integer. This could be
> >> happening in the compiler as well for jump tables, or for dynamic library
> >> linking, etc.
> ><
> >I am going to go on record as stating that when ever the compiler cannot
> >tell a pointer from an integer, that the proper course of action is to raise
> >a compile time error. You certainly would not be allowed to make such an
> >error in ADA, nor FORTRAN........................and it is time to start
> >squishing
> >out these C-isms from the source.
> ><
> >That is:: address arithmetic is not integer arithmetic.
> ><
> >*(a+i) is as different from *(i+a)
> >as
> >x>>y is from y>>x !
> >>
> >> But with your checking that Rbase must have the proper high-order bits, this
> >> will not always work. And, it adds complexity to code generation for what
> >> I suspect is minimal hardware savings. In fact, it may be more complex to
> >> rely on the Rbase+imm to calculate the high-order bits, since then you have
> >> to do the AGEN in a certain way to get that answer.
> ><
> >The gate cost is 7-gates (in addition to the 2031 gates a 64-bit carry select
> >adder takes.)
> >>
> >> Kent
> Can you explain the hardware savings?
<
There is no savings, thee is a very tiny cost. 2031 is smaller than 2038 !
>
> The Rbase+imm64 looks like it needs a 64-bit adder to get the high bits right.
> Then, to add in Rindex (with a shift) is another 64-bit adder after that.
> But if you did the adds together, I think it's a lot less logic to do the
> partial adds, and then just do one full carry propagation.
<
You use a 3-2 compressor to compress the 3-operands into a sum and a carry
(1 gate of delay longer than a 2-input 64-bit adder.)
Then you use a regular carry-whatever adder to generate the 64-bit result.
<
I use carry select adders for speed.
<
The detection of alteration of the top bits takes 7 gates after all of that
rig-a-ma-role.
<
There are very few patterns one needs to look for in detecting this.
>
> Kent

Kent Dickey wrote:
> In article <sm0050$109j$1@gioia.aioe.org>,
> Terje Mathisen <terje.mathisen@tmsw.no> wrote:
> [ snip ]
>> Yes, compilers do know a lot more about the nooks & crannies of cpu
>> performance than the vast majority of programmers, but OTOH, the type of
>> optimizations you have to employ to handle micro-architectural balance
>> shifts can very often be far larger than anything a compiler would be
>> allowed to do.
>>
>> A case in point: I have done several conference presentations and a
>> university guest lecture on a single program/algorithm (unix wc - word
>> count) which I've optimized for most generations of x86 cpus from 8088
>> to Pentium and PentiumPro.
>>
>> The type of optimizations needed consisted of total rewrites, using new
>> algorithms, but still achieving the same end result.
>
> I am curious what wc algorithms you used, and I wasn't able to find your
> presentation online with a little bit of Googling. If it is available,
> can you point me to it?

This code predates ubiquitous web storage/access, but if you drop me a
mail msg I'll send you the source code for the current version.

>
> I wanted a way to test the speed of some algorithms (and SSD
> performance), and wc seemed like a good choice, where counting spaces
> and lines would give me a "checksum" of sorts as well. But wc on a Mac
> is just stupidly slow, so I wrote:
>
> while(1) {
> len = reliable_read(in_fd, &(g_inbuf[0]), INBUF_SIZE);
> if(len == 0) {
> break;
> }
> inptr = &g_inbuf[0];
> for(i = 0; i < len; i++) {
> c = *inptr++;
> type = g_chartype[c];
> dcount_lines += (type & 2);
> is_new_nonspace = last_space & type;
> dcount_words += is_new_nonspace;
> last_space = (~type) & 1;
> }
> dcount_chars += len;
> }
>
> where dcount_lines count is 2x the proper value. (Yes, this code will
> fail on 2^63 newlines). reliable_read is just read() which always gets
> the required size unless EOF is hit. The definition of "words" in wc
> was not easy for me to understand, so I have probably done it
> incorrectly.

This is more or less equivalent to my first 8088 asm version, except
that I hardcoded CR/LF/TAB/Space as the only word separators, and CRLF
or LF as valid line separators.

The first speedup came from using blocked file reads instead of
single-byte OS calls, then the first algorithm improvement looked more
or less like this, as C pseudocode, disregarding block start/end handling:

while (inp < bufend) {
c = *inp++;
if (c > ' ') { // Start of a word?
wordcnt++;
do {
c = *inp++;
} while (c > ' ');
}
if (c == ' ') continue;
if (c == '\r') {
if (c = *inp != '\n') continue;
inp++;
}
if (c == '\n') {
linecnt++;
continue;
}
}

The boundary conditions were handled by having a guard (space char) past
the end of the buffer space, if this was hit then I would adjust the
wordcnt after loading the next block if the word actually continued into
this.

I also handled the case where the last line wasn't LF-terminated, by
counting it anyway.

The faster versions all use tables to classify characters, so that
things like '.,-_#:;%&/' etc can be used as word separators, this is
configurable at runtime.

If you move to a word/non-word classification bitmap, then the number of
words is of course the same as the number of times that bitmap flips
from 0 to 1, or (using your idea) half the total number of flips, plus a
possible adjustment for words starting or ending at the file ends.

My fastest current code use some form of either 3-way
(LF/separator/word) or 4-way (CR/LF/separator/word) classification of
individual bytes, paired into 9 or 16-way bundle.

At least one character worth of data from the previous pair is joined
with the current pair classification, and this (typically 5-bit) bundle
indexes into a combined increment table: Top 16 bits is the line count
and the bottom 16 bits is the word increment.

This gets added to a 32-bit combined counter which is guaranteed to be
able to handle at least 65535 bytes of input data before the combined
accumulator has to be split into separate parts and used to increment
the actual word and line counters.

In Pentium asm this logic translated into 4 instructions (load pair,
lookup pair and combine with previous state, load increment, add
increment to combined counter) which for latency/scheduling reasons were
inverted so that the load-use separation was maximized:

combined_counter += counter_increments;
counter_increments = increment_table[bx];
bl (bottom half of BX) = pair_classification_table[pair];
pair = inp_pairs[OFFS];

combined_counter += counter_increments;
counter_increments = increment_table[bx+16];
bh (top half of BX) = pair_classification_table[pair];
pair = inp_pairs[OFFS+2];

The two groups above were replicated (asm macro) 64 times so that
resulted in 256 input bytes processed as a single branchless block.

Since the code was 16-bit I needed a segment override to access the 64
KB pair_classification[] table, this increased the runtime from 1 to 1.5
clock cycles/byte processed, I did in fact get 40 MB/s on a 60 MHz Pentium.

Today I'd love to find a way to use GPGPU methods, i.e. all those table
lookups are very similar to degenerate texture accesses, so it should
easily run at full RAM speed. I would also like better utf8 support
since that is trivially easy to add by decrementing the character
counter for each intermediate utf8 byte value, effectively skipping
them. It would be _much_ harder to support arbitrary utf8 characters as
word separators!

BTW, I initially discarded the 64KB pair lookup table idea because I
believed it would cause too many cache misses (8 or 16 KB $L1 at the
time), but in reality it turned out that typical text is _very_ far from
using all possible byte pairs, or even all 7-bit pairs, so the hit rate
was close to 100%.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Kent Dickey wrote:
> In article <9be5a768-5d12-4f3b-9daa-3360c08543dbn@googlegroups.com>,
> MitchAlsup <MitchAlsup@aol.com> wrote:
>>> But with your checking that Rbase must have the proper high-order bits, this
>>> will not always work. And, it adds complexity to code generation for what
>>> I suspect is minimal hardware savings. In fact, it may be more complex to
>>> rely on the Rbase+imm to calculate the high-order bits, since then you have
>>> to do the AGEN in a certain way to get that answer.
>> <
>> The gate cost is 7-gates (in addition to the 2031 gates a 64-bit carry select
>> adder takes.)
>>>
>>> Kent
>
> Can you explain the hardware savings?
>
> The Rbase+imm64 looks like it needs a 64-bit adder to get the high bits right.
> Then, to add in Rindex (with a shift) is another 64-bit adder after that.
> But if you did the adds together, I think it's a lot less logic to do the
> partial adds, and then just do one full carry propagation.

That is how you do it, similar to (but easier than) the adder network
that follows the many partial multiplier results in a MUL.

I.e. mov rax, table[rbx + rcx*8] takes three inputs to the adder, a
single set of full adders will reduce this to an array of (carry, sum)
pairs, so (according to Mitch) this is just 2 gate delays.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Paper about ISO C

<8cf6e488-d907-4189-9d67-2b9d008bf808n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=21966&group=comp.arch#21966

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:3c9:: with SMTP id r9mr12082127qkm.297.1636720877287;
Fri, 12 Nov 2021 04:41:17 -0800 (PST)
X-Received: by 2002:a9d:82a:: with SMTP id 39mr12211623oty.282.1636720876977;
Fri, 12 Nov 2021 04:41:16 -0800 (PST)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!news.uzoreto.com!feeder1.cambriumusenet.nl!feed.tweak.nl!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 12 Nov 2021 04:41:16 -0800 (PST)
In-Reply-To: <smll01$6t1$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.182.153; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.182.153
References: <87fstdumxd.fsf@hotmail.com> <2021Nov3.092521@mips.complang.tuwien.ac.at>
<sluebv$icv$1@dont-email.me> <sm0050$109j$1@gioia.aioe.org>
<8JadnZEDxaucyRD8nZ2dnUU7-XvNnZ2d@giganews.com> <smll01$6t1$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8cf6e488-d907-4189-9d67-2b9d008bf808n@googlegroups.com>
Subject: Re: Paper about ISO C
From: already5...@yahoo.com (Michael S)
Injection-Date: Fri, 12 Nov 2021 12:41:17 +0000
Content-Type: text/plain; charset="UTF-8"

by: Michael S - Fri, 12 Nov 2021 12:41 UTC

On Friday, November 12, 2021 at 2:00:05 PM UTC+2, Terje Mathisen wrote:
> Kent Dickey wrote:
> > In article <sm0050$109j$1...@gioia.aioe.org>,
> > Terje Mathisen <terje.m...@tmsw.no> wrote:
> > [ snip ]
> >> Yes, compilers do know a lot more about the nooks & crannies of cpu
> >> performance than the vast majority of programmers, but OTOH, the type of
> >> optimizations you have to employ to handle micro-architectural balance
> >> shifts can very often be far larger than anything a compiler would be
> >> allowed to do.
> >>
> >> A case in point: I have done several conference presentations and a
> >> university guest lecture on a single program/algorithm (unix wc - word
> >> count) which I've optimized for most generations of x86 cpus from 8088
> >> to Pentium and PentiumPro.
> >>
> >> The type of optimizations needed consisted of total rewrites, using new
> >> algorithms, but still achieving the same end result.
> >
> > I am curious what wc algorithms you used, and I wasn't able to find your
> > presentation online with a little bit of Googling. If it is available,
> > can you point me to it?
> This code predates ubiquitous web storage/access, but if you drop me a
> mail msg I'll send you the source code for the current version.

Is there a good reason to not store source code on public github account?
Or on one of alternatives if gethub is ideologically unacceptable?

> >
> > I wanted a way to test the speed of some algorithms (and SSD
> > performance), and wc seemed like a good choice, where counting spaces
> > and lines would give me a "checksum" of sorts as well. But wc on a Mac
> > is just stupidly slow, so I wrote:
> >
> > while(1) {
> > len = reliable_read(in_fd, &(g_inbuf[0]), INBUF_SIZE);
> > if(len == 0) {
> > break;
> > }
> > inptr = &g_inbuf[0];
> > for(i = 0; i < len; i++) {
> > c = *inptr++;
> > type = g_chartype[c];
> > dcount_lines += (type & 2);
> > is_new_nonspace = last_space & type;
> > dcount_words += is_new_nonspace;
> > last_space = (~type) & 1;
> > }
> > dcount_chars += len;
> > }
> >
> > where dcount_lines count is 2x the proper value. (Yes, this code will
> > fail on 2^63 newlines). reliable_read is just read() which always gets
> > the required size unless EOF is hit. The definition of "words" in wc
> > was not easy for me to understand, so I have probably done it
> > incorrectly.
> This is more or less equivalent to my first 8088 asm version, except
> that I hardcoded CR/LF/TAB/Space as the only word separators, and CRLF
> or LF as valid line separators.
>
> The first speedup came from using blocked file reads instead of
> single-byte OS calls, then the first algorithm improvement looked more
> or less like this, as C pseudocode, disregarding block start/end handling:
>
> while (inp < bufend) {
> c = *inp++;
> if (c > ' ') { // Start of a word?
> wordcnt++;
> do {
> c = *inp++;
> } while (c > ' ');
> }
> if (c == ' ') continue;
> if (c == '\r') {
> if (c = *inp != '\n') continue;
> inp++;
> }
> if (c == '\n') {
> linecnt++;
> continue;
> }
> }
>
> The boundary conditions were handled by having a guard (space char) past
> the end of the buffer space, if this was hit then I would adjust the
> wordcnt after loading the next block if the word actually continued into
> this.
>
> I also handled the case where the last line wasn't LF-terminated, by
> counting it anyway.
>
> The faster versions all use tables to classify characters, so that
> things like '.,-_#:;%&/' etc can be used as word separators, this is
> configurable at runtime.
>
> If you move to a word/non-word classification bitmap, then the number of
> words is of course the same as the number of times that bitmap flips
> from 0 to 1, or (using your idea) half the total number of flips, plus a
> possible adjustment for words starting or ending at the file ends.
>
> My fastest current code use some form of either 3-way
> (LF/separator/word) or 4-way (CR/LF/separator/word) classification of
> individual bytes, paired into 9 or 16-way bundle.
>
> At least one character worth of data from the previous pair is joined
> with the current pair classification, and this (typically 5-bit) bundle
> indexes into a combined increment table: Top 16 bits is the line count
> and the bottom 16 bits is the word increment.
>
> This gets added to a 32-bit combined counter which is guaranteed to be
> able to handle at least 65535 bytes of input data before the combined
> accumulator has to be split into separate parts and used to increment
> the actual word and line counters.
>
> In Pentium asm this logic translated into 4 instructions (load pair,
> lookup pair and combine with previous state, load increment, add
> increment to combined counter) which for latency/scheduling reasons were
> inverted so that the load-use separation was maximized:
>
> combined_counter += counter_increments;
> counter_increments = increment_table[bx];
> bl (bottom half of BX) = pair_classification_table[pair];
> pair = inp_pairs[OFFS];
>
> combined_counter += counter_increments;
> counter_increments = increment_table[bx+16];
> bh (top half of BX) = pair_classification_table[pair];
> pair = inp_pairs[OFFS+2];
>
> The two groups above were replicated (asm macro) 64 times so that
> resulted in 256 input bytes processed as a single branchless block.
>
> Since the code was 16-bit I needed a segment override to access the 64
> KB pair_classification[] table, this increased the runtime from 1 to 1.5
> clock cycles/byte processed, I did in fact get 40 MB/s on a 60 MHz Pentium.
>
> Today I'd love to find a way to use GPGPU methods, i.e. all those table
> lookups are very similar to degenerate texture accesses, so it should
> easily run at full RAM speed. I would also like better utf8 support
> since that is trivially easy to add by decrementing the character
> counter for each intermediate utf8 byte value, effectively skipping
> them. It would be _much_ harder to support arbitrary utf8 characters as
> word separators!
>
> BTW, I initially discarded the 64KB pair lookup table idea because I
> believed it would cause too many cache misses (8 or 16 KB $L1 at the
> time), but in reality it turned out that typical text is _very_ far from
> using all possible byte pairs, or even all 7-bit pairs, so the hit rate
> was close to 100%.
>
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Michael S wrote:
> On Friday, November 12, 2021 at 2:00:05 PM UTC+2, Terje Mathisen wrote:
>> Kent Dickey wrote:
>>> In article <sm0050$109j$1...@gioia.aioe.org>,
>>> Terje Mathisen <terje.m...@tmsw.no> wrote:
>>> [ snip ]
>>>> Yes, compilers do know a lot more about the nooks & crannies of cpu
>>>> performance than the vast majority of programmers, but OTOH, the type of
>>>> optimizations you have to employ to handle micro-architectural balance
>>>> shifts can very often be far larger than anything a compiler would be
>>>> allowed to do.
>>>>
>>>> A case in point: I have done several conference presentations and a
>>>> university guest lecture on a single program/algorithm (unix wc - word
>>>> count) which I've optimized for most generations of x86 cpus from 8088
>>>> to Pentium and PentiumPro.
>>>>
>>>> The type of optimizations needed consisted of total rewrites, using new
>>>> algorithms, but still achieving the same end result.
>>>
>>> I am curious what wc algorithms you used, and I wasn't able to find your
>>> presentation online with a little bit of Googling. If it is available,
>>> can you point me to it?
>> This code predates ubiquitous web storage/access, but if you drop me a
>> mail msg I'll send you the source code for the current version.
>
> Is there a good reason to not store source code on public github account?

Not today, no.

However, by around 1987 I had already written 18 MB of Pascal, 3MB of
asm and 1MB of C, then I just went on from there.

Trying to locate all potentially interesting program source files and
creating a github location for all of them is just a _lot_ of work. :-(

All my current professional programming use a company github repository.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Paper about ISO C

<462c6d6a-31eb-4302-902a-5c1b47a951a5n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=21972&group=comp.arch#21972

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:346:: with SMTP id r6mr17982356qtw.185.1636739431308;
Fri, 12 Nov 2021 09:50:31 -0800 (PST)
X-Received: by 2002:a05:6830:1445:: with SMTP id w5mr14145916otp.112.1636739431057;
Fri, 12 Nov 2021 09:50:31 -0800 (PST)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 12 Nov 2021 09:50:30 -0800 (PST)
In-Reply-To: <smllgf$e77$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <87fstdumxd.fsf@hotmail.com> <8fe79286-c374-4129-b2a2-cb93099e9448n@googlegroups.com>
<5NidnT3IZ9Ji0RD8nZ2dnUU7-T_NnZ2d@giganews.com> <9be5a768-5d12-4f3b-9daa-3360c08543dbn@googlegroups.com>
<_LSdnePfso4DDxD8nZ2dnUU7-KOdnZ2d@giganews.com> <smllgf$e77$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <462c6d6a-31eb-4302-902a-5c1b47a951a5n@googlegroups.com>
Subject: Re: Paper about ISO C
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 12 Nov 2021 17:50:31 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3315

by: MitchAlsup - Fri, 12 Nov 2021 17:50 UTC

On Friday, November 12, 2021 at 6:08:49 AM UTC-6, Terje Mathisen wrote:
> Kent Dickey wrote:
> > In article <9be5a768-5d12-4f3b...@googlegroups.com>,
> > MitchAlsup <Mitch...@aol.com> wrote:
> >>> But with your checking that Rbase must have the proper high-order bits, this
> >>> will not always work. And, it adds complexity to code generation for what
> >>> I suspect is minimal hardware savings. In fact, it may be more complex to
> >>> rely on the Rbase+imm to calculate the high-order bits, since then you have
> >>> to do the AGEN in a certain way to get that answer.
> >> <
> >> The gate cost is 7-gates (in addition to the 2031 gates a 64-bit carry select
> >> adder takes.)
> >>>
> >>> Kent
> >
> > Can you explain the hardware savings?
> >
> > The Rbase+imm64 looks like it needs a 64-bit adder to get the high bits right.
> > Then, to add in Rindex (with a shift) is another 64-bit adder after that.
> > But if you did the adds together, I think it's a lot less logic to do the
> > partial adds, and then just do one full carry propagation.
> That is how you do it, similar to (but easier than) the adder network
> that follows the many partial multiplier results in a MUL.
>
> I.e. mov rax, table[rbx + rcx*8] takes three inputs to the adder, a
> single set of full adders will reduce this to an array of (carry, sum)
> pairs, so (according to Mitch) this is just 2 gate delays.
<
1 gate per full adder
2 gates per 4-2 compressor.
<
That is you get 3-inputs for single gate of additional delay
and you get 4-inputs for 2 extra gates of delay
From here it goes
6-inputs 3-gates
8-inputs 4-gates
12-inputs 5-gates
16-inputs 6-gates
(and here we are at ½ the delay of the carry performing adder.)
<
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: Paper about ISO C

<86pmr49hhu.fsf@linuxsc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=21975&group=comp.arch#21975

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.arch
Subject: Re: Paper about ISO C
Date: Sat, 13 Nov 2021 07:56:29 -0800
Organization: A noiseless patient Spider
Lines: 105
Message-ID: <86pmr49hhu.fsf@linuxsc.com>
References: <87fstdumxd.fsf@hotmail.com> <sjugcv$jio$1@dont-email.me> <cb6bbb41-398f-4e2a-9a19-08bc4582b291n@googlegroups.com> <sk437c$672$1@dont-email.me> <jwvee8qgxk0.fsf-monnier+comp.arch@gnu.org> <2021Oct12.185057@mips.complang.tuwien.ac.at> <jwvzgre88y4.fsf-monnier+comp.arch@gnu.org> <5f97b29e-e958-49e2-bb1c-c0e9870f9c2bn@googlegroups.com> <sku3dr$1hb$2@dont-email.me> <5d25afd4-0e2c-4e98-a457-a04be5ae88dbn@googlegroups.com> <sl3c2g$n4b$1@dont-email.me> <2021Oct25.195829@mips.complang.tuwien.ac.at> <sl79bl$jei$1@dont-email.me> <itpou1Fa8stU1@mid.individual.net> <86zgqvfe7h.fsf@linuxsc.com> <itsb76FpbmnU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: reader02.eternal-september.org; posting-host="11a0868cb825ac3c7403c6cd6d1f76c2";
logging-data="29978"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+JGnX+aQnJvJ0NMPbImkRcA6h12OxUbPE="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:3kTJQdOQCB6GyWLKoBvXx2JM3dg=
sha1:KzotTchLZZ4lDtXRX43PjFAVNMg=

by: Tim Rentsch - Sat, 13 Nov 2021 15:56 UTC

Niklas Holsti <niklas.holsti@tidorum.invalid> writes:

> On 2021-10-27 4:28, Tim Rentsch wrote:
>
>> Niklas Holsti <niklas.holsti@tidorum.invalid> writes:
>>
>>>> [.. volatile ..]
>>>
>>> These discussions of volatile, and execution order wrt timing,
>>> gave me an idea: perhaps C (and other languages) should allow
>>> marking functions (subprograms) as "volatile", with the meaning
>>> that all of the effects of a call of that function (including use
>>> of processor time) should be ordered as volatile accesses are
>>> ordered, with respect to other volatile accesses.
>>>
>>> For example, if x and y are volatile variables, and foo is a
>>> volatile function, then in this code
>>>
>>> x = 1;
>>> foo ();
>>> y = 1;
>>>
>>> we would be sure that all effects and all dynamic resource usage
>>> of the foo() call would occur between the assignments to x and to
>>> y.
>>>
>>> A more flexible approach would be to mark selected function calls
>>> as volatile, in the same way that C allows in-line use of
>>> pointer-to-volatile to force a volatile access to an object that
>>> is not itself marked volatile. Something like:
>>>
>>> x = 1;
>>> (volatile) foo ();
>>> y = 1;
>>>
>>> Are volatile functions and/or volatile function calls a good idea?
>>
>> Let me propose a simpler mechanism that I believe does a better
>> job of what (I think) it is you want to do. By way of example:
>>
>> * (_Volatile int *) &x = 1;
>> foo();
>> * (_Volatile int *) &y = 1;
>>
>> The semantics of the new _Volatile qualifier, speaking
>> informally, is that it imposes a sequence point in the actual
>> machine, not just in the abstract machine. So all logically
>> previous evaluations must be finished before a volatile access,
>> and after a volatile access all logically subsequent evaluations
>> must not yet be started. To say that another way, no expression
>> evaluation (including side effects) may be "moved across" a
>> read or write to a _Volatile object.
>>
>> Note that foo() is a call to an ordinary function, and expressions
>> in foo() may be re-ordered in all the usual ways, except that they
>> must not be "moved across" the assignment to x or the assignment
>> to y.
>
> That is exactly the semantics I intended, as I described in my
> response to David Brown. So we are creating the same functionality
> but using different source-code mechanisms - my suggestion marks the
> call (or the function), and interacts with the existing "volatile"
> mechanism, while your suggestion defines a new and stronger
> "_Volatile" access.

Part of what motivated my proposal is I think _Volatile is useful
all by itself. When people use volatile, I think in many cases
what they expect, and also what is actually needed, is something
closer to _Volatile than it is to volatile. If indeed _Volatile
is useful in its own right, then adding _Volatile to the language
gets both benefits, so that seems like a win.

> I suppose only compiler implementors can tell us which of the
> two is easier to implement.

I've been doing some thinking on this question, and I'm pretty
sure that what it take to implement them is basically the same
for the two approaches.

> Your suggestion has the benefit that the "immovable" code is not
> necessarily a function call, as in my proposal. However, I had in
> mind an extension to let any code block be defined as immovable in
> this sense, perhaps as
>
> (volatile) { some code ... };
>
> which would let the programmer use a local encapsulation of the
> immovable code, without defining a function just for that purpose.

I can't help feeling that introducing a special form just for
this purpose is a red flag that we are going down a bad path.
But let me ask a question. I'm not sure what problem motivated
your original suggestion. What problem is it that you want to
solve, where having some mechanism like the ones we have been
discussing would help solve it? Or is it just an idea that
occurred to you, without any particular use case where it would
be helpful?

Tangential note re: comments by David Brown. For several years
now I've been following a policy of not reading postings from
David Brown. Back when I was reading them I found his comments
were mostly thoughtless and arrogant. I did glance over your
reply to his posting; I confess though I didn't give it much
attention after seeing what the proposed alternative was.

Re: Paper about ISO C

<ivaf0tFkrrnU1@mid.individual.net>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=21978&group=comp.arch#21978

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch
Subject: Re: Paper about ISO C
Date: Sat, 13 Nov 2021 20:38:20 +0200
Organization: Tidorum Ltd
Lines: 136
Message-ID: <ivaf0tFkrrnU1@mid.individual.net>
References: <87fstdumxd.fsf@hotmail.com> <sjugcv$jio$1@dont-email.me>
<cb6bbb41-398f-4e2a-9a19-08bc4582b291n@googlegroups.com>
<sk437c$672$1@dont-email.me> <jwvee8qgxk0.fsf-monnier+comp.arch@gnu.org>
<2021Oct12.185057@mips.complang.tuwien.ac.at>
<jwvzgre88y4.fsf-monnier+comp.arch@gnu.org>
<5f97b29e-e958-49e2-bb1c-c0e9870f9c2bn@googlegroups.com>
<sku3dr$1hb$2@dont-email.me>
<5d25afd4-0e2c-4e98-a457-a04be5ae88dbn@googlegroups.com>
<sl3c2g$n4b$1@dont-email.me> <2021Oct25.195829@mips.complang.tuwien.ac.at>
<sl79bl$jei$1@dont-email.me> <itpou1Fa8stU1@mid.individual.net>
<86zgqvfe7h.fsf@linuxsc.com> <itsb76FpbmnU1@mid.individual.net>
<86pmr49hhu.fsf@linuxsc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net I7BAC9a3d9Y1X71LaYrc5wEblIfF7H8kDcJv9GoksANn7kBUvv
Cancel-Lock: sha1:SByJ/Y+EPqL6/twDb22n73BF1yM=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.14.0
In-Reply-To: <86pmr49hhu.fsf@linuxsc.com>
Content-Language: en-US

by: Niklas Holsti - Sat, 13 Nov 2021 18:38 UTC

(I'm keeping a lot of the context because it has been quite a while
since the discussion that Tim resumed.)

On 2021-11-13 17:56, Tim Rentsch wrote:
> Niklas Holsti <niklas.holsti@tidorum.invalid> writes:
>
>> On 2021-10-27 4:28, Tim Rentsch wrote:
>>
>>> Niklas Holsti <niklas.holsti@tidorum.invalid> writes:
>>>
>>>>> [.. volatile ..]
>>>>
>>>> These discussions of volatile, and execution order wrt timing,
>>>> gave me an idea: perhaps C (and other languages) should allow
>>>> marking functions (subprograms) as "volatile", with the meaning
>>>> that all of the effects of a call of that function (including use
>>>> of processor time) should be ordered as volatile accesses are
>>>> ordered, with respect to other volatile accesses.
>>>>
>>>> For example, if x and y are volatile variables, and foo is a
>>>> volatile function, then in this code
>>>>
>>>> x = 1;
>>>> foo ();
>>>> y = 1;
>>>>
>>>> we would be sure that all effects and all dynamic resource usage
>>>> of the foo() call would occur between the assignments to x and to
>>>> y.
>>>>
>>>> A more flexible approach would be to mark selected function calls
>>>> as volatile, in the same way that C allows in-line use of
>>>> pointer-to-volatile to force a volatile access to an object that
>>>> is not itself marked volatile. Something like:
>>>>
>>>> x = 1;
>>>> (volatile) foo ();
>>>> y = 1;
>>>>
>>>> Are volatile functions and/or volatile function calls a good idea?
>>>
>>> Let me propose a simpler mechanism that I believe does a better
>>> job of what (I think) it is you want to do. By way of example:
>>>
>>> * (_Volatile int *) &x = 1;
>>> foo();
>>> * (_Volatile int *) &y = 1;
>>>
>>> The semantics of the new _Volatile qualifier, speaking
>>> informally, is that it imposes a sequence point in the actual
>>> machine, not just in the abstract machine. So all logically
>>> previous evaluations must be finished before a volatile access,
>>> and after a volatile access all logically subsequent evaluations
>>> must not yet be started. To say that another way, no expression
>>> evaluation (including side effects) may be "moved across" a
>>> read or write to a _Volatile object.
>>>
>>> Note that foo() is a call to an ordinary function, and expressions
>>> in foo() may be re-ordered in all the usual ways, except that they
>>> must not be "moved across" the assignment to x or the assignment
>>> to y.
>>
>> That is exactly the semantics I intended, as I described in my
>> response to David Brown. So we are creating the same functionality
>> but using different source-code mechanisms - my suggestion marks the
>> call (or the function), and interacts with the existing "volatile"
>> mechanism, while your suggestion defines a new and stronger
>> "_Volatile" access.
>
> Part of what motivated my proposal is I think _Volatile is useful
> all by itself. When people use volatile, I think in many cases
> what they expect, and also what is actually needed, is something
> closer to _Volatile than it is to volatile. If indeed _Volatile
> is useful in its own right, then adding _Volatile to the language
> gets both benefits, so that seems like a win.
>
>> I suppose only compiler implementors can tell us which of the
>> two is easier to implement.
>
> I've been doing some thinking on this question, and I'm pretty
> sure that what it take to implement them is basically the same
> for the two approaches.
>
>> Your suggestion has the benefit that the "immovable" code is not
>> necessarily a function call, as in my proposal. However, I had in
>> mind an extension to let any code block be defined as immovable in
>> this sense, perhaps as
>>
>> (volatile) { some code ... };
>>
>> which would let the programmer use a local encapsulation of the
>> immovable code, without defining a function just for that purpose.
>
> I can't help feeling that introducing a special form just for
> this purpose is a red flag that we are going down a bad path.

Well, blocks { ... } exist already, and if the preceding suggestion of
"(volatile) foo()" is implemented, the block form does not seem to be
much of an extension.

> But let me ask a question. I'm not sure what problem motivated
> your original suggestion. What problem is it that you want to
> solve, where having some mechanism like the ones we have been
> discussing would help solve it?

My suggestion was a response to some posts about controlling
(minimizing) the time elapsed between two volatile accesses. Something
like the following, where x and y are volatile, but z and q not:

z = (some long computation);
q = x;
y = z + q;

The problem was that some compiler moved the long computation after the
reading of x, thus greatly increasing the delay between the reading of x
and the assignment to y. That code movement is of course now allowed,
because the computation is not "volatile", nor is the assignment to z.

A work-around for the above example would be to make z volatile too, but
that is not as direct as saying that the computation shall not be moved
past volatile accesses.

> Tangential note re: comments by David Brown. For several years
> now I've been following a policy of not reading postings from
> David Brown. Back when I was reading them I found his comments
> were mostly thoughtless and arrogant.

I don't find them so. David does usually give his opinions strongly and
without softening, but they are usually valid opinions (meaning that I
mostly agree with them).

Re: Paper about ISO C

<d5b22f64-83b1-4a6e-9638-0c7171c01dccn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=21980&group=comp.arch#21980

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:56a4:: with SMTP id bd4mr24180450qvb.16.1636831448637;
Sat, 13 Nov 2021 11:24:08 -0800 (PST)
X-Received: by 2002:a9d:5f15:: with SMTP id f21mr20478684oti.331.1636831448425;
Sat, 13 Nov 2021 11:24:08 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 13 Nov 2021 11:24:08 -0800 (PST)
In-Reply-To: <ivaf0tFkrrnU1@mid.individual.net>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <87fstdumxd.fsf@hotmail.com> <sjugcv$jio$1@dont-email.me>
<cb6bbb41-398f-4e2a-9a19-08bc4582b291n@googlegroups.com> <sk437c$672$1@dont-email.me>
<jwvee8qgxk0.fsf-monnier+comp.arch@gnu.org> <2021Oct12.185057@mips.complang.tuwien.ac.at>
<jwvzgre88y4.fsf-monnier+comp.arch@gnu.org> <5f97b29e-e958-49e2-bb1c-c0e9870f9c2bn@googlegroups.com>
<sku3dr$1hb$2@dont-email.me> <5d25afd4-0e2c-4e98-a457-a04be5ae88dbn@googlegroups.com>
<sl3c2g$n4b$1@dont-email.me> <2021Oct25.195829@mips.complang.tuwien.ac.at>
<sl79bl$jei$1@dont-email.me> <itpou1Fa8stU1@mid.individual.net>
<86zgqvfe7h.fsf@linuxsc.com> <itsb76FpbmnU1@mid.individual.net>
<86pmr49hhu.fsf@linuxsc.com> <ivaf0tFkrrnU1@mid.individual.net>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d5b22f64-83b1-4a6e-9638-0c7171c01dccn@googlegroups.com>
Subject: Re: Paper about ISO C
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 13 Nov 2021 19:24:08 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Sat, 13 Nov 2021 19:24 UTC

On Saturday, November 13, 2021 at 12:38:24 PM UTC-6, Niklas Holsti wrote:
> (I'm keeping a lot of the context because it has been quite a while
> since the discussion that Tim resumed.)
> On 2021-11-13 17:56, Tim Rentsch wrote:
> > Niklas Holsti <niklas...@tidorum.invalid> writes:
> >
> >> On 2021-10-27 4:28, Tim Rentsch wrote:
> >>
> >>> Niklas Holsti <niklas...@tidorum.invalid> writes:
> >>>
> >>>>> [.. volatile ..]
> >>>>
> >>>> These discussions of volatile, and execution order wrt timing,
> >>>> gave me an idea: perhaps C (and other languages) should allow
> >>>> marking functions (subprograms) as "volatile", with the meaning
> >>>> that all of the effects of a call of that function (including use
> >>>> of processor time) should be ordered as volatile accesses are
> >>>> ordered, with respect to other volatile accesses.
> >>>>
> >>>> For example, if x and y are volatile variables, and foo is a
> >>>> volatile function, then in this code
> >>>>
> >>>> x = 1;
> >>>> foo ();
> >>>> y = 1;
> >>>>
> >>>> we would be sure that all effects and all dynamic resource usage
> >>>> of the foo() call would occur between the assignments to x and to
> >>>> y.
> >>>>
> >>>> A more flexible approach would be to mark selected function calls
> >>>> as volatile, in the same way that C allows in-line use of
> >>>> pointer-to-volatile to force a volatile access to an object that
> >>>> is not itself marked volatile. Something like:
> >>>>
> >>>> x = 1;
> >>>> (volatile) foo ();
> >>>> y = 1;
> >>>>
> >>>> Are volatile functions and/or volatile function calls a good idea?
> >>>
> >>> Let me propose a simpler mechanism that I believe does a better
> >>> job of what (I think) it is you want to do. By way of example:
> >>>
> >>> * (_Volatile int *) &x = 1;
> >>> foo();
> >>> * (_Volatile int *) &y = 1;
> >>>
> >>> The semantics of the new _Volatile qualifier, speaking
> >>> informally, is that it imposes a sequence point in the actual
> >>> machine, not just in the abstract machine. So all logically
> >>> previous evaluations must be finished before a volatile access,
> >>> and after a volatile access all logically subsequent evaluations
> >>> must not yet be started. To say that another way, no expression
> >>> evaluation (including side effects) may be "moved across" a
> >>> read or write to a _Volatile object.
> >>>
> >>> Note that foo() is a call to an ordinary function, and expressions
> >>> in foo() may be re-ordered in all the usual ways, except that they
> >>> must not be "moved across" the assignment to x or the assignment
> >>> to y.
> >>
> >> That is exactly the semantics I intended, as I described in my
> >> response to David Brown. So we are creating the same functionality
> >> but using different source-code mechanisms - my suggestion marks the
> >> call (or the function), and interacts with the existing "volatile"
> >> mechanism, while your suggestion defines a new and stronger
> >> "_Volatile" access.
> >
> > Part of what motivated my proposal is I think _Volatile is useful
> > all by itself. When people use volatile, I think in many cases
> > what they expect, and also what is actually needed, is something
> > closer to _Volatile than it is to volatile. If indeed _Volatile
> > is useful in its own right, then adding _Volatile to the language
> > gets both benefits, so that seems like a win.
> >
> >> I suppose only compiler implementors can tell us which of the
> >> two is easier to implement.
> >
> > I've been doing some thinking on this question, and I'm pretty
> > sure that what it take to implement them is basically the same
> > for the two approaches.
> >
> >> Your suggestion has the benefit that the "immovable" code is not
> >> necessarily a function call, as in my proposal. However, I had in
> >> mind an extension to let any code block be defined as immovable in
> >> this sense, perhaps as
> >>
> >> (volatile) { some code ... };
> >>
> >> which would let the programmer use a local encapsulation of the
> >> immovable code, without defining a function just for that purpose.
> >
> > I can't help feeling that introducing a special form just for
> > this purpose is a red flag that we are going down a bad path.
> Well, blocks { ... } exist already, and if the preceding suggestion of
> "(volatile) foo()" is implemented, the block form does not seem to be
> much of an extension.
<
My question is:: in a (volatile) { block } does EVERY memory access
take on the volatile moniker ? does the { block } take on the moniker
and if so exactly what does that mean ? is no memory reordering
allowed whatsoever ?
<
{I am pretty sure I have no idea as to what (volatile) foo(); means;
especially if foo() has already been compiled and placed in a library.
Do you reach into the library and then attach volatile to every access?
But perhaps one simply means that one cannot put a volatile in a
location other than its resting memory location while the foo() is
active? !?!}
<
> > But let me ask a question. I'm not sure what problem motivated
> > your original suggestion. What problem is it that you want to
> > solve, where having some mechanism like the ones we have been
> > discussing would help solve it?
<
> My suggestion was a response to some posts about controlling
> (minimizing) the time elapsed between two volatile accesses. Something
> like the following, where x and y are volatile, but z and q not:
>
> z = (some long computation);
> q = x;
> y = z + q;
>
> The problem was that some compiler moved the long computation after the
> reading of x, thus greatly increasing the delay between the reading of x
> and the assignment to y. That code movement is of course now allowed,
> because the computation is not "volatile", nor is the assignment to z.
<
Volatile was invented for those variables (like memory mapped I/O registers)
that have to be read/written exactly the same number of times they are mentioned
in the source code. These kinds of memory locations change their values upon
being read, and do not necessarily have the last value written.
<
Now, it looks like you are suggesting that volatile has assumed another purpose:
that of scheduling access to potentially shared variables in shared memory.
<
But in any event:: why was the code NOT written::
z = (some long computation);
y = z + x;
<
Or even:
y = (some long calculation) + x;
<
The assignment of x into q, it seems to me, indicates the programmer wanted to
separate the volatile access from the calculation. And thus, the compiler should
have the mentioned freedom!
<
>
> A work-around for the above example would be to make z volatile too, but
> that is not as direct as saying that the computation shall not be moved
> past volatile accesses.
<
Making z volatile requires its assigned location be memory and not be allocated
to a register. A register retains the value that was last written and returns same
upon being read; and thus has none of the properties one wants to coin to cause
the volatile moniker to be attached.
<
Not sure you want to go that far.
<
> > Tangential note re: comments by David Brown. For several years
> > now I've been following a policy of not reading postings from
> > David Brown. Back when I was reading them I found his comments
> > were mostly thoughtless and arrogant.
> I don't find them so. David does usually give his opinions strongly and
> without softening, but they are usually valid opinions (meaning that I
> mostly agree with them).

Re: Paper about ISO C

<jwvr1bj6dfl.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=21983&group=comp.arch#21983

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Paper about ISO C
Date: Sat, 13 Nov 2021 14:56:56 -0500
Organization: A noiseless patient Spider
Lines: 14
Message-ID: <jwvr1bj6dfl.fsf-monnier+comp.arch@gnu.org>
References: <87fstdumxd.fsf@hotmail.com> <sjugcv$jio$1@dont-email.me>
<cb6bbb41-398f-4e2a-9a19-08bc4582b291n@googlegroups.com>
<sk437c$672$1@dont-email.me>
<jwvee8qgxk0.fsf-monnier+comp.arch@gnu.org>
<2021Oct12.185057@mips.complang.tuwien.ac.at>
<jwvzgre88y4.fsf-monnier+comp.arch@gnu.org>
<5f97b29e-e958-49e2-bb1c-c0e9870f9c2bn@googlegroups.com>
<sku3dr$1hb$2@dont-email.me>
<5d25afd4-0e2c-4e98-a457-a04be5ae88dbn@googlegroups.com>
<sl3c2g$n4b$1@dont-email.me>
<2021Oct25.195829@mips.complang.tuwien.ac.at>
<sl79bl$jei$1@dont-email.me> <itpou1Fa8stU1@mid.individual.net>
<86zgqvfe7h.fsf@linuxsc.com> <itsb76FpbmnU1@mid.individual.net>
<86pmr49hhu.fsf@linuxsc.com> <ivaf0tFkrrnU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="333312bd0231073d53f74dc3830acb03";
logging-data="26945"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/EgbxaTPAABJCj19AZGdmW"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)
Cancel-Lock: sha1:kh+23ZrrK7jnIET3a3ew5pClNg4=
sha1:W+Fke3z8FXVQj/h5fx1zMeHKSlc=

by: Stefan Monnier - Sat, 13 Nov 2021 19:56 UTC

> My suggestion was a response to some posts about controlling (minimizing)
> the time elapsed between two volatile accesses. Something like the
> following, where x and y are volatile, but z and q not:

I think the fact that it accessed volatile vars was rather accidental,
and (ab)using a notion of volatility for that would probably not be
a good idea.

I think for the original problem, what the programmer wants is to be
able to label a chunk of the code (e.g. a block) as being "performance
sensitive", which would tell the compiler not to move code *into* it.

Stefan

Re: Paper about ISO C

<ivaju8Flp1uU1@mid.individual.net>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=21984&group=comp.arch#21984

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch
Subject: Re: Paper about ISO C
Date: Sat, 13 Nov 2021 22:02:15 +0200
Organization: Tidorum Ltd
Lines: 106
Message-ID: <ivaju8Flp1uU1@mid.individual.net>
References: <87fstdumxd.fsf@hotmail.com> <sjugcv$jio$1@dont-email.me>
<cb6bbb41-398f-4e2a-9a19-08bc4582b291n@googlegroups.com>
<sk437c$672$1@dont-email.me> <jwvee8qgxk0.fsf-monnier+comp.arch@gnu.org>
<2021Oct12.185057@mips.complang.tuwien.ac.at>
<jwvzgre88y4.fsf-monnier+comp.arch@gnu.org>
<5f97b29e-e958-49e2-bb1c-c0e9870f9c2bn@googlegroups.com>
<sku3dr$1hb$2@dont-email.me>
<5d25afd4-0e2c-4e98-a457-a04be5ae88dbn@googlegroups.com>
<sl3c2g$n4b$1@dont-email.me> <2021Oct25.195829@mips.complang.tuwien.ac.at>
<sl79bl$jei$1@dont-email.me> <itpou1Fa8stU1@mid.individual.net>
<86zgqvfe7h.fsf@linuxsc.com> <itsb76FpbmnU1@mid.individual.net>
<86pmr49hhu.fsf@linuxsc.com> <ivaf0tFkrrnU1@mid.individual.net>
<d5b22f64-83b1-4a6e-9638-0c7171c01dccn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net gyf3bemFz4YMAT+UGK2wTgj/kCwiM7UTsdi3Q3MhblqdRlPLBk
Cancel-Lock: sha1:RYYr0Xefokj2whk64FizmemhGsc=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.14.0
In-Reply-To: <d5b22f64-83b1-4a6e-9638-0c7171c01dccn@googlegroups.com>
Content-Language: en-US

by: Niklas Holsti - Sat, 13 Nov 2021 20:02 UTC

On 2021-11-13 21:24, MitchAlsup wrote:
> On Saturday, November 13, 2021 at 12:38:24 PM UTC-6, Niklas Holsti wrote:

[snip]

>>>>> Niklas Holsti <niklas...@tidorum.invalid> writes:
>>>>>
>>>>>>> [.. volatile ..]
>>>>>>
>>>>>> These discussions of volatile, and execution order wrt timing,
>>>>>> gave me an idea: perhaps C (and other languages) should allow
>>>>>> marking functions (subprograms) as "volatile", with the meaning
>>>>>> that all of the effects of a call of that function (including use
>>>>>> of processor time) should be ordered as volatile accesses are
>>>>>> ordered, with respect to other volatile accesses.
>>>>>>
>>>>>> For example, if x and y are volatile variables, and foo is a
>>>>>> volatile function, then in this code
>>>>>>
>>>>>> x = 1;
>>>>>> foo ();
>>>>>> y = 1;
>>>>>>
>>>>>> we would be sure that all effects and all dynamic resource usage
>>>>>> of the foo() call would occur between the assignments to x and to
>>>>>> y.
>>>>>>
>>>>>> A more flexible approach would be to mark selected function calls
>>>>>> as volatile, in the same way that C allows in-line use of
>>>>>> pointer-to-volatile to force a volatile access to an object that
>>>>>> is not itself marked volatile. Something like:
>>>>>>
>>>>>> x = 1;
>>>>>> (volatile) foo ();
>>>>>> y = 1;

[snip]

>>>> [or also] to let any code block be defined as immovable in
>>>> this sense, perhaps as
>>>>
>>>> (volatile) { some code ... };
>>>>
>>>> which would let the programmer use a local encapsulation of the
>>>> immovable code, without defining a function just for that purpose.

>
> My question is:: in a (volatile) { block } does EVERY memory access
> take on the volatile moniker ? does the { block } take on the moniker
> and if so exactly what does that mean ? is no memory reordering
> allowed whatsoever ?

The intent was that all of the processing, whether memory accesses or
not, from the volatile function (call) or volatile block would not be
movable over any volatile access.

David Browns suggestion of just marking an "execution barrier" in the
code is more direct, and more powerful in the sense that it would allow
even less code movement.

> Volatile was invented for those variables (like memory mapped I/O
> registers) that have to be read/written exactly the same number of
> times they are mentioned in the source code. These kinds of memory
> locations change their values upon being read, and do not necessarily
> have the last value written.

Yes, but the timing of accesses to volatile variables is often important
too. The example problem that I described in my preceding posting was
apparently such a case, where the compiler's code movement increased the
timing jitter and/or delay between two volatile accesses.

> Now, it looks like you are suggesting that volatile has assumed
> another purpose: that of scheduling access to potentially shared
> variables in shared memory.

Not at all. I don't see how that misunderstanding arose, but of course I
apologize if I was unclear.

However, if the program is multi-threaded, the suggested volatile
functions, volatile calls and volatile blocks are probably not useful in
any thread that can be preempted for significant durations. In such
systems, the suggested mechanisms would be useful only in interrupt
handlers and regions that are non-preemptible or where the maximum
duration of a preemption is short and can be tolerated in the timing of
volatile accesses.

> <
> But in any event:: why was the code NOT written::
> z = (some long computation);
> y = z + x;
> <
> Or even:
> y = (some long calculation) + x;

The same undesired code reordering could happen in either of those
forms, so they are equivalent to the form I gave.

Re: Paper about ISO C

<03c20ef7-aad9-41a5-9318-7fe5e127e80en@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=21985&group=comp.arch#21985

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:4111:: with SMTP id q17mr26972430qtl.407.1636834076201;
Sat, 13 Nov 2021 12:07:56 -0800 (PST)
X-Received: by 2002:a05:6808:14c3:: with SMTP id f3mr16055268oiw.51.1636834076004;
Sat, 13 Nov 2021 12:07:56 -0800 (PST)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!2.eu.feeder.erje.net!feeder.erje.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 13 Nov 2021 12:07:55 -0800 (PST)
In-Reply-To: <d5b22f64-83b1-4a6e-9638-0c7171c01dccn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.182.153; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.182.153
References: <87fstdumxd.fsf@hotmail.com> <sjugcv$jio$1@dont-email.me>
<cb6bbb41-398f-4e2a-9a19-08bc4582b291n@googlegroups.com> <sk437c$672$1@dont-email.me>
<jwvee8qgxk0.fsf-monnier+comp.arch@gnu.org> <2021Oct12.185057@mips.complang.tuwien.ac.at>
<jwvzgre88y4.fsf-monnier+comp.arch@gnu.org> <5f97b29e-e958-49e2-bb1c-c0e9870f9c2bn@googlegroups.com>
<sku3dr$1hb$2@dont-email.me> <5d25afd4-0e2c-4e98-a457-a04be5ae88dbn@googlegroups.com>
<sl3c2g$n4b$1@dont-email.me> <2021Oct25.195829@mips.complang.tuwien.ac.at>
<sl79bl$jei$1@dont-email.me> <itpou1Fa8stU1@mid.individual.net>
<86zgqvfe7h.fsf@linuxsc.com> <itsb76FpbmnU1@mid.individual.net>
<86pmr49hhu.fsf@linuxsc.com> <ivaf0tFkrrnU1@mid.individual.net> <d5b22f64-83b1-4a6e-9638-0c7171c01dccn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <03c20ef7-aad9-41a5-9318-7fe5e127e80en@googlegroups.com>
Subject: Re: Paper about ISO C
From: already5...@yahoo.com (Michael S)
Injection-Date: Sat, 13 Nov 2021 20:07:56 +0000
Content-Type: text/plain; charset="UTF-8"

by: Michael S - Sat, 13 Nov 2021 20:07 UTC

On Saturday, November 13, 2021 at 9:24:09 PM UTC+2, MitchAlsup wrote:
> On Saturday, November 13, 2021 at 12:38:24 PM UTC-6, Niklas Holsti wrote:
> > (I'm keeping a lot of the context because it has been quite a while
> > since the discussion that Tim resumed.)
> > On 2021-11-13 17:56, Tim Rentsch wrote:
> > > Niklas Holsti <niklas...@tidorum.invalid> writes:
> > >
> > >> On 2021-10-27 4:28, Tim Rentsch wrote:
> > >>
> > >>> Niklas Holsti <niklas...@tidorum.invalid> writes:
> > >>>
> > >>>>> [.. volatile ..]
> > >>>>
> > >>>> These discussions of volatile, and execution order wrt timing,
> > >>>> gave me an idea: perhaps C (and other languages) should allow
> > >>>> marking functions (subprograms) as "volatile", with the meaning
> > >>>> that all of the effects of a call of that function (including use
> > >>>> of processor time) should be ordered as volatile accesses are
> > >>>> ordered, with respect to other volatile accesses.
> > >>>>
> > >>>> For example, if x and y are volatile variables, and foo is a
> > >>>> volatile function, then in this code
> > >>>>
> > >>>> x = 1;
> > >>>> foo ();
> > >>>> y = 1;
> > >>>>
> > >>>> we would be sure that all effects and all dynamic resource usage
> > >>>> of the foo() call would occur between the assignments to x and to
> > >>>> y.
> > >>>>
> > >>>> A more flexible approach would be to mark selected function calls
> > >>>> as volatile, in the same way that C allows in-line use of
> > >>>> pointer-to-volatile to force a volatile access to an object that
> > >>>> is not itself marked volatile. Something like:
> > >>>>
> > >>>> x = 1;
> > >>>> (volatile) foo ();
> > >>>> y = 1;
> > >>>>
> > >>>> Are volatile functions and/or volatile function calls a good idea?
> > >>>
> > >>> Let me propose a simpler mechanism that I believe does a better
> > >>> job of what (I think) it is you want to do. By way of example:
> > >>>
> > >>> * (_Volatile int *) &x = 1;
> > >>> foo();
> > >>> * (_Volatile int *) &y = 1;
> > >>>
> > >>> The semantics of the new _Volatile qualifier, speaking
> > >>> informally, is that it imposes a sequence point in the actual
> > >>> machine, not just in the abstract machine. So all logically
> > >>> previous evaluations must be finished before a volatile access,
> > >>> and after a volatile access all logically subsequent evaluations
> > >>> must not yet be started. To say that another way, no expression
> > >>> evaluation (including side effects) may be "moved across" a
> > >>> read or write to a _Volatile object.
> > >>>
> > >>> Note that foo() is a call to an ordinary function, and expressions
> > >>> in foo() may be re-ordered in all the usual ways, except that they
> > >>> must not be "moved across" the assignment to x or the assignment
> > >>> to y.
> > >>
> > >> That is exactly the semantics I intended, as I described in my
> > >> response to David Brown. So we are creating the same functionality
> > >> but using different source-code mechanisms - my suggestion marks the
> > >> call (or the function), and interacts with the existing "volatile"
> > >> mechanism, while your suggestion defines a new and stronger
> > >> "_Volatile" access.
> > >
> > > Part of what motivated my proposal is I think _Volatile is useful
> > > all by itself. When people use volatile, I think in many cases
> > > what they expect, and also what is actually needed, is something
> > > closer to _Volatile than it is to volatile. If indeed _Volatile
> > > is useful in its own right, then adding _Volatile to the language
> > > gets both benefits, so that seems like a win.
> > >
> > >> I suppose only compiler implementors can tell us which of the
> > >> two is easier to implement.
> > >
> > > I've been doing some thinking on this question, and I'm pretty
> > > sure that what it take to implement them is basically the same
> > > for the two approaches.
> > >
> > >> Your suggestion has the benefit that the "immovable" code is not
> > >> necessarily a function call, as in my proposal. However, I had in
> > >> mind an extension to let any code block be defined as immovable in
> > >> this sense, perhaps as
> > >>
> > >> (volatile) { some code ... };
> > >>
> > >> which would let the programmer use a local encapsulation of the
> > >> immovable code, without defining a function just for that purpose.
> > >
> > > I can't help feeling that introducing a special form just for
> > > this purpose is a red flag that we are going down a bad path.
> > Well, blocks { ... } exist already, and if the preceding suggestion of
> > "(volatile) foo()" is implemented, the block form does not seem to be
> > much of an extension.
> <
> My question is:: in a (volatile) { block } does EVERY memory access
> take on the volatile moniker ? does the { block } take on the moniker
> and if so exactly what does that mean ? is no memory reordering
> allowed whatsoever ?
> <
> {I am pretty sure I have no idea as to what (volatile) foo(); means;
> especially if foo() has already been compiled and placed in a library.
> Do you reach into the library and then attach volatile to every access?
> But perhaps one simply means that one cannot put a volatile in a
> location other than its resting memory location while the foo() is
> active? !?!}
> <
> > > But let me ask a question. I'm not sure what problem motivated
> > > your original suggestion. What problem is it that you want to
> > > solve, where having some mechanism like the ones we have been
> > > discussing would help solve it?
> <
> > My suggestion was a response to some posts about controlling
> > (minimizing) the time elapsed between two volatile accesses. Something
> > like the following, where x and y are volatile, but z and q not:
> >
> > z = (some long computation);
> > q = x;
> > y = z + q;
> >
> > The problem was that some compiler moved the long computation after the
> > reading of x, thus greatly increasing the delay between the reading of x
> > and the assignment to y. That code movement is of course now allowed,
> > because the computation is not "volatile", nor is the assignment to z.
> <
> Volatile was invented for those variables (like memory mapped I/O registers)
> that have to be read/written exactly the same number of times they are mentioned
> in the source code. These kinds of memory locations change their values upon
> being read, and do not necessarily have the last value written.
> <
> Now, it looks like you are suggesting that volatile has assumed another purpose:
> that of scheduling access to potentially shared variables in shared memory.
> <
> But in any event:: why was the code NOT written::
> z = (some long computation);
> y = z + x;
> <
> Or even:
> y = (some long calculation) + x;
> <
> The assignment of x into q, it seems to me, indicates the programmer wanted to
> separate the volatile access from the calculation. And thus, the compiler should
> have the mentioned freedom!

Here are relevant messages
https://groups.google.com/g/comp.arch/c/HMgFkk6BBqE/m/ncfgGNyIAgAJ
https://groups.google.com/g/comp.arch/c/HMgFkk6BBqE/m/z72ie5SYAgAJ
https://groups.google.com/g/comp.arch/c/HMgFkk6BBqE/m/glqoZoqaAgAJ
https://groups.google.com/g/comp.arch/c/HMgFkk6BBqE/m/SweM0XzHAgAJ

> <
> >
> > A work-around for the above example would be to make z volatile too, but
> > that is not as direct as saying that the computation shall not be moved
> > past volatile accesses.
> <
> Making z volatile requires its assigned location be memory and not be allocated
> to a register. A register retains the value that was last written and returns same
> upon being read; and thus has none of the properties one wants to coin to cause
> the volatile moniker to be attached.
> <
> Not sure you want to go that far.
> <
> > > Tangential note re: comments by David Brown. For several years
> > > now I've been following a policy of not reading postings from
> > > David Brown. Back when I was reading them I found his comments
> > > were mostly thoughtless and arrogant.
> > I don't find them so. David does usually give his opinions strongly and
> > without softening, but they are usually valid opinions (meaning that I
> > mostly agree with them).

Click here to read the complete article

Re: Paper about ISO C

<ival9gFm1nnU1@mid.individual.net>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=21987&group=comp.arch#21987

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!4.us.feeder.erje.net!2.us.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!news.in-chemnitz.de!news2.arglkargh.de!news.karotte.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch
Subject: Re: Paper about ISO C
Date: Sat, 13 Nov 2021 22:25:20 +0200
Organization: Tidorum Ltd
Lines: 25
Message-ID: <ival9gFm1nnU1@mid.individual.net>
References: <87fstdumxd.fsf@hotmail.com> <sjugcv$jio$1@dont-email.me>
<cb6bbb41-398f-4e2a-9a19-08bc4582b291n@googlegroups.com>
<sk437c$672$1@dont-email.me> <jwvee8qgxk0.fsf-monnier+comp.arch@gnu.org>
<2021Oct12.185057@mips.complang.tuwien.ac.at>
<jwvzgre88y4.fsf-monnier+comp.arch@gnu.org>
<5f97b29e-e958-49e2-bb1c-c0e9870f9c2bn@googlegroups.com>
<sku3dr$1hb$2@dont-email.me>
<5d25afd4-0e2c-4e98-a457-a04be5ae88dbn@googlegroups.com>
<sl3c2g$n4b$1@dont-email.me> <2021Oct25.195829@mips.complang.tuwien.ac.at>
<sl79bl$jei$1@dont-email.me> <itpou1Fa8stU1@mid.individual.net>
<86zgqvfe7h.fsf@linuxsc.com> <itsb76FpbmnU1@mid.individual.net>
<86pmr49hhu.fsf@linuxsc.com> <ivaf0tFkrrnU1@mid.individual.net>
<jwvr1bj6dfl.fsf-monnier+comp.arch@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net vMPguOn90f50Shx83zcqyg2FXyDjfrD/tLMz0mKF/W/yIng7Rx
Cancel-Lock: sha1:lonchjK07cwXBVSrjma0BWExXn0=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.14.0
In-Reply-To: <jwvr1bj6dfl.fsf-monnier+comp.arch@gnu.org>
Content-Language: en-US

by: Niklas Holsti - Sat, 13 Nov 2021 20:25 UTC

On 2021-11-13 21:56, Stefan Monnier wrote:
>> My suggestion was a response to some posts about controlling (minimizing)
>> the time elapsed between two volatile accesses. Something like the
>> following, where x and y are volatile, but z and q not:
>
> I think the fact that it accessed volatile vars was rather accidental,
> and (ab)using a notion of volatility for that would probably not be
> a good idea.
>
> I think for the original problem, what the programmer wants is to be
> able to label a chunk of the code (e.g. a block) as being "performance
> sensitive", which would tell the compiler not to move code *into* it.

Yes, but a chunk of code can be performance sensitive only if its
performance is visible in external effects. In embedded real-time
systems, those effects are usually caused by volatile accesses to
memory-mapped I/O device registers, with performance affecting the
absolute and relative timing of those accesses.

In present C, volatile accesses are the only language mechanism that can
limit code movement (at least in single-threaded code), so IMO it is not
far-fetched to extend that mechanism to get more control over code
movement. However, David Brown's suggestion of an "execution barrier"
statement is more direct and more powerful, as I have said earlier.

Re: Paper about ISO C

<864k8fahq7.fsf@linuxsc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=21988&group=comp.arch#21988

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.arch
Subject: Re: Paper about ISO C
Date: Sat, 13 Nov 2021 13:06:08 -0800
Organization: A noiseless patient Spider
Lines: 40
Message-ID: <864k8fahq7.fsf@linuxsc.com>
References: <87fstdumxd.fsf@hotmail.com> <sjugcv$jio$1@dont-email.me> <cb6bbb41-398f-4e2a-9a19-08bc4582b291n@googlegroups.com> <sk437c$672$1@dont-email.me> <jwvee8qgxk0.fsf-monnier+comp.arch@gnu.org> <2021Oct12.185057@mips.complang.tuwien.ac.at> <jwvzgre88y4.fsf-monnier+comp.arch@gnu.org> <5f97b29e-e958-49e2-bb1c-c0e9870f9c2bn@googlegroups.com> <sku3dr$1hb$2@dont-email.me> <5d25afd4-0e2c-4e98-a457-a04be5ae88dbn@googlegroups.com> <sl3c2g$n4b$1@dont-email.me> <2021Oct25.195829@mips.complang.tuwien.ac.at> <sl79bl$jei$1@dont-email.me> <itpou1Fa8stU1@mid.individual.net> <86zgqvfe7h.fsf@linuxsc.com> <itsb76FpbmnU1@mid.individual.net> <86pmr49hhu.fsf@linuxsc.com> <ivaf0tFkrrnU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: reader02.eternal-september.org; posting-host="11a0868cb825ac3c7403c6cd6d1f76c2";
logging-data="19306"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/AOGH42+DxQcb5QRoSdsm5vqZ6IUsnem0="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:FmEND7quTzkVjqu27Akz2tFLc2g=
sha1:y9N/X/nuo06c+j9ANrDW+5nuYiM=

by: Tim Rentsch - Sat, 13 Nov 2021 21:06 UTC

Niklas Holsti <niklas.holsti@tidorum.invalid> writes:

> On 2021-11-13 17:56, Tim Rentsch wrote:

[...]

>> But let me ask a question. I'm not sure what problem motivated
>> your original suggestion. What problem is it that you want to
>> solve, where having some mechanism like the ones we have been
>> discussing would help solve it?
>
> My suggestion was a response to some posts about controlling
> (minimizing) the time elapsed between two volatile accesses. Something
> like the following, where x and y are volatile, but z and q not:
>
> z = (some long computation);
> q = x;
> y = z + q;
>
> The problem was that some compiler moved the long computation after
> the reading of x, thus greatly increasing the delay between the
> reading of x and the assignment to y. That code movement is of course
> now allowed, because the computation is not "volatile", nor is the
> assignment to z.
>
> A work-around for the above example would be to make z volatile too,
> [...]

We can leave z as it is, and still force the needed ordering
(taking the type of z to be int):

z = *(volatile int[]){0} = (some long computation);
q = x;
y = z + q;

This method makes it obvious that some ordering is being imposed,
using only a local change, and doesn't need any new language
mechanisms. Without more compelling use cases, I'm inclined to
think adding some new language construct weighs more on the minus
side than the plus side.

Subject	Author
Paper about ISO C	clamky
Re: Paper about ISO C	BGB
Re: Paper about ISO C	David Brown
Re: Paper about ISO C	BGB
Re: Paper about ISO C	MitchAlsup
Re: Paper about ISO C	MitchAlsup
Re: Paper about ISO C	Branimir Maksimovic
Re: Paper about ISO C	George Neuner
Re: Paper about ISO C	Branimir Maksimovic
Re: Paper about ISO C	EricP
Re: Paper about ISO C	Ivan Godard
Re: Paper about ISO C	EricP
Re: Paper about ISO C	MitchAlsup
Re: Paper about ISO C	BGB
Re: Paper about ISO C	Stephen Fuld
Re: Paper about ISO C	Ivan Godard
Re: addressing and protection, was Paper about ISO C	John Levine
Re: addressing and protection, was Paper about ISO C	MitchAlsup
Re: addressing and protection, was Paper about ISO C	John Levine
Re: addressing and protection, was Paper about ISO C	Ivan Godard
Re: addressing and protection, was Paper about ISO C	Quadibloc
Re: addressing and protection, was Paper about ISO C	Bill Findlay
Re: addressing and protection, was Paper about ISO C	Quadibloc
Re: addressing and protection, was Paper about ISO C	Stephen Fuld
Re: addressing and protection, was Paper about ISO C	Branimir Maksimovic
Re: addressing and protection, was Paper about ISO C	EricP
Re: addressing and protection, was Paper about ISO C	Stefan Monnier
Re: addressing and protection, was Paper about ISO C	EricP
Re: addressing and protection, was Paper about ISO C	Ivan Godard
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	MitchAlsup
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	John Dallman
Re: addressing and protection, was Paper about ISO C	MitchAlsup
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	Thomas Koenig
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	Anton Ertl
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	Quadibloc
Re: addressing and protection, was Paper about ISO C	Quadibloc
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	John Dallman
Re: addressing and protection, was Paper about ISO C	Thomas Koenig
Re: addressing and protection, was Paper about ISO C	MitchAlsup
Re: addressing and protection, was Paper about ISO C	Thomas Koenig
Re: addressing and protection, was Paper about ISO C	MitchAlsup
Re: addressing and protection, was Paper about ISO C	Stefan Monnier
Re: addressing and protection, was Paper about ISO C	Chris M. Thomasson
Re: addressing and protection, was Paper about ISO C	MitchAlsup
Re: addressing and protection, was Paper about ISO C	Thomas Koenig
Re: addressing and protection, was Paper about ISO C	Thomas Koenig
Re: addressing and protection, was Paper about ISO C	EricP
Re: addressing and protection, was Paper about ISO C	MitchAlsup
Re: addressing and protection, was Paper about ISO C	John Dallman
Re: addressing and protection, was Paper about ISO C	Thomas Koenig
Re: addressing and protection, was Paper about ISO C	Ivan Godard
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	Stephen Fuld
Address space consumption (was: addressing and protection, was Paper about ISO C	Stefan Monnier
Re: Address space consumption (was: addressing and protection, was	MitchAlsup
Re: addressing and protection, was Paper about ISO C	MitchAlsup
Re: addressing and protection, was Paper about ISO C	Thomas Koenig
Re: addressing and protection, was Paper about ISO C	George Neuner
Re: addressing and protection, was Paper about ISO C	Stephen Fuld
Re: addressing and protection, was Paper about ISO C	Michael S
Re: addressing and protection, was Paper about ISO C	Terje Mathisen
Re: addressing and protection, was Paper about ISO C	Michael S
Re: addressing and protection, was Paper about ISO C	Stefan Monnier
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	John Levine
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	Stephen Fuld
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	John Levine
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	clamky
Re: addressing and protection, was Paper about ISO C	George Neuner
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	George Neuner
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	Ivan Godard
Re: educational computation, was addressing and protection, was Paper about ISO	John Levine
Re: educational computation, was addressing and protection, was Paper	Ivan Godard
Re: educational computation, was addressing and protection, was Paper	Terje Mathisen
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	MitchAlsup
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	David Brown
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	MitchAlsup
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	Ivan Godard
Re: addressing and protection, was Paper about ISO C	MitchAlsup
Re: addressing and protection, was Paper about ISO C	Ivan Godard
Re: addressing and protection, was Paper about ISO C	MitchAlsup
Re: addressing and protection, was Paper about ISO C	JimBrakefield
Re: addressing and protection, was Paper about ISO C	Terje Mathisen
Re: addressing and protection, was Paper about ISO C	Tim Rentsch
Re: addressing and protection, was Paper about ISO C	Terje Mathisen
Re: addressing and protection, was Paper about ISO C	BGB
Re: addressing and protection, was Paper about ISO C	Terje Mathisen
Re: addressing and protection, was Paper about ISO C	Anne & Lynn Wheeler
Re: addressing and protection, was Paper about ISO C	MitchAlsup
Re: what is cheap these days, addressing and protection, was Paper about ISO C	John Levine
Re: addressing and protection, was Paper about ISO C	Stephen Fuld
Re: addressing and protection, was Paper about ISO C	Michael S
Re: addressing and protection, was Paper about ISO C	Michael S
RAM size (was: addressing and protection, was Paper about ISO C)	Anton Ertl
Re: addressing and protection, was Paper about ISO C	Terje Mathisen
Re: addressing and protection, was Paper about ISO C	MitchAlsup
Re: addressing and protection, was Paper about ISO C	John Levine
Re: addressing and protection, was Paper about ISO C	MitchAlsup
Re: addressing and protection, was Paper about ISO C	John Levine
Re: Paper about ISO C	BGB
Re: Paper about ISO C	EricP
Re: Paper about ISO C	Branimir Maksimovic
Re: Paper about ISO C	Thomas Koenig
Re: Paper about ISO C	antispam
Re: Paper about ISO C	Quadibloc
Re: Paper about ISO C	Thomas Koenig
Re: Paper about ISO C	David Brown
Re: Paper about ISO C	Thomas Koenig
Re: Paper about ISO C	Victor Yodaiken
Re: Paper about ISO C	Kent Dickey