Welcome to novaBBS (click a section below)

mail files register newsreader groups login

Message-ID:

When we write programs that "learn", it turns out we do and they don't.

Minor idea for indirect target predictor

Subject	Author
Minor idea for indirect target predictor	Paul A. Clayton
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Terje Mathisen
Re: Minor idea for indirect target predictor	EricP
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	EricP
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Paul A. Clayton
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Paul A. Clayton
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Andy
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Anton Ertl
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Ivan Godard
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Ivan Godard
Re: Minor idea for indirect target predictor	MitchAlsup
Re: sparse switch (was Minor idea for indirect target predictor)	Brian G. Lucas
Re: sparse switch (was Minor idea for indirect target predictor)	EricP
Re: sparse switch (was Minor idea for indirect target predictor)	Thomas Koenig
Re: sparse switch (was Minor idea for indirect target predictor)	Brian G. Lucas
Re: sparse switch (was Minor idea for indirect target predictor)	Thomas Koenig
Re: sparse switch (was Minor idea for indirect target predictor)	MitchAlsup
Re: sparse switch (was Minor idea for indirect target predictor)	Thomas Koenig
Re: sparse switch (was Minor idea for indirect target predictor)	Terje Mathisen
Re: sparse switch (was Minor idea for indirect target predictor)	John Levine
Re: sparse switch (was Minor idea for indirect target predictor)	MitchAlsup
Re: sparse switch (was Minor idea for indirect target predictor)	Thomas Koenig
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Stefan Monnier
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Stefan Monnier
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Stefan Monnier
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Stefan Monnier
Re: Minor idea for indirect target predictor	Stephen Fuld
Re: Minor idea for indirect target predictor	Ivan Godard
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Terje Mathisen
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Marcus
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Marcus
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Terje Mathisen
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	Ivan Godard
Re: Minor idea for indirect target predictor	John Dallman
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	George Neuner
Re: Minor idea for indirect target predictor	EricP
Re: Minor idea for indirect target predictor	Andy Valencia
Re: Minor idea for indirect target predictor	Ivan Godard
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	Ivan Godard
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	George Neuner
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Stefan Monnier
Re: Minor idea for indirect target predictor	George Neuner
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	John Dallman
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	Stephen Fuld
Re: Minor idea for indirect target predictor	George Neuner
Re: Minor idea for indirect target predictor	Stephen Fuld
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	George Neuner
Re: Minor idea for indirect target predictor	MitchAlsup
Re: Minor idea for indirect target predictor	George Neuner
Re: Minor idea for indirect target predictor	Stephen Fuld
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Stefan Monnier
Re: Minor idea for indirect target predictor	Ivan Godard
Re: Python performance (was: Minor idea for indirect target	Marcus
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	David Brown
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	Thomas Koenig
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	BGB
Re: Minor idea for indirect target predictor	MitchAlsup

Pages:1 234 5 6 7 8

Re: Minor idea for indirect target predictor

<sbk7i3$r2v$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18284&group=comp.arch#18284

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Thu, 1 Jul 2021 13:05:39 +0200
Organization: A noiseless patient Spider
Lines: 63
Message-ID: <sbk7i3$r2v$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi3df$gg$1@newsreader4.netcologne.de>
<sbi7n8$7g9$1@dont-email.me> <sbi8d7$3sa$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 1 Jul 2021 11:05:39 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c8f8f43462db8e38d0e8cadf1c3650f9";
logging-data="27743"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Ssh+f6/N2YTcmbML8aQzJEOZcEZZ0vrk="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:6G7GxsYbIJlhZFSCpkjR2IRzswo=
In-Reply-To: <sbi8d7$3sa$1@newsreader4.netcologne.de>
Content-Language: en-GB

by: David Brown - Thu, 1 Jul 2021 11:05 UTC

On 30/06/2021 19:07, Thomas Koenig wrote:
> David Brown <david.brown@hesbynett.no> schrieb:
>> On 30/06/2021 17:42, Thomas Koenig wrote:
>>> David Brown <david.brown@hesbynett.no> schrieb:
>>>
>>> (ctype.h)
>>>
>>>> However, range testing the value is so simple and obvious that I would
>>>> not expect any C library implementation to have problems here.
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78155 has the non-nice
>>> example
>>>
>>> $ cat foo.c
>>> int main (void)
>>> {
>>> __builtin_printf ("%i\n", __builtin_isalpha (999999));
>>> }
>>> $ gcc foo.c
>>> $ ./a.out
>>> Segmentation fault (core dumped)
>>>
>>
>> These builtins are a different matter. These are undocumented
>> functions, used by libraries to implement the documented ones.
>
> This isn't any better:
>
> $ cat foo.c
> #include <ctype.h>
> #include <stdio.h>
>
> int main (void)
> {
> printf ("%i\n", isalpha (999999));
> }
> $ gcc foo.c
> $ ./a.out
> Segmentation fault (core dumped)
>
> This just serves to illustrate that the arguments to isalpha() and
> friends can lead to undefined behavior, and these functions should
> not be called on unsanitized data.
>

Just as the documentation for the function says.

(Section 7.4 of the C standard, or
<https://en.cppreference.com/w/c/string/byte/isalpha> for those that
feel the C standards are not particularly easy reads!)

It is the responsibility of every programmer calling a function to know
what values are valid, and to ensure that the function is only called
with valid values.

It is /helpful/ for library programmers to check their inputs for basic
sanity, when it can be done without significant efficiency costs.

It surprises me a little that this is not done here (using the
particular compiler and library combination you have), but I suppose
someone felt that the documentation of the standard function was clear
enough, and the relative efficiency costs for checking were too high.

Re: Minor idea for indirect target predictor

<sbk7pm$v60$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18285&group=comp.arch#18285

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Thu, 1 Jul 2021 13:09:42 +0200
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <sbk7pm$v60$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi3df$gg$1@newsreader4.netcologne.de>
<sbi7n8$7g9$1@dont-email.me> <sbi8d7$3sa$1@newsreader4.netcologne.de>
<578ebc77-5c53-4def-92af-945b20488248n@googlegroups.com>
<sbjm60$32l$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 1 Jul 2021 11:09:42 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c8f8f43462db8e38d0e8cadf1c3650f9";
logging-data="31936"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+mnERZOv0niPHu0P0qnIz8LiH8xFjsyAg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:vQB/RfjAZDlBxvBeOtuQwMT2cns=
In-Reply-To: <sbjm60$32l$1@newsreader4.netcologne.de>
Content-Language: en-GB

by: David Brown - Thu, 1 Jul 2021 11:09 UTC

On 01/07/2021 08:09, Thomas Koenig wrote:
> MitchAlsup <MitchAlsup@aol.com> schrieb:
>
>> Why does isalpha NOT check its input argument range ?????
>
> Another of those features of C that did not age well. It made
> sense in the 1970s, less so in the 2020s.
>
> A better alternative is to use
>
> #include <safe-ctype.h>
>
> and then ISALPHA etc. This is a gcc extension, not sure
> if LLVM has it or not.
>

Alternatively, you could stick to passing unsigned chars to isalpha.
The function only makes sense for ASCII, so you could also use
"isalpha(c & 0x7f)" rather than "isalpha(c)" - if the masking makes a
difference, your code is wrong anyway.

Or you could just write code that does what you want it to do - sanitize
all input from untrusted sources, and don't call functions that might
not be defined for the input you give them.

Re: Minor idea for indirect target predictor

<sbkg7l$jt6$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18287&group=comp.arch#18287

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!news.mixmin.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Thu, 1 Jul 2021 13:33:41 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbkg7l$jt6$1@newsreader4.netcologne.de>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi3df$gg$1@newsreader4.netcologne.de>
<sbi7n8$7g9$1@dont-email.me> <sbi8d7$3sa$1@newsreader4.netcologne.de>
<578ebc77-5c53-4def-92af-945b20488248n@googlegroups.com>
<sbjm60$32l$1@newsreader4.netcologne.de> <sbk7pm$v60$1@dont-email.me>
Injection-Date: Thu, 1 Jul 2021 13:33:41 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="20390"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Thu, 1 Jul 2021 13:33 UTC

David Brown <david.brown@hesbynett.no> schrieb:
> On 01/07/2021 08:09, Thomas Koenig wrote:
>> MitchAlsup <MitchAlsup@aol.com> schrieb:
>>
>>> Why does isalpha NOT check its input argument range ?????
>>
>> Another of those features of C that did not age well. It made
>> sense in the 1970s, less so in the 2020s.
>>
>> A better alternative is to use
>>
>> #include <safe-ctype.h>
>>
>> and then ISALPHA etc. This is a gcc extension, not sure
>> if LLVM has it or not.
>>
>
> Alternatively, you could stick to passing unsigned chars to isalpha.
> The function only makes sense for ASCII,

isalpha() and friends are locale-specific, so they need to be able
to deal with eight-bit-characters.

Re: Minor idea for indirect target predictor

<sbkhu4$lcr$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18288&group=comp.arch#18288

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Thu, 1 Jul 2021 16:02:44 +0200
Organization: A noiseless patient Spider
Lines: 31
Message-ID: <sbkhu4$lcr$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi3df$gg$1@newsreader4.netcologne.de>
<sbi7n8$7g9$1@dont-email.me> <sbi8d7$3sa$1@newsreader4.netcologne.de>
<578ebc77-5c53-4def-92af-945b20488248n@googlegroups.com>
<sbjm60$32l$1@newsreader4.netcologne.de> <sbk7pm$v60$1@dont-email.me>
<sbkg7l$jt6$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 1 Jul 2021 14:02:44 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c8f8f43462db8e38d0e8cadf1c3650f9";
logging-data="21915"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19AnG+tx07U6p2JQsVF+h5JxfI3bMJ+xNA="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:V+MTqQST2IBvQnvleQ4yd0K+zq8=
In-Reply-To: <sbkg7l$jt6$1@newsreader4.netcologne.de>
Content-Language: en-GB

by: David Brown - Thu, 1 Jul 2021 14:02 UTC

On 01/07/2021 15:33, Thomas Koenig wrote:
> David Brown <david.brown@hesbynett.no> schrieb:
>> On 01/07/2021 08:09, Thomas Koenig wrote:
>>> MitchAlsup <MitchAlsup@aol.com> schrieb:
>>>
>>>> Why does isalpha NOT check its input argument range ?????
>>>
>>> Another of those features of C that did not age well. It made
>>> sense in the 1970s, less so in the 2020s.
>>>
>>> A better alternative is to use
>>>
>>> #include <safe-ctype.h>
>>>
>>> and then ISALPHA etc. This is a gcc extension, not sure
>>> if LLVM has it or not.
>>>
>>
>> Alternatively, you could stick to passing unsigned chars to isalpha.
>> The function only makes sense for ASCII,
>
> isalpha() and friends are locale-specific, so they need to be able
> to deal with eight-bit-characters.
>

In the old days, when people used locales with 8-bit character sets,
that was the case. (And isalpha() is happy with any value that can be
represented as an unsigned char - i.e., 8-bit characters.) But for a
long time, on *nix systems, locales have been mostly either "C"
(basically US English with 7-bit ASCII only), or utf-8.

Re: Minor idea for indirect target predictor

<sbki2c$lff$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18289&group=comp.arch#18289

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Thu, 1 Jul 2021 14:05:00 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbki2c$lff$1@newsreader4.netcologne.de>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi3df$gg$1@newsreader4.netcologne.de>
<sbi7n8$7g9$1@dont-email.me> <sbi8d7$3sa$1@newsreader4.netcologne.de>
<578ebc77-5c53-4def-92af-945b20488248n@googlegroups.com>
<sbjm60$32l$1@newsreader4.netcologne.de> <sbk7pm$v60$1@dont-email.me>
<sbkg7l$jt6$1@newsreader4.netcologne.de> <sbkhu4$lcr$1@dont-email.me>
Injection-Date: Thu, 1 Jul 2021 14:05:00 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="21999"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Thu, 1 Jul 2021 14:05 UTC

David Brown <david.brown@hesbynett.no> schrieb:
> On 01/07/2021 15:33, Thomas Koenig wrote:
>> David Brown <david.brown@hesbynett.no> schrieb:
>>> On 01/07/2021 08:09, Thomas Koenig wrote:
>>>> MitchAlsup <MitchAlsup@aol.com> schrieb:
>>>>
>>>>> Why does isalpha NOT check its input argument range ?????
>>>>
>>>> Another of those features of C that did not age well. It made
>>>> sense in the 1970s, less so in the 2020s.
>>>>
>>>> A better alternative is to use
>>>>
>>>> #include <safe-ctype.h>
>>>>
>>>> and then ISALPHA etc. This is a gcc extension, not sure
>>>> if LLVM has it or not.
>>>>
>>>
>>> Alternatively, you could stick to passing unsigned chars to isalpha.
>>> The function only makes sense for ASCII,
>>
>> isalpha() and friends are locale-specific, so they need to be able
>> to deal with eight-bit-characters.
>>
>
> In the old days, when people used locales with 8-bit character sets,
> that was the case. (And isalpha() is happy with any value that can be
> represented as an unsigned char - i.e., 8-bit characters.) But for a
> long time, on *nix systems, locales have been mostly either "C"
> (basically US English with 7-bit ASCII only), or utf-8.

Still needs to be supported, though.

Re: sparse switch (was Minor idea for indirect target predictor)

<sbkoim$nbh$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18290&group=comp.arch#18290

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bage...@gmail.com (Brian G. Lucas)
Newsgroups: comp.arch
Subject: Re: sparse switch (was Minor idea for indirect target predictor)
Date: Thu, 1 Jul 2021 10:56:04 -0500
Organization: A noiseless patient Spider
Lines: 176
Message-ID: <sbkoim$nbh$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me>
<3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
<sbg1u7$v63$1@dont-email.me>
<504cb750-7e93-4def-9e6c-0d40c2e7832dn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 1 Jul 2021 15:56:06 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="de612192e78e942d4a1b07291c91d823";
logging-data="23921"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/6PHf5gwWPkxAQbLLmqGAi"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:gfotTN7mDhY89QI3aevOib51ViU=
In-Reply-To: <504cb750-7e93-4def-9e6c-0d40c2e7832dn@googlegroups.com>
Content-Language: en-US

by: Brian G. Lucas - Thu, 1 Jul 2021 15:56 UTC

On 6/29/21 6:49 PM, MitchAlsup wrote:
> On Tuesday, June 29, 2021 at 4:05:13 PM UTC-5, Ivan Godard wrote:
>> On 6/29/2021 1:33 PM, MitchAlsup wrote:
>>> On Tuesday, June 29, 2021 at 2:27:46 PM UTC-5, Ivan Godard wrote:
>>>> On 6/29/2021 9:01 AM, BGB wrote:
>>>>> On 6/28/2021 10:45 PM, Andy wrote:
>>>>>> On 27/06/21 7:31 am, MitchAlsup wrote:
>>>>>>
>>>>>>> There are five cases: Switches, Returns, Method calls, Tabulated
>>>>>>> Calls, and Longjump.
>>>>>>> <
>>>>>>> Switches tend to be unpredictable much of the time.
>>>>>>> Returns are easily highly predictable
>>>>>>> Method calls tend to use few methods more than others
>>>>>>> Tabulated calls tend to be indexed tables of entry points
>>>>>>> And longjump is a category all by itself.
>>>>>>
>>>>>> Ummm, so no love for Call Return Tables?, wouldn't they go some way to
>>>>>> reducing the number of conditional instructions spent checking for
>>>>>> error return codes and having otherwise unneeded instructions situated
>>>>>> right after the original call.
>>>>>>
>>>>>
>>>>> A dedicated call-return table probably only really makes sense for
>>>>> something like x86, which loads the return address from memory at the
>>>>> same time as branching to it.
>>>>>
>>>>> If you use a link register, or return via some other special register
>>>>> (such as treating R1 as a secondary link register), then all one needs
>>>>> to do is check is that this register hasn't been modified within the
>>>>> pipeline, and if so use the value held in this register as the predicted
>>>>> branch target.
>>>>>
>>>>> In this case, typically one can reload the LR or "Secondary LR" several
>>>>> cycles before its associated RTS or JMP. If one reloads the link
>>>>> register before the others in a function epilog, then (typically) the
>>>>> reloaded register will have left the pipeline before the CPU reaches the
>>>>> return instruction.
>>>>>
>>>>>
>>>>> Meanwhile, switch faces a problem that typically one calculates the
>>>>> switch target immediately before branching to it. If one could somehow
>>>>> do something like:
>>>>> Calculate where the "switch()" will go;
>>>>> Put some prior unrelated logic here;
>>>>> Do the branch to the switch target.
>>>>>
>>>>> Then "switch()" could potentially also be made predictable.
>>>> You can do exactly that: figure index; index-load from a table in
>>>> memory; do other stuff; indirect branch. Of course, that puts the table
>>>> in dmem, which Mitch doesn't like, or requires a load instruction from imem.
>>> <
>>> Switch is common enough that it should be performed in an instruction
>>> (the multiway branch) I added limit checking to this due to how it
>>> "worked out" in my ISA.
>>> <
>>> Also note: the tables in my switches remain program relative, so you fetch
>>> from ICache and then add this to the IP for the target address. This saves
>>> table size as the majority of all tables fit in 16-bit offsets, retains program
>>> relativity.
>>> <
>>>>> For small switch one might instead use linear probing or binary
>>>>> subdivision or similar, in which case it is predictable if the
>>>>> individual branches are predictable.
>>>>>
>>>>> Say (number of case labels):
>>>>> 1-3: Linear Probe
>>>>> 4-11: Binary Subdivide
>>>>> 12-511 (and sufficiently dense): Branch Table
>>>>> Else: Binary Subdivide (1)
>>>>>
>>>>> *1: Binary subdivide happens until either it reaches the conditions for
>>>>> a branch-table, or the number of targets is small enough that it falls
>>>>> back to a linear probe.
>>>>>
>>>>> ...
>>>> What matters is the distribution of indices, not the number of targets.
>>>> Think a typical lexer: you want to isolate all alphas with one compare,
>>>> not a four way binary to a particular alpha, because all the alphas go
>>>> to a single label. Once you do that, normal ifconversion will get rid of
>>>> half the internal branches of the decision tree..
>>> <
>>> In any event, any HW is going to be able to sort through the situation
>>> faster than any string of SW instructions.
>>>
>> Arguable.
>>
>> Label density matters for the tables too:
>> switch (<32 bit>) {
>> <cases: 11111, 22222, 33333, 44444, 55555, 66666, 77777, 88888, 99999>
>>
>> Do you really want 100k labels in your table, or a 3-deep compare tree?
> <
> That is considered "not dense", and get composed from ::
> <
> if( value==11111)
> {}
> else if( value ==22222 )
> {}
> .....
>>
>> Out of curiosity, what happens when you throw that at Brian's compiler?
> <
> Don't know:: Brian ??
>
int foo(int x)
{ switch (x) {
case 11111: return 1;
case 22222: return 2;
case 33333: return 3;
case 44444: return 4;
case 55555: return 5;
case 66666: return 6;
case 77777: return 7;
case 88888: return 8;
case 99999: return 9;
default: return 0;
}
}

foo: ; @foo
srl r2,r1,<32:0>
cmp r1,r2,#55554
ble r1,.LBB0_1
cmp r1,r2,#77776
ble r1,.LBB0_7
cmp r1,r2,#77777
beq r1,.LBB0_17
cmp r1,r2,#88888
beq r1,.LBB0_18
cmp r1,r2,#99999
bne r1,.LBB0_20
mov r1,#9
..LBB0_21:
ret
..LBB0_1:
cmp r1,r2,#33332
bgt r1,.LBB0_4
mov r1,#1
cmp r3,r2,#11111
beq r3,.LBB0_21
cmp r1,r2,#22222
bne r1,.LBB0_20
mov r1,#2
ret
..LBB0_7:
cmp r1,r2,#55555
beq r1,.LBB0_15
cmp r1,r2,#66666
bne r1,.LBB0_20
mov r1,#6
ret
..LBB0_4:
cmp r1,r2,#33333
beq r1,.LBB0_13
cmp r1,r2,#44444
bne r1,.LBB0_20
mov r1,#4
ret
..LBB0_17:
mov r1,#7
ret
..LBB0_18:
mov r1,#8
ret
..LBB0_15:
mov r1,#5
ret
..LBB0_13:
mov r1,#3
ret
..LBB0_20:
mov r1,#0
ret

Re: sparse switch (was Minor idea for indirect target predictor)

<KFmDI.10362$835.8905@fx36.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18292&group=comp.arch#18292

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!feeder.usenetexpress.com!tr1.eu1.usenetexpress.com!nntp.speedium.network!feeder01!81.171.65.14.MISMATCH!peer02.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx36.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: sparse switch (was Minor idea for indirect target predictor)
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com> <17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me> <sbfs7g$o15$1@dont-email.me> <3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com> <sbg1u7$v63$1@dont-email.me> <504cb750-7e93-4def-9e6c-0d40c2e7832dn@googlegroups.com> <sbkoim$nbh$1@dont-email.me>
In-Reply-To: <sbkoim$nbh$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 129
Message-ID: <KFmDI.10362$835.8905@fx36.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Thu, 01 Jul 2021 17:10:02 UTC
Date: Thu, 01 Jul 2021 13:09:13 -0400
X-Received-Bytes: 3872

by: EricP - Thu, 1 Jul 2021 17:09 UTC

Brian G. Lucas wrote:
> On 6/29/21 6:49 PM, MitchAlsup wrote:
>> On Tuesday, June 29, 2021 at 4:05:13 PM UTC-5, Ivan Godard wrote:
>>>
>>> Label density matters for the tables too:
>>> switch (<32 bit>) {
>>> <cases: 11111, 22222, 33333, 44444, 55555, 66666, 77777, 88888, 99999>
>>>
>>> Do you really want 100k labels in your table, or a 3-deep compare tree?
>> <
>> That is considered "not dense", and get composed from ::
>> <
>> if( value==11111)
>> {}
>> else if( value ==22222 )
>> {}
>> .....
>>>
>>> Out of curiosity, what happens when you throw that at Brian's compiler?
>> <
>> Don't know:: Brian ??
>>
> int foo(int x)
> {
> switch (x) {
> case 11111: return 1;
> case 22222: return 2;
> case 33333: return 3;
> case 44444: return 4;
> case 55555: return 5;
> case 66666: return 6;
> case 77777: return 7;
> case 88888: return 8;
> case 99999: return 9;
> default: return 0;
> }
> }
>
> foo: ; @foo
> srl r2,r1,<32:0>
> cmp r1,r2,#55554
> ble r1,.LBB0_1
> cmp r1,r2,#77776
> ble r1,.LBB0_7
> cmp r1,r2,#77777
> beq r1,.LBB0_17
> cmp r1,r2,#88888
> beq r1,.LBB0_18
> cmp r1,r2,#99999
> bne r1,.LBB0_20
> mov r1,#9
> ..LBB0_21:
> ret
> ..LBB0_1:
> cmp r1,r2,#33332
> bgt r1,.LBB0_4
> mov r1,#1
> cmp r3,r2,#11111
> beq r3,.LBB0_21
> cmp r1,r2,#22222
> bne r1,.LBB0_20
> mov r1,#2
> ret
> ..LBB0_7:
> cmp r1,r2,#55555
> beq r1,.LBB0_15
> cmp r1,r2,#66666
> bne r1,.LBB0_20
> mov r1,#6
> ret
> ..LBB0_4:
> cmp r1,r2,#33333
> beq r1,.LBB0_13
> cmp r1,r2,#44444
> bne r1,.LBB0_20
> mov r1,#4
> ret
> ..LBB0_17:
> mov r1,#7
> ret
> ..LBB0_18:
> mov r1,#8
> ret
> ..LBB0_15:
> mov r1,#5
> ret
> ..LBB0_13:
> mov r1,#3
> ret
> ..LBB0_20:
> mov r1,#0
> ret
>

Doesn't My66000 CMP instruction set multiple flags into r1?
If so, it would be more efficient to decompose the binary search
to use the flags to do a 3 way split:

cmp r1,r2,#55555
blt r1, L1
bgt r1, L2
mov r1,#5
ret
L1:
cmp r1,r2,#22222
blt r1,L3
bgt r1,L4
mov r1,#2
ret
L3:
cmp r1,r2,#11111
bne r1,Default
mov r1,#1
ret
....
....
Default:
mov r1,#0
ret

Most compilers won't do this because they don't
really know about condition codes and don't realize that in

if (i < j) ...
else if (i > j) ...
else

the flag values can be reused.

Re: sparse switch (was Minor idea for indirect target predictor)

<sbku26$tc4$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18293&group=comp.arch#18293

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: sparse switch (was Minor idea for indirect target predictor)
Date: Thu, 1 Jul 2021 17:29:42 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbku26$tc4$1@newsreader4.netcologne.de>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me>
<3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
<sbg1u7$v63$1@dont-email.me>
<504cb750-7e93-4def-9e6c-0d40c2e7832dn@googlegroups.com>
<sbkoim$nbh$1@dont-email.me> <KFmDI.10362$835.8905@fx36.iad>
Injection-Date: Thu, 1 Jul 2021 17:29:42 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="30084"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Thu, 1 Jul 2021 17:29 UTC

EricP <ThatWouldBeTelling@thevillage.com> schrieb:

>> int foo(int x)
>> {
>> switch (x) {
>> case 11111: return 1;
>> case 22222: return 2;
>> case 33333: return 3;
>> case 44444: return 4;
>> case 55555: return 5;
>> case 66666: return 6;
>> case 77777: return 7;
>> case 88888: return 8;
>> case 99999: return 9;
>> default: return 0;
>> }
>> }

[...]

> Doesn't My66000 CMP instruction set multiple flags into r1?
> If so, it would be more efficient to decompose the binary search
> to use the flags to do a 3 way split:
>
> cmp r1,r2,#55555
> blt r1, L1
> bgt r1, L2
> mov r1,#5
> ret

[...]

> Most compilers won't do this because they don't
> really know about condition codes and don't realize that in
>
> if (i < j) ...
> else if (i > j) ...
> else
>
> the flag values can be reused.

Here is what gcc does on POWER:

0000000000000000 <foo>:
0: 00 00 20 39 li r9,0
4: 03 d9 29 61 ori r9,r9,55555
8: 00 48 03 7c cmpw r3,r9
c: d4 00 82 41 beq e0 <foo+0xe0>
10: 40 00 81 40 ble 50 <foo+0x50>
14: 01 00 40 3d lis r10,1
18: 08 00 20 39 li r9,8
1c: 38 5b 4a 61 ori r10,r10,23352
20: 00 50 03 7c cmpw r3,r10
24: 18 00 82 41 beq 3c <foo+0x3c>
28: 58 00 81 40 ble 80 <foo+0x80>
2c: fe ff 69 6c xoris r9,r3,65534
30: 9f 86 09 2c cmpwi r9,-31073
34: 9c 00 82 40 bne d0 <foo+0xd0>
38: 09 00 20 39 li r9,9
3c: b4 07 23 7d extsw r3,r9
40: 20 00 80 4e blr

[...]

So the handling of the constants does not look pretty due to the
fixed 32-bit ISA, but the flag handling looks OK.

Re: sparse switch (was Minor idea for indirect target predictor)

<sbkvok$n8c$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18295&group=comp.arch#18295

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bage...@gmail.com (Brian G. Lucas)
Newsgroups: comp.arch
Subject: Re: sparse switch (was Minor idea for indirect target predictor)
Date: Thu, 1 Jul 2021 12:58:44 -0500
Organization: A noiseless patient Spider
Lines: 133
Message-ID: <sbkvok$n8c$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me>
<3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
<sbg1u7$v63$1@dont-email.me>
<504cb750-7e93-4def-9e6c-0d40c2e7832dn@googlegroups.com>
<sbkoim$nbh$1@dont-email.me> <KFmDI.10362$835.8905@fx36.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 1 Jul 2021 17:58:44 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="de612192e78e942d4a1b07291c91d823";
logging-data="23820"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/15Ot+V86Xsdj7GIUwfKO8"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:ZBqQeJy4QPY7k+7F81WXl0cW6o8=
In-Reply-To: <KFmDI.10362$835.8905@fx36.iad>
Content-Language: en-US

by: Brian G. Lucas - Thu, 1 Jul 2021 17:58 UTC

On 7/1/21 12:09 PM, EricP wrote:
> Brian G. Lucas wrote:
>> On 6/29/21 6:49 PM, MitchAlsup wrote:
>>> On Tuesday, June 29, 2021 at 4:05:13 PM UTC-5, Ivan Godard wrote:
>>>>
>>>> Label density matters for the tables too:
>>>> switch (<32 bit>) {
>>>> <cases: 11111, 22222, 33333, 44444, 55555, 66666, 77777, 88888, 99999>
>>>>
>>>> Do you really want 100k labels in your table, or a 3-deep compare tree?
>>> <
>>> That is considered "not dense", and get composed from ::
>>> <
>>>            if( value==11111)
>>>            {}
>>>            else if( value ==22222 )
>>>            {}
>>> .....
>>>>
>>>> Out of curiosity, what happens when you throw that at Brian's compiler?
>>> <
>>> Don't know:: Brian ??
>>>
>> int foo(int x)
>> {
>>     switch (x) {
>>      case 11111: return 1;
>>      case 22222: return 2;
>>      case 33333: return 3;
>>      case 44444: return 4;
>>      case 55555: return 5;
>>      case 66666: return 6;
>>      case 77777: return 7;
>>      case 88888: return 8;
>>      case 99999: return 9;
>>      default: return 0;
>>     }
>> }
>>
>> foo:                                    ; @foo
>>     srl    r2,r1,<32:0>
>>     cmp    r1,r2,#55554
>>     ble    r1,.LBB0_1
>>     cmp    r1,r2,#77776
>>     ble    r1,.LBB0_7
>>     cmp    r1,r2,#77777
>>     beq    r1,.LBB0_17
>>     cmp    r1,r2,#88888
>>     beq    r1,.LBB0_18
>>     cmp    r1,r2,#99999
>>     bne    r1,.LBB0_20
>>     mov    r1,#9
>> ..LBB0_21:
>>     ret
>> ..LBB0_1:
>>     cmp    r1,r2,#33332
>>     bgt    r1,.LBB0_4
>>     mov    r1,#1
>>     cmp    r3,r2,#11111
>>     beq    r3,.LBB0_21
>>     cmp    r1,r2,#22222
>>     bne    r1,.LBB0_20
>>     mov    r1,#2
>>     ret
>> ..LBB0_7:
>>     cmp    r1,r2,#55555
>>     beq    r1,.LBB0_15
>>     cmp    r1,r2,#66666
>>     bne    r1,.LBB0_20
>>     mov    r1,#6
>>     ret
>> ..LBB0_4:
>>     cmp    r1,r2,#33333
>>     beq    r1,.LBB0_13
>>     cmp    r1,r2,#44444
>>     bne    r1,.LBB0_20
>>     mov    r1,#4
>>     ret
>> ..LBB0_17:
>>     mov    r1,#7
>>     ret
>> ..LBB0_18:
>>     mov    r1,#8
>>     ret
>> ..LBB0_15:
>>     mov    r1,#5
>>     ret
>> ..LBB0_13:
>>     mov    r1,#3
>>     ret
>> ..LBB0_20:
>>     mov    r1,#0
>>     ret
>>
>
> Doesn't My66000 CMP instruction set multiple flags into r1?
> If so, it would be more efficient to decompose the binary search
> to use the flags to do a 3 way split:
>
> cmp r1,r2,#55555
> blt r1, L1
> bgt r1, L2
> mov r1,#5
> ret
> L1:
> cmp r1,r2,#22222
> blt r1,L3
> bgt r1,L4
> mov r1,#2
> ret
> L3:
> cmp r1,r2,#11111
> bne r1,Default
> mov r1,#1
> ret
> ...
> ...
> Default:
> mov r1,#0
> ret
>
> Most compilers won't do this because they don't
> really know about condition codes and don't realize that in
>
> if (i < j) ...
> else if (i > j) ...
> else
>
> the flag values can be reused.
>
Alas, the poor compiler writer hasn't figured out how to do that yet.
brian

Re: sparse switch (was Minor idea for indirect target predictor)

<sbl306$1v8$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18302&group=comp.arch#18302

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: sparse switch (was Minor idea for indirect target predictor)
Date: Thu, 1 Jul 2021 18:53:58 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbl306$1v8$1@newsreader4.netcologne.de>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me>
<3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
<sbg1u7$v63$1@dont-email.me>
<504cb750-7e93-4def-9e6c-0d40c2e7832dn@googlegroups.com>
<sbkoim$nbh$1@dont-email.me> <KFmDI.10362$835.8905@fx36.iad>
<sbkvok$n8c$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 1 Jul 2021 18:53:58 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-51bc-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:51bc:0:7285:c2ff:fe6c:992d";
logging-data="2024"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Thu, 1 Jul 2021 18:53 UTC

Brian G. Lucas <bagel99@gmail.com> schrieb:
> On 7/1/21 12:09 PM, EricP wrote:

>> Doesn't My66000 CMP instruction set multiple flags into r1?
>> If so, it would be more efficient to decompose the binary search
>> to use the flags to do a 3 way split:

[...]

>> Most compilers won't do this because they don't
>> really know about condition codes and don't realize that in
>>
>> if (i < j) ...
>> else if (i > j) ...
>> else
>>
>> the flag values can be reused.
>>
> Alas, the poor compiler writer hasn't figured out how to do that yet.

That could be a limitiation of LLVM: Unlike gcc for POWER for
which this works (see my recent post), clang for POWER does not
do so either https://godbolt.org/z/Pb1aYs6r6 .

Re: sparse switch (was Minor idea for indirect target predictor)

<53a2e7dc-af76-4b75-9507-d0ac991a0d0fn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18305&group=comp.arch#18305

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:1388:: with SMTP id h8mr1634332qtj.147.1625169804333; Thu, 01 Jul 2021 13:03:24 -0700 (PDT)
X-Received: by 2002:aca:de05:: with SMTP id v5mr974415oig.157.1625169804042; Thu, 01 Jul 2021 13:03:24 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 1 Jul 2021 13:03:23 -0700 (PDT)
In-Reply-To: <KFmDI.10362$835.8905@fx36.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:39ad:9960:ab74:f885; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:39ad:9960:ab74:f885
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com> <17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me> <sbfs7g$o15$1@dont-email.me> <3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com> <sbg1u7$v63$1@dont-email.me> <504cb750-7e93-4def-9e6c-0d40c2e7832dn@googlegroups.com> <sbkoim$nbh$1@dont-email.me> <KFmDI.10362$835.8905@fx36.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <53a2e7dc-af76-4b75-9507-d0ac991a0d0fn@googlegroups.com>
Subject: Re: sparse switch (was Minor idea for indirect target predictor)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 01 Jul 2021 20:03:24 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 13

by: MitchAlsup - Thu, 1 Jul 2021 20:03 UTC

On Thursday, July 1, 2021 at 12:10:05 PM UTC-5, EricP wrote:
<sigh>
> Most compilers won't do this because they don't
> really know about condition codes and don't realize that in
>
> if (i < j) ...
> else if (i > j) ...
> else
>
> the flag values can be reused.
<
Which, by the way, is how FORTRAN II's if statement worked ..........sigh..........

{How much have we forgotten by making so much of an advance..........}

Re: Minor idea for indirect target predictor

<sbl8hi$q3f$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18307&group=comp.arch#18307

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Thu, 1 Jul 2021 15:26:05 -0500
Organization: A noiseless patient Spider
Lines: 269
Message-ID: <sbl8hi$q3f$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi11k$m95$1@dont-email.me>
<sbjush$2gp$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 1 Jul 2021 20:28:34 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="e44588a2f6852be57873317e5ff4e19a";
logging-data="26735"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+WCNYXsfGreIZcxaPUBvBU"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:1eXC+4r6mSM07R1QhQIjqWeLR5M=
In-Reply-To: <sbjush$2gp$1@dont-email.me>
Content-Language: en-US

by: BGB - Thu, 1 Jul 2021 20:26 UTC

On 7/1/2021 3:37 AM, Marcus wrote:
> On 2021-06-30, BGB wrote:
>> On 6/30/2021 9:21 AM, David Brown wrote:
>>> On 30/06/2021 14:36, Thomas Koenig wrote:
>>>> Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
>>>>> Stephen Fuld wrote:
>>>>>> On 6/29/2021 12:27 PM, Ivan Godard wrote:
>>>>>> I am not a compiler guy, so forgive my ignorance. What would a
>>>>>> compiler
>>>>>> do in such a situation? Would it first test for something like if
>>>>>> char
>>>>>> >= "a" and <= "z", etc.?
>>>>>>
>>>>>> Just thinking about how I might do it, I think I would have a 256
>>>>>> byte
>>>>>> array, indexed by the character value, with each entry having an
>>>>>> indicator value, say 1 for lower case letters, 2 for upper case
>>>>>> letters,
>>>>>> 3 for numbers, 4 for white space characters, etc. Looking up the
>>>>>> value
>>>>>> shouldn't be costly as most of the table with shortly be in D-cache.
>>>>>> Then use the indicator value to index a jump or call table (which I
>>>>>> guess is fairly predictable as most characters are lower case
>>>>>> letters.
>>>>>>
>>>>>> But what do I know? :-(
>>>>>
>>>>> About the same as the standard C lib?
>>>>>
>>>>> I.e. the classic character classification macros use just such a
>>>>> table.
>>>>>
>>>>> As I noted previously, it tends to break down a bit with utf8, but you
>>>>> can at least handle all 7-bit ascii very quickly.
>>>
>>> It breaks down a /lot/ with UTF-8 - the traditional <ctype.h>
>>> classifications such as "isdigit" or "tolower" collapse pretty quickly
>>> in the face of non-ASCII character sets. Locales can help a bit, but
>>> you don't have to get very exotic before the answer is "it's
>>> complicated".
>>>
>>
>> If one is feeding raw byte or char values to these functions, and
>> using UTF-8 or similar, then as soon as one gets outside ASCII range
>> the answer is basically useless.
>>
>> Simple case is to ignore everything outside the ASCII range.
>>
>> Secondary option is to treat everything outside the ASCII range as if
>> it were Codepage-1252 or Latin-1.
>>
>>
>>
>> For wider characters, one would likely need some other functions, like
>> those from 'wctype.h', but then it depends on locale rather than "just
>> assume it is UTF-16 or something".
>>
>
> There's a clear difference between string encoding (ASCII, Latin-1,
> CP-1252, UTF-8, UTF-16, ...) and character encoding. The latter can
> usually be safely represented as a 32-bit integer representing a
> single Unicode code point.
>
> In my opinion, any function dealing with a single character at a time
> should use Unicode code points as input/output (rather than locale
> dependent encodings that is just a big source of bugs).
>

The behavior of some parts of the C library assume locale, for example
the multibyte string conversion functions, ...

Likewise, the "cytpe.h" generally also assume locale-dependent behavior.
The simple case though is to just hard-wire these to raw ASCII or CP1252
or similar.

Or, say:
"C" /"POSIX" / "EN-US":
The 'ctype' functions assume plain ASCII;
Most other stuff can assume CP1252.
Other (non-UTF-8):
Assume CP1252.
UTF8 (".UTF8"/"EN-US.UTF8"/...):
Most functions assume UTF-8;
The ctype functions fall back to plain ASCII.

I went recently looking into some of it, trying to figure out how
character encoding and locale were supposed to interact, realized that
apparently rather than doing something "sensible", the compilers were
mostly originally just sort of "winging it" back in the codepage days,
and it seems the closest interpretation to "sensible" is to effectively
treat the normal string type as a raw collection of bytes with an
implicit NUL terminator on the end.

But, then in modern days, we have gone from mostly using codepage
source-files to mostly UTF-8 source files, sometimes with a BOM, and
some editors (such as VisualStudio) spitting out UTF-16 files if used
for editing (and any non-ASCII characters are used).

The compiler can normalize everything to UTF-8 internally, but then this
leaves some ambiguous cases what happens with string literals, which
sort of results in some of the funky handling I ended up with (just sort
of loosely crushing everything down into byte-range in the default case).

Alternately, one would need to set a "default codepage" for the compiler
to assume for its string conversions, though this does still leave some
ambiguity (and we still probably need to be able to deal with the "code
which likes to encode binary data in strings" issue, but could
potentially warn about conversions for any other codepoints which can't
be represented in the currently selected codepage). Though, it is likely
the selection would still be fairly limited, and more exotic encodings
would remain unlikely.

Though it seems according to the MSVC documentation, for example, some
multibyte string functions depend on locale, whereas others are
hard-wired to assume UTF8.

In my case, it ends up similar, though sometimes diverging in different
cases.

I ended up working some on the locale stuff, and have basically ended up
with two major locales: "C"/"POSIX"/"EN-US"/... basically just assume
ASCII+CP1252 or similar; "C.UTF8"/".UTF8"/"EN-US.UTF8"/... Assume UTF-8.

The C library I am using ignores pretty much everything else about
locale settings, and I will probably keep up this pattern. For general
use by multi-language programs, the original C locale system seems a bit
broken.

> For instance in the GLFW text input API, all input events carry an
> unsigned int representing the Unicode code point for a single character:
>
> https://www.glfw.org/docs/latest/input_guide.html#input_char
>
> ...whereas functions that deal with text strings use UTF-8 encoding,
> like glfwSetWindowTitle():
>
> https://www.glfw.org/docs/latest/window_guide.html#window_title
>
> My personal preference for C/C++ applications is to use UTF-8 strings
> everywhere internally, and whenever an external API (e.g. the Win32 API)
> is used - do the necessary string translations in conjunction with the
> API calls. On POSIX systems you usally do not have to do any
> translations since most API:s accept UTF-8 strings.
>
> I successfully used this design philosophy for BuildCache, for instance.
>

As can be noted, my implementation of most filesystem API calls assume
UTF-8 by default.

One can't really make UTF-8 or similar the default encoding for string
literals though, mostly because there is a lot of software which tries
to do things like stick raw binary data into string literals, and will
break in catastrophic ways if this behavior is not preserved. This
includes the ability to encode strings with embedded NUL characters
without the literal being implicitly truncated by the compiler, ...

I suspect this may be part of why newer versions of C added things like:
char *s;
s = u8"Whatever\U0001F346\U0001F602"; //explicitly UTF-8

Though, one can debate whether it is better to encode these directly, or
split them up into surrogate pairs encoded via UTF-8, ...

Likewise:
wchar_t *ws;
ws=L"\U0001F346\U0001F602";
Could end up as two codepoints (UTF-32), or four (UTF-16).

Whereas char16_t/char32_t are more explicit:
char16_t *ws1;
char32_t *ws2;
ws1=u"\U0001F346\U0001F602"; //four
ws2=U"\U0001F346\U0001F602"; //two

Starts looking into it (partly based on the MSVC/MSDN descriptions), and
apparently for "proper" C string semantics in plain (raw) string
literals, I may actually end up needing to double-encode them:
String literal uses UTF-8 encoded to encode values in the range 00..FF;
Codepoints from the C source (or encoded via '\uXXXX'), will need to be
encoded as a series of bytes representing the UTF-8 codepoint,
themselves encoded as above.

It looks like 'L' and 'u' literals would still need to be encoded in a
more traditional M-UTF-8 (encoding 0000..FFFF directly, with
10000..1FFFF encoded via surrogate pairs encoded via UTF-8).

Likewise, 'U' literals would represent the whole codepoint range
directly via UTF-8 codepoints. Similarly the valid range for '\x'
character escaped also seems to be dependent on string literal type.

Click here to read the complete article

Re: sparse switch (was Minor idea for indirect target predictor)

<sbma1j$r7o$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18319&group=comp.arch#18319

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-30c8-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: sparse switch (was Minor idea for indirect target predictor)
Date: Fri, 2 Jul 2021 06:00:19 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbma1j$r7o$1@newsreader4.netcologne.de>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me>
<3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
<sbg1u7$v63$1@dont-email.me>
<504cb750-7e93-4def-9e6c-0d40c2e7832dn@googlegroups.com>
<sbkoim$nbh$1@dont-email.me> <KFmDI.10362$835.8905@fx36.iad>
<53a2e7dc-af76-4b75-9507-d0ac991a0d0fn@googlegroups.com>
Injection-Date: Fri, 2 Jul 2021 06:00:19 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-30c8-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:30c8:0:7285:c2ff:fe6c:992d";
logging-data="27896"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Fri, 2 Jul 2021 06:00 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Thursday, July 1, 2021 at 12:10:05 PM UTC-5, EricP wrote:
><sigh>
>> Most compilers won't do this because they don't
>> really know about condition codes and don't realize that in
>>
>> if (i < j) ...
>> else if (i > j) ...
>> else
>>
>> the flag values can be reused.
><
> Which, by the way, is how FORTRAN II's if statement worked ..........sigh..........

Of course, there it was in the primary language already,
so it just needed a straightforward translation.

> {How much have we forgotten by making so much of an advance..........}

This is one place where the transformation to SSA, helpful as it
is, can have its drawbacks. You need your backend to recognize
the pattern and emit the corresponding code.

gcc gets this right at least on POWER and x86_64, LLVM doesn't,
so you will just have to wait until this get fixed for LLVM.

Re: sparse switch (was Minor idea for indirect target predictor)

<sbmpg7$1hav$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18320&group=comp.arch#18320

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!Z/OnjRNZ74xzNAVdC5cKTg.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: sparse switch (was Minor idea for indirect target predictor)
Date: Fri, 2 Jul 2021 12:24:07 +0200
Organization: Aioe.org NNTP Server
Lines: 24
Message-ID: <sbmpg7$1hav$1@gioia.aioe.org>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me>
<3d9264ba-2014-4f33-811c-36b89f103855n@googlegroups.com>
<sbg1u7$v63$1@dont-email.me>
<504cb750-7e93-4def-9e6c-0d40c2e7832dn@googlegroups.com>
<sbkoim$nbh$1@dont-email.me> <KFmDI.10362$835.8905@fx36.iad>
<53a2e7dc-af76-4b75-9507-d0ac991a0d0fn@googlegroups.com>
NNTP-Posting-Host: Z/OnjRNZ74xzNAVdC5cKTg.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7.1
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Fri, 2 Jul 2021 10:24 UTC

MitchAlsup wrote:
> On Thursday, July 1, 2021 at 12:10:05 PM UTC-5, EricP wrote:
> <sigh>
>> Most compilers won't do this because they don't
>> really know about condition codes and don't realize that in
>>
>> if (i < j) ...
>> else if (i > j) ...
>> else
>>
>> the flag values can be reused.
> <
> Which, by the way, is how FORTRAN II's if statement worked ..........sigh..........
>
> {How much have we forgotten by making so much of an advance..........}
>
It is also how you want your comparison operator to work, i.e. <-1,0,1>
or just <negative,zero,positive>

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: sparse switch (was Minor idea for indirect target predictor)

<sbo29n$31j0$1@gal.iecc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18330&group=comp.arch#18330

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.cmpublishers.com!adore2!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: sparse switch (was Minor idea for indirect target predictor)
Date: Fri, 2 Jul 2021 22:00:23 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <sbo29n$31j0$1@gal.iecc.com>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com> <sbkoim$nbh$1@dont-email.me> <KFmDI.10362$835.8905@fx36.iad> <53a2e7dc-af76-4b75-9507-d0ac991a0d0fn@googlegroups.com>
Injection-Date: Fri, 2 Jul 2021 22:00:23 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="99936"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com> <sbkoim$nbh$1@dont-email.me> <KFmDI.10362$835.8905@fx36.iad> <53a2e7dc-af76-4b75-9507-d0ac991a0d0fn@googlegroups.com>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)

by: John Levine - Fri, 2 Jul 2021 22:00 UTC

According to MitchAlsup <MitchAlsup@aol.com>:
>On Thursday, July 1, 2021 at 12:10:05 PM UTC-5, EricP wrote:
><sigh>
>> Most compilers won't do this because they don't
>> really know about condition codes and don't realize that in
>>
>> if (i < j) ...
>> else if (i > j) ...
>> else
>>
>> the flag values can be reused.
><
>Which, by the way, is how FORTRAN II's if statement worked ..........sigh..........

Not really, since the 704 didn't have condition codes. It did have an accumulator
and instructions to branch on the AC being greater, less, or equal to zero.

I recall seeing case statements turned into branch trees a long time ago but
it was a high level special case. If the cases were dense it did an indexed
branch, or if there were just a few it did a sequence of tests, otherwise
it made a branch tree.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: sparse switch (was Minor idea for indirect target predictor)

<c352b592-9a42-4661-b423-2bd56151a842n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18332&group=comp.arch#18332

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:e982:: with SMTP id z2mr1879043qvn.58.1625268591413;
Fri, 02 Jul 2021 16:29:51 -0700 (PDT)
X-Received: by 2002:a05:6830:1d8:: with SMTP id r24mr1456927ota.206.1625268591156;
Fri, 02 Jul 2021 16:29:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 2 Jul 2021 16:29:50 -0700 (PDT)
In-Reply-To: <sbo29n$31j0$1@gal.iecc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:2504:1814:511f:f501;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:2504:1814:511f:f501
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<sbkoim$nbh$1@dont-email.me> <KFmDI.10362$835.8905@fx36.iad>
<53a2e7dc-af76-4b75-9507-d0ac991a0d0fn@googlegroups.com> <sbo29n$31j0$1@gal.iecc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c352b592-9a42-4661-b423-2bd56151a842n@googlegroups.com>
Subject: Re: sparse switch (was Minor idea for indirect target predictor)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 02 Jul 2021 23:29:51 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Fri, 2 Jul 2021 23:29 UTC

On Friday, July 2, 2021 at 5:00:25 PM UTC-5, John Levine wrote:
> According to MitchAlsup <Mitch...@aol.com>:
> >On Thursday, July 1, 2021 at 12:10:05 PM UTC-5, EricP wrote:
> ><sigh>
> >> Most compilers won't do this because they don't
> >> really know about condition codes and don't realize that in
> >>
> >> if (i < j) ...
> >> else if (i > j) ...
> >> else
> >>
> >> the flag values can be reused.
> ><
> >Which, by the way, is how FORTRAN II's if statement worked ..........sigh..........
<
> Not really, since the 704 didn't have condition codes. It did have an accumulator
> and instructions to branch on the AC being greater, less, or equal to zero.
<
What you describe of 704 is exactly what FORTRAN desired. So, yes, really !
>
> I recall seeing case statements turned into branch trees a long time ago but
> it was a high level special case. If the cases were dense it did an indexed
> branch, or if there were just a few it did a sequence of tests, otherwise
> it made a branch tree.
>
> --
> Regards,
> John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
> Please consider the environment before reading this e-mail. https://jl.ly

Re: Minor idea for indirect target predictor

<sbp4me$ruh$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18341&group=comp.arch#18341

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Sat, 3 Jul 2021 09:47:25 +0200
Organization: A noiseless patient Spider
Lines: 34
Message-ID: <sbp4me$ruh$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi11k$m95$1@dont-email.me>
<sbjush$2gp$1@dont-email.me> <sbl8hi$q3f$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 3 Jul 2021 07:47:26 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d0051a3783b6cb8b6e1b428fa06e0342";
logging-data="28625"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+eqjjy+OOPWyK35YcA9pmJaIAo9BVhFOA="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:uyKkeKi/Z88GkEQMcjFfbbrLVbU=
In-Reply-To: <sbl8hi$q3f$1@dont-email.me>
Content-Language: en-US

by: Marcus - Sat, 3 Jul 2021 07:47 UTC

On 2021-07-01, BGB wrote:
> On 7/1/2021 3:37 AM, Marcus wrote:

[snip]

>>
>> There's a clear difference between string encoding (ASCII, Latin-1,
>> CP-1252, UTF-8, UTF-16, ...) and character encoding. The latter can
>> usually be safely represented as a 32-bit integer representing a
>> single Unicode code point.
>>
>> In my opinion, any function dealing with a single character at a time
>> should use Unicode code points as input/output (rather than locale
>> dependent encodings that is just a big source of bugs).
>>
>
> The behavior of some parts of the C library assume locale, for example
> the multibyte string conversion functions, ...
>
> Likewise, the "cytpe.h" generally also assume locale-dependent behavior.
> The simple case though is to just hard-wire these to raw ASCII or CP1252
> or similar.

I realize that, but in the context of instruction set architectures
(which is what I'm thinking about here) I find it fairly uninteresting
to optimize for legacy C string libraries. True, they are used all over,
but OTOH they are quickly falling out of fashion. Any application that
cares about performance and/or internationalization will avoid most of
the standard C library string functions.

More interesting (IMO) is to optimize for modern languages and string
types, and possibly things like ICU (http://site.icu-project.org/).

/Marcus

Re: sparse switch (was Minor idea for indirect target predictor)

<sbp8mn$pp2$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18342&group=comp.arch#18342

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-30c8-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: sparse switch (was Minor idea for indirect target predictor)
Date: Sat, 3 Jul 2021 08:55:51 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbp8mn$pp2$1@newsreader4.netcologne.de>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<sbkoim$nbh$1@dont-email.me> <KFmDI.10362$835.8905@fx36.iad>
<53a2e7dc-af76-4b75-9507-d0ac991a0d0fn@googlegroups.com>
<sbo29n$31j0$1@gal.iecc.com>
<c352b592-9a42-4661-b423-2bd56151a842n@googlegroups.com>
Injection-Date: Sat, 3 Jul 2021 08:55:51 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-30c8-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:30c8:0:7285:c2ff:fe6c:992d";
logging-data="26402"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Sat, 3 Jul 2021 08:55 UTC

MitchAlsup <MitchAlsup@aol.com> schrieb:
> On Friday, July 2, 2021 at 5:00:25 PM UTC-5, John Levine wrote:
>> According to MitchAlsup <Mitch...@aol.com>:
>> >On Thursday, July 1, 2021 at 12:10:05 PM UTC-5, EricP wrote:
>> ><sigh>
>> >> Most compilers won't do this because they don't
>> >> really know about condition codes and don't realize that in
>> >>
>> >> if (i < j) ...
>> >> else if (i > j) ...
>> >> else
>> >>
>> >> the flag values can be reused.
>> ><
>> >Which, by the way, is how FORTRAN II's if statement worked ..........sigh..........
><
>> Not really, since the 704 didn't have condition codes. It did have an accumulator
>> and instructions to branch on the AC being greater, less, or equal to zero.
><
> What you describe of 704 is exactly what FORTRAN desired. So, yes, really !

In this case, the high-level language came later, so chances are
that Backus' team modeled the arithmetic IF on the capabilities
on the hardware.

Remember, they were making it up as they went along, and in their
minds, FORTRAN was something IBM 704-specific. Little did they
know what they created...

The machine influenced the language in other ways, for example the
six-letter variables (to fit six letters into a 36-bit register) or
the 72 character effective line length, because the I/O used the
Accumulator and the MQ register to read in a row of holes from a
punched card.

Re: Minor idea for indirect target predictor

<sbqrr5$880$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18352&group=comp.arch#18352

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Sat, 3 Jul 2021 18:26:07 -0500
Organization: A noiseless patient Spider
Lines: 124
Message-ID: <sbqrr5$880$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi11k$m95$1@dont-email.me>
<sbjush$2gp$1@dont-email.me> <sbl8hi$q3f$1@dont-email.me>
<sbp4me$ruh$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 3 Jul 2021 23:28:37 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="70a49ea9e7c621ca917ccf00da97be9e";
logging-data="8448"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1//vAvyFQ4JjpdB2mi6r4W7"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:YPxBGNY9S/HxFF35yzv6tjuzQGQ=
In-Reply-To: <sbp4me$ruh$1@dont-email.me>
Content-Language: en-US

by: BGB - Sat, 3 Jul 2021 23:26 UTC

On 7/3/2021 2:47 AM, Marcus wrote:
> On 2021-07-01, BGB wrote:
>> On 7/1/2021 3:37 AM, Marcus wrote:
>
> [snip]
>
>>>
>>> There's a clear difference between string encoding (ASCII, Latin-1,
>>> CP-1252, UTF-8, UTF-16, ...) and character encoding. The latter can
>>> usually be safely represented as a 32-bit integer representing a
>>> single Unicode code point.
>>>
>>> In my opinion, any function dealing with a single character at a time
>>> should use Unicode code points as input/output (rather than locale
>>> dependent encodings that is just a big source of bugs).
>>>
>>
>> The behavior of some parts of the C library assume locale, for example
>> the multibyte string conversion functions, ...
>>
>> Likewise, the "cytpe.h" generally also assume locale-dependent behavior.
>> The simple case though is to just hard-wire these to raw ASCII or
>> CP1252 or similar.
>
> I realize that, but in the context of instruction set architectures
> (which is what I'm thinking about here) I find it fairly uninteresting
> to optimize for legacy C string libraries. True, they are used all over,
> but OTOH they are quickly falling out of fashion. Any application that
> cares about performance and/or internationalization will avoid most of
> the standard C library string functions.
>

Granted, but the CPU itself doesn't need to care much about character
encoding in this case, it is more trying to get the correct behavior
from the C compiler and standard library.

Though, on this front, did realize recently that I had misunderstood a
few things about how C strings were supposed to behave regarding
encoding and locale, so had made an effort to fix it.

By default, C implementations tend to use ASCII as the default encoding.

Similarly, CP-1252 is basically the default character encoding for
Windows and MSVC, and is "nearly identical" to 8859-1 / Latin-1, just
with most of the control characters replaced with characters from
8859-15 (Latin-9).

While one could argue for using UTF-8 as the default character encoding
instead (as "making sense"), this is not ideal in terms of running
legacy code which may assumes 8859-1 or 1252 or similar.

It is possible to override this using '--locale=utf8' or '#pragma
setlocale("utf8")' or similar though.

Though, the "string semantics fix", does have the funky effect that
narrow strings using the UTF-8 encoding are now effectively
double-encoded (since the UTF-8 encoding is also being used to encode
bytes 00..FF).

If it were too big of an issue though, it could be possible to use a
hack to "re-encode" UTF-8 back into UTF-8 in a way which still preserved
the expected C semantics (and could transparently deal with arbitrary
character encodings and byte sequences), but with slightly reduced
expansion vs the current scheme.

Say:
0000..007F: ASCII Range (1x)
0080..00FF: Byte (2x)
0100..06FF: UTF-8, Fall-Through (1x)
0700..077F: Byte Pair (ASCII + Zero Suffix, *1, 1x)
0780..07FF: UTF-8, Latin-1, Remapped from 0080..00FF (1x)
0800..7FFF: UTF-8, Fall-Through (1x)
8000..FFFF: Byte-Pair (1.5x)

....

*1: This could be useful, potentially, for strings of the form:
"F\0o\0\o\0 \0B\0a\0r\0\r\0\n\0\0\0"
Or, basically, a legacy Win32 way of expression UTF-16 or UCS2 strings.

> More interesting (IMO) is to optimize for modern languages and string
> types, and possibly things like ICU (http://site.icu-project.org/).
>

FWIW: BGBCC does primarily use UTF-8 and Unicode internally.

A "sibling language" (sorta) to C on this project is BS2 (BGBScript2),
which is vaguely sort of like, essentially, "What if a Java or C#
language was built on top of a C-like backend?"

It handles strings a little differently than C: It uses a dedicated
"string" type, and treats strings mostly as opaque values. Though, as-in
C, the "string" type is essentially still represented using a bare pointer.

This is built around the observation that it is generally cheaper for
strings to intern them by default and treat them as immutable, than it
is to try to use dynamically-managed storage for strings (99% of strings
are effectively constants).

Strings in BS2 come in one of 3 major encodings:
* ASCII / CP-1252;
* UTF-16;
* UTF-8.

Unlike C though, the default "char" type is 16-bits, and the C 'char'
type is mostly replaced with byte/sbyte/ubyte.

The difference between string types is mostly hidden at the language
level, and it pretends as-if the strings were UTF-16 (if one tries to
access characters within the string).

The representation of BS2 strings is effectively a pointer to the first
character (like in C), but with a few preceding bytes encoding a length
and encoding/type tag.

Re: Minor idea for indirect target predictor

<sbrv45$d0s$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18355&group=comp.arch#18355

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!nGb8eeSGMNR1Kqz1vCOYug.user.gioia.aioe.org.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Sun, 4 Jul 2021 11:30:45 +0200
Organization: Aioe.org NNTP Server
Lines: 44
Message-ID: <sbrv45$d0s$1@gioia.aioe.org>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi11k$m95$1@dont-email.me>
<sbjush$2gp$1@dont-email.me> <sbl8hi$q3f$1@dont-email.me>
<sbp4me$ruh$1@dont-email.me> <sbqrr5$880$1@dont-email.me>
NNTP-Posting-Host: nGb8eeSGMNR1Kqz1vCOYug.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.7.1
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Sun, 4 Jul 2021 09:30 UTC

BGB wrote:
> By default, C implementations tend to use ASCII as the default encoding.
>
> Similarly, CP-1252 is basically the default character encoding for
> Windows and MSVC, and is "nearly identical" to 8859-1 / Latin-1, just
> with most of the control characters replaced with characters from
> 8859-15 (Latin-9).
>
> While one could argue for using UTF-8 as the default character encoding
> instead (as "making sense"), this is not ideal in terms of running
> legacy code which may assumes 8859-1 or 1252 or similar.
>
> It is possible to override this using '--locale=utf8' or '#pragma
> setlocale("utf8")' or similar though.
>
> Though, the "string semantics fix", does have the funky effect that
> narrow strings using the UTF-8 encoding are now effectively
> double-encoded (since the UTF-8 encoding is also being used to encode
> bytes 00..FF).
>
> If it were too big of an issue though, it could be possible to use a
> hack to "re-encode" UTF-8 back into UTF-8 in a way which still preserved
> the expected C semantics (and could transparently deal with arbitrary
> character encodings and byte sequences), but with slightly reduced
> expansion vs the current scheme.
>
> Say:
> 0000..007F: ASCII Range (1x)
> 0080..00FF: Byte (2x)
> 0100..06FF: UTF-8, Fall-Through (1x)
> 0700..077F: Byte Pair (ASCII + Zero Suffix, *1, 1x)
> 0780..07FF: UTF-8, Latin-1, Remapped from 0080..00FF (1x)
> 0800..7FFF: UTF-8, Fall-Through (1x)
> 8000..FFFF: Byte-Pair (1.5x)

I am almost certain any hack like this is explicitly banned by the utf8
rules!

I.e. an implementation _shall not_ generate non-canonical encodings, right?

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Minor idea for indirect target predictor

<sbsqiq$2mv$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18359&group=comp.arch#18359

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Sun, 4 Jul 2021 12:16:52 -0500
Organization: A noiseless patient Spider
Lines: 129
Message-ID: <sbsqiq$2mv$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi11k$m95$1@dont-email.me>
<sbjush$2gp$1@dont-email.me> <sbl8hi$q3f$1@dont-email.me>
<sbp4me$ruh$1@dont-email.me> <sbqrr5$880$1@dont-email.me>
<sbrv45$d0s$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 4 Jul 2021 17:19:23 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="70a49ea9e7c621ca917ccf00da97be9e";
logging-data="2783"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+vyk3W49Cox1NFf+HizBrE"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:61Kejm9NFpiOd6RBNE7QcVrw4Wk=
In-Reply-To: <sbrv45$d0s$1@gioia.aioe.org>
Content-Language: en-US

by: BGB - Sun, 4 Jul 2021 17:16 UTC

On 7/4/2021 4:30 AM, Terje Mathisen wrote:
> BGB wrote:
>> By default, C implementations tend to use ASCII as the default encoding.
>>
>> Similarly, CP-1252 is basically the default character encoding for
>> Windows and MSVC, and is "nearly identical" to 8859-1 / Latin-1, just
>> with most of the control characters replaced with characters from
>> 8859-15 (Latin-9).
>>
>> While one could argue for using UTF-8 as the default character
>> encoding instead (as "making sense"), this is not ideal in terms of
>> running legacy code which may assumes 8859-1 or 1252 or similar.
>>
>> It is possible to override this using '--locale=utf8' or '#pragma
>> setlocale("utf8")' or similar though.
>>
>> Though, the "string semantics fix", does have the funky effect that
>> narrow strings using the UTF-8 encoding are now effectively
>> double-encoded (since the UTF-8 encoding is also being used to encode
>> bytes 00..FF).
>>
>> If it were too big of an issue though, it could be possible to use a
>> hack to "re-encode" UTF-8 back into UTF-8 in a way which still
>> preserved the expected C semantics (and could transparently deal with
>> arbitrary character encodings and byte sequences), but with slightly
>> reduced expansion vs the current scheme.
>>
>> Say:
>>    0000..007F: ASCII Range (1x)
>>    0080..00FF: Byte (2x)
>>    0100..06FF: UTF-8, Fall-Through (1x)
>>    0700..077F: Byte Pair (ASCII + Zero Suffix, *1, 1x)
>>    0780..07FF: UTF-8, Latin-1, Remapped from 0080..00FF (1x)
>>    0800..7FFF: UTF-8, Fall-Through (1x)
>>    8000..FFFF: Byte-Pair (1.5x)
>
> I am almost certain any hack like this is explicitly banned by the utf8
> rules!
>
> I.e. an implementation _shall not_ generate non-canonical encodings, right?
>

Probably, if used for text interchange, but in this case it is mostly
kept internal to the compiler (though may also appear in RIL3 IR /
bytecode), and the main alternative would be changing how string
literals are managed by the compiler (IOW, not treating them like ASCIZ
strings; but instead using either Pascal-style strings or length-limited
arrays or similar).

In the IR stage, strings may also be used for passing binary data blobs,
blobs of inline ASM, ...

Though, one could argue how much storage-size matters in an IR format,
but anyways.

Either way, it was already non-canonical due to things like overlong
encodings ('C0 80' to encode embedded NUL bytes and similar), and the
potential use of non-BMP characters being encoded using surrogate pairs
encoded as UTF-8.

Both of these had precedent in the Java VM and .NET CLR and similar.

Typically the string is converted into its actual target encoding (plain
UTF-8, CP1252, ...) once the final binary is generated.

Previously, I had used a scheme where strings were stored in a more
generic M-UTF-8 variant and then converted to the target encoding in the
backend. Looking around though, I realized this was inconsistent with C
semantics in a few areas and led to some encoding ambiguities. Getting
the specified semantics would more-or-less would require dealing with
the text-encoding conversions in the tokenizer / lexer.

Though, within the generated binaries, there is still the possible
funkiness of the compiler generating string literals preceded by
backwards-encoded codepoints for length prefixes and similar in the case
of BS2 string literals.

I don't necessarily think that UTF-8 encoding restrictions necessarily
apply much to things which fall too far outside of using it for normal
text interchange.

Granted, I may had partly based some of this on using the MSDN docs as a
reference, would otherwise need to figure it out from the C standard,
but the C standard seems to take a stance more like "text encodings are
a thing that exist" and doesn't seem to expand much on the text encoding
semantics much beyond this.

As-is, the other string literal formats (L, u, U, and u8), still use
plain M-UTF-8 though, since in this case these semantics don't lead to
ambiguity.

Where also, L and u strings:
* \xXXXX, limited to 4 digits;
* \uXXXX, same as usual.
* \UXXXXXXXX, produces a surrogate pair.

And, for U and u8:
* \xXXXXXXXX, limited to 8 digits;
* \uXXXX, same as usual.
* \UXXXXXXXX, encoded directly.

As for why \U uses 8-digits when the Unicode encoding space only uses 6,
dunno.

Whereas, for narrow strings (no prefix):
\xXX, limited to 2 digits;
\uXXXX, encoded as a character in the current locale.
For CP-1252, mapped to a character byte.
If it fails, it is mapped to 8F and a warning may be issued.
It may be specified manually via the setlocale pragma.
Or, the default case may be modified via a command-line option.

My interpretation of u8 strings does differ slightly from MSDN's
description, given I left them essentially as plain UTF strings, whereas
the online description implied they would behave more like narrow
strings with the locale set to UTF-8 for that particular literal.

....

Re: Minor idea for indirect target predictor

<sbsrmv$7rm$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18360&group=comp.arch#18360

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-30c8-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Sun, 4 Jul 2021 17:38:39 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbsrmv$7rm$1@newsreader4.netcologne.de>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi11k$m95$1@dont-email.me>
<sbjush$2gp$1@dont-email.me> <sbl8hi$q3f$1@dont-email.me>
<sbp4me$ruh$1@dont-email.me> <sbqrr5$880$1@dont-email.me>
<sbrv45$d0s$1@gioia.aioe.org> <sbsqiq$2mv$1@dont-email.me>
Injection-Date: Sun, 4 Jul 2021 17:38:39 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-30c8-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:30c8:0:7285:c2ff:fe6c:992d";
logging-data="8054"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Sun, 4 Jul 2021 17:38 UTC

BGB <cr88192@gmail.com> schrieb:

> Probably, if used for text interchange, but in this case it is mostly
> kept internal to the compiler (though may also appear in RIL3 IR /
> bytecode), and the main alternative would be changing how string
> literals are managed by the compiler (IOW, not treating them like ASCIZ
> strings; but instead using either Pascal-style strings or length-limited
> arrays or similar).

The gfortran front end just uses an unsigned 32-bit integer
as internal character data type. Anything else would just add
unnecessary complexity and run-time overhead. Their length
is given as a ssize_t integer.

If this is just for literals which occur in programs, I would
be extremely surprised if this turned out to be a bottleneck
for any reasonable program.

The best way is probably to translate to that type on input, do
all stuff that needs to be done on that simple data type and then
write it to the desired format on output.

It is more profitable to spend your time optimizing other things :-)

Re: Minor idea for indirect target predictor

<sbt9lt$bsb$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18365&group=comp.arch#18365

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Sun, 4 Jul 2021 16:34:30 -0500
Organization: A noiseless patient Spider
Lines: 109
Message-ID: <sbt9lt$bsb$1@dont-email.me>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi11k$m95$1@dont-email.me>
<sbjush$2gp$1@dont-email.me> <sbl8hi$q3f$1@dont-email.me>
<sbp4me$ruh$1@dont-email.me> <sbqrr5$880$1@dont-email.me>
<sbrv45$d0s$1@gioia.aioe.org> <sbsqiq$2mv$1@dont-email.me>
<sbsrmv$7rm$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 4 Jul 2021 21:37:01 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="70a49ea9e7c621ca917ccf00da97be9e";
logging-data="12171"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Qezq7oImiw7aTIOrG2Opt"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:ubysz9nkN5vfXF8o9i4jOaPGZ7s=
In-Reply-To: <sbsrmv$7rm$1@newsreader4.netcologne.de>
Content-Language: en-US

by: BGB - Sun, 4 Jul 2021 21:34 UTC

On 7/4/2021 12:38 PM, Thomas Koenig wrote:
> BGB <cr88192@gmail.com> schrieb:
>
>> Probably, if used for text interchange, but in this case it is mostly
>> kept internal to the compiler (though may also appear in RIL3 IR /
>> bytecode), and the main alternative would be changing how string
>> literals are managed by the compiler (IOW, not treating them like ASCIZ
>> strings; but instead using either Pascal-style strings or length-limited
>> arrays or similar).
>
> The gfortran front end just uses an unsigned 32-bit integer
> as internal character data type. Anything else would just add
> unnecessary complexity and run-time overhead. Their length
> is given as a ssize_t integer.
>
> If this is just for literals which occur in programs, I would
> be extremely surprised if this turned out to be a bottleneck
> for any reasonable program.
>

Performance bottleneck, no, unlikely.

Memory space overhead, you might be surprised...

For a compiler and VM, apart from things like AST node-like structures
and similar, string data can end up being a significant chunk of the
memory footprint.

One doesn't just have literals, but may also have:
* any symbols (variable / function name) within the program;
* the names of any visible function prototypes, typedefs, ...;
* Every signature string for every variable or function;
* ...

So, having a space-efficient representation for string data can be
relevant, as can interning any short/recurring strings.

As noted, in my compiler, inline ASM blobs, and also entire ASM modules
(post preprocessor) may also be passed through the IR stage via string
literals, as well as some binary data blobs, such as "Resource WAD"
lumps, ...

Essentially, pretty much the entire front-end of the compiler is fed
through something that is essentially comparable to a binary-coded
version of PostScript, and stored temporarily as ".ril" (RIL3) files.

When the final binary is to be compiled, these RIL3 files are
decoded/interpreted, which essentially builds all of the structures used
by the backend (and is also translated into 3AC and basic-blocks via
this interpretation process).

It is possible that RIL3 files could be considered also as input to an
AOT compiler, but the main debate is whether the memory overhead would
be low enough to be "acceptable" (its design would basically depend on
the ability to be able to keep everything in RAM during the AOT
compilation process, though possible workarounds may exist).

Ideally, one would want the ability to have a single-pass AOT compiler
which can fit in a memory footprint of, say, under 4MB or so, and where
its memory footprint doesn't increase drastically when compiling a
larger or more complex program (ignoring the space needed to store the
intermediate program sections or "object code").

> The best way is probably to translate to that type on input, do
> all stuff that needs to be done on that simple data type and then
> write it to the desired format on output.
>

Could be.

As noted, convert-on-output was how it worked previously, but figured it
needed to be changed given doing it this way led to some ambiguities
regarding the parsing and handling of C string literals.

Luckily, the literals were already capable of handling arbitrary binary
data, so this part wasn't a huge leap.

The new encoding idea could be used to reduce the expansion of binary
data and such in a few other cases.

> It is more profitable to spend your time optimizing other things :-)
>

I have spent a lot more time writing about it on here than this took to
"actually implement".

Though, the main thing I changed from how it was handling binary data
previously was to allow some "UTF-8 pass through" ranges, and to add
rules for encoding 2-byte pairs as a single code-point, rather than
using multiple code-points.

In the prior encoding, it simply escapes all bytes outside the 01..7F
range by encoding them using a 2-byte encoding.

But, 2x2 -> 4, whereas 2 bytes via 8000..FFFF is 3-bytes, thus saving 1
byte in this case.

The 01..7F range can be partly ignored because these are already encoded
in an unescaped form.

....

Re: Minor idea for indirect target predictor

<sbtaqn$iia$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18367&group=comp.arch#18367

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-30c8-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Minor idea for indirect target predictor
Date: Sun, 4 Jul 2021 21:56:39 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sbtaqn$iia$1@newsreader4.netcologne.de>
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com>
<17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com>
<sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me>
<sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me>
<sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de>
<sbhukl$4a3$1@dont-email.me> <sbi11k$m95$1@dont-email.me>
<sbjush$2gp$1@dont-email.me> <sbl8hi$q3f$1@dont-email.me>
<sbp4me$ruh$1@dont-email.me> <sbqrr5$880$1@dont-email.me>
<sbrv45$d0s$1@gioia.aioe.org> <sbsqiq$2mv$1@dont-email.me>
<sbsrmv$7rm$1@newsreader4.netcologne.de> <sbt9lt$bsb$1@dont-email.me>
Injection-Date: Sun, 4 Jul 2021 21:56:39 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-30c8-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:30c8:0:7285:c2ff:fe6c:992d";
logging-data="19018"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Sun, 4 Jul 2021 21:56 UTC

BGB <cr88192@gmail.com> schrieb:

> One doesn't just have literals, but may also have:
> * any symbols (variable / function name) within the program;
> * the names of any visible function prototypes, typedefs, ...;
> * Every signature string for every variable or function;
> * ...

Any of those should be ASCII, right?

> So, having a space-efficient representation for string data can be
> relevant, as can interning any short/recurring strings.

Hmm... ok...

>
>
> As noted, in my compiler, inline ASM blobs, and also entire ASM modules
> (post preprocessor) may also be passed through the IR stage via string
> literals, as well as some binary data blobs, such as "Resource WAD"
> lumps, ...

I would think that there is a difference between user strings
(which may be UTF-8 or whatever) and internal representation of
variable names. Those are two rather different things, which I
would probably put in separate code paths.

> Essentially, pretty much the entire front-end of the compiler is fed
> through something that is essentially comparable to a binary-coded
> version of PostScript, and stored temporarily as ".ril" (RIL3) files.

Hm, interesting. Is there a particular reason why you use files instead
passing things through data structures in memory?

(This is one of the things I really do not like about standard UNIX
compilation, writing out the code in assembler and then reading
it back and parsing it again seems like a waste.)

> When the final binary is to be compiled, these RIL3 files are
> decoded/interpreted, which essentially builds all of the structures used
> by the backend (and is also translated into 3AC and basic-blocks via
> this interpretation process).

OK.

> It is possible that RIL3 files could be considered also as input to an
> AOT compiler, but the main debate is whether the memory overhead would
> be low enough to be "acceptable" (its design would basically depend on
> the ability to be able to keep everything in RAM during the AOT
> compilation process, though possible workarounds may exist).

I can understand that you want your compiler to be able to run on
your own hardware, but...

> Ideally, one would want the ability to have a single-pass AOT compiler
> which can fit in a memory footprint of, say, under 4MB or so, and where
> its memory footprint doesn't increase drastically when compiling a
> larger or more complex program (ignoring the space needed to store the
> intermediate program sections or "object code").

....that seems to be a rather harsh requirement these days. I am
all for conserving resources, but I personally would probably
rather go for a cross-compilation than trying to fit a compiler
into 4MB. (Then again, I work in the gcc framework, where such
restrictions do not exists).

>
>
>> The best way is probably to translate to that type on input, do
>> all stuff that needs to be done on that simple data type and then
>> write it to the desired format on output.
>>
>
> Could be.
>
> As noted, convert-on-output was how it worked previously, but figured it
> needed to be changed given doing it this way led to some ambiguities
> regarding the parsing and handling of C string literals.
>
> Luckily, the literals were already capable of handling arbitrary binary
> data, so this part wasn't a huge leap.

Including null bytes? That's good design, then.

[...]

Re: Minor idea for indirect target predictor

<c73236fe-ec2d-4638-b5a3-01b41e8519f4n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18368&group=comp.arch#18368

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:59c7:: with SMTP id n190mr11142852qkb.146.1625437308434; Sun, 04 Jul 2021 15:21:48 -0700 (PDT)
X-Received: by 2002:a4a:d781:: with SMTP id c1mr8119818oou.23.1625437308211; Sun, 04 Jul 2021 15:21:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 4 Jul 2021 15:21:47 -0700 (PDT)
In-Reply-To: <sbtaqn$iia$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:19f3:6b6f:f544:1f6d; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:19f3:6b6f:f544:1f6d
References: <bbbfd05b-e065-4d1a-85c9-4afdc0905722n@googlegroups.com> <17f5bdce-25a9-4d68-8eca-c1554947b143n@googlegroups.com> <sbe511$src$1@gioia.aioe.org> <sbfg9g$371$1@dont-email.me> <sbfs7g$o15$1@dont-email.me> <sbg6bf$rrm$1@dont-email.me> <sbhng6$17b$1@gioia.aioe.org> <sbhofu$o28$1@newsreader4.netcologne.de> <sbhukl$4a3$1@dont-email.me> <sbi11k$m95$1@dont-email.me> <sbjush$2gp$1@dont-email.me> <sbl8hi$q3f$1@dont-email.me> <sbp4me$ruh$1@dont-email.me> <sbqrr5$880$1@dont-email.me> <sbrv45$d0s$1@gioia.aioe.org> <sbsqiq$2mv$1@dont-email.me> <sbsrmv$7rm$1@newsreader4.netcologne.de> <sbt9lt$bsb$1@dont-email.me> <sbtaqn$iia$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c73236fe-ec2d-4638-b5a3-01b41e8519f4n@googlegroups.com>
Subject: Re: Minor idea for indirect target predictor
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 04 Jul 2021 22:21:48 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 74

by: MitchAlsup - Sun, 4 Jul 2021 22:21 UTC

On Sunday, July 4, 2021 at 4:56:41 PM UTC-5, Thomas Koenig wrote:
> BGB <cr8...@gmail.com> schrieb:
> > One doesn't just have literals, but may also have:
> > * any symbols (variable / function name) within the program;
> > * the names of any visible function prototypes, typedefs, ...;
> > * Every signature string for every variable or function;
> > * ...
> Any of those should be ASCII, right?
> > So, having a space-efficient representation for string data can be
> > relevant, as can interning any short/recurring strings.
> Hmm... ok...
> >
> >
> > As noted, in my compiler, inline ASM blobs, and also entire ASM modules
> > (post preprocessor) may also be passed through the IR stage via string
> > literals, as well as some binary data blobs, such as "Resource WAD"
> > lumps, ...
> I would think that there is a difference between user strings
> (which may be UTF-8 or whatever) and internal representation of
> variable names. Those are two rather different things, which I
> would probably put in separate code paths.
> > Essentially, pretty much the entire front-end of the compiler is fed
> > through something that is essentially comparable to a binary-coded
> > version of PostScript, and stored temporarily as ".ril" (RIL3) files.
> Hm, interesting. Is there a particular reason why you use files instead
> passing things through data structures in memory?
>
> (This is one of the things I really do not like about standard UNIX
> compilation, writing out the code in assembler and then reading
> it back and parsing it again seems like a waste.)
<
It was not a waste when there was only 64KB of memory for your
application and you wanted to compile files larger than would fit
in memory.
<
> > When the final binary is to be compiled, these RIL3 files are
> > decoded/interpreted, which essentially builds all of the structures used
> > by the backend (and is also translated into 3AC and basic-blocks via
> > this interpretation process).
> OK.
> > It is possible that RIL3 files could be considered also as input to an
> > AOT compiler, but the main debate is whether the memory overhead would
> > be low enough to be "acceptable" (its design would basically depend on
> > the ability to be able to keep everything in RAM during the AOT
> > compilation process, though possible workarounds may exist).
> I can understand that you want your compiler to be able to run on
> your own hardware, but...
> > Ideally, one would want the ability to have a single-pass AOT compiler
> > which can fit in a memory footprint of, say, under 4MB or so, and where
> > its memory footprint doesn't increase drastically when compiling a
> > larger or more complex program (ignoring the space needed to store the
> > intermediate program sections or "object code").
> ...that seems to be a rather harsh requirement these days. I am
> all for conserving resources, but I personally would probably
> rather go for a cross-compilation than trying to fit a compiler
> into 4MB. (Then again, I work in the gcc framework, where such
> restrictions do not exists).
> >
> >
> >> The best way is probably to translate to that type on input, do
> >> all stuff that needs to be done on that simple data type and then
> >> write it to the desired format on output.
> >>
> >
> > Could be.
> >
> > As noted, convert-on-output was how it worked previously, but figured it
> > needed to be changed given doing it this way led to some ambiguities
> > regarding the parsing and handling of C string literals.
> >
> > Luckily, the literals were already capable of handling arbitrary binary
> > data, so this part wasn't a huge leap.
> Including null bytes? That's good design, then.
>
> [...]

Pages:1 234 5 6 7 8

server_pubkey.txt

rocksolid light 0.9.8
clearnet tor