Welcome to novaBBS (click a section below)

mail files register newsreader groups login

Message-ID:

19 May, 2024: Line wrapping has been changed to be more consistent with Usenet standards.
If you find that it is broken please let me know here rocksolid.nodes.help

Concertina II Progress

Subject	Author
Concertina II Progress	Quadibloc
Re: Concertina II Progress	BGB
Re: Concertina II Progress	Thomas Koenig
Re: Concertina II Progress	BGB-Alt
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	BGB-Alt
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	BGB
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	BGB
Re: Concertina II Progress	Stephen Fuld
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	BGB-Alt
Re: Concertina II Progress	Stephen Fuld
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Stephen Fuld
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Stephen Fuld
Re: Concertina II Progress	BGB
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	BGB
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Stefan Monnier
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Paul A. Clayton
Re: Concertina II Progress	Stefan Monnier
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	BGB
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	BGB
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	BGB
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	BGB
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	BGB
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	Robert Finch
Re: Concertina II Progress	BGB
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Robert Finch
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Chris M. Thomasson
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Chris M. Thomasson
Re: Concertina II Progress	BGB
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	BGB
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Robert Finch
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Robert Finch
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Scott Lurndal
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	BGB
Re: Concertina II Progress	Paul A. Clayton
Re: Concertina II Progress	Robert Finch
Re: Concertina II Progress	Paul A. Clayton
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Paul A. Clayton
Re: Concertina II Progress	BGB
Computer architecture (was: Concertina II Progress)	Anton Ertl
Re: Computer architecture	EricP
Re: Computer architecture	Anton Ertl
Re: Computer architecture	Scott Lurndal
Re: Computer architecture	Stefan Monnier
Re: Computer architecture	Scott Lurndal
Re: Computer architecture	Stefan Monnier
Re: Computer architecture	Scott Lurndal
Re: Computer architecture	Stefan Monnier
Re: Computer architecture	BGB
Re: Computer architecture	Stefan Monnier
Re: Computer architecture	BGB
Re: Computer architecture	Scott Lurndal
Re: Computer architecture	Anton Ertl
Re: Computer architecture	Paul A. Clayton
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Robert Finch
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	MitchAlsup
Re: Concertina II Progress	Thomas Koenig
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	Quadibloc
Re: Concertina II Progress	MitchAlsup

Pages:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161718 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

Re: Concertina II Progress

<um0oh9$vu3u$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35862&group=comp.arch#35862

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadib...@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Thu, 21 Dec 2023 07:12:41 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <um0oh9$vu3u$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<uij9lt$3054t$1@newsreader4.netcologne.de> <uijjcd$2d9sp$1@dont-email.me>
<uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me>
<uire3v$7li2$1@dont-email.me> <uk7rik$tu34$1@dont-email.me>
<ulr336$3t479$2@dont-email.me> <ulrgb2$3unbg$1@dont-email.me>
<ulskvc$4teb$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 21 Dec 2023 07:12:41 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="65c215f1a00e01545e16b1cda1629e81";
logging-data="1046654"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/PZeRzD9XC6NlyGhpib957ztqD7/NSegE="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:ifSYqvz7QOaT48mjGrWkKpP8DYY=

by: Quadibloc - Thu, 21 Dec 2023 07:12 UTC

Now I've simplified the format of the Composed Instructions, which
allow instructions longer than 32 bits to appear in code without
block headers.

This freed up just enough opcode space that I could just barely
add a header format for reserving part of a block for pseudo-immediates
with essentially zero overhead back in to the instruction set.

I felt this feature was needed to make immediate values feel like
a real part of the instruction set; if they always required a full
32-bit header as overhead, there would be reluctance to use them.

John Savard

Re: Concertina II Progress

<um0r6i$10a4s$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35864&group=comp.arch#35864

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!nntp.comgw.net!paganini.bofh.team!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Thu, 21 Dec 2023 08:58:10 +0100
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <um0r6i$10a4s$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<2023Dec3.153637@mips.complang.tuwien.ac.at> <ukilt3$303sv$1@dont-email.me>
<71910e37d784192f7adce00f4d3b3f3e@news.novabbs.com>
<ukiqdb$2dg6p$1@dont-email.me>
<41f4c1184e960c9e97fcff43ab68b17c@news.novabbs.com>
<ukj2pi$32gt4$1@dont-email.me>
<92c62fa38dc8740877d08dcb26704212@news.novabbs.com>
<ul44h3$khbu$1@newsreader4.netcologne.de> <ulcft4$t0bq$1@dont-email.me>
<0177880ee05c2861f55b0609989a6af6@news.novabbs.com>
<wuEeN.26394$PuZ9.5742@fx11.iad> <ulgfe0$1oc9k$4@dont-email.me>
<ulkfm8$2f2qa$1@dont-email.me> <um09oo$tu5q$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 21 Dec 2023 07:58:10 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="886bdbbdb47f52bca9bcfd243becfcfa";
logging-data="1058972"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+0a+MhaCZwCKN6smKIBC0RvgwVFFQ4f0k="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.11.0
Cancel-Lock: sha1:TmiW5BYmNxFzxG2FudWytxoCoH8=
Content-Language: en-GB
In-Reply-To: <um09oo$tu5q$2@dont-email.me>

by: David Brown - Thu, 21 Dec 2023 07:58 UTC

On 21/12/2023 04:00, Chris M. Thomasson wrote:
> On 12/16/2023 7:28 AM, David Brown wrote:
>> On 15/12/2023 03:59, Chris M. Thomasson wrote:
>>> On 12/14/2023 6:41 AM, Scott Lurndal wrote:
>>>> mitchalsup@aol.com (MitchAlsup) writes:

>>>> I ask for PDF's. I have no ability to read windows office formats
>>>> of any type without using star/open/libre office, and I detest WYSIWYG
>>>> word processors of all stripes.
>>>
>>> Try to stay far away from windows office docs, they can be filled
>>> with interesting macros, well back in the day! I do remember a lot of
>>> print to PDF programs. Mock up a printer device, print, produce a file.
>>
>> They are only a problem if you use MS Office. LibreOffice, and its
>> predecessors, disable the macros by default.
>>
>> PDF also supports dangerous links and Javascript.
>
> Indeed!
>
>
>> It's not a problem if you use a decent pdf viewer, but if you use
>> Adobe Acrobat on Windows, you can definitely be at risk.
>>
>
> Well, just make sure the PDF reader has javascript turned off all
> around. Trust in it.

"Trust in it" ?

Some readers /are/ trustworthy. Adobe's are not - Acrobat reader has
endless lists of security holes. I haven't had it installed on a PC for
many years, so things may have changed, but in comparison to any other
reader it was huge, slow, and required continuous upgrading to deal with
vulnerabilities, requiring a reboot of Windows each time. Horrible
software.

On Linux, common readers like evince don't support javascript - you can
trust them!

Re: Concertina II Progress

<um1akp$12lv4$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35865&group=comp.arch#35865

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Thu, 21 Dec 2023 13:21:45 +0100
Organization: A noiseless patient Spider
Lines: 29
Message-ID: <um1akp$12lv4$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me> <uijjcd$2d9sp$1@dont-email.me>
<uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me>
<uire3v$7li2$1@dont-email.me> <uk7rik$tu34$1@dont-email.me>
<ukact0$1e539$1@dont-email.me> <ukc34t$1po20$1@dont-email.me>
<1949acd069b7c93db910f3c0357a0298@news.novabbs.com>
<2023Dec3.153637@mips.complang.tuwien.ac.at> <ukilt3$303sv$1@dont-email.me>
<71910e37d784192f7adce00f4d3b3f3e@news.novabbs.com>
<ukiuts$320sm$1@dont-email.me> <2023Dec5.120709@mips.complang.tuwien.ac.at>
<ulskg7$4q6g$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 21 Dec 2023 12:21:45 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="7438c4f507dea2cfeae135832a9734ea";
logging-data="1136612"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX180OZJ/kxqJlArDaJhnO+Y2eK0BVfywXn4Qlp0gDKed+g=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.18
Cancel-Lock: sha1:et613Nlex+WFrDOouaFJk7t/sAg=
In-Reply-To: <ulskg7$4q6g$2@dont-email.me>

by: Terje Mathisen - Thu, 21 Dec 2023 12:21 UTC

Quadibloc wrote:
> On Tue, 05 Dec 2023 11:07:09 +0000, Anton Ertl wrote:
>
>> Quadibloc <quadibloc@servername.invalid> writes:
>> [IBM Model 195]
>>> Its microarchitecture ended up being, in general terms, copied by the
>>> Pentium Pro and the Pentium II.
>>
>> Not really. The Models 91 and 195 only have OoO for FP, not for
>> integers.
>
> As do the Pentium Pro and the Pentium II. (The Motorola 68050 did it

Huh???

I'm sure Andy Glew would disagree re the PPro!

Terje
> the other way around, only having OoO for integers, and not for FP,
> figuring, I guess, that integers are used the most, so this would
> create better performance numbers.)
>
> John Savard
>

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Superior architecture styles?

<20231221150808.00007ab4@yahoo.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35868&group=comp.arch#35868

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5...@yahoo.com (Michael S)
Newsgroups: comp.arch
Subject: Re: Superior architecture styles?
Date: Thu, 21 Dec 2023 15:08:08 +0200
Organization: A noiseless patient Spider
Lines: 42
Message-ID: <20231221150808.00007ab4@yahoo.com>
References: <uigus7$1pteb$1@dont-email.me>
<2023Dec3.153637@mips.complang.tuwien.ac.at>
<ukilt3$303sv$1@dont-email.me>
<71910e37d784192f7adce00f4d3b3f3e@news.novabbs.com>
<ukiqdb$2dg6p$1@dont-email.me>
<41f4c1184e960c9e97fcff43ab68b17c@news.novabbs.com>
<ukj2pi$32gt4$1@dont-email.me>
<ukk0m0$3bbkb$1@dont-email.me>
<jwvr0k1k41o.fsf-monnier+comp.arch@gnu.org>
<ukm32p$3p2jh$1@dont-email.me>
<YUGbN.204677$Ee89.140988@fx17.iad>
<2023Dec6.085407@mips.complang.tuwien.ac.at>
<ulqu0t$3oouk$1@dont-email.me>
<2023Dec19.094918@mips.complang.tuwien.ac.at>
<jwvfrzxdkfa.fsf-monnier+comp.arch@gnu.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="ad61e48ed7f6a2e90dee9db879d36192";
logging-data="1140154"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/9GdfHHWkN/3Mx28lvedmyBdA9ogaJRGM="
Cancel-Lock: sha1:cTe55qscPajeFdcoE6SiPl7/+uE=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)

by: Michael S - Thu, 21 Dec 2023 13:08 UTC

On Wed, 20 Dec 2023 09:26:09 -0500
Stefan Monnier <monnier@iro.umontreal.ca> wrote:

> > Isn't that as it should be? Hardware provides some simple,
> > possibly inconvenient interface, and software abstracts the
> > inconvenience away; e.g., RISCs provide only a load/store
> > interface, and compilers translate the stuff programmers write into
> > this interface. The difference is that with RISCs the result is a
> > least as fast as architectures that tried to cater more to the
> > programming language, while I am convinced that hardware where the
> > designers design for efficient sequential consistency will let
> > programmers write software that handily outperforms using libraries
> > or system calls on hardware designed for weaker memory models.
>
> While I agree that CPUs should provide sequential consistency (it
> would make life easier for debugging, formalization, and reasoning,
> and I can't think of any reason why it should have a significant
> cost),

The same reason TSO-like models were invented and re-invented since
first S/370 multiprocessor: unconstrained ability to feed loads
from local store queue. If anything, I would think that this ability
is even more advantageous with big store queues and deep OoO windows of
today than it was back in early S/370 days.

> I'd be surprised if it "handily outperforms" the status quo:
> concurrent programming is hard even with sequential consistency, so
> most normal code would still continue using the exact same libraries.
>
>
> Stefan

I agree.
More so, even for "normal abnormal" code, x86 model where TSO is
augmented by implied memory barriers in all synchronization
instructions does not appear inferior to SC.
Now, some people want to write "abnormal abnormal" code, where
threads communicate with regular loads and stores, without any use of
special synchronization instructions, but this people are a very small
minority.

Re: Concertina II Progress

<5iYgN.93173$Wp_8.63957@fx17.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35872&group=comp.arch#35872

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx17.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Concertina II Progress
Newsgroups: comp.arch
References: <uigus7$1pteb$1@dont-email.me> <71910e37d784192f7adce00f4d3b3f3e@news.novabbs.com> <ukiqdb$2dg6p$1@dont-email.me> <41f4c1184e960c9e97fcff43ab68b17c@news.novabbs.com> <ukj2pi$32gt4$1@dont-email.me> <92c62fa38dc8740877d08dcb26704212@news.novabbs.com> <ul44h3$khbu$1@newsreader4.netcologne.de> <ulcft4$t0bq$1@dont-email.me> <0177880ee05c2861f55b0609989a6af6@news.novabbs.com> <wuEeN.26394$PuZ9.5742@fx11.iad> <ulgfe0$1oc9k$4@dont-email.me> <ulkfm8$2f2qa$1@dont-email.me> <um09oo$tu5q$2@dont-email.me>
Lines: 71
Message-ID: <5iYgN.93173$Wp_8.63957@fx17.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Thu, 21 Dec 2023 14:51:45 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Thu, 21 Dec 2023 14:51:45 GMT
X-Received-Bytes: 4047

by: Scott Lurndal - Thu, 21 Dec 2023 14:51 UTC

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
>On 12/16/2023 7:28 AM, David Brown wrote:
>> On 15/12/2023 03:59, Chris M. Thomasson wrote:
>>> On 12/14/2023 6:41 AM, Scott Lurndal wrote:
>>>> mitchalsup@aol.com (MitchAlsup) writes:
>>>>> David Brown wrote:
>>>>>
>>>>>> On 10/12/2023 11:39, Thomas Koenig wrote:
>>>>>>> MitchAlsup <mitchalsup@aol.com> schrieb:
>>>>>>>
>>>>>>>> Question (to everyone):: Has your word processor or spreadsheet
>>>>>>>> added anything USEFUL TO YOU since 2000 ??
>>>>>>>
>>>>>>> In my case: Yes.
>>>>>>>
>>>>>>> Besides making many things worse, the new formula editor (since
>>>>>>> 2010?) in Word is reasonable to work with, especially since it is
>>>>>>> possible to use LaTeX notation now (and thus it is now possible
>>>>>>> to paste from Maple).
>>>>>>>
>>>>>
>>>>>> If I want to write something serious with formula, I use LaTeX.
>>>>>
>>>>>>> Previously, I actually wrote some reports in LaTeX, going to some
>>>>>>> trouble to make them appear visually like the Word template du jour
>>>>>>> (but the formulas gave it away, they looked to nice for Word).
>>>>>>>
>>>>>
>>>>>> What a strange thing to do - that sounds completely backwards to me!
>>>>>
>>>>>> I was happy when I had made a template for LibreOffice (it might have
>>>>>> been one of the forks of OpenOffice, pre-LibreOffice) that looked
>>>>>> similar to what I have for LaTeX. Then I could make
>>>>>> reasonable-looking
>>>>>> documents for customers that insisted on having docx format instead
>>>>>> of pdf.
>>>>>
>>>>>> I don't think there has been much exciting or important (to me)
>>>>>> added to
>>>>>> word processors for decades. Direct pdf generation was one, which
>>>>>> probably existed in Star Office (the ancestor of OpenOffice /
>>>>>
>>>>> *.pdf arrives in Word ~2000 (maybe before).
>>>>
>>>> Are you sure about that? IIRC it was a decade later before
>>>> adobe wasn't required.
>>>>
>>>> <snip>
>>>>
>>>>> I still require people sending me *.docx to convert it back to
>>>>> WORD2003 format *.doc and retransmitting it. It is surprising how
>>>>> many people don't know how to do that.
>>>>
>>>> I ask for PDF's. I have no ability to read windows office formats
>>>> of any type without using star/open/libre office, and I detest WYSIWYG
>>>> word processors of all stripes.
>>>
>>> Try to stay far away from windows office docs, they can be filled with
>>> interesting macros, well back in the day! I do remember a lot of print
>>> to PDF programs. Mock up a printer device, print, produce a file.
>>
>> They are only a problem if you use MS Office. LibreOffice, and its
>> predecessors, disable the macros by default.
>>
>> PDF also supports dangerous links and Javascript.
>
>Indeed!

Although my PDF reader ignores links and Javascript (xpdf),
and I've yet to encounter a PDF that xpdf cannot read.

Re: Concertina II Progress

<QiYgN.93174$Wp_8.66810@fx17.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35873&group=comp.arch#35873

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.swapon.de!news.in-chemnitz.de!3.eu.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx17.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Concertina II Progress
Newsgroups: comp.arch
References: <uigus7$1pteb$1@dont-email.me> <ukiqdb$2dg6p$1@dont-email.me> <41f4c1184e960c9e97fcff43ab68b17c@news.novabbs.com> <ukj2pi$32gt4$1@dont-email.me> <92c62fa38dc8740877d08dcb26704212@news.novabbs.com> <ul44h3$khbu$1@newsreader4.netcologne.de> <ulcft4$t0bq$1@dont-email.me> <0177880ee05c2861f55b0609989a6af6@news.novabbs.com> <wuEeN.26394$PuZ9.5742@fx11.iad> <ulgfe0$1oc9k$4@dont-email.me> <ulkfm8$2f2qa$1@dont-email.me> <um09oo$tu5q$2@dont-email.me> <um0r6i$10a4s$1@dont-email.me>
Lines: 44
Message-ID: <QiYgN.93174$Wp_8.66810@fx17.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Thu, 21 Dec 2023 14:52:32 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Thu, 21 Dec 2023 14:52:32 GMT
X-Received-Bytes: 2730

by: Scott Lurndal - Thu, 21 Dec 2023 14:52 UTC

David Brown <david.brown@hesbynett.no> writes:
>On 21/12/2023 04:00, Chris M. Thomasson wrote:
>> On 12/16/2023 7:28 AM, David Brown wrote:
>>> On 15/12/2023 03:59, Chris M. Thomasson wrote:
>>>> On 12/14/2023 6:41 AM, Scott Lurndal wrote:
>>>>> mitchalsup@aol.com (MitchAlsup) writes:
>
>>>>> I ask for PDF's. I have no ability to read windows office formats
>>>>> of any type without using star/open/libre office, and I detest WYSIWYG
>>>>> word processors of all stripes.
>>>>
>>>> Try to stay far away from windows office docs, they can be filled
>>>> with interesting macros, well back in the day! I do remember a lot of
>>>> print to PDF programs. Mock up a printer device, print, produce a file.
>>>
>>> They are only a problem if you use MS Office. LibreOffice, and its
>>> predecessors, disable the macros by default.
>>>
>>> PDF also supports dangerous links and Javascript.
>>
>> Indeed!
>>
>>
>>> It's not a problem if you use a decent pdf viewer, but if you use
>>> Adobe Acrobat on Windows, you can definitely be at risk.
>>>
>>
>> Well, just make sure the PDF reader has javascript turned off all
>> around. Trust in it.
>
>"Trust in it" ?
>
>Some readers /are/ trustworthy. Adobe's are not - Acrobat reader has
>endless lists of security holes. I haven't had it installed on a PC for
>many years, so things may have changed, but in comparison to any other
>reader it was huge, slow, and required continuous upgrading to deal with
>vulnerabilities, requiring a reboot of Windows each time. Horrible
>software.
>
>On Linux, common readers like evince don't support javascript - you can
>trust them!

Although the evince UI is crap. I prefer xpdf.

Re: Concertina II Progress

<um1oun$14rv2$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35878&group=comp.arch#35878

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: quadib...@servername.invalid (Quadibloc)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Thu, 21 Dec 2023 16:25:59 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 14
Message-ID: <um1oun$14rv2$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me> <uijjcd$2d9sp$1@dont-email.me>
<uijk93$2dc2i$2@dont-email.me> <uijr5g$2ep8o$1@dont-email.me>
<uire3v$7li2$1@dont-email.me> <uk7rik$tu34$1@dont-email.me>
<ukact0$1e539$1@dont-email.me> <ukc34t$1po20$1@dont-email.me>
<1949acd069b7c93db910f3c0357a0298@news.novabbs.com>
<2023Dec3.153637@mips.complang.tuwien.ac.at>
<ukilt3$303sv$1@dont-email.me>
<71910e37d784192f7adce00f4d3b3f3e@news.novabbs.com>
<ukiuts$320sm$1@dont-email.me>
<2023Dec5.120709@mips.complang.tuwien.ac.at> <ulskg7$4q6g$2@dont-email.me>
<um1akp$12lv4$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 21 Dec 2023 16:25:59 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="65c215f1a00e01545e16b1cda1629e81";
logging-data="1208290"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18tGa+BD7NbFqf7sZgxDZnN9ILW7cY/4mw="
User-Agent: Pan/0.146 (Hic habitat felicitas; d7a48b4
gitlab.gnome.org/GNOME/pan.git)
Cancel-Lock: sha1:5MVnIXsJyza6vUxpThHtWXpvSxE=

by: Quadibloc - Thu, 21 Dec 2023 16:25 UTC

On Thu, 21 Dec 2023 13:21:45 +0100, Terje Mathisen wrote:

> Huh???
>
> I'm sure Andy Glew would disagree re the PPro!

I distinctly remember reading somewhere about the Pentium Pro, II, and
the 68060, but Wikipedia doesn't back me up, so it's entirely possible
that the one place where I read this - which I can't identify, not
remembering what it was - was in error. Since this was the same as the
360/91, naturally it was memorable to me, so I remembered that, and forgot
anything contradicting it I might have read elsewhere.

John Savard

Re: Superior architecture styles?

<2023Dec21.184624@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35883&group=comp.arch#35883

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Superior architecture styles?
Date: Thu, 21 Dec 2023 17:46:24 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 34
Message-ID: <2023Dec21.184624@mips.complang.tuwien.ac.at>
References: <uigus7$1pteb$1@dont-email.me> <ukiqdb$2dg6p$1@dont-email.me> <41f4c1184e960c9e97fcff43ab68b17c@news.novabbs.com> <ukj2pi$32gt4$1@dont-email.me> <ukk0m0$3bbkb$1@dont-email.me> <jwvr0k1k41o.fsf-monnier+comp.arch@gnu.org> <ukm32p$3p2jh$1@dont-email.me> <YUGbN.204677$Ee89.140988@fx17.iad> <2023Dec6.085407@mips.complang.tuwien.ac.at> <ulqu0t$3oouk$1@dont-email.me> <2023Dec19.094918@mips.complang.tuwien.ac.at> <jwvfrzxdkfa.fsf-monnier+comp.arch@gnu.org>
Injection-Info: dont-email.me; posting-host="22d67c57d4c94525ca046727884919fe";
logging-data="1242774"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+7SOWueFzKvi+9R0yJEazk"
Cancel-Lock: sha1:KwL0/pIG6bE+po1nM9WNBgACdUk=
X-newsreader: xrn 10.11

by: Anton Ertl - Thu, 21 Dec 2023 17:46 UTC

Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> Isn't that as it should be? Hardware provides some simple, possibly
>> inconvenient interface, and software abstracts the inconvenience
>> away; e.g., RISCs provide only a load/store interface, and compilers
>> translate the stuff programmers write into this interface. The
>> difference is that with RISCs the result is a least as fast as
>> architectures that tried to cater more to the programming language,
>> while I am convinced that hardware where the designers design for
>> efficient sequential consistency will let programmers write software
>> that handily outperforms using libraries or system calls on hardware
>> designed for weaker memory models.
>
>While I agree that CPUs should provide sequential consistency (it would
>make life easier for debugging, formalization, and reasoning, and
>I can't think of any reason why it should have a significant cost), I'd
>be surprised if it "handily outperforms" the status quo: concurrent
>programming is hard even with sequential consistency, so most normal
>code would still continue using the exact same libraries.

Yes, most programmers will still write sequential programs or use
libraries or system calls, but if the number of people who write
lock-free concurrent code increases by a factor of 100, I expect that
there will be more efficient libraries, so even those who use
libraries will benefit.

I actually expect even the existing libraries to perform better on
hardware that efficiently implements sequential consistency, because
the barriers become noops instead of being super-expensive on weakly
ordered hardware).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Superior architecture styles?

<2023Dec21.191326@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35887&group=comp.arch#35887

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Superior architecture styles?
Date: Thu, 21 Dec 2023 18:13:26 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 54
Message-ID: <2023Dec21.191326@mips.complang.tuwien.ac.at>
References: <uigus7$1pteb$1@dont-email.me> <41f4c1184e960c9e97fcff43ab68b17c@news.novabbs.com> <ukj2pi$32gt4$1@dont-email.me> <ukk0m0$3bbkb$1@dont-email.me> <jwvr0k1k41o.fsf-monnier+comp.arch@gnu.org> <ukm32p$3p2jh$1@dont-email.me> <YUGbN.204677$Ee89.140988@fx17.iad> <2023Dec6.085407@mips.complang.tuwien.ac.at> <ulqu0t$3oouk$1@dont-email.me> <2023Dec19.094918@mips.complang.tuwien.ac.at> <jwvfrzxdkfa.fsf-monnier+comp.arch@gnu.org> <ebc9593b790afee0d722c6828f7db5ec@news.novabbs.com>
Injection-Info: dont-email.me; posting-host="22d67c57d4c94525ca046727884919fe";
logging-data="1251551"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+dXt06ddWPulAnw7Is1l51"
Cancel-Lock: sha1:tHwcQ6tOgklTQdcI8wbFvFU0Z4A=
X-newsreader: xrn 10.11

by: Anton Ertl - Thu, 21 Dec 2023 18:13 UTC

mitchalsup@aol.com (MitchAlsup) writes:
>As to costs::
>
> l = p[ i / 123456789 ];
> r[7] = s;
>
>The LD from p is delayed 20+odd cycles by the IDIV. Under SC or TSO
>the unaliased r[7] ST cannot become visible until the LD has become
>visible; yet since they cannot alias, this thread is not sensitive
>to the order in which they are performed

What do you mean with "visible"? You present a single-threaded
program here, why talk about memory ordering at all?

And why would I want to perform the store early? The more typical
case is that you have a store architecturally before a load, and you
want to perform the load early.

What current CPUs do is to ask a predictor whether two accesses alias,
and if the predictor says that they don't, it speculates that they can
move past each other. If the speculation turns out to be wrong, the
speculative state is dropped, just as in case of a branch
misprediction.

And for multi-threaded accesses to memory, sequential consistency can
be implemented in a simular way: Have a predictor that tells you what
memory is contended, and for non-contended memory, speculatively do
the fast thing, and if another thread turns out to access the same
memory, recover and do it in the architectural order.

>--and certain compilers with
>certain flags will more the ST above the LD

That is irrelevant for the question of how an architecture should
perform. It's the other way round: programming languages pass on the
bad programming models that bad (i.e., weakly ordered) architectures
provide. What should they do? Provide sequential consistency by
inserting a memory barrier between any two memory accesses? On weakly
ordered hardware where memory barriers are extremely slow?

If, OTOH, architectures implemented efficient sequential consistency,
programming languages would provide sequential consistency. And those
that don't would be abandoned by everybody but Chris M. Thomasson.

>--certain HW memory queues
>will also allow ST to become visible before the LD.

Yes, we know it's possible to design bad hardware. The idea here is
to tell the hardware designers that weak ordering is not good enough.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Superior architecture styles?

<6e2b1bc342f0f94e3ff6dbef0daf0051@news.novabbs.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35889&group=comp.arch#35889

copy link Newsgroups: comp.arch

Date: Thu, 21 Dec 2023 19:51:50 +0000
Subject: Re: Superior architecture styles?
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
From: mitchal...@aol.com (MitchAlsup)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$iPRnOQD/EqRmBgZd1JhSw.ph2.5Qy3yDLM.DKYnSy/liJ2kK6T5ny
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uigus7$1pteb$1@dont-email.me> <41f4c1184e960c9e97fcff43ab68b17c@news.novabbs.com> <ukj2pi$32gt4$1@dont-email.me> <ukk0m0$3bbkb$1@dont-email.me> <jwvr0k1k41o.fsf-monnier+comp.arch@gnu.org> <ukm32p$3p2jh$1@dont-email.me> <YUGbN.204677$Ee89.140988@fx17.iad> <2023Dec6.085407@mips.complang.tuwien.ac.at> <ulqu0t$3oouk$1@dont-email.me> <2023Dec19.094918@mips.complang.tuwien.ac.at> <jwvfrzxdkfa.fsf-monnier+comp.arch@gnu.org> <ebc9593b790afee0d722c6828f7db5ec@news.novabbs.com> <2023Dec21.191326@mips.complang.tuwien.ac.at>
Organization: novaBBS
Message-ID: <6e2b1bc342f0f94e3ff6dbef0daf0051@news.novabbs.com>

by: MitchAlsup - Thu, 21 Dec 2023 19:51 UTC

Anton Ertl wrote:

> mitchalsup@aol.com (MitchAlsup) writes:
>>As to costs::
>>
>> l = p[ i / 123456789 ];
>> r[7] = s;
>>
>>The LD from p is delayed 20+odd cycles by the IDIV. Under SC or TSO
>>the unaliased r[7] ST cannot become visible until the LD has become
>>visible; yet since they cannot alias, this thread is not sensitive
>>to the order in which they are performed

> What do you mean with "visible"?

It is the same language Lamport used in his seminal paper. It means
when when the address has been broadcast (on the bus) but neither
the read or write has been performed. In CPUs with cache, it means
when the cache has been accessed and hit has not been granted.

> You present a single-threaded
> program here, why talk about memory ordering at all?

Somebody ask for a case where SC reduced performance. TSO also has
reduced performance in this case, whereas Cache, causal, and weak
consistencies do not.

> And why would I want to perform the store early? The more typical
> case is that you have a store architecturally before a load, and you
> want to perform the load early.

Is not the general paradigm of OoO processors to do everything as early
as possible ?? And you want both LDs and STs to pass each other freely
AFTER you know the virtual address (after AGEN) when independent, but
remain in processor order when dependent.

Also note:: One can perform a younger LD early in the face of a
ST with unknown address, if the pipeline can reRun the LD again
if the older ST happens to conflict with the value LDed. When one
demands SC, you cannot do this unless you are willing to back all
the way up to the ST with the conflict.

> What current CPUs do is to ask a predictor whether two accesses alias,
> and if the predictor says that they don't, it speculates that they can
> move past each other. If the speculation turns out to be wrong, the
> speculative state is dropped, just as in case of a branch
> misprediction.

Yes, this is backing up to the ST, causal only has to reRun the LD--
which is a lot less expensive (to performance).

> And for multi-threaded accesses to memory, sequential consistency can
> be implemented in a simular way: Have a predictor that tells you what
> memory is contended, and for non-contended memory, speculatively do
> the fast thing, and if another thread turns out to access the same
> memory, recover and do it in the architectural order.

I agree when doing cross-threaded accesses SC is the best model for
the programmer (least complicated) which is why My 66000 reverts to
SC when doing ATOMIC stuff and reverts back to causal when this is
no longer necessary. In order to be in a position to do this, both
the LD and ST of the ATOMIC event have to be special in some way to
the pipeline. I mark them with a Lock-bit.

Mixing causal and SC at the boundaries of cross-threading accesses
is done without a predictor (although one could be used) by restricting
when cells in a memory dependence matrix is allowed to relax (and allow
dependent memory reference to become visible). A predictor makes recovery
of the MDM position significantly harder.

>>--and certain compilers with
>>certain flags will more the ST above the LD

> That is irrelevant for the question of how an architecture should
> perform.

But is entirely relevant to how a µArchitecture is allowed to perform.
When the µArchitecture is allowed to move LDs around STs when there
is no overlap in the containers accessed, performance is improved.
But when this is allowed, one needs to know if one is performing an
ATOMIC event (or not) so as to prevent that.
> It's the other way round: programming languages pass on the
> bad programming models that bad (i.e., weakly ordered) architectures
> provide. What should they do? Provide sequential consistency by
> inserting a memory barrier between any two memory accesses? On weakly
> ordered hardware where memory barriers are extremely slow?

Use processors that fix the problem for you with essentially no effort
on your part.

> If, OTOH, architectures implemented efficient sequential consistency,
> programming languages would provide sequential consistency. And those
> that don't would be abandoned by everybody but Chris M. Thomasson.

As illustrated above SC-only harms performance, not by a lot, but by enough
to convert a winning µArchiteccture into a Ho-Hum competitor.

>>--certain HW memory queues
>>will also allow ST to become visible before the LD.

> Yes, we know it's possible to design bad hardware. The idea here is
> to tell the hardware designers that weak ordering is not good enough.

I am not now or for the past 10 years been advocating for weak memory
models when cross-thread accesses are involved. Here, I am advocating
for weakened models that happen to allow for cross-thread accesses to
stabilize to the position that all interested parties agree on what
happened and in what order. This is causal consistency. {{I do not
believe that memory models weaker than causal bring enough performance
to the table to justify the added SW complexity.}}

I am not now or for the past 10 years been advocating for compilers to
make assumptions about memory aliasing allowing them to move LDs and
STs across each other. Here, I am advocating for HW to allows LD<->ST
when accesses can be proven never to conflict outside of cross-thread
memory accesses.

And B: HW designers do not understand memory ordering requirements
sufficiently that they CAN design HW to the specifications you request.
It is not in their DNA....They work with languages that provide the
illusion of everything that can be in parallel already is in parallel
and they did not have to do anything to cause that to be.

> - anton

Re: Concertina II Progress

<um2aj6$17ium$2@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35891&group=comp.arch#35891

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: Concertina II Progress
Date: Thu, 21 Dec 2023 13:27:01 -0800
Organization: A noiseless patient Spider
Lines: 54
Message-ID: <um2aj6$17ium$2@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<2023Dec3.153637@mips.complang.tuwien.ac.at> <ukilt3$303sv$1@dont-email.me>
<71910e37d784192f7adce00f4d3b3f3e@news.novabbs.com>
<ukiqdb$2dg6p$1@dont-email.me>
<41f4c1184e960c9e97fcff43ab68b17c@news.novabbs.com>
<ukj2pi$32gt4$1@dont-email.me>
<92c62fa38dc8740877d08dcb26704212@news.novabbs.com>
<ul44h3$khbu$1@newsreader4.netcologne.de> <ulcft4$t0bq$1@dont-email.me>
<0177880ee05c2861f55b0609989a6af6@news.novabbs.com>
<wuEeN.26394$PuZ9.5742@fx11.iad> <ulgfe0$1oc9k$4@dont-email.me>
<ulkfm8$2f2qa$1@dont-email.me> <um09oo$tu5q$2@dont-email.me>
<um0r6i$10a4s$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 21 Dec 2023 21:27:02 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d5d40264675499f3691087d5c7ec9ace";
logging-data="1297366"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/xznPBP1C5knkz97zjYG3egLBCWQo8fPI="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:B3+B3jySIFdnmR2hp9Ssnrw73XU=
In-Reply-To: <um0r6i$10a4s$1@dont-email.me>
Content-Language: en-US

by: Chris M. Thomasson - Thu, 21 Dec 2023 21:27 UTC

On 12/20/2023 11:58 PM, David Brown wrote:
> On 21/12/2023 04:00, Chris M. Thomasson wrote:
>> On 12/16/2023 7:28 AM, David Brown wrote:
>>> On 15/12/2023 03:59, Chris M. Thomasson wrote:
>>>> On 12/14/2023 6:41 AM, Scott Lurndal wrote:
>>>>> mitchalsup@aol.com (MitchAlsup) writes:
>
>>>>> I ask for PDF's. I have no ability to read windows office formats
>>>>> of any type without using star/open/libre office, and I detest WYSIWYG
>>>>> word processors of all stripes.
>>>>
>>>> Try to stay far away from windows office docs, they can be filled
>>>> with interesting macros, well back in the day! I do remember a lot
>>>> of print to PDF programs. Mock up a printer device, print, produce a
>>>> file.
>>>
>>> They are only a problem if you use MS Office. LibreOffice, and its
>>> predecessors, disable the macros by default.
>>>
>>> PDF also supports dangerous links and Javascript.
>>
>> Indeed!
>>
>>
>>> It's not a problem if you use a decent pdf viewer, but if you use
>>> Adobe Acrobat on Windows, you can definitely be at risk.
>>>
>>
>> Well, just make sure the PDF reader has javascript turned off all
>> around. Trust in it.
>
> "Trust in it" ?

I meant that as basically, a sarcastic response. Well, shit can happen.
The JavaScript installed a virus anyway. A person says: But, we had
Javascript turned off with a little checkbox in the GUI... ;^o

Face palm! ;^o

>
> Some readers /are/ trustworthy. Adobe's are not - Acrobat reader has
> endless lists of security holes. I haven't had it installed on a PC for
> many years, so things may have changed, but in comparison to any other
> reader it was huge, slow, and required continuous upgrading to deal with
> vulnerabilities, requiring a reboot of Windows each time. Horrible
> software.
>
> On Linux, common readers like evince don't support javascript - you can
> trust them!
>

Re: Superior architecture styles?

<um2b9s$17p02$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35892&group=comp.arch#35892

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: Superior architecture styles?
Date: Thu, 21 Dec 2023 13:39:05 -0800
Organization: A noiseless patient Spider
Lines: 69
Message-ID: <um2b9s$17p02$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<41f4c1184e960c9e97fcff43ab68b17c@news.novabbs.com>
<ukj2pi$32gt4$1@dont-email.me> <ukk0m0$3bbkb$1@dont-email.me>
<jwvr0k1k41o.fsf-monnier+comp.arch@gnu.org> <ukm32p$3p2jh$1@dont-email.me>
<YUGbN.204677$Ee89.140988@fx17.iad>
<2023Dec6.085407@mips.complang.tuwien.ac.at> <ulqu0t$3oouk$1@dont-email.me>
<2023Dec19.094918@mips.complang.tuwien.ac.at>
<jwvfrzxdkfa.fsf-monnier+comp.arch@gnu.org>
<ebc9593b790afee0d722c6828f7db5ec@news.novabbs.com>
<2023Dec21.191326@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 21 Dec 2023 21:39:09 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d5d40264675499f3691087d5c7ec9ace";
logging-data="1303554"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+At7gJTwtLeeS1ZD/v201sthV6MalrjB4="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:ELnlcLaH6Uf0soOATPo4cgG8Wmc=
Content-Language: en-US
In-Reply-To: <2023Dec21.191326@mips.complang.tuwien.ac.at>

by: Chris M. Thomasson - Thu, 21 Dec 2023 21:39 UTC

On 12/21/2023 10:13 AM, Anton Ertl wrote:
> mitchalsup@aol.com (MitchAlsup) writes:
>> As to costs::
>>
>> l = p[ i / 123456789 ];
>> r[7] = s;
>>
>> The LD from p is delayed 20+odd cycles by the IDIV. Under SC or TSO
>> the unaliased r[7] ST cannot become visible until the LD has become
>> visible; yet since they cannot alias, this thread is not sensitive
>> to the order in which they are performed
>
> What do you mean with "visible"? You present a single-threaded
> program here, why talk about memory ordering at all?
>
> And why would I want to perform the store early? The more typical
> case is that you have a store architecturally before a load, and you
> want to perform the load early.
>
> What current CPUs do is to ask a predictor whether two accesses alias,
> and if the predictor says that they don't, it speculates that they can
> move past each other. If the speculation turns out to be wrong, the
> speculative state is dropped, just as in case of a branch
> misprediction.
>
> And for multi-threaded accesses to memory, sequential consistency can
> be implemented in a simular way: Have a predictor that tells you what
> memory is contended, and for non-contended memory, speculatively do
> the fast thing, and if another thread turns out to access the same
> memory, recover and do it in the architectural order.
>
>> --and certain compilers with
>> certain flags will more the ST above the LD
>
> That is irrelevant for the question of how an architecture should
> perform. It's the other way round: programming languages pass on the
> bad programming models that bad (i.e., weakly ordered) architectures
> provide. What should they do? Provide sequential consistency by
> inserting a memory barrier between any two memory accesses? On weakly
> ordered hardware where memory barriers are extremely slow?
>
> If, OTOH, architectures implemented efficient sequential consistency,
> programming languages would provide sequential consistency. And those
> that don't would be abandoned by everybody but Chris M. Thomasson.

Na. I would want to see a new arch that is 100% seq_cst, beat the shit
out of my existing finely tuned algorithms that use highly relaxed
memory barriers that work on existing arch. To borrow a quote from one
of my friends, Joe Seigh. Can't remember it verbatim right now, but it
went something like this:
______________
When you get lemons, you make lemonade.
When you get hardware, you make software.
______________

iirc, it was is sig for messages usenet.

>
>> --certain HW memory queues
>> will also allow ST to become visible before the LD.
>
> Yes, we know it's possible to design bad hardware. The idea here is
> to tell the hardware designers that weak ordering is not good enough.
>
> - anton

Re: Superior architecture styles?

<um2bsh$17p02$2@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35893&group=comp.arch#35893

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: Superior architecture styles?
Date: Thu, 21 Dec 2023 13:49:03 -0800
Organization: A noiseless patient Spider
Lines: 30
Message-ID: <um2bsh$17p02$2@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<2023Dec3.153637@mips.complang.tuwien.ac.at> <ukilt3$303sv$1@dont-email.me>
<71910e37d784192f7adce00f4d3b3f3e@news.novabbs.com>
<ukiqdb$2dg6p$1@dont-email.me>
<41f4c1184e960c9e97fcff43ab68b17c@news.novabbs.com>
<ukj2pi$32gt4$1@dont-email.me> <ukk0m0$3bbkb$1@dont-email.me>
<jwvr0k1k41o.fsf-monnier+comp.arch@gnu.org> <ukm32p$3p2jh$1@dont-email.me>
<YUGbN.204677$Ee89.140988@fx17.iad>
<2023Dec6.085407@mips.complang.tuwien.ac.at> <ulqu0t$3oouk$1@dont-email.me>
<2023Dec19.094918@mips.complang.tuwien.ac.at>
<jwvfrzxdkfa.fsf-monnier+comp.arch@gnu.org>
<ebc9593b790afee0d722c6828f7db5ec@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 21 Dec 2023 21:49:05 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d5d40264675499f3691087d5c7ec9ace";
logging-data="1303554"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19vTKJIu+MfITdPelU3J23eyOYK8xBi6Q4="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:Q1Tfkoi5zUP+jQRVd0puU3eHNwQ=
In-Reply-To: <ebc9593b790afee0d722c6828f7db5ec@news.novabbs.com>
Content-Language: en-US

by: Chris M. Thomasson - Thu, 21 Dec 2023 21:49 UTC

On 12/20/2023 8:29 AM, MitchAlsup wrote:
> Stefan Monnier wrote:
>
>>> Isn't that as it should be? Hardware provides some simple, possibly
>>> inconvenient interface, and software abstracts the inconvenience
>>> away; e.g., RISCs provide only a load/store interface, and compilers
>>> translate the stuff programmers write into this interface. The
>>> difference is that with RISCs the result is a least as fast as
>>> architectures that tried to cater more to the programming language,
>>> while I am convinced that hardware where the designers design for
>>> efficient sequential consistency will let programmers write software
>>> that handily outperforms using libraries or system calls on hardware
>>> designed for weaker memory models.
>
>> While I agree that CPUs should provide sequential consistency

It's nice that they already do. The fun part is creating algorithms that
try to avoid it. Not only avoid seq_cst, but avoid a acquire and release
membars, and the fun acq_rel acquire release all in one... ;^) Now,
there are certain algorithms that, as-is, cannot avoid it (seq_cst).
Original SMR and, even Dekkers algorithm require seq_cst...

>> (it would
>> make life easier for debugging, formalization, and reasoning, and
>> I can't think of any reason why it should have a significant cost), I'd
>> be surprised if it "handily outperforms" the status quo: concurrent
>> programming is hard even with sequential consistency, so most normal
>> code would still continue using the exact same libraries.
[...]

Sequential Consistency (wa: Superior architecture ...)

<2023Dec22.182153@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35904&group=comp.arch#35904

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Sequential Consistency (wa: Superior architecture ...)
Date: Fri, 22 Dec 2023 17:21:53 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 98
Message-ID: <2023Dec22.182153@mips.complang.tuwien.ac.at>
References: <uigus7$1pteb$1@dont-email.me> <ukk0m0$3bbkb$1@dont-email.me> <jwvr0k1k41o.fsf-monnier+comp.arch@gnu.org> <ukm32p$3p2jh$1@dont-email.me> <YUGbN.204677$Ee89.140988@fx17.iad> <2023Dec6.085407@mips.complang.tuwien.ac.at> <ulqu0t$3oouk$1@dont-email.me> <2023Dec19.094918@mips.complang.tuwien.ac.at> <jwvfrzxdkfa.fsf-monnier+comp.arch@gnu.org> <ebc9593b790afee0d722c6828f7db5ec@news.novabbs.com> <2023Dec21.191326@mips.complang.tuwien.ac.at> <6e2b1bc342f0f94e3ff6dbef0daf0051@news.novabbs.com>
Injection-Info: dont-email.me; posting-host="3ece815b28275e3a07ba112b681d5e5c";
logging-data="1763831"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18RWhPfAQGjNAEnfyQ4huRt"
Cancel-Lock: sha1:sELWJSd09YFt8344964cFKaTFEE=
X-newsreader: xrn 10.11

by: Anton Ertl - Fri, 22 Dec 2023 17:21 UTC

mitchalsup@aol.com (MitchAlsup) writes:
>Anton Ertl wrote:
>> You present a single-threaded
>> program here, why talk about memory ordering at all?
>
>Somebody ask for a case where SC reduced performance. TSO also has
>reduced performance in this case, whereas Cache, causal, and weak
>consistencies do not.

In cases where there is no behavioural difference between sequential
consistency and weaker memory orderings, there is no fundamental
reason why there should be a performance difference.

You are apparently thinking about a particular implementation (my
guess is: MY66000). What may hold for this implementation does not
necessarily hold in general.

>Also note:: One can perform a younger LD early in the face of a
>ST with unknown address, if the pipeline can reRun the LD again
>if the older ST happens to conflict with the value LDed. When one
>demands SC, you cannot do this unless you are willing to back all
>the way up to the ST with the conflict.

I would just check whether the result of the load has changed from the
speculatively executed version, and if so, restart from load. Why
would you restart from the earlier store?

>>>--and certain compilers with
>>>certain flags will more the ST above the LD
>
>> That is irrelevant for the question of how an architecture should
>> perform.
>
>But is entirely relevant to how a �Architecture is allowed to perform.

What a microarchitecture is allowed to do is entirely limited by the
architecture. If the architecture specifies sequential consistency,
the microarchitecture has to implement sequential consistency. If the
architecture specifies TSO, the microarchitecture has to implement TSO
or better. What some compiler does is irrelevant.

>When the �Architecture is allowed to move LDs around STs when there
>is no overlap in the containers accessed, performance is improved.

containers?

>> If, OTOH, architectures implemented efficient sequential consistency,
>> programming languages would provide sequential consistency. And those
>> that don't would be abandoned by everybody but Chris M. Thomasson.
>
>As illustrated above SC-only harms performance, not by a lot, but by enough
>to convert a winning �Architeccture into a Ho-Hum competitor.

I have not found any such illustration in what you wrote, only claims
(probably based on a MY66000 microarchitecture).

In particular, given that for single-threaded programs the behaviour
is the same between SC and weak ordering, it is possible to implement
a microarchitecture that provides SC and has the same performance for
single-threaded programs as a weakly-ordered competitor.

Now, consider access to shared memory in multi-threaded programs.
With a microarchitecture that is optimized for weak ordering, every
accesses to shared memory will be slow (by having to use slow atomic
operations, memory barriers and somesuch). By contrast, with a
microarchitecture that is optimized for sequential consistency,
accesses to shared memory will be fast in the uncontended case and
slow only in the contended case (then the reruns are needed). Bottom
line: If the programmer made contention rare, the microarchitecture
designed for sequential consistency wins.

Compare with precise exceptions: The Alpha guys told us that precise
exceptions are too slow, and gave us imprecise exceptions with a slow
trap barrier instruction (TRAPB) for limiting the imprecision; and
programmers should avoid TRAPB at all costs. Then they implemented
the 21264, and lo and behold, TRAPB was as cheap as a NOP; it probably
was a NOP, and the 21264 implemented precise exceptions, because it
comes for free when you do an OoO microarchitecture with speculation.
And the 21264 was faster than the 21164a with its imprecise
exceptions.

>And B: HW designers do not understand memory ordering requirements
>sufficiently that they CAN design HW to the specifications you request.
>It is not in their DNA....They work with languages that provide the
>illusion of everything that can be in parallel already is in parallel
>and they did not have to do anything to cause that to be.

Whatever DNA they may have, they have managed to implement the
sequential semantics (including precise exceptions) of instruction
sets, and I am sure that they will manage to implement sequential
consistency efficiently if they are tasked to do that. But we have to
tell them that weak ordering (which is easier to implement) is not
good enough.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Sequential Consistency (wa: Superior architecture ...)

<um4ogv$1md2p$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35906&group=comp.arch#35906

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: Sequential Consistency (wa: Superior architecture ...)
Date: Fri, 22 Dec 2023 11:37:03 -0800
Organization: A noiseless patient Spider
Lines: 111
Message-ID: <um4ogv$1md2p$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me> <ukk0m0$3bbkb$1@dont-email.me>
<jwvr0k1k41o.fsf-monnier+comp.arch@gnu.org> <ukm32p$3p2jh$1@dont-email.me>
<YUGbN.204677$Ee89.140988@fx17.iad>
<2023Dec6.085407@mips.complang.tuwien.ac.at> <ulqu0t$3oouk$1@dont-email.me>
<2023Dec19.094918@mips.complang.tuwien.ac.at>
<jwvfrzxdkfa.fsf-monnier+comp.arch@gnu.org>
<ebc9593b790afee0d722c6828f7db5ec@news.novabbs.com>
<2023Dec21.191326@mips.complang.tuwien.ac.at>
<6e2b1bc342f0f94e3ff6dbef0daf0051@news.novabbs.com>
<2023Dec22.182153@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 22 Dec 2023 19:37:03 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="976eed046f8a4bb8ca380aeef2438635";
logging-data="1782873"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/expHUvm7p4gBBixPZ7m7EpBLK8nN3RX4="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:4GwCiDrXNZmFgENbq+obcJrUJTY=
In-Reply-To: <2023Dec22.182153@mips.complang.tuwien.ac.at>
Content-Language: en-US

by: Chris M. Thomasson - Fri, 22 Dec 2023 19:37 UTC

On 12/22/2023 9:21 AM, Anton Ertl wrote:
> mitchalsup@aol.com (MitchAlsup) writes:
>> Anton Ertl wrote:
>>> You present a single-threaded
>>> program here, why talk about memory ordering at all?
>>
>> Somebody ask for a case where SC reduced performance. TSO also has
>> reduced performance in this case, whereas Cache, causal, and weak
>> consistencies do not.

SC radically reduces performance on existing archs for sure. Think of
iterating a linked list. Wrt RCU, the readers can iterate the list
without using any memory barriers (DEC Alpha aside for a moment). This
is FAR superior when compared to having to use SQ style membars for
every iteration! Aka, MFENCE, LOCK RMW on Intel, #StoreLoad on SPARC,
ect... Keep in mind that this is for existing archs! Not a new one that
might be created in the future.

:^D

>
> In cases where there is no behavioural difference between sequential
> consistency and weaker memory orderings, there is no fundamental
> reason why there should be a performance difference.
>
> You are apparently thinking about a particular implementation (my
> guess is: MY66000). What may hold for this implementation does not
> necessarily hold in general.
>
>> Also note:: One can perform a younger LD early in the face of a
>> ST with unknown address, if the pipeline can reRun the LD again
>> if the older ST happens to conflict with the value LDed. When one
>> demands SC, you cannot do this unless you are willing to back all
>> the way up to the ST with the conflict.
>
> I would just check whether the result of the load has changed from the
> speculatively executed version, and if so, restart from load. Why
> would you restart from the earlier store?
>
>>>> --and certain compilers with
>>>> certain flags will more the ST above the LD
>>
>>> That is irrelevant for the question of how an architecture should
>>> perform.
>>
>> But is entirely relevant to how a µArchitecture is allowed to perform.
>
> What a microarchitecture is allowed to do is entirely limited by the
> architecture. If the architecture specifies sequential consistency,
> the microarchitecture has to implement sequential consistency. If the
> architecture specifies TSO, the microarchitecture has to implement TSO
> or better. What some compiler does is irrelevant.
>
>> When the µArchitecture is allowed to move LDs around STs when there
>> is no overlap in the containers accessed, performance is improved.
>
> containers?
>
>>> If, OTOH, architectures implemented efficient sequential consistency,
>>> programming languages would provide sequential consistency. And those
>>> that don't would be abandoned by everybody but Chris M. Thomasson.
>>
>> As illustrated above SC-only harms performance, not by a lot, but by enough
>> to convert a winning µArchiteccture into a Ho-Hum competitor.
>
> I have not found any such illustration in what you wrote, only claims
> (probably based on a MY66000 microarchitecture).
>
> In particular, given that for single-threaded programs the behaviour
> is the same between SC and weak ordering, it is possible to implement
> a microarchitecture that provides SC and has the same performance for
> single-threaded programs as a weakly-ordered competitor.
>
> Now, consider access to shared memory in multi-threaded programs.
> With a microarchitecture that is optimized for weak ordering, every
> accesses to shared memory will be slow (by having to use slow atomic
> operations, memory barriers and somesuch). By contrast, with a
> microarchitecture that is optimized for sequential consistency,
> accesses to shared memory will be fast in the uncontended case and
> slow only in the contended case (then the reruns are needed). Bottom
> line: If the programmer made contention rare, the microarchitecture
> designed for sequential consistency wins.
>
> Compare with precise exceptions: The Alpha guys told us that precise
> exceptions are too slow, and gave us imprecise exceptions with a slow
> trap barrier instruction (TRAPB) for limiting the imprecision; and
> programmers should avoid TRAPB at all costs. Then they implemented
> the 21264, and lo and behold, TRAPB was as cheap as a NOP; it probably
> was a NOP, and the 21264 implemented precise exceptions, because it
> comes for free when you do an OoO microarchitecture with speculation.
> And the 21264 was faster than the 21164a with its imprecise
> exceptions.
>
>> And B: HW designers do not understand memory ordering requirements
>> sufficiently that they CAN design HW to the specifications you request.
>> It is not in their DNA....They work with languages that provide the
>> illusion of everything that can be in parallel already is in parallel
>> and they did not have to do anything to cause that to be.
>
> Whatever DNA they may have, they have managed to implement the
> sequential semantics (including precise exceptions) of instruction
> sets, and I am sure that they will manage to implement sequential
> consistency efficiently if they are tasked to do that. But we have to
> tell them that weak ordering (which is easier to implement) is not
> good enough.
>
> - anton

Re: Sequential Consistency (wa: Superior architecture ...)

<099e012f0d65c52c1fa92914eb3ca618@news.novabbs.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35907&group=comp.arch#35907

copy link Newsgroups: comp.arch

Date: Fri, 22 Dec 2023 19:46:51 +0000
Subject: Re: Sequential Consistency (wa: Superior architecture ...)
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
From: mitchal...@aol.com (MitchAlsup)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$8ByVXPnOaESC1VpT3ZUVyOBN3Gdr8O4jnWfrBKlGUbT0h/6xvpP9G
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <uigus7$1pteb$1@dont-email.me> <ukk0m0$3bbkb$1@dont-email.me> <jwvr0k1k41o.fsf-monnier+comp.arch@gnu.org> <ukm32p$3p2jh$1@dont-email.me> <YUGbN.204677$Ee89.140988@fx17.iad> <2023Dec6.085407@mips.complang.tuwien.ac.at> <ulqu0t$3oouk$1@dont-email.me> <2023Dec19.094918@mips.complang.tuwien.ac.at> <jwvfrzxdkfa.fsf-monnier+comp.arch@gnu.org> <ebc9593b790afee0d722c6828f7db5ec@news.novabbs.com> <2023Dec21.191326@mips.complang.tuwien.ac.at> <6e2b1bc342f0f94e3ff6dbef0daf0051@news.novabbs.com> <2023Dec22.182153@mips.complang.tuwien.ac.at>
Organization: novaBBS
Message-ID: <099e012f0d65c52c1fa92914eb3ca618@news.novabbs.com>

by: MitchAlsup - Fri, 22 Dec 2023 19:46 UTC

Anton Ertl wrote:

> mitchalsup@aol.com (MitchAlsup) writes:
>>Anton Ertl wrote:
>>> You present a single-threaded
>>> program here, why talk about memory ordering at all?
>>
>>Somebody ask for a case where SC reduced performance. TSO also has
>>reduced performance in this case, whereas Cache, causal, and weak
>>consistencies do not.

> In cases where there is no behavioural difference between sequential
> consistency and weaker memory orderings, there is no fundamental
> reason why there should be a performance difference.

> You are apparently thinking about a particular implementation (my
> guess is: MY66000). What may hold for this implementation does not
> necessarily hold in general.

Granted, I convert everything into the architecture I know best and
in most detail. Wouldn't you ??

>>Also note:: One can perform a younger LD early in the face of a
>>ST with unknown address, if the pipeline can reRun the LD again
>>if the older ST happens to conflict with the value LDed. When one
>>demands SC, you cannot do this unless you are willing to back all
>>the way up to the ST with the conflict.

> I would just check whether the result of the load has changed from the
> speculatively executed version, and if so, restart from load. Why
> would you restart from the earlier store?

To make everyone agree on the ordering of memory events.

>>>>--and certain compilers with
>>>>certain flags will more the ST above the LD
>>
>>> That is irrelevant for the question of how an architecture should
>>> perform.
>>
>>But is entirely relevant to how a �Architecture is allowed to perform.

> What a microarchitecture is allowed to do is entirely limited by the
> architecture. If the architecture specifies sequential consistency,
> the microarchitecture has to implement sequential consistency. If the
> architecture specifies TSO, the microarchitecture has to implement TSO
> or better. What some compiler does is irrelevant.

Most µArchitectures operate under the "as if" rule, and if SC is the
Architectural rule, µArchitecture can do things in SC order or can do
things n WO and if a SC violation is detected, back up and make it
look like SC was in play.

>>When the �Architecture is allowed to move LDs around STs when there
>>is no overlap in the containers accessed, performance is improved.

> containers?

Unit of memory access {byte, half (except on x86) word, double, quad
(except on everyone other than x86), bigger than 64-bit things};
aligned or not, line crossing or not, page crossing or not, MTRR
crossing or not.

>>> If, OTOH, architectures implemented efficient sequential consistency,
>>> programming languages would provide sequential consistency. And those
>>> that don't would be abandoned by everybody but Chris M. Thomasson.
>>
>>As illustrated above SC-only harms performance, not by a lot, but by enough
>>to convert a winning �Architeccture into a Ho-Hum competitor.

> I have not found any such illustration in what you wrote, only claims
> (probably based on a MY66000 microarchitecture).

> In particular, given that for single-threaded programs the behaviour
> is the same between SC and weak ordering, it is possible to implement
> a microarchitecture that provides SC and has the same performance for
> single-threaded programs as a weakly-ordered competitor.

I assert without proof:: no.
I assert that the loss may not be significant in many µArchitectures.
I assert that the loss is significant on a few of the biggest baddest
µArchitectures.
Where significance is >5% and <10%.

> Now, consider access to shared memory in multi-threaded programs.
> With a microarchitecture that is optimized for weak ordering, every
> accesses to shared memory will be slow (by having to use slow atomic
> operations, memory barriers and somesuch). By contrast, with a
> microarchitecture that is optimized for sequential consistency,
> accesses to shared memory will be fast in the uncontended case and
> slow only in the contended case (then the reruns are needed). Bottom
> line: If the programmer made contention rare, the microarchitecture
> designed for sequential consistency wins.

No, but it might beak even.

> Compare with precise exceptions: The Alpha guys told us that precise
> exceptions are too slow, and gave us imprecise exceptions with a slow
> trap barrier instruction (TRAPB) for limiting the imprecision; and
> programmers should avoid TRAPB at all costs. Then they implemented
> the 21264, and lo and behold, TRAPB was as cheap as a NOP; it probably
> was a NOP, and the 21264 implemented precise exceptions, because it
> comes for free when you do an OoO microarchitecture with speculation.
> And the 21264 was faster than the 21164a with its imprecise
> exceptions.

264 was wider issue and OoO, too.

>>And B: HW designers do not understand memory ordering requirements
>>sufficiently that they CAN design HW to the specifications you request.
>>It is not in their DNA....They work with languages that provide the
>>illusion of everything that can be in parallel already is in parallel
>>and they did not have to do anything to cause that to be.

> Whatever DNA they may have, they have managed to implement the
> sequential semantics (including precise exceptions) of instruction
> sets, and I am sure that they will manage to implement sequential
> consistency efficiently if they are tasked to do that. But we have to
> tell them that weak ordering (which is easier to implement) is not
> good enough.

> - anton

Re: Superior architecture styles? (was: Concertina II Progress)

<2023Dec23.114911@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35914&group=comp.arch#35914

copy link Newsgroups: comp.arch

by: Anton Ertl - Sat, 23 Dec 2023 10:49 UTC

jgd@cix.co.uk (John Dallman) writes:
>In article <2023Dec19.180834@mips.complang.tuwien.ac.at>,
>anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>> Do you mean the speculative loads that would return NaTs if the
>> memory access is invalid? If so, why does that not happen when
>> running under a debugger.
>
>No. What I found was much, much worse. IA-64 had load-advance and
>load-check instructions. You were supposed to issue the load-advance as
>soon as you knew the address you were going to need and it happened
>asynchronously. The load-check was quick if the value had arrived, but
>otherwise stalled the pipeline until it did arrive. If the hardware had
>discarded an advance load, as it was permitted to do, this meant a full
>wait for memory/cache.

Ah, yes, this one. In particular, the ld.c would repeat the load if a
store in between stored to that address.

>The problem came when you were fetching into floating-point registers,
>which were not windowed, although the way they could move around for
>modulo-scheduled loops confused people on that front.
>
>The sequence of events that caused problems was:
>
>1. Advance load into floating-point register fx.
>2. Call subroutine, which wants to use fx.
>3. Subroutine pushes fx to stack, and starts using it.
>4. Advance load arrives, overwriting fx.
>5. Subroutine may well go wrong, given its register has been corrupted.
>6. If subroutine doesn't notice, it finishes its work.
>7. Subroutine pops fx from the stack.
>8. Subroutine returns.
>9. Check load is executed. The ALAT says the load has happened.
>
>The value in fx, nonetheless is what was there /before/ you issued the
>advance load.
>
>If you have breakpoints anywhere along this path, they cause pipeline
>flushes and the problem does not (usually) reproduce.

Ouch. This clearly is a compiler bug. It eiter should abstain from
moving a load across the call, or perform a ld.c before the call.
Which compiler was this?

>IA-64 did not prove that all its ideas were worthless, only that the way
>they'd been put together was counterproductive. Nonetheless, anyone else
>who tries to launch a EPIC processor is going to face some very hostile
>questioning, and it's perfectly understandable that nobody wants to try.

I have not had any such problems when I compiled my code on IA-64
(with gcc), the result was just slow:

Gforth 0.7 results (lower is better):
sieve bubble matrix fib
1.000 1.120 0.710 1.680 0.7.0; Itanium II 900MHz (HP rx2600); gcc-4.1.1
0.245 0.287 0.156 0.376 0.7.0; Pentium 4 Northwood 2.26GHz; gcc-2.95.4 20011002 (Debian prerelease)
0.670 1.070 0.932 0.968 0.7.0; Alpha 21264B 800MHz; gcc-4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

Both Intel CPUs are from 2002, the 21264B from 2001 (based on the
21264 first released in 1998); and this is the Itanium running IA-64
code, not IA-32 code.

I have discussed earlier (<2022Feb22.112450@mips.complang.tuwien.ac.at>
and others) why EPIC lost against OoO, and it's not just bad execution
by Intel and HP.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Superior architecture styles? (was: Concertina II Progress)

<um7gmd$274oh$2@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35917&group=comp.arch#35917

copy link Newsgroups: comp.arch

by: Chris M. Thomasson - Sat, 23 Dec 2023 20:41 UTC

On 12/23/2023 6:37 AM, John Dallman wrote:
> In article <2023Dec23.114911@mips.complang.tuwien.ac.at>,
> anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>
>> Which compiler was this?
>
> Microsoft. Our primary demand for Itanium, when things were starting up
> for it, was Windows, because there was no 64-bit Windows at the time and
> some large customers really wanted that. That demand evaporated like dry
> ice in Death Valley when Intel announced their x86-64.

Do you remember the rather odd instruction for the Itanium called
cmp8xchg16?

>
> I started work on Windows using Intel's and Microsoft's compilers in
> parallel, expecting to discard one when it became clear which was better.
> After a while, we were at a point when we could get through the most
> basic level of testing in an optimised build with Microsoft, and couldn't
> in a deoptimised build with Intel.
>
>> Ouch. This clearly is a compiler bug. It either should abstain from
>> moving a load across the call, or perform a ld.c before the call.
>
> The problem comes when loads and calls are mixed together very closely,
> such that there's obviously no time for a load to complete between the
> address being known and the next necessary call. Microsoft's tactic of
> re-issuing loads got rid of most of the problems, and we'd become far
> more interested in AMD64.
>
>> I have discussed earlier
>> (<2022Feb22.112450@mips.complang.tuwien.ac.at>
>> and others) why EPIC lost against OoO, and it's not just bad
>> execution by Intel and HP.
>
> Indeed.
>
> John

Re: Superior architecture styles?

<63ec25fc0a23e7cede83035cf470323f@news.novabbs.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35919&group=comp.arch#35919

copy link Newsgroups: comp.arch

Date: Sat, 23 Dec 2023 21:44:22 +0000
Subject: Re: Superior architecture styles?
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
From: mitchal...@aol.com (MitchAlsup)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$ciC/n8vnNihmPCXSnH29iuCbQF.JHClYt6U0ArTFwQHG15aT74PRS
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <2023Dec23.114911@mips.complang.tuwien.ac.at> <memo.20231223143751.13424d@jgd.cix.co.uk> <um7gmd$274oh$2@dont-email.me>
Organization: novaBBS
Message-ID: <63ec25fc0a23e7cede83035cf470323f@news.novabbs.com>

by: MitchAlsup - Sat, 23 Dec 2023 21:44 UTC

Chris M. Thomasson wrote:

> On 12/23/2023 6:37 AM, John Dallman wrote:
>> In article <2023Dec23.114911@mips.complang.tuwien.ac.at>,
>> anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>>
>>> Which compiler was this?
>>
>> Microsoft. Our primary demand for Itanium, when things were starting up
>> for it, was Windows, because there was no 64-bit Windows at the time and
>> some large customers really wanted that. That demand evaporated like dry
>> ice in Death Valley when Intel announced their x86-64.

> Do you remember the rather odd instruction for the Itanium called
> cmp8xchg16?

Yes, at that point in time, MS was also asking for Compare 2 Swap 2 over
on the x86 side of things (DCAS in IBM parlance), which is the impetus for
my inventing ASF which later morphed into ESM. My point was and still is
that one can add 1 to 3 ATOMIC instructions every product rev, or one can
provide the primitives such that SW can invent and use whatever primitive
they can figure out how to program and how to use. {{There is no 3rd choice}}

Re: Superior architecture styles?

<um7m3a$27v82$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35920&group=comp.arch#35920

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: Superior architecture styles?
Date: Sat, 23 Dec 2023 14:14:01 -0800
Organization: A noiseless patient Spider
Lines: 34
Message-ID: <um7m3a$27v82$1@dont-email.me>
References: <2023Dec23.114911@mips.complang.tuwien.ac.at>
<memo.20231223143751.13424d@jgd.cix.co.uk> <um7gmd$274oh$2@dont-email.me>
<63ec25fc0a23e7cede83035cf470323f@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 23 Dec 2023 22:14:02 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ec15cd74988e47dd581b59265d205ee1";
logging-data="2358530"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18X1toffeehc2/xjRvbormUjq8MWqviMyc="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:FPUT1zDv2Jv4ETAfRfW2PkoC7ME=
In-Reply-To: <63ec25fc0a23e7cede83035cf470323f@news.novabbs.com>
Content-Language: en-US

by: Chris M. Thomasson - Sat, 23 Dec 2023 22:14 UTC

On 12/23/2023 1:44 PM, MitchAlsup wrote:
> Chris M. Thomasson wrote:
>
>> On 12/23/2023 6:37 AM, John Dallman wrote:
>>> In article <2023Dec23.114911@mips.complang.tuwien.ac.at>,
>>> anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>>>
>>>> Which compiler was this?
>>>
>>> Microsoft. Our primary demand for Itanium, when things were starting up
>>> for it, was Windows, because there was no 64-bit Windows at the time and
>>> some large customers really wanted that. That demand evaporated like dry
>>> ice in Death Valley when Intel announced their x86-64.
>
>> Do you remember the rather odd instruction for the Itanium called
>> cmp8xchg16?
>
> Yes, at that point in time, MS was also asking for Compare 2 Swap 2 over
> on the x86 side of things (DCAS in IBM parlance), which is the impetus
> for my inventing ASF which later morphed into ESM. My point was and
> still is
> that one can add 1 to 3 ATOMIC instructions every product rev, or one can
> provide the primitives such that SW can invent and use whatever primitive
> they can figure out how to program and how to use. {{There is no 3rd
> choice}}

Well the thing about cmp8xchg16 and friends is that all the words need
to be adjacent to one another. Then there is cmpxchg16 that allows for
true 128 bit CAS on a 64 bit system, again wrt adjacent words...
cmp8xchg16 was always strange to me.

Now, speaking of IBM, iirc, are you familiar with the PLO locking
scheme? I think it was called PLO. Iirc, it was in the same appendix as
the lock-free IBM free pool manipulation?

Re: Superior architecture styles?

<2aa5eaadee1c7517e64689a0f77e5772@news.novabbs.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35922&group=comp.arch#35922

copy link Newsgroups: comp.arch

Date: Sat, 23 Dec 2023 23:12:20 +0000
Subject: Re: Superior architecture styles?
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on novalink.us
From: mitchal...@aol.com (MitchAlsup)
Newsgroups: comp.arch
X-Rslight-Site: $2y$10$ynytAy9/J/l9PYNFLGaNjO0hp6RXGvYIJlUvD8Kt.nhiqDx6GkG6q
X-Rslight-Posting-User: 7e9c45bcd6d4757c5904fbe9a694742e6f8aa949
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Rocksolid Light
References: <2023Dec23.114911@mips.complang.tuwien.ac.at> <memo.20231223143751.13424d@jgd.cix.co.uk> <um7gmd$274oh$2@dont-email.me> <63ec25fc0a23e7cede83035cf470323f@news.novabbs.com> <um7m3a$27v82$1@dont-email.me>
Organization: novaBBS
Message-ID: <2aa5eaadee1c7517e64689a0f77e5772@news.novabbs.com>

by: MitchAlsup - Sat, 23 Dec 2023 23:12 UTC

Chris M. Thomasson wrote:

> On 12/23/2023 1:44 PM, MitchAlsup wrote:
>> Chris M. Thomasson wrote:
>>
>>> On 12/23/2023 6:37 AM, John Dallman wrote:
>>>> In article <2023Dec23.114911@mips.complang.tuwien.ac.at>,
>>>> anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>>>>
>>>>> Which compiler was this?
>>>>
>>>> Microsoft. Our primary demand for Itanium, when things were starting up
>>>> for it, was Windows, because there was no 64-bit Windows at the time and
>>>> some large customers really wanted that. That demand evaporated like dry
>>>> ice in Death Valley when Intel announced their x86-64.
>>
>>> Do you remember the rather odd instruction for the Itanium called
>>> cmp8xchg16?
>>
>> Yes, at that point in time, MS was also asking for Compare 2 Swap 2 over
>> on the x86 side of things (DCAS in IBM parlance), which is the impetus
>> for my inventing ASF which later morphed into ESM. My point was and
>> still is
>> that one can add 1 to 3 ATOMIC instructions every product rev, or one can
>> provide the primitives such that SW can invent and use whatever primitive
>> they can figure out how to program and how to use. {{There is no 3rd
>> choice}}

> Well the thing about cmp8xchg16 and friends is that all the words need
> to be adjacent to one another.

That really limits its utility.

> Then there is cmpxchg16 that allows for
> true 128 bit CAS on a 64 bit system, again wrt adjacent words...
> cmp8xchg16 was always strange to me.

ESM has none of those limitations, and you can perform up to CMP8SWAP8 if
that floats your boat, and/or attach time stamps along with the pointer
manipulations,.......

> Now, speaking of IBM, iirc, are you familiar with the PLO locking
> scheme? I think it was called PLO. Iirc, it was in the same appendix as
> the lock-free IBM free pool manipulation?

Re: Superior architecture styles?

<um8613$2e2q9$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35925&group=comp.arch#35925

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaroncl...@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Superior architecture styles?
Date: Sat, 23 Dec 2023 21:45:53 -0500
Organization: A noiseless patient Spider
Lines: 62
Message-ID: <um8613$2e2q9$1@dont-email.me>
References: <2023Dec23.114911@mips.complang.tuwien.ac.at>
<memo.20231223143751.13424d@jgd.cix.co.uk> <um7gmd$274oh$2@dont-email.me>
<63ec25fc0a23e7cede83035cf470323f@news.novabbs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 24 Dec 2023 02:45:55 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="603864286bf53c814d49f2bd1161af0e";
logging-data="2558793"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Fg0bLkdrEaf/JzQdQcySAR87Kxz7PqjU="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:Ni/DkT7mqxlz8uhFOrgloJaOaP0=
In-Reply-To: <63ec25fc0a23e7cede83035cf470323f@news.novabbs.com>

by: Paul A. Clayton - Sun, 24 Dec 2023 02:45 UTC

On 12/23/23 4:44 PM, MitchAlsup wrote:
> Chris M. Thomasson wrote:
[snip]
>> Do you remember the rather odd instruction for the Itanium
>> called cmp8xchg16?
>
> Yes, at that point in time, MS was also asking for Compare 2 Swap
> 2 over on the x86 side of things (DCAS in IBM parlance), which is the
> impetus for my inventing ASF which later morphed into ESM. My
> point was and still is that one can add 1 to 3 ATOMIC instructions
> every product rev, or one can provide the primitives such that SW can
> invent and use whatever primitive they can figure out how to program
> and how to use. {{There is no 3rd choice}}

Third choice: do both (likely poorly).

ESM could atomically operate on up to 512 bytes in up to 8 64-
byte-aligned locations. The proposed atomics seem to be limited to
a single cache block; a modestly extended LL/SC that allowed
ordinary loads and stores within the reserved cache block would
seem to provide that functionality. ESM is much more flexible, but
forward progress is more difficult with more reservations. (Yes,
ESM has mechanisms to handle queues and such, but I doubt even
Mitch Alsup has discovered a universal solution.)

(A proposed extension for RISC-V — Zacas — provides double-width
CAS. I do not buy the argument that "the CAS atomic instructions
scale better to highly parallel systems than LR/SC"; I think that
is simply an implementation choice. I also see little difficulty
in extending LL/SC active regions to include additional loads and
stores within the single reservation. However, I would not be
surprised if there was an advantage to easier idiom recognition
for atomic operations that would readily benefit from some
optimizations. The code density disadvantage would be difficult
to remove (a sequence of primitives will typically be longer in
bytes than a complex operation), but greater flexibility might
pay for lower code density especially for less frequent
operations.)

(Reserving a single "page" would also seem to make ordering easier
at the cost of more false sharing. Spatially dense atomics are
also less interesting.)

Methods applied for locks might help in some cases for hardware-
monitored reservations, but designing an interface seems difficult
(general enough to be useful yet also high enough performance to
be useful) and dangerous (new techniques and use cases can change
the two difficulty factors).

Non-composing hardware lock elision seems to have the advantage of
including an exclusive name. I imagine that one could use that
name to establish a single order of atomic operations when there
is conflict, but even then maximizing throughput seems difficult.
If one falls back to all users of the lock name wait in line, one
likely loses a lot of potential concurrency while the lock name is
in "queuing mode". I suspect there are some tricks that would
allow some queue skipping at moderate complexity/area/power/etc.
cost (possibly involving versioned memory).

I have not studied software concurrency (just "overhearing" a few
things from time to time), but even a sub-novice knows that
locking is hard (deadlock, livelock, etc.).

Re: Superior architecture styles?

<um8am8$2ekt7$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35926&group=comp.arch#35926

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: paaroncl...@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Re: Superior architecture styles?
Date: Sat, 23 Dec 2023 23:05:25 -0500
Organization: A noiseless patient Spider
Lines: 245
Message-ID: <um8am8$2ekt7$1@dont-email.me>
References: <uigus7$1pteb$1@dont-email.me>
<2023Dec3.153637@mips.complang.tuwien.ac.at> <ukilt3$303sv$1@dont-email.me>
<71910e37d784192f7adce00f4d3b3f3e@news.novabbs.com>
<ukiqdb$2dg6p$1@dont-email.me>
<41f4c1184e960c9e97fcff43ab68b17c@news.novabbs.com>
<ukj2pi$32gt4$1@dont-email.me> <ukk0m0$3bbkb$1@dont-email.me>
<jwvr0k1k41o.fsf-monnier+comp.arch@gnu.org> <ukm32p$3p2jh$1@dont-email.me>
<YUGbN.204677$Ee89.140988@fx17.iad>
<2023Dec6.085407@mips.complang.tuwien.ac.at> <ulqu0t$3oouk$1@dont-email.me>
<2023Dec19.094918@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 24 Dec 2023 04:05:28 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="603864286bf53c814d49f2bd1161af0e";
logging-data="2577319"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX184BXsqOmQiNXnL2iZANSPz8e0Dgoccz7o="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.0
Cancel-Lock: sha1:UKNNaF5Xxi8n3YTt6Slrq0vdnXs=
In-Reply-To: <2023Dec19.094918@mips.complang.tuwien.ac.at>

by: Paul A. Clayton - Sun, 24 Dec 2023 04:05 UTC

On 12/19/23 3:49 AM, Anton Ertl wrote:
> "Paul A. Clayton" <paaronclayton@gmail.com> writes:
[snip]
>> I suspect that a superior interface could be designed which
>> exploits diverse locality (i.e., data might naturally be closer to
>> some computational resources than to others) and communication
>> (and storage) costs and budgets (budgets being related to urgency
>> and importance).
>
> There have been a number of attempts to design architectures with more
> explicit hardware-oriented features, as well as attempts to design
> architectures with more software-oriented features.

Communication of the (absolute and relative) costs and usefulness
in both directions between hardware and software is a significant
factor. Tossing difficult things to the other side of the
interface just because they are difficult is problematic, but the
relative difficulty can justify the placement of work.

Compilation (and libraries) cache work at the cost of storage.
Hardware can also cache work (e.g., µop caches, optimized trace
caches, branch predictors), but such cached data/work is not
considered worth persisting. Decoding again can easily be less
work than storing extra data and transferring that data to and
from a secondary storage location (L2 µop cache).

> On the hardware-oriented side we have seen:
>
> * Explicit fast/small memories that leaves the problem of locality to
> software rather than adding the hardware complexity of cache tags
> and letting the hardware guess what memory will be accessed in the
> future. The only such memories that have succeeded in the long term
> are registers; my explanation for the latter is that registers are
> useful for scalar variables in programs (and vector/SIMD registers
> for temporary storage of pieces of aggregated data).
>
> My explanation for the lack of success of explicit local memory is
> that it is too hard for software to manage; dealing with it would be
> a cross-cutting concern that would make the software significantly
> more complex. Better write software as if there was no cache or
> local memory, and let hardware attempt to get good performance out
> of it.

We really need those magic compilers.☺

I suspect there is _some_ room for more diverse storage. There are
already commonly two kinds of registers (FP/SIMD and GPR) and
there is a modest code density incentive for x86-64 to avoid REX
prefixes (so lower registers are slightly special).

> * Software prefetch instructions. Ok, we have a cache instead of
> explicit local memories, but can't we let software improve the
> cache's performace with prefetch hints? I have read some reports
> that the experience is that adding such instructions showed
> disappointing speedups, and slowdowns also happened. While many
> architectures have this feature, it seems to be used little.
> Hardware prefetchers seem to do ok, though.

It seems that the accesses which are easy for compilers to insert
(strided accesses where distant use is certain) are easy for
hardware to handle. Having the software/hardware interface make
pointers more easily recognizable might help hardware prefetch
from pointers (though it seems that some hardware does track
pointers for prefetching).

Prefetching is wasteful for hits and unused cache blocks. Software
is less aware of the cost/benefit than hardware.

I suspect that software prefetching also runs into the difficulty
of a limited amount of miss handling. If hardware provides an
abundance of miss handling, it will commonly be underutilized —
which is wasteful — but if software prefetches are taking away
resources from demand accesses performance could easily be hurt.

Prefetching also has aspects of timeliness (both late and early)
and confidence. In terms of storage, prefetching to an outer level
of cache may be preferred for lower confidence and/or lower
criticality prefetches.

Hardware has better access to resource (bandwidth and storage)
costs, but it seems conceivable that software could place a bid
on a prefetch. However, such bids seem likely to be bloated.

With out-of-order execution, criticality is not as simple as
"how many instruction fetches ahead of use must the prefetch be
scheduled". Loads correcting a mispredicted branch or constraining
memory-level parallelism are more critical.

> * Distributed memory. This avoids all the problems of cache
> consistency, but it requires the programmer to avoid all sharing
> (including read-only data and code), and, most importantly, to fit
> within its slice of machine. This is acceptable for supercomputing,
> but does not fly in general-purpose computing. And even in game
> computin, which one might consider to be more supercomputer-like,
> the Cell architecture was abandoned in the next generation.

I suspect some degree of localization is desirable. The any-to-any
model for cache coherence is wasteful. Directories and snoop
filters can reduce this waste, but sometimes software knows that
communication will never happen or that it is limited to a single
cluster (one IBM POWER implementation had a coherence state
indicating local to the chip/module, but something like LPARs
could further reduce communication).

I suspect that communication patterns could also be exploited
with cooperation between hardware and software. E.g., a layout
of nodes with some connections being crossbar and others ring-
based might justify allocation preference not merely for
proximity but for connection type. This seems somewhat "wild
and wooly" but has some surface plausibility.

> * Weak memory models. Ok, we (hardware designers) have to do cache
> consistency, but let's do a shoddy job and shift the problems over
> to the software people whom we task with inserting architecturally
> meaningless barriers at select places. The great thing about this
> is that we can always blame the software people: if the performance
> sucks, they have inserted too many barriers; if the program breaks,
> they have inserted too few; and if the performance sucks while the
> program breaks, well, we can blame them for both, or just shrug and
> say that the problem is hard. In any case, the blame is deflected
> from us.
>
> One can argue that this particular architectural misfeature has
> succeeded: After all, in shared-memory multiprocessors you can buy
> today, sequential consistency is nonexistent, and weaker models such
> as TSO or even weaker ones dominate.
>
> OTOH, very few programmers program to these weak models. They write
> sequential programs, or use libraries/system calls that hide the
> horrors of weak consistency behind easier-to-understand but much
> slower abstractions.

I would guess that the "much slower abstractions" can be
'flattened' sometimes by compilers and even by hardware.

> Isn't that as it should be? Hardware provides some simple, possibly
> inconvenient interface, and software abstracts the inconvenience
> away; e.g., RISCs provide only a load/store interface, and compilers
> translate the stuff programmers write into this interface. The
> difference is that with RISCs the result is a least as fast as
> architectures that tried to cater more to the programming language,
> while I am convinced that hardware where the designers design for
> efficient sequential consistency will let programmers write software
> that handily outperforms using libraries or system calls on hardware
> designed for weaker memory models.

>
>> They also (as I
>> understand) lacked value prediction whereas OoO effectively uses
>> value prediction for branches.
>
> Branch prediction is branch prediction, value prediction predicts the
> values of registers or memory locations;

I view branch direction prediction as value prediction of a one-
bit value.

> I have seen research about
> value prediction maybe 20 years ago, but I am not aware that it has
> been productized. IIRC when I asked someone who presented a paper on
> value prediction why there should be a significant predictability for
> values, the answer was that the values would come from executing along
> a mispredicted path: E.g., when an IF is mispredicted, the values of
> the code after the ENDIF might still compute to the same values.
> However, given the rareness of mispredictions these days, this source
> of value prediction probably provides too little benefit to justify
> the cost (not to mention Spectre considerations, which came in later).

Stride-based value prediction is really just early computation but
has the advantage of not having to propagate the carry redundantly
for each addition.

>> While the current interface does allow for significant improvement
>> exploiting common behaviors and dynamic microarchitectural
>> information, it does not seem (to me) an ideal interface.
>
> I think it is quite remarkable that a sequence of instructions, with
> branches and RAM and registers, is what we already have in S/360 (59
> years ago) and (I think) in the IBM 704 (69 years ago); I am not
> familiar enough with earlier machines or the analytical engine to
> comment on them, but this style might be even older. And while the
> hardware capabilities have exploded since that time, and performance
> characteristics have changed, this style of interface is still doing
> well. Even specific instruction sets are doing well: AMD64 with it's
> legacy going back to 8086 (45 years ago) and the Datapoint 2200 (53
> years ago) competes well with recent architectures such as ARM A64 and
> RISC-V. S390x goes back to S/360 (59 years ago); I don't know how
> fast it's current implementation is on programs I care about, but I
> expect that in an alternative reality where the PC market was based on
> S390x, the offerings of Intel and AMD (or whoever would take their
> place) would offer similar performance on software I care about as
> their AMD64 offerings offer in this reality.
>
> And of course the recent ARM A64 and RISC-V also follow the style
> outlined above. RISC-V is even very similar to MIPS (37 years ago),
> despite the R2000 having only ~100,000 transistors, and current CPUs
> having billions.

Click here to read the complete article

Re: Superior architecture styles?

<_V_hN.90589$c3Ea.3627@fx10.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35928&group=comp.arch#35928

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!tncsrv06.tnetconsulting.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx10.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Superior architecture styles?
Newsgroups: comp.arch
References: <2023Dec23.114911@mips.complang.tuwien.ac.at> <memo.20231223143751.13424d@jgd.cix.co.uk> <um7gmd$274oh$2@dont-email.me> <63ec25fc0a23e7cede83035cf470323f@news.novabbs.com> <um8613$2e2q9$1@dont-email.me>
Lines: 109
Message-ID: <_V_hN.90589$c3Ea.3627@fx10.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sun, 24 Dec 2023 18:39:54 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sun, 24 Dec 2023 18:39:54 GMT
X-Received-Bytes: 5720

by: Scott Lurndal - Sun, 24 Dec 2023 18:39 UTC

"Paul A. Clayton" <paaronclayton@gmail.com> writes:
>On 12/23/23 4:44 PM, MitchAlsup wrote:
>> Chris M. Thomasson wrote:
>[snip]
>>> Do you remember the rather odd instruction for the Itanium
>>> called cmp8xchg16?
>>
>> Yes, at that point in time, MS was also asking for Compare 2 Swap
>> 2 over on the x86 side of things (DCAS in IBM parlance), which is the
>> impetus for my inventing ASF which later morphed into ESM. My
>> point was and still is that one can add 1 to 3 ATOMIC instructions
>> every product rev, or one can provide the primitives such that SW can
>> invent and use whatever primitive they can figure out how to program
>> and how to use. {{There is no 3rd choice}}
>
>Third choice: do both (likely poorly).
>
>ESM could atomically operate on up to 512 bytes in up to 8 64-
>byte-aligned locations. The proposed atomics seem to be limited to
>a single cache block; a modestly extended LL/SC that allowed
>ordinary loads and stores within the reserved cache block would
>seem to provide that functionality. ESM is much more flexible, but
>forward progress is more difficult with more reservations. (Yes,
>ESM has mechanisms to handle queues and such, but I doubt even
>Mitch Alsup has discovered a universal solution.)
>
>(A proposed extension for RISC-V — Zacas — provides double-width
>CAS. I do not buy the argument that "the CAS atomic instructions
>scale better to highly parallel systems than LR/SC"; I think that
>is simply an implementation choice.

In my experience with large scale systems, it has been shown
that LL/SC cannot scale so long as it must acquire the cache
line to complete the transaction.

The advantage of atomics (particularly load+add) is that the
core doesn't need to acquire exclusive access to the cache line
before completing the operation - the atomic can be easily delegated
to a higher level (e.g. LLC) of cache, passed to the cache
that owns the line, or even to the memory controller
(or a PCI express device).

> I also see little difficulty
>in extending LL/SC active regions to include additional loads and
>stores within the single reservation.

ARM has specified that any store by the processor _may_ cause
the SC to (Store Exclusive) to fail.

>Methods applied for locks might help in some cases for hardware-
>monitored reservations, but designing an interface seems difficult
>(general enough to be useful yet also high enough performance to
>be useful) and dangerous (new techniques and use cases can change
>the two difficulty factors).

Back in the early 1980's, we added hardware support for locks
(and events - e.g. modern condition variables) directly to
the instruction set. An instruction to acquire a lock,
one to release a lock (operating like a mutex), one to
test if the lock is available and acquire if so, but not
block if owned by another thread. Instructions to
wait for an event and signal an event.

The lock was defined by a data structure that included
a 'canonical lock number', and a field containing
the current lock owner task number, and a link to the
first thread waiting for the lock, if any.

The MCP(OS) thread data structure contained a field that
recorded the current canonical lock number (CLN) owned by the
thread and the lock instruction would fail (set condition codes)
if the lock being locked had a CLN less than or equal to
the currently owned CLN. The unlock instruction would fault
if the thread attempted to unlock a lock where the CLN didn't
match the current CLN. These rules completely prevented A-B
deadlocks.

If the lock was already owned the lock instruction would trap
to a microkernel (highest privilege level) which would place
the thread on a waiting list and dispatch the next highest
priority thread.

If there was a waiter recorded in the lock structure, the
unlock instruction would trap to the microkernel which would
adjust the list and, if the newly runnable thread was highest
priority, dispatch it.

>
>Non-composing hardware lock elision seems to have the advantage of
>including an exclusive name. I imagine that one could use that
>name to establish a single order of atomic operations when there
>is conflict, but even then maximizing throughput seems difficult.
>If one falls back to all users of the lock name wait in line, one
>likely loses a lot of potential concurrency while the lock name is
>in "queuing mode". I suspect there are some tricks that would
>allow some queue skipping at moderate complexity/area/power/etc.
>cost (possibly involving versioned memory).

Sounds much like transactions. Most hardware implementations so
far have suffered from various restrictions in the size of the
transaction and haven't found much traction yet.

>
>I have not studied software concurrency (just "overhearing" a few
>things from time to time), but even a sub-novice knows that
>locking is hard (deadlock, livelock, etc.).

Re: Superior architecture styles?

<r8%hN.81467$PuZ9.48313@fx11.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=35929&group=comp.arch#35929

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx11.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: Superior architecture styles?
Newsgroups: comp.arch
References: <uigus7$1pteb$1@dont-email.me> <71910e37d784192f7adce00f4d3b3f3e@news.novabbs.com> <ukiqdb$2dg6p$1@dont-email.me> <41f4c1184e960c9e97fcff43ab68b17c@news.novabbs.com> <ukj2pi$32gt4$1@dont-email.me> <ukk0m0$3bbkb$1@dont-email.me> <jwvr0k1k41o.fsf-monnier+comp.arch@gnu.org> <ukm32p$3p2jh$1@dont-email.me> <YUGbN.204677$Ee89.140988@fx17.iad> <2023Dec6.085407@mips.complang.tuwien.ac.at> <ulqu0t$3oouk$1@dont-email.me> <2023Dec19.094918@mips.complang.tuwien.ac.at> <um8am8$2ekt7$1@dont-email.me>
Lines: 46
Message-ID: <r8%hN.81467$PuZ9.48313@fx11.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Sun, 24 Dec 2023 18:55:19 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Sun, 24 Dec 2023 18:55:19 GMT
X-Received-Bytes: 3271

by: Scott Lurndal - Sun, 24 Dec 2023 18:55 UTC

"Paul A. Clayton" <paaronclayton@gmail.com> writes:
>On 12/19/23 3:49 AM, Anton Ertl wrote:
>> "Paul A. Clayton" <paaronclayton@gmail.com> writes:
>[snip]
>>> I suspect that a superior interface could be designed which
>>> exploits diverse locality (i.e., data might naturally be closer to
>>> some computational resources than to others) and communication
>>> (and storage) costs and budgets (budgets being related to urgency
>>> and importance).
>>
>> There have been a number of attempts to design architectures with more
>> explicit hardware-oriented features, as well as attempts to design
>> architectures with more software-oriented features.
>
>Communication of the (absolute and relative) costs and usefulness
>in both directions between hardware and software is a significant
>factor. Tossing difficult things to the other side of the
>interface just because they are difficult is problematic, but the
>relative difficulty can justify the placement of work.

Although there are many things that hardware can do much
more efficiently, perhaps at a cost. Such as Content
Addressible Memory which is useful in many workloads,
particularly in the networking space.

I'd love to have some form of CAM available directly to a
C++ programmer on standard hardware - I often see cases
where it could be useful for performance, for small
lookup tables. Perhaps a CAM instruction with a set of
CAM banks (one per hardware thread) and support in the kernel
to save/restore them on context switch (lazily, like FPR).

>> I think it is quite remarkable that a sequence of instructions, with
>> branches and RAM and registers, is what we already have in S/360 (59
>> years ago) and (I think) in the IBM 704 (69 years ago);

The Datatron first shipped the same year as the 704. It was
announced in '52. There was a 2-digit order code in a 'command'
(what we'd call an instruction). It included both multiplication
and division (10 digit operands, 20 digit product) as well
as shift (digit, not bit) orders (peephole optimizations
for multiply/division by powers of 10). Load (Clear and Add) and Store
orders. Block transfer order copied 20 consecutive main storage
cells to a 'quick access loop'.

devel / comp.arch / Concertina II Progress

Pages:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161718 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

server_pubkey.txt

rocksolid light 0.9.81
clearnet tor

19 May, 2024: Line wrapping has been changed to be more consistent with Usenet standards. If you find that it is broken please let me know here rocksolid.nodes.help

devel / comp.arch / Concertina II Progress

devel / comp.arch / Concertina II Progress

19 May, 2024: Line wrapping has been changed to be more consistent with Usenet standards.
If you find that it is broken please let me know here rocksolid.nodes.help