novaBBS - comp.arch - Re: Approximate reciprocals

Re: lock me up, was IBM Mainframe market

<t3edbq$pv0$1@gioia.aioe.org>

https://www.novabbs.com/devel/article-flat.php?id=24797&group=comp.arch#24797

Path: i2pn2.org!i2pn.org!aioe.org!rd9pRsUZyxkRLAEK7e/Uzw.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: lock me up, was IBM Mainframe market
Date: Sat, 16 Apr 2022 14:38:08 +0200
Organization: Aioe.org NNTP Server
Message-ID: <t3edbq$pv0$1@gioia.aioe.org>
References: <16su4hdjofh949len5eha1ncb73r4av8oe@4ax.com>
<t3alc7$12tg$1@gal.iecc.com> <87ilrbuk6f.fsf@localhost>
<87pmli8dja.fsf@localhost> <t3d0uq$2h1r$2@gal.iecc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="26592"; posting-host="rd9pRsUZyxkRLAEK7e/Uzw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.11.1
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Sat, 16 Apr 2022 12:38 UTC

John Levine wrote:
> According to Anne & Lynn Wheeler <lynn@garlic.com>:
>> Two months later, came back with routes reimplemented in "C" from
>> scratch, demo'ed on rs/6000 320 where they put it through paces ...
>> different complex origin/destination combinations that the existing
>> implementations couldn't handle. workload profiles showed that ten (rack
>> mount) rs/6000 990s could handle all route requests for all airlines in
>> the world.
>
> Sounds a lot like what ITA did. Since route lookups are read-only, you
> can run a lot of stuff in parallel without locks.

Right, that's not even hard. :-)

The company I worked for at one point in time had the contract from the
Norwegian IRS to run the national social security number database.
(Please note that this DB was in fact owned by the IRS and not by the
obvious (?) social security department.)

Anyway, up to the point when I was asked to look into our bid for the
next 5-year period, these contracts had effectively requires running on
an IBM mainframe since several SNA/3270 protocols/API were required, but
the next one would not have this limitation so I was free to see what I
could do for a DB of maximum 7M records, each with not more than 1 KB of
data, so less than 8GB of primary data.

A pair of 64-bit workstation-class machines with 32 GB of RAM would
suffice to keep the entire DB, including all secondary tables and
indices, in memory. All updates were specified to happen via a daily
(i.e. nightly) bulk process so maintaining consistency was also trivial.

Except for the police which has an custom wild card search interface
available (but with extra logging so that it could be verified that this
happened due to an active investigation), every request would be single
record lookups.

Result: The pair (or more, geographically distributed?) of redundant
servers would be able to handle a couple of orders of magnitude more
traffic than the old mainframe DB solution, with sub-us processing time,
i.e. they would run at wire speed of the fastest network available.

Total cost for the required hw and for sw development could be recovered
in a few months compared to the old contract.

Terje

PS. Today I would probably have specified using a Foundation DB as the
repository instead of a totally custom setup.

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: lock me up, was IBM Mainframe market

<t3ede2$pv0$2@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24798&group=comp.arch#24798

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!rd9pRsUZyxkRLAEK7e/Uzw.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: lock me up, was IBM Mainframe market
Date: Sat, 16 Apr 2022 14:39:20 +0200
Organization: Aioe.org NNTP Server
Message-ID: <t3ede2$pv0$2@gioia.aioe.org>
References: <16su4hdjofh949len5eha1ncb73r4av8oe@4ax.com>
<t3big8$sem$1@newsreader4.netcologne.de>
<3e96e0bc-d48d-48a1-b0cc-8adca1442e31n@googlegroups.com>
<t3bpmp$lr$1@newsreader4.netcologne.de> <t3d16e$2h1r$3@gal.iecc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="26592"; posting-host="rd9pRsUZyxkRLAEK7e/Uzw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.11.1
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Sat, 16 Apr 2022 12:39 UTC

John Levine wrote:
> According to Thomas Koenig <tkoenig@netcologne.de>:
>>> it was that SAP they were referring to, for which Odoo was pitched
>>> as a replacement.
>>
>> ERP software is not a small market. SAP had an anuall turnover
>> of 27.8e9 Euros in 2021 (around 30 Gigadollars).
>
> According to Wikipedia, SAP is the #4 software company in the world after
> Microsoft, IBM, and Oracle. They have long been the biggest European
> computer company.
>

Didn't they use to be #2 for a while, i.e. following Microsoft?

(Back when IBM was a HW company and Oracle sold nothing but Oracle DB)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: lock me up, was IBM Mainframe market

<2022Apr16.142426@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24799&group=comp.arch#24799

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: lock me up, was IBM Mainframe market
Date: Sat, 16 Apr 2022 12:24:26 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 73
Message-ID: <2022Apr16.142426@mips.complang.tuwien.ac.at>
References: <16su4hdjofh949len5eha1ncb73r4av8oe@4ax.com> <t39nve$15rv$2@gal.iecc.com> <2022Apr14.224455@mips.complang.tuwien.ac.at> <t3a5b0$8b3$1@dont-email.me> <t3aie4$qiv$1@gal.iecc.com>
Injection-Info: reader02.eternal-september.org; posting-host="8ed3d4a44f83cdf3a7553622df5b4999";
logging-data="9637"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18f5f7WVB/uLYwPciP4iEoh"
Cancel-Lock: sha1:wrHEweaWVP0o2S/qSMapQ7iTBXI=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Sat, 16 Apr 2022 12:24 UTC

John Levine <johnl@taugh.com> writes:
>There are new banks and new airlines. If they could do their banking and
>flying on cheap racks of PCs, don't you think they would?

Not necessarily. For each individual company, it can be cheaper to
rent a mainframe and start with established software that handles all
the tax, money laundering etc. regulations rather than starting from
scratch, with lots of companies (IBM, the software companies, and
competing banks) around that will do their utmost to crucify you for
every mistake; and even if you don't make a mistake, everybody else
will tell their customers about your new bank that cannot even afford
proper banking computers. There are probably also network effects at
work that make it hard to break out of the established system.

>I can't find the reference now, but in the database biz people know
>that ten computers of speed X are not equivalent to one of speed 10X

Everybody knows that. It's Amdahl's law. But we are not talking
about ten computers of speed X and one with speed 10X here, we are
talking about several computers with up to 128 cores (if we take a
dual-socket EPYC system; but if you get the distributed software
right, it may be cheaper to use more smaller systems) against one
computer with up to about 240 cores that costs maybe 10x as much; the
cores themselves have little speed difference.

And since you mention per-bank systems, the whole stuff is distributed
anyway: the ATM is situated at one bank (or belongs to one ATM
company), the user's account often at a different one, they have to do
transactions on each bank's computers.

Credit card processing may be more centralized, because there are a
few (or, in Europe, basically one) credit card company, and each
credit card holder and (I presume) each dealer that accepts the card
has an account there, so this can be done within a single system. But
given that you do it in a distributed way for ATM transactions and
money transfers, you can also do it in a distributed way for credit
card payments, if you want to.

>exactly because of the lock contention.

But that's the point I am trying to get across: For the kind of
transactions mentioned here, lock contention is no problem:

1) Nearly all accounts don't do thousands of transactions per second;
and for those that do (say, Amazon's account at Visa or somesuch), you
can split it into multiple accounts if necessary (but I doubt that it
is necessary even for Amazon's volume if the software is written
accordingly).

2) Also, you don't need to hold locks on both accounts involved. The
classic way (from before computing) of money transfer is that the
sending account's bank is taking the money off the one account, sends
it over to the receiving account's bank, and the receiving bank puts
the money into the receiving account. [In the old days there were
IIRC three days when the money was on the way. The banks kept this
timespan up far into the computer age, because it was in their balance
during that time (and produced interest while interest rates were >0),
and it took some regulations for bringing that down to IIRC one day
relatively recently.] You can use the same principle for distributed
computerized credit-card processing or ATM processing.

>You can parallelize a lot, and
>they do parallelize a lot, but at some point, there are atomic
>transactions which means everyone looking at the data the transactions
>touch needs to use the same locks.

A transfer from account A to B and a transfer from account E to F
don't need to use the same locks.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: IBM Mainframe market was Re: Approximate reciprocals

<GSB6K.86844$Kdf.36873@fx96.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24801&group=comp.arch#24801

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx96.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: IBM Mainframe market was Re: Approximate reciprocals
References: <16su4hdjofh949len5eha1ncb73r4av8oe@4ax.com> <P3M5K.419982$iK66.339487@fx46.iad> <pNh6K.167452$8V_7.154516@fx04.iad> <t3ca2a$dpk$1@newsreader4.netcologne.de> <t3d0ql$2h1r$1@gal.iecc.com>
In-Reply-To: <t3d0ql$2h1r$1@gal.iecc.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 37
Message-ID: <GSB6K.86844$Kdf.36873@fx96.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sat, 16 Apr 2022 16:09:10 UTC
Date: Sat, 16 Apr 2022 12:07:43 -0400
X-Received-Bytes: 2405

by: EricP - Sat, 16 Apr 2022 16:07 UTC

John Levine wrote:
> According to Thomas Koenig <tkoenig@netcologne.de>:
>>> Calling Micro/370 a re-microcoded 68000 is
>>> like saying a Z80 is a re-microcoded 8080.
>>> It just doesn't do justice to all the work that went into it.
>> I have removed the reference to the 68000-360 from the German
>> Wikipedia accordingly. At least that reference led to an
>> interesting discussion here :-)
>
> You probably should put it back. The Micro/370 was an experimental chip that never shipped in a product.
>
> The chips in the XT/370 card really were reprogrammed 68K and 8087. See recent messages here for
> some references.

Oh right. The project time frames don't line up.
In Tredennick's book the Micro-370 starts in 1980 and says it might
have been done in about 2 years but project is under resourced and
they get the first chip in 1985.

The IBM paper says they had a 370 emulated entirely in software written
for a Motorola MC68000 in 1982, and the XT/370 is announced in Oct-83.

So they must have been running parallel projects.

In Apr-1988 IBM announced the IBM 7437 VM/SP Technical Workstation
which later became the Personal/370, aka P/370, is a complete
32-bit 370 on a micro-channel adapter (MCA) card for PS/2 or RS/6000.
It was only available to OEM's.
That sounds like it might have been the Micro-370 chip.

https://en.wikipedia.org/wiki/PC-based_IBM_mainframe-compatible_systems#Personal/370

I found a picture of a P/370
https://www.ardent-tool.com/CPU/P_370.html

Re: IBM Mainframe market was Re: Approximate reciprocals

<uMC6K.212365$jxu4.203096@fx02.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24802&group=comp.arch#24802

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.swapon.de!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx02.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: IBM Mainframe market was Re: Approximate reciprocals
References: <16su4hdjofh949len5eha1ncb73r4av8oe@4ax.com> <P3M5K.419982$iK66.339487@fx46.iad> <pNh6K.167452$8V_7.154516@fx04.iad> <t3ca2a$dpk$1@newsreader4.netcologne.de> <t3d0ql$2h1r$1@gal.iecc.com> <t3dq0j$cr2$1@newsreader4.netcologne.de>
In-Reply-To: <t3dq0j$cr2$1@newsreader4.netcologne.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 56
Message-ID: <uMC6K.212365$jxu4.203096@fx02.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sat, 16 Apr 2022 17:10:50 UTC
Date: Sat, 16 Apr 2022 13:10:22 -0400
X-Received-Bytes: 2923

by: EricP - Sat, 16 Apr 2022 17:10 UTC

Thomas Koenig wrote:
> John Levine <johnl@taugh.com> schrieb:
>> According to Thomas Koenig <tkoenig@netcologne.de>:
>>>> Calling Micro/370 a re-microcoded 68000 is
>>>> like saying a Z80 is a re-microcoded 8080.
>>>> It just doesn't do justice to all the work that went into it.
>>> I have removed the reference to the 68000-360 from the German
>>> Wikipedia accordingly. At least that reference led to an
>>> interesting discussion here :-)
>> You probably should put it back.
>
> Ii would not have made sense to put back what what there because it was
> wrong on serveral counts, and had no reference
>
> It read (deepl translation, I'm lazy)
>
> This led to such interesting variants as the 68000-360, which could
> execute a ''stripped down'' version of the [[IBM]][[System/360]]
> instruction set directly on the chip and was used for a small 360
> model by IBM.
>
> - It wasn't called 68000-360
> - It wasn't /360 instructions, it was /370
> - It wasn't used for a "small 360"
>
> I replaced it with (again, deepl translation)
>
> In the PC XT/370 plug-in card, which enabled a System/370 on a
> PC/XT, a 68000 with modified microcode was used to execute some of
> the System/370 instructions directly. The remaining instructions
> were emulated on another 68000.
>
> which is, I believe, accurate.

Sorry about that, they had overlapping projects.

I found what looks like the patent for the XT/370 processor
but it still doesn't say how they did the decoder.
There doesn't seem to be any other technical details on it.

Methods for partitioning mainframe instruction sets to
implement microprocessor based emulation thereof, 1982
https://patents.google.com/patent/US4514803

which is basically a copy of the published paper

[paywalled]
Microprocessor Implementation of Mainframe Processors by
Means of Architecture Partitioning, IBM J. RES. DEVELOP 1982
https://ieeexplore.ieee.org/abstract/document/5390522/

Re: IBM Mainframe market was Re: Approximate reciprocals

<87r15wzzdu.fsf@localhost>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24803&group=comp.arch#24803

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: lyn...@garlic.com (Anne & Lynn Wheeler)
Newsgroups: comp.arch
Subject: Re: IBM Mainframe market was Re: Approximate reciprocals
Date: Sat, 16 Apr 2022 07:55:41 -1000
Organization: Wheeler&Wheeler
Lines: 77
Message-ID: <87r15wzzdu.fsf@localhost>
References: <16su4hdjofh949len5eha1ncb73r4av8oe@4ax.com>
<P3M5K.419982$iK66.339487@fx46.iad>
<pNh6K.167452$8V_7.154516@fx04.iad>
<t3ca2a$dpk$1@newsreader4.netcologne.de> <t3d0ql$2h1r$1@gal.iecc.com>
<GSB6K.86844$Kdf.36873@fx96.iad>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="ac7204ec9c0b4d7fddd5ca5e2cd19dd0";
logging-data="32381"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/71xlVcTC+/Tn5g0UNopVNaSjikF7Cofg="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:ZTZu+2E8fC7QUBxmLLGTg9afHzI=
sha1:3AhCNLeohoPpXLOG64f33hIshfI=

by: Anne & Lynn Whee - Sat, 16 Apr 2022 17:55 UTC

EricP <ThatWouldBeTelling@thevillage.com> writes:
> In Apr-1988 IBM announced the IBM 7437 VM/SP Technical Workstation
> which later became the Personal/370, aka P/370, is a complete
> 32-bit 370 on a micro-channel adapter (MCA) card for PS/2 or RS/6000.
> It was only available to OEM's.
> That sounds like it might have been the Micro-370 chip.
>
> https://en.wikipedia.org/wiki/PC-based_IBM_mainframe-compatible_systems#Personal/370

upthread I posted doing some work for A74 (which gets announced as 7437
.... with email reference from '85 & '86) ... running at 350KIPS (better
than the xt/370 100KIPS).
http://www.garlic.com/~lynn/2015d.html#email850503
http://www.garlic.com/~lynn/2000e.html#email880622

other XT/370 (AT/370) email
http://www.garlic.com/~lynn/2015e.html#email850617

other a74 email
http://www.garlic.com/~lynn/2015d.html#email85020
http://www.garlic.com/~lynn/2015d.html#email85020b
http://www.garlic.com/~lynn/2015d.html#email85020c
http://www.garlic.com/~lynn/2015d.html#email85020d

mid-80s, IBM Boeblingen (Germany) also did ("ROMAN") 3-chip 370 running
at 3MIPS ... I was looking at how many I could cram into a rack. trivia:
There were monthly meetings at (Stanford) SLAC where would see a lot of
people in Silicon valley ... including many of the Amdahl people. One of
them on business trip to Nixdorf in Germany found they had a bootlegged
copy of the ROMON specification. He confiscated it and mailed it to me
for mailing back to Boeblingen.

'85 ROMAN email
http://www.garlic.com/~lynn/2007c.html#email850712

'85 email mentioning HSDT & suppose to do interconnect for the
NSF supercomputing centers, as well A74 and ROMAN work
http://www.garlic.com/~lynn/2011c.html#email850425b
same archived post with email mentioning HSDT pitch for Berkeley, NSF,
NCAR, etc
http://www.garlic.com/~lynn/2011c.html#email850426

posts also reference was suppose to do pitch to NSF Director on NSF
supercomputer interconnect at same time (IBM) yorktown wanting me for a
week to talk about how many processor chips could be crammed into a
rack (and in addition to all ROMAN 370 chips, how many Blue Iliad,
1st 32bit 801/risc chip) ... misc other
http://www.garlic.com/~lynn/2021g.html#email850312
http://www.garlic.com/~lynn/2021g.html#email850313
http://www.garlic.com/~lynn/2021g.html#email850314
http://www.garlic.com/~lynn/2007d.html#email850315

more HSDT/NSF topic drift: congress cuts the (NSF) budget, some other
things happen, and eventually an RFP is released ... preliminary
announce Preliminary announce (Mar1986)
http://www.garlic.com/~lynn/2002k.html#12
The OASC has initiated three programs: The Supercomputer Centers Program
to provide Supercomputer cycles; the New Technologies Program to foster
new supercomputer software and hardware developments; and the Networking
Program to build a National Supercomputer Access Network - NSFnet.

.... internal IBM politics prevent us from bidding on the RFP. the NSF
director tries to help by writing the company a letter (with support
from other agencies), but that just makes the internal politics worse
(as did claims that what we already had running was at least 5yrs ahead
of the winning bid). The winning bid doesn't even install T1 links
called for ... they are 440kbit/sec links ... but apparently to make it
look like its meeting the requirements, they install telco multiplexors
with T1 trunks (running multiple links/trunk). We periodically ridicule
them that why don't they call it a T5 network (because some of those T1
trunks would in turn be multiplexed over T3 or even T5 trunks). as
regional networks connect in, it becomes the NSFNET backbone, precursor
to modern internet
https://www.technologyreview.com/s/401444/grid-computing/

--
virtualization experience starting Jan1968, online at home since Mar1970

Re: lock me up, was IBM Mainframe market

<t3f005$2f11$1@gal.iecc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24804&group=comp.arch#24804

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!not-for-mail
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch
Subject: Re: lock me up, was IBM Mainframe market
Date: Sat, 16 Apr 2022 17:56:21 -0000 (UTC)
Organization: Taughannock Networks
Message-ID: <t3f005$2f11$1@gal.iecc.com>
References: <16su4hdjofh949len5eha1ncb73r4av8oe@4ax.com> <t3a5b0$8b3$1@dont-email.me> <t3aie4$qiv$1@gal.iecc.com> <2022Apr16.142426@mips.complang.tuwien.ac.at>
Injection-Date: Sat, 16 Apr 2022 17:56:21 -0000 (UTC)
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="80929"; mail-complaints-to="abuse@iecc.com"
In-Reply-To: <16su4hdjofh949len5eha1ncb73r4av8oe@4ax.com> <t3a5b0$8b3$1@dont-email.me> <t3aie4$qiv$1@gal.iecc.com> <2022Apr16.142426@mips.complang.tuwien.ac.at>
Cleverness: some
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: johnl@iecc.com (John Levine)

by: John Levine - Sat, 16 Apr 2022 17:56 UTC

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>Everybody knows that. It's Amdahl's law. But we are not talking
>about ten computers of speed X and one with speed 10X here, we are
>talking about several computers with up to 128 cores (if we take a
>dual-socket EPYC system; but if you get the distributed software
>right, it may be cheaper to use more smaller systems) against one
>computer with up to about 240 cores that costs maybe 10x as much; the
>cores themselves have little speed difference.

To reiterate once again the point I'm trying to make, we all know that
you can get a lot more nominal MIPS/$ with a loosely coupled distributed
system. The problem is that some database stuff is inherently serial,
you can't just wave it away, and that's what keeps mainframes relevant.

>And since you mention per-bank systems, the whole stuff is distributed
>anyway: the ATM is situated at one bank (or belongs to one ATM
>company), the user's account often at a different one, they have to do
>transactions on each bank's computers.

Of course, but if my wife, my daughter, and I all try to take $100 out
of our bank account at the same time from three different places and
there's only $200 in the account, one of us has to lose. Some stuff is
inherently serial.

>>exactly because of the lock contention.
>
>But that's the point I am trying to get across: For the kind of
>transactions mentioned here, lock contention is no problem:

Sorry, I don't know how to explain this any more clearly. You can make
a lot of locks go away, but you can't make all the locks go away, and
if you have to deal with locks, one fast machine does it a lot better
than multiple slow machines.

>2) Also, you don't need to hold locks on both accounts involved. The
>classic way (from before computing) of money transfer is that the
>sending account's bank is taking the money off the one account, sends
>it over to the receiving account's bank, and the receiving bank puts
>the money into the receiving account.

I know, we still have paper checks in the US. But the era when you had
three days to bounce a check is long gone. The US Federal Reserve is
developing a scheme called FedNow which is supposed to do realtime P2P
bank payments next year, similar to what the UK and other countries
already have. I wouldn't dump my IBM stock just yet.

R's,
John
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Re: lock me up, was IBM Mainframe market

<fa5e14d4-5b73-41c2-9a49-55a2907eeaffn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24805&group=comp.arch#24805

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:1453:b0:2f1:d20c:63f8 with SMTP id v19-20020a05622a145300b002f1d20c63f8mr3014011qtx.670.1650138690974;
Sat, 16 Apr 2022 12:51:30 -0700 (PDT)
X-Received: by 2002:a05:6808:2219:b0:322:759c:a394 with SMTP id
bd25-20020a056808221900b00322759ca394mr572449oib.54.1650138690704; Sat, 16
Apr 2022 12:51:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 16 Apr 2022 12:51:30 -0700 (PDT)
In-Reply-To: <t3f005$2f11$1@gal.iecc.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:5cdc:b922:6737:3cc7;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:5cdc:b922:6737:3cc7
References: <16su4hdjofh949len5eha1ncb73r4av8oe@4ax.com> <t3a5b0$8b3$1@dont-email.me>
<t3aie4$qiv$1@gal.iecc.com> <2022Apr16.142426@mips.complang.tuwien.ac.at> <t3f005$2f11$1@gal.iecc.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fa5e14d4-5b73-41c2-9a49-55a2907eeaffn@googlegroups.com>
Subject: Re: lock me up, was IBM Mainframe market
From: already5...@yahoo.com (Michael S)
Injection-Date: Sat, 16 Apr 2022 19:51:30 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 86

by: Michael S - Sat, 16 Apr 2022 19:51 UTC

On Saturday, April 16, 2022 at 8:56:24 PM UTC+3, John Levine wrote:
> According to Anton Ertl <an...@mips.complang.tuwien.ac.at>:
> >Everybody knows that. It's Amdahl's law. But we are not talking
> >about ten computers of speed X and one with speed 10X here, we are
> >talking about several computers with up to 128 cores (if we take a
> >dual-socket EPYC system; but if you get the distributed software
> >right, it may be cheaper to use more smaller systems) against one
> >computer with up to about 240 cores that costs maybe 10x as much; the
> >cores themselves have little speed difference.
> To reiterate once again the point I'm trying to make, we all know that
> you can get a lot more nominal MIPS/$ with a loosely coupled distributed
> system. The problem is that some database stuff is inherently serial,
> you can't just wave it away, and that's what keeps mainframes relevant.

Except that mainframes are relative newcomers to Big-SMP.
The wave started with Sun Fire.
The lead in an area of big-SMP forked between HPC-oriented SGI machines,
that were very big, but worst-case latency of remote memory access was also
relatively big, and business-oriented IBM Power (p690, p5 595, p6 595, p795) and HP
Superdome that were smaller than top SGI machines but featured faster remote memory
access, at least on idle or near-idle machine.

It's not that during this period S/360 derivatives (i.e. z series/system z) didn't offere SMP
models. They did. But top z tended to be 2-3 times smaller than top POWER or Superdome.

But around the start of 2010s SGI died, HP had outgrown IBM as a business, but practically
died as leading-edge R&D organization. And IBM's Power group lost enthusiasm about big
SMP. They still released E880 (in 2016?), but it was a little smaller than the biggest p795.

All this allowed to Big Iron to cache up with Big Tin and eventually to top it.
It seems, the latest z still does not have as many cores as the biggest p795, but cores are
much newer and supposedly faster.
I say "supposedly", because no benchmarks published.

> >And since you mention per-bank systems, the whole stuff is distributed
> >anyway: the ATM is situated at one bank (or belongs to one ATM
> >company), the user's account often at a different one, they have to do
> >transactions on each bank's computers.
> Of course, but if my wife, my daughter, and I all try to take $100 out
> of our bank account at the same time from three different places and
> there's only $200 in the account, one of us has to lose. Some stuff is
> inherently serial.
> >>exactly because of the lock contention.
> >
> >But that's the point I am trying to get across: For the kind of
> >transactions mentioned here, lock contention is no problem:
> Sorry, I don't know how to explain this any more clearly. You can make
> a lot of locks go away, but you can't make all the locks go away, and
> if you have to deal with locks, one fast machine does it a lot better
> than multiple slow machines.

The speed of communication between core depends mostly on physical layers
of the COMM and on number of hops.
Communication within SMP is not inherently faster than communication in cluster
with equally good links. More like the opposite, at least for some sort of communication
patterns. IBM itself demonstrated it to the world 20 years ago, when their BlueGene
cluster were winning ping-pong latency benchmarks over SGI's contemporary big-SMP.

Back to 2022, the number of hops in 240-core z and in a small cluster of EPYC3
machines that has approximately the same # of cores or even two times more,
is more or less the same.
In Intel-based systems the # of hops would be smaller, because Intel has more
cores per die. But right now Intel is not competitive for other reasons. Specifically,
some of their products (Cascade Lake) are based on very old cores and others
(Ice Lake server) are screwed by struggling silicon tech.
But with Saphire Rapids Intel should take a significant lead in a field of sped of
remote memory access. That is, unless they screw something again.

> >2) Also, you don't need to hold locks on both accounts involved. The
> >classic way (from before computing) of money transfer is that the
> >sending account's bank is taking the money off the one account, sends
> >it over to the receiving account's bank, and the receiving bank puts
> >the money into the receiving account.
> I know, we still have paper checks in the US. But the era when you had
> three days to bounce a check is long gone. The US Federal Reserve is
> developing a scheme called FedNow which is supposed to do realtime P2P
> bank payments next year, similar to what the UK and other countries
> already have. I wouldn't dump my IBM stock just yet.
>
>
> R's,
> John
> --
> Regards,
> John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
> Please consider the environment before reading this e-mail. https://jl.ly

Re: IBM Mainframe market was Re: Approximate reciprocals

<87wnfoyate.fsf@localhost>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24806&group=comp.arch#24806

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: lyn...@garlic.com (Anne & Lynn Wheeler)
Newsgroups: comp.arch
Subject: Re: IBM Mainframe market was Re: Approximate reciprocals
Date: Sat, 16 Apr 2022 11:31:41 -1000
Organization: Wheeler&Wheeler
Lines: 15
Message-ID: <87wnfoyate.fsf@localhost>
References: <16su4hdjofh949len5eha1ncb73r4av8oe@4ax.com>
<P3M5K.419982$iK66.339487@fx46.iad>
<pNh6K.167452$8V_7.154516@fx04.iad>
<t3ca2a$dpk$1@newsreader4.netcologne.de> <t3d0ql$2h1r$1@gal.iecc.com>
<GSB6K.86844$Kdf.36873@fx96.iad> <87r15wzzdu.fsf@localhost>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="ac7204ec9c0b4d7fddd5ca5e2cd19dd0";
logging-data="27530"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/rDiejTwrIAH1d5KrGSbe1q68Wgl6WIyg="
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:AvNZdL+L1bKVlHUQcUbU+T/NVjY=
sha1:Zdnj9lJRzxQ+qmyMyoCmVjLGi2A=

by: Anne & Lynn Whee - Sat, 16 Apr 2022 21:31 UTC

Anne & Lynn Wheeler <lynn@garlic.com> writes:
> other a74 email
> http://www.garlic.com/~lynn/2015d.html#email85020
> http://www.garlic.com/~lynn/2015d.html#email85020b
> http://www.garlic.com/~lynn/2015d.html#email85020c
> http://www.garlic.com/~lynn/2015d.html#email85020d

typo, dropped a digit
http://www.garlic.com/~lynn/2015d.html#email850520
http://www.garlic.com/~lynn/2015d.html#email850520b
http://www.garlic.com/~lynn/2015d.html#email850520c
http://www.garlic.com/~lynn/2015d.html#email850520d

--
virtualization experience starting Jan1968, online at home since Mar1970

Re: IBM Mainframe market was Re: Approximate reciprocals

<3758f6fb-477c-459b-95e1-50f550844a6an@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24807&group=comp.arch#24807

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:2552:b0:67b:32e2:2400 with SMTP id s18-20020a05620a255200b0067b32e22400mr3328983qko.768.1650168579743;
Sat, 16 Apr 2022 21:09:39 -0700 (PDT)
X-Received: by 2002:a05:6808:2185:b0:2d9:ebf0:fb66 with SMTP id
be5-20020a056808218500b002d9ebf0fb66mr2498944oib.69.1650168579507; Sat, 16
Apr 2022 21:09:39 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 16 Apr 2022 21:09:39 -0700 (PDT)
In-Reply-To: <t3dqov$dmi$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:6947:3c86:73e1:a64e;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:6947:3c86:73e1:a64e
References: <16su4hdjofh949len5eha1ncb73r4av8oe@4ax.com> <P3M5K.419982$iK66.339487@fx46.iad>
<pNh6K.167452$8V_7.154516@fx04.iad> <t3ca2a$dpk$1@newsreader4.netcologne.de>
<t3d0ql$2h1r$1@gal.iecc.com> <t3dqov$dmi$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3758f6fb-477c-459b-95e1-50f550844a6an@googlegroups.com>
Subject: Re: IBM Mainframe market was Re: Approximate reciprocals
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sun, 17 Apr 2022 04:09:39 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 22

by: Quadibloc - Sun, 17 Apr 2022 04:09 UTC

On Saturday, April 16, 2022 at 1:21:06 AM UTC-6, Thomas Koenig wrote:

> I now replaced it with (again, deepl translation)
> In the [[PC XT/370]] plug-in card, which enabled a [[System/370]]
> on an [[IBM Personal Computer XT|PC/XT]], a 68000 with modified
> microcode was used to execute some of the System/370 instructions
> directly. The remaining instructions were emulated on another
> 68000 or executed on the [[Intel 8087|8087]] used as [[floating
> point unit|FPU]].
>
> which should be accurate.

I went and checked; the Wikipedia article gave a reference to an IBM
disclosure on the Prior Art Database, an archived copy of the site was referenced,
from which I could read an abbreviated version of the text - and the description
matches that. The second 68000 was slightly modified, not by changing its
instruction set, but by adding an address pin to facilitate intercommunication with
the master 68000 which executed the basic 370 subset.

However, the 8087 was also modified; remember, the 370 wasn't using BFP yet
at that time.

John Savard

Re: IBM Mainframe market was Re: Approximate reciprocals

<F%V6K.92585$n41.21057@fx35.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24809&group=comp.arch#24809

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx35.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: IBM Mainframe market was Re: Approximate reciprocals
References: <16su4hdjofh949len5eha1ncb73r4av8oe@4ax.com> <P3M5K.419982$iK66.339487@fx46.iad> <pNh6K.167452$8V_7.154516@fx04.iad> <t3ca2a$dpk$1@newsreader4.netcologne.de> <t3d0ql$2h1r$1@gal.iecc.com> <GSB6K.86844$Kdf.36873@fx96.iad> <87r15wzzdu.fsf@localhost>
In-Reply-To: <87r15wzzdu.fsf@localhost>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 24
Message-ID: <F%V6K.92585$n41.21057@fx35.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 17 Apr 2022 15:04:05 UTC
Date: Sun, 17 Apr 2022 11:03:32 -0400
X-Received-Bytes: 1967

by: EricP - Sun, 17 Apr 2022 15:03 UTC

Anne & Lynn Wheeler wrote:
> EricP <ThatWouldBeTelling@thevillage.com> writes:
>> In Apr-1988 IBM announced the IBM 7437 VM/SP Technical Workstation
>> which later became the Personal/370, aka P/370, is a complete
>> 32-bit 370 on a micro-channel adapter (MCA) card for PS/2 or RS/6000.
>> It was only available to OEM's.
>> That sounds like it might have been the Micro-370 chip.
>>
>> https://en.wikipedia.org/wiki/PC-based_IBM_mainframe-compatible_systems#Personal/370
>
> upthread I posted doing some work for A74 (which gets announced as 7437
> .... with email reference from '85 & '86) ... running at 350KIPS (better
> than the xt/370 100KIPS).

That 350 KIPS performance matches what the Micro-370 documents say,
that it was 250 KIPS at 10 MHz and early tests were up to 16 MHZ
which would be ~400 KIPS. Possibly they later got to 800 KIPS.
Call the Micro-370 a second generation chip.

That could probably make the 3.5 MIPS Personal/370
in the wikipedia page a different 3rd generation 370 chip.

Re: lock me up, was IBM Mainframe market

<2022Apr18.184748@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24810&group=comp.arch#24810

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: lock me up, was IBM Mainframe market
Date: Mon, 18 Apr 2022 16:47:48 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 106
Message-ID: <2022Apr18.184748@mips.complang.tuwien.ac.at>
References: <16su4hdjofh949len5eha1ncb73r4av8oe@4ax.com> <t3a5b0$8b3$1@dont-email.me> <t3aie4$qiv$1@gal.iecc.com> <2022Apr16.142426@mips.complang.tuwien.ac.at> <t3f005$2f11$1@gal.iecc.com>
Injection-Info: reader02.eternal-september.org; posting-host="879918baff984b89c43eb70ef59c4892";
logging-data="25229"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX180ORAdWGyf00+QMbXMi5ke"
Cancel-Lock: sha1:7Aw4lkVZqwYMXZKUGVnArjA8L4Y=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Mon, 18 Apr 2022 16:47 UTC

John Levine <johnl@taugh.com> writes:
>According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>>Everybody knows that. It's Amdahl's law. But we are not talking
>>about ten computers of speed X and one with speed 10X here, we are
>>talking about several computers with up to 128 cores (if we take a
>>dual-socket EPYC system; but if you get the distributed software
>>right, it may be cheaper to use more smaller systems) against one
>>computer with up to about 240 cores that costs maybe 10x as much; the
>>cores themselves have little speed difference.
>
>To reiterate once again the point I'm trying to make, we all know that
>you can get a lot more nominal MIPS/$ with a loosely coupled distributed
>system. The problem is that some database stuff is inherently serial,
>you can't just wave it away, and that's what keeps mainframes relevant.

Yes, there is database stuff that requires shared memory, and that
keeps big systems around. For the inherently serial ones, a
single-core system is enough, but maybe the RAM or I/O requirements
require some big system.

When discussing these requirements, give examples where shared memory
is necessary/that are inherently serial. ATM processing is not such
an example; indeed it requires distributed processing. Credit card
processing is not such an example; while it can be performed on a
shared-memory machine, it can also be performed in a distributed way.

Certainly the three different ATMs will send their requests to the
computer holding your account, and that computer will process them one
after the other, and the last request will be denied. But for
processing this account of yours a single-core computer with meagre
memory is sufficient; given the resources of a modern PC, it can
process a large number of accounts (not because the problem requires
it, but because it's much cheaper than the PC-per-account setup).

>>But that's the point I am trying to get across: For the kind of
>>transactions mentioned here, lock contention is no problem:
>
>Sorry, I don't know how to explain this any more clearly. You can make
>a lot of locks go away, but you can't make all the locks go away,

I did not claim that there would be no locks, just not lock
contention, except maybe if Amazon has all its transactions on one
account.

Your family members rarely try to take money out of your account at
the same time, and if they do, the locks on your account are probably
contended a few microseconds or (if the code has not been optimized
for holding the account lock for a short time) maybe milliseconds,
which is no problem for a transaction that typically takes many
seconds of the user's time.

>and
>if you have to deal with locks, one fast machine does it a lot better
>than multiple slow machines.

I don't think an IBM mainframe is any faster when dealing with a
single contended lock than, e.g., a PC.

A shared-memory system is probably faster at locking than a
distributed-locking protocoll, but then you need an example that would
require distributed locking.

I was actually thinking about money transfers (what we do a lot in the
EU), not checks.

>But the era when you had
>three days to bounce a check is long gone.

And now they also only have one day to hold your money in their
balance on a money transfer. And the actual transfer could happen in
seconds, thanks to modern networking. My point is that money
transfers have never needed to lock both accounts, and the difference
in the times when the transaction happens on the accounts shows this
nicely. And ATM and credit card transactions can be implemented in
the same way (I don't know if they are), of course with much faster
transmission needed in the ATM case (but that's not a problem).

> I wouldn't dump my IBM stock just yet.

I have no IBM stock. I think that the mainframe business gives them
nice profits, but it's not the main source of revenue for IBM these
days, so even if all mainframe customers switched to something based
on PC technology, I don't expect a big dent in their stock.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Approximate reciprocals

<t3k9lo$af7$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24811&group=comp.arch#24811

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Approximate reciprocals
Date: Mon, 18 Apr 2022 20:12:07 +0200
Organization: A noiseless patient Spider
Lines: 70
Message-ID: <t3k9lo$af7$1@dont-email.me>
References: <t1c154$j5t$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 18 Apr 2022 18:12:08 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7073742fa75404c478393c819c811ca8";
logging-data="10727"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19jtlDzqTKAECMWuH9mdJ5Zk4Nrpy5cGU8="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Cancel-Lock: sha1:bWuj6DZAk8semt5p3n2whS239fA=
In-Reply-To: <t1c154$j5t$1@dont-email.me>
Content-Language: en-US

by: Marcus - Mon, 18 Apr 2022 18:12 UTC

On 2022-03-22, Marcus wrote:
> Hello group!
>
> A class of instructions that is very tempting to include in an ISA is
> approximate floating-point reciprocals (1/x & 1/sqrt(x)), and
> possibly specialized instructions for improving precision
> (a Newton-Raphson step).
>
> Many ISAs have such instructions:
>
> CRAY:
> * 070ijx - Floating reciprocal approximation
> * 067ijk - Reciprocal iteration
>
> ARM:
> * FRECPE - Floating-point reciprocal estimate
> * FRECPS - Floating-point reciprocal step
> * FRSQRTE - Floating-point reciprocal square root estimate
> * FRSQRTS - Floating-point reciprocal square root step
>
> POWER:
> * FRES - Floating reciprocal estimate single
> * FRSQRTE - Floating reciprocal square root estimate
>
> x86:
> * RSQRTSS - Approximate reciprocal square root
>
> TI C67x:
> * RCPSP - Floating-Point reciprocal approximation
> * RSQRSP - Floating-Point reciprocal square root approximation
>
> ...and there are probably others.
>
> What are your feelings towards including such instructions in an ISA?
>
> My own feelings are mixed.
>
> Pros:
> * Easy to implement in hardware.
> * Can provide a significant speedup for certain workloads, especially
> if limited accuracy is acceptable.
>
> Cons:
> * Hard to specify exact operation.
> * Likely a source of poor portability (borderline undefined behavior).
> * The "step/iteration" instructions can usually be replaced by FMA.
>
> /Marcus

One more thing that I discovered about a week ago:

In order to support the IEEE 754 semantics of 1 / 0 == Inf, you actually
*need* a special Newtom-Raphson step instruction that handles the
special case where x = 0 and y = Inf.

Consider the approximation of y = 1 / x.

A single NR step then becomes: y' = y * (2 - x * y)

When x = 0, the approximation would return Inf, and thus we would end up
calculating x * y = 0 * Inf = NaN according to an IEEE 754 compliant
implementation.

The ARM instruction FRECPS (Floating-point reciprocal step) implements
2 - x * y, but handles the special case x == 0 && y == Inf (or vice
versa) and returns 2 instead of NaN. Thus the complete step would be
Inf * 2 = Inf, which is correct.

/Marcus

Re: lock me up, was IBM Mainframe market

<t3kckj$22h$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24812&group=comp.arch#24812

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: lock me up, was IBM Mainframe market
Date: Mon, 18 Apr 2022 12:02:42 -0700
Organization: A noiseless patient Spider
Lines: 115
Message-ID: <t3kckj$22h$1@dont-email.me>
References: <16su4hdjofh949len5eha1ncb73r4av8oe@4ax.com>
<t3a5b0$8b3$1@dont-email.me> <t3aie4$qiv$1@gal.iecc.com>
<2022Apr16.142426@mips.complang.tuwien.ac.at> <t3f005$2f11$1@gal.iecc.com>
<2022Apr18.184748@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 18 Apr 2022 19:02:43 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="1fc4e32a84b6e222c1b12f13633302bc";
logging-data="2129"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18TSI3UVZo4dQs7k56kMO76"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.8.0
Cancel-Lock: sha1:Y3J3F2agSvUMBkMh9HZZDb9ge6I=
In-Reply-To: <2022Apr18.184748@mips.complang.tuwien.ac.at>
Content-Language: en-US

by: Ivan Godard - Mon, 18 Apr 2022 19:02 UTC

On 4/18/2022 9:47 AM, Anton Ertl wrote:
> John Levine <johnl@taugh.com> writes:
>> According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
>>> Everybody knows that. It's Amdahl's law. But we are not talking
>>> about ten computers of speed X and one with speed 10X here, we are
>>> talking about several computers with up to 128 cores (if we take a
>>> dual-socket EPYC system; but if you get the distributed software
>>> right, it may be cheaper to use more smaller systems) against one
>>> computer with up to about 240 cores that costs maybe 10x as much; the
>>> cores themselves have little speed difference.
>>
>> To reiterate once again the point I'm trying to make, we all know that
>> you can get a lot more nominal MIPS/$ with a loosely coupled distributed
>> system. The problem is that some database stuff is inherently serial,
>> you can't just wave it away, and that's what keeps mainframes relevant.
>
> Yes, there is database stuff that requires shared memory, and that
> keeps big systems around. For the inherently serial ones, a
> single-core system is enough, but maybe the RAM or I/O requirements
> require some big system.
>
> When discussing these requirements, give examples where shared memory
> is necessary/that are inherently serial. ATM processing is not such
> an example; indeed it requires distributed processing. Credit card
> processing is not such an example; while it can be performed on a
> shared-memory machine, it can also be performed in a distributed way.
>
>>> And since you mention per-bank systems, the whole stuff is distributed
>>> anyway: the ATM is situated at one bank (or belongs to one ATM
>>> company), the user's account often at a different one, they have to do
>>> transactions on each bank's computers.
>>
>> Of course, but if my wife, my daughter, and I all try to take $100 out
>> of our bank account at the same time from three different places and
>> there's only $200 in the account, one of us has to lose. Some stuff is
>> inherently serial.
>
> Certainly the three different ATMs will send their requests to the
> computer holding your account, and that computer will process them one
> after the other, and the last request will be denied. But for
> processing this account of yours a single-core computer with meagre
> memory is sufficient; given the resources of a modern PC, it can
> process a large number of accounts (not because the problem requires
> it, but because it's much cheaper than the PC-per-account setup).
>
>>> But that's the point I am trying to get across: For the kind of
>>> transactions mentioned here, lock contention is no problem:
>>
>> Sorry, I don't know how to explain this any more clearly. You can make
>> a lot of locks go away, but you can't make all the locks go away,
>
> I did not claim that there would be no locks, just not lock
> contention, except maybe if Amazon has all its transactions on one
> account.
>
> Your family members rarely try to take money out of your account at
> the same time, and if they do, the locks on your account are probably
> contended a few microseconds or (if the code has not been optimized
> for holding the account lock for a short time) maybe milliseconds,
> which is no problem for a transaction that typically takes many
> seconds of the user's time.
>
>> and
>> if you have to deal with locks, one fast machine does it a lot better
>> than multiple slow machines.
>
> I don't think an IBM mainframe is any faster when dealing with a
> single contended lock than, e.g., a PC.
>
> A shared-memory system is probably faster at locking than a
> distributed-locking protocoll, but then you need an example that would
> require distributed locking.
>
>>> 2) Also, you don't need to hold locks on both accounts involved. The
>>> classic way (from before computing) of money transfer is that the
>>> sending account's bank is taking the money off the one account, sends
>>> it over to the receiving account's bank, and the receiving bank puts
>>> the money into the receiving account.
>>
>> I know, we still have paper checks in the US.
>
> I was actually thinking about money transfers (what we do a lot in the
> EU), not checks.
>
>> But the era when you had
>> three days to bounce a check is long gone.
>
> And now they also only have one day to hold your money in their
> balance on a money transfer. And the actual transfer could happen in
> seconds, thanks to modern networking. My point is that money
> transfers have never needed to lock both accounts, and the difference
> in the times when the transaction happens on the accounts shows this
> nicely. And ATM and credit card transactions can be implemented in
> the same way (I don't know if they are), of course with much faster
> transmission needed in the ATM case (but that's not a problem).
>
>> I wouldn't dump my IBM stock just yet.
>
> I have no IBM stock. I think that the mainframe business gives them
> nice profits, but it's not the main source of revenue for IBM these
> days, so even if all mainframe customers switched to something based
> on PC technology, I don't expect a big dent in their stock.
>
> - anton

That's not actually the way distributed accounting works (or at least
worked). There are no locks. If there were locks, then if the
connection went down after acquiring the lock the parties could be
locked out of the account indefinitely. The naive way to deal with that
is to have the transaction timeout, but that's the Byzantine Generals
problem. (https://en.wikipedia.org/wiki/Two_Generals%27_Problem)

The way it used to be done (back when I worked in the area) was a
distributed two-phase commit with a voting algorithm to deal with loss
of the commit node(s). How it's done now I have no idea.

Re: Approximate reciprocals

<28ead543-b6e3-4c6d-a346-aa3e5a171811n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24813&group=comp.arch#24813

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:110a:0:b0:2f1:ea84:b84 with SMTP id c10-20020ac8110a000000b002f1ea840b84mr8046001qtj.463.1650308855239;
Mon, 18 Apr 2022 12:07:35 -0700 (PDT)
X-Received: by 2002:a05:6870:204c:b0:da:b3f:2b86 with SMTP id
l12-20020a056870204c00b000da0b3f2b86mr6485850oad.293.1650308854962; Mon, 18
Apr 2022 12:07:34 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 18 Apr 2022 12:07:34 -0700 (PDT)
In-Reply-To: <t3k9lo$af7$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5d5:4aad:8adc:41a3;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5d5:4aad:8adc:41a3
References: <t1c154$j5t$1@dont-email.me> <t3k9lo$af7$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <28ead543-b6e3-4c6d-a346-aa3e5a171811n@googlegroups.com>
Subject: Re: Approximate reciprocals
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 18 Apr 2022 19:07:35 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 90

by: MitchAlsup - Mon, 18 Apr 2022 19:07 UTC

On Monday, April 18, 2022 at 1:12:11 PM UTC-5, Marcus wrote:
> On 2022-03-22, Marcus wrote:
> > Hello group!
> >
> > A class of instructions that is very tempting to include in an ISA is
> > approximate floating-point reciprocals (1/x & 1/sqrt(x)), and
> > possibly specialized instructions for improving precision
> > (a Newton-Raphson step).
> >
> > Many ISAs have such instructions:
> >
> > CRAY:
> > * 070ijx - Floating reciprocal approximation
> > * 067ijk - Reciprocal iteration
> >
> > ARM:
> > * FRECPE - Floating-point reciprocal estimate
> > * FRECPS - Floating-point reciprocal step
> > * FRSQRTE - Floating-point reciprocal square root estimate
> > * FRSQRTS - Floating-point reciprocal square root step
> >
> > POWER:
> > * FRES - Floating reciprocal estimate single
> > * FRSQRTE - Floating reciprocal square root estimate
> >
> > x86:
> > * RSQRTSS - Approximate reciprocal square root
> >
> > TI C67x:
> > * RCPSP - Floating-Point reciprocal approximation
> > * RSQRSP - Floating-Point reciprocal square root approximation
> >
> > ...and there are probably others.
> >
> > What are your feelings towards including such instructions in an ISA?
> >
> > My own feelings are mixed.
> >
> > Pros:
> > * Easy to implement in hardware.
> > * Can provide a significant speedup for certain workloads, especially
> > if limited accuracy is acceptable.
> >
> > Cons:
> > * Hard to specify exact operation.
> > * Likely a source of poor portability (borderline undefined behavior).
> > * The "step/iteration" instructions can usually be replaced by FMA.
> >
> > /Marcus
>
>
> One more thing that I discovered about a week ago:
>
> In order to support the IEEE 754 semantics of 1 / 0 == Inf, you actually
> *need* a special Newtom-Raphson step instruction that handles the
> special case where x = 0 and y = Inf.
>
> Consider the approximation of y = 1 / x.
>
> A single NR step then becomes: y' = y * (2 - x * y)
>
> When x = 0, the approximation would return Inf, and thus we would end up
> calculating x * y = 0 * Inf = NaN according to an IEEE 754 compliant
> implementation.
<
I am going to complain about the word implementation.
<
How one performs RCP and RSQRT is a choice and there is a fairly rich set
of algorithms; Newton-Raphson being only 1 of those.
<
One of the properties of N-R iterations is (given a reasonable starting point)
each iteration doubles the number of bits of accuracy. Here is a situation
where an N-R iteration is performed where the rules of IEEE math cause a
wrong answer to loose all accuracy rather than doubling the accuracy.
<
N-R iterations also have the property that they are self correcting (a small
error in the starting point is reduced quadratically with iteration).
<
Neither of which happen when w=0 and y=inf. I blame this on IEEE
specification not on any arithmetic properties of N-R iterations.
<
<
But this also serves to illustrate why this should be done in HW where all
the special checks can be performed at 0 cycle cost.
>
> The ARM instruction FRECPS (Floating-point reciprocal step) implements
> 2 - x * y, but handles the special case x == 0 && y == Inf (or vice
> versa) and returns 2 instead of NaN. Thus the complete step would be
> Inf * 2 = Inf, which is correct.
>
> /Marcus

Marcus wrote:
> One more thing that I discovered about a week ago:
>
> In order to support the IEEE 754 semantics of 1 / 0 == Inf, you actually
> *need* a special Newtom-Raphson step instruction that handles the
> special case where x = 0 and y = Inf.

The obvious way to handle this, if you have a little bit of HW help in
the form of an input number(s) classifier, is to take out all the
special cases and merge these results with what you get from the normal
algorithm.

>
> Consider the approximation of y = 1 / x.
>
> A single NR step then becomes: y' = y * (2 - x * y)
>
> When x = 0, the approximation would return Inf, and thus we would end up
> calculating x * y = 0 * Inf = NaN according to an IEEE 754 compliant
> implementation.
>
> The ARM instruction FRECPS (Floating-point reciprocal step) implements
> 2 - x * y, but handles the special case x == 0 && y == Inf (or vice
> versa) and returns 2 instead of NaN. Thus the complete step would be
> Inf * 2 = Inf, which is correct.

The alternative is as you show here, which needs HW help in the form of
that special reciprocal step op.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Approximate reciprocals

<8ed2b18a-4815-48c0-8055-cf91614fe0a4n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24815&group=comp.arch#24815

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:d48:b0:446:3a33:2dba with SMTP id 8-20020a0562140d4800b004463a332dbamr12785147qvr.78.1650397326996;
Tue, 19 Apr 2022 12:42:06 -0700 (PDT)
X-Received: by 2002:a05:6830:1d93:b0:605:42d1:d911 with SMTP id
y19-20020a0568301d9300b0060542d1d911mr6068897oti.158.1650397326763; Tue, 19
Apr 2022 12:42:06 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 19 Apr 2022 12:42:06 -0700 (PDT)
In-Reply-To: <t3lqpf$nea$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5d49:f811:81d8:6fed;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5d49:f811:81d8:6fed
References: <t1c154$j5t$1@dont-email.me> <t3k9lo$af7$1@dont-email.me> <t3lqpf$nea$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8ed2b18a-4815-48c0-8055-cf91614fe0a4n@googlegroups.com>
Subject: Re: Approximate reciprocals
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 19 Apr 2022 19:42:06 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 43

by: MitchAlsup - Tue, 19 Apr 2022 19:42 UTC

On Tuesday, April 19, 2022 at 3:10:26 AM UTC-5, Terje Mathisen wrote:
> Marcus wrote:
> > One more thing that I discovered about a week ago:
> >
> > In order to support the IEEE 754 semantics of 1 / 0 == Inf, you actually
> > *need* a special Newtom-Raphson step instruction that handles the
> > special case where x = 0 and y = Inf.
> The obvious way to handle this, if you have a little bit of HW help in
> the form of an input number(s) classifier, is to take out all the
> special cases and merge these results with what you get from the normal
> algorithm.
> >
> > Consider the approximation of y = 1 / x.
> >
> > A single NR step then becomes: y' = y * (2 - x * y)
<
Note: y ~= 1/x -- thus algebraically x×y ~= 1
Here: 0×Inf should be 1 not NaN.
In fact LIM[x->0]( x×~1/x ) -> ~1
<
Here, IEEE simply missed the boat.
> >
> > When x = 0, the approximation would return Inf, and thus we would end up
> > calculating x * y = 0 * Inf = NaN according to an IEEE 754 compliant
> > implementation.
> >
> > The ARM instruction FRECPS (Floating-point reciprocal step) implements
> > 2 - x * y, but handles the special case x == 0 && y == Inf (or vice
> > versa) and returns 2 instead of NaN. Thus the complete step would be
> > Inf * 2 = Inf, which is correct.
> The alternative is as you show here, which needs HW help in the form of
> that special reciprocal step op.
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: Approximate reciprocals

<t3n67c$b6o$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24816&group=comp.arch#24816

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Approximate reciprocals
Date: Tue, 19 Apr 2022 15:31:39 -0500
Organization: A noiseless patient Spider
Lines: 116
Message-ID: <t3n67c$b6o$1@dont-email.me>
References: <t1c154$j5t$1@dont-email.me> <t3k9lo$af7$1@dont-email.me>
<t3lqpf$nea$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 19 Apr 2022 20:31:40 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="9de2e4cdbe3f23543f0c3033a00a7a33";
logging-data="11480"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18zCcWU7TJG2kS11awQ1X9Z"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.8.0
Cancel-Lock: sha1:M43mAfIJhdEQyeE1mONtyN292lc=
In-Reply-To: <t3lqpf$nea$1@gioia.aioe.org>
Content-Language: en-US

by: BGB - Tue, 19 Apr 2022 20:31 UTC

On 4/19/2022 3:10 AM, Terje Mathisen wrote:
> Marcus wrote:
>> One more thing that I discovered about a week ago:
>>
>> In order to support the IEEE 754 semantics of 1 / 0 == Inf, you actually
>> *need* a special Newtom-Raphson step instruction that handles the
>> special case where x = 0 and y = Inf.
>
> The obvious way to handle this, if you have a little bit of HW help in
> the form of an input number(s) classifier, is to take out all the
> special cases and merge these results with what you get from the normal
> algorithm.
>

Yeah. For example, exponents of 000 and 7FF trigger special cases,
everything else can pass through as normal.

>>
>> Consider the approximation of y = 1 / x.
>>
>> A single NR step then becomes: y' = y * (2 - x * y)
>>
>> When x = 0, the approximation would return Inf, and thus we would end up
>> calculating x * y = 0 * Inf = NaN according to an IEEE 754 compliant
>> implementation.
>>
>> The ARM instruction FRECPS (Floating-point reciprocal step) implements
>> 2 - x * y, but handles the special case x == 0 && y == Inf (or vice
>> versa) and returns 2 instead of NaN. Thus the complete step would be
>> Inf * 2 = Inf, which is correct.
>
> The alternative is as you show here, which needs HW help in the form of
> that special reciprocal step op.
>

Though, absent having something special in the hardware, such a helper
op would offer little real advantage over using normal FPU instructions
(FMUL/FADD/FMAC/...).

Since the operation is effectively latency-bound rather than throughput
bound, and normal FPU instructions can (presumably) operate the FPU at
their full speed (in terms of latency).

For example, I recently went and re-implemented an FDIV instruction,
based on just sort of plumbing the existing units together in a way that
sort of does a crude approximation of N-R ... (With no actual stepping
or flow-control on the values), and as-is has a relatively slow
convergence. The FPU then just sorta stalls for a while and waits for
the values to converge (need to shift the value right a few bits else it
just sorta flies off into space).

In this case, I took the "hopefully cheap" route rather than an
"actually good" route.

I initially went with "wait until difference drops to zero", but then
switched over to "stall for N cycles", because:
Easier to reason about costs;
Marginally cheaper;
Will not deadlock the CPU if a pathological edge-case happens.
For now, using 240 for FDIV, and 252 for FSQRT.

So, while it can technically divide numbers, and perform square root,
as-is its performance "leaves something to be desired". Current testing
shows that it is also somewhat slower than the current software options.

The power of software in this case being that it allows "coordinating"
the individual steps, and allows the flexibility of doing N-R properly
only a single FMUL and FADD unit.

Also, in software it is possible to have alternate versions depending on
what is needed (so, for example, for much of Quake's renderer I can use
a chopped down version with only 2 N-R steps, which is "mostly good
enough", and around 1/3 as many clock-cycles as the FDIV op).

Theoretically, better coordination could be possible by fiddling with
pipeline-delays such that the values move through the pipelines in a
sensible way. Though, initial attempts at this made the stability and
convergence "actually worse" than just going at "full-speed".

Initial guess for FDIV was basically:
{ Rs[63]^Rt[63], Rs[62:40]-Rt[62:40]+23'h3FF000, 40'h00 }
With the low-order bits cut off because:
They are basically already wrong anyways;
They add cost and latency.

Couldn't find a better approximate that would not depend on large lookup
tables, and attempts to break it down into small (6-bit) lookups were
unsuccessful.

Though, a single 6-bit lookup for the high-bits of the mantissa would be
at least slightly more accurate than a bare subtract.
{ Rs[63]^Rt[63], Rs[62:40] +
{ 11'h3FE-Rt[62:52], RtApxRcpFra },
40'h00 }
Where, one uses Rt[51:46] to lookup an approximation of the reciprocal.

....

At present, FDIV via this method takes around 240 clock cycles to
converge...

Also have noted that the low-order bits of the result are basically
garbage in this implementation.

It is also basically a boat anchor if one tries to use it in Quake or
similar.

....

Re: Approximate reciprocals

<b7766fa1-f75e-4486-b477-6a2ec44efc75n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24817&group=comp.arch#24817

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:5947:0:b0:446:4c6a:32d0 with SMTP id eo7-20020ad45947000000b004464c6a32d0mr11388599qvb.131.1650405927106;
Tue, 19 Apr 2022 15:05:27 -0700 (PDT)
X-Received: by 2002:a05:6870:f2a9:b0:e5:8106:4486 with SMTP id
u41-20020a056870f2a900b000e581064486mr313935oap.109.1650405926741; Tue, 19
Apr 2022 15:05:26 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 19 Apr 2022 15:05:26 -0700 (PDT)
In-Reply-To: <8ed2b18a-4815-48c0-8055-cf91614fe0a4n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:5cdc:b922:6737:3cc7;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:5cdc:b922:6737:3cc7
References: <t1c154$j5t$1@dont-email.me> <t3k9lo$af7$1@dont-email.me>
<t3lqpf$nea$1@gioia.aioe.org> <8ed2b18a-4815-48c0-8055-cf91614fe0a4n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b7766fa1-f75e-4486-b477-6a2ec44efc75n@googlegroups.com>
Subject: Re: Approximate reciprocals
From: already5...@yahoo.com (Michael S)
Injection-Date: Tue, 19 Apr 2022 22:05:27 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 26

by: Michael S - Tue, 19 Apr 2022 22:05 UTC

On Tuesday, April 19, 2022 at 10:42:08 PM UTC+3, MitchAlsup wrote:
> On Tuesday, April 19, 2022 at 3:10:26 AM UTC-5, Terje Mathisen wrote:
> > Marcus wrote:
> > > One more thing that I discovered about a week ago:
> > >
> > > In order to support the IEEE 754 semantics of 1 / 0 == Inf, you actually
> > > *need* a special Newtom-Raphson step instruction that handles the
> > > special case where x = 0 and y = Inf.
> > The obvious way to handle this, if you have a little bit of HW help in
> > the form of an input number(s) classifier, is to take out all the
> > special cases and merge these results with what you get from the normal
> > algorithm.
> > >
> > > Consider the approximation of y = 1 / x.
> > >
> > > A single NR step then becomes: y' = y * (2 - x * y)
> <
> Note: y ~= 1/x -- thus algebraically x×y ~= 1
> Here: 0×Inf should be 1 not NaN.
> In fact LIM[x->0]( x×~1/x ) -> ~1

0*Inf = NaN for the same reason 0/0=Nan and Inf/Inf=Nan.
In all 3 cases the results are "unknown" and FP numbers have no better encoding for "unknown" than NaN.

Re: Approximate reciprocals

<3ae4ba74-1397-436d-a43b-fa854fcce949n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24818&group=comp.arch#24818

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:318d:b0:69e:6f03:7950 with SMTP id bi13-20020a05620a318d00b0069e6f037950mr10957828qkb.493.1650407629260;
Tue, 19 Apr 2022 15:33:49 -0700 (PDT)
X-Received: by 2002:a9d:67c6:0:b0:605:4538:74c4 with SMTP id
c6-20020a9d67c6000000b00605453874c4mr6044957otn.304.1650407628923; Tue, 19
Apr 2022 15:33:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 19 Apr 2022 15:33:48 -0700 (PDT)
In-Reply-To: <b7766fa1-f75e-4486-b477-6a2ec44efc75n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5d49:f811:81d8:6fed;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5d49:f811:81d8:6fed
References: <t1c154$j5t$1@dont-email.me> <t3k9lo$af7$1@dont-email.me>
<t3lqpf$nea$1@gioia.aioe.org> <8ed2b18a-4815-48c0-8055-cf91614fe0a4n@googlegroups.com>
<b7766fa1-f75e-4486-b477-6a2ec44efc75n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3ae4ba74-1397-436d-a43b-fa854fcce949n@googlegroups.com>
Subject: Re: Approximate reciprocals
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 19 Apr 2022 22:33:49 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 32

by: MitchAlsup - Tue, 19 Apr 2022 22:33 UTC

On Tuesday, April 19, 2022 at 5:05:28 PM UTC-5, Michael S wrote:
> On Tuesday, April 19, 2022 at 10:42:08 PM UTC+3, MitchAlsup wrote:
> > On Tuesday, April 19, 2022 at 3:10:26 AM UTC-5, Terje Mathisen wrote:
> > > Marcus wrote:
> > > > One more thing that I discovered about a week ago:
> > > >
> > > > In order to support the IEEE 754 semantics of 1 / 0 == Inf, you actually
> > > > *need* a special Newtom-Raphson step instruction that handles the
> > > > special case where x = 0 and y = Inf.
> > > The obvious way to handle this, if you have a little bit of HW help in
> > > the form of an input number(s) classifier, is to take out all the
> > > special cases and merge these results with what you get from the normal
> > > algorithm.
> > > >
> > > > Consider the approximation of y = 1 / x.
> > > >
> > > > A single NR step then becomes: y' = y * (2 - x * y)
> > <
> > Note: y ~= 1/x -- thus algebraically x×y ~= 1
> > Here: 0×Inf should be 1 not NaN.
> > In fact LIM[x->0]( x×~1/x ) -> ~1
> 0*Inf = NaN for the same reason 0/0=Nan and Inf/Inf=Nan.
> In all 3 cases the results are "unknown" and FP numbers have no better encoding for "unknown" than NaN.
<
They may be unknown "in general" but in this particular N-R iteration the LIM is known to be 1.000000000000000000000000000.
<

Re: Approximate reciprocals

<821e3a4c-ae5a-4880-8ade-7ef3d21b6b56n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24820&group=comp.arch#24820

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5a84:0:b0:2f1:f687:df63 with SMTP id c4-20020ac85a84000000b002f1f687df63mr11007771qtc.307.1650420219705;
Tue, 19 Apr 2022 19:03:39 -0700 (PDT)
X-Received: by 2002:a05:6871:54c:b0:e5:8e95:d081 with SMTP id
t12-20020a056871054c00b000e58e95d081mr606435oal.103.1650420219474; Tue, 19
Apr 2022 19:03:39 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 19 Apr 2022 19:03:39 -0700 (PDT)
In-Reply-To: <3ae4ba74-1397-436d-a43b-fa854fcce949n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:5cdc:b922:6737:3cc7;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:5cdc:b922:6737:3cc7
References: <t1c154$j5t$1@dont-email.me> <t3k9lo$af7$1@dont-email.me>
<t3lqpf$nea$1@gioia.aioe.org> <8ed2b18a-4815-48c0-8055-cf91614fe0a4n@googlegroups.com>
<b7766fa1-f75e-4486-b477-6a2ec44efc75n@googlegroups.com> <3ae4ba74-1397-436d-a43b-fa854fcce949n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <821e3a4c-ae5a-4880-8ade-7ef3d21b6b56n@googlegroups.com>
Subject: Re: Approximate reciprocals
From: already5...@yahoo.com (Michael S)
Injection-Date: Wed, 20 Apr 2022 02:03:39 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 48

by: Michael S - Wed, 20 Apr 2022 02:03 UTC

On Wednesday, April 20, 2022 at 1:33:50 AM UTC+3, MitchAlsup wrote:
> On Tuesday, April 19, 2022 at 5:05:28 PM UTC-5, Michael S wrote:
> > On Tuesday, April 19, 2022 at 10:42:08 PM UTC+3, MitchAlsup wrote:
> > > On Tuesday, April 19, 2022 at 3:10:26 AM UTC-5, Terje Mathisen wrote:
> > > > Marcus wrote:
> > > > > One more thing that I discovered about a week ago:
> > > > >
> > > > > In order to support the IEEE 754 semantics of 1 / 0 == Inf, you actually
> > > > > *need* a special Newtom-Raphson step instruction that handles the
> > > > > special case where x = 0 and y = Inf.
> > > > The obvious way to handle this, if you have a little bit of HW help in
> > > > the form of an input number(s) classifier, is to take out all the
> > > > special cases and merge these results with what you get from the normal
> > > > algorithm.
> > > > >
> > > > > Consider the approximation of y = 1 / x.
> > > > >
> > > > > A single NR step then becomes: y' = y * (2 - x * y)
> > > <
> > > Note: y ~= 1/x -- thus algebraically x×y ~= 1
> > > Here: 0×Inf should be 1 not NaN.
> > > In fact LIM[x->0]( x×~1/x ) -> ~1
> > 0*Inf = NaN for the same reason 0/0=Nan and Inf/Inf=Nan.
> > In all 3 cases the results are "unknown" and FP numbers have no better encoding for "unknown" than NaN.
> <
> They may be unknown "in general" but in this particular N-R iteration the LIM is known to be 1.000000000000000000000000000.
> <

The rules are "in general". They are not created to satisfy needs of one particular not extremely important algorithm.
BTW, after thinking a bit about it I feel that existing inf rules are rather "too practical", scarifying mathematical rigor
in favor of non-proven practicality.
In particular, being today (tonight) in rigorous mood, I'd prefer (+inf - Finite_Positive_Number) to be NaN rather than +inf.
Because we don't know if result is bigger than MAX_DBL or, may be, smaller.
The same goes for (inf / Finite_Number_Above_One).
Etc...

Re: Approximate reciprocals

<1c169747-51f6-4508-bc19-fac4f31d134en@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24822&group=comp.arch#24822

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:13fc:b0:69e:90a3:e1bc with SMTP id h28-20020a05620a13fc00b0069e90a3e1bcmr8078100qkl.645.1650421051545;
Tue, 19 Apr 2022 19:17:31 -0700 (PDT)
X-Received: by 2002:a05:6830:104f:b0:605:4e00:8ac6 with SMTP id
b15-20020a056830104f00b006054e008ac6mr4274606otp.381.1650421051247; Tue, 19
Apr 2022 19:17:31 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 19 Apr 2022 19:17:31 -0700 (PDT)
In-Reply-To: <821e3a4c-ae5a-4880-8ade-7ef3d21b6b56n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5d49:f811:81d8:6fed;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5d49:f811:81d8:6fed
References: <t1c154$j5t$1@dont-email.me> <t3k9lo$af7$1@dont-email.me>
<t3lqpf$nea$1@gioia.aioe.org> <8ed2b18a-4815-48c0-8055-cf91614fe0a4n@googlegroups.com>
<b7766fa1-f75e-4486-b477-6a2ec44efc75n@googlegroups.com> <3ae4ba74-1397-436d-a43b-fa854fcce949n@googlegroups.com>
<821e3a4c-ae5a-4880-8ade-7ef3d21b6b56n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1c169747-51f6-4508-bc19-fac4f31d134en@googlegroups.com>
Subject: Re: Approximate reciprocals
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 20 Apr 2022 02:17:31 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 51

by: MitchAlsup - Wed, 20 Apr 2022 02:17 UTC

On Tuesday, April 19, 2022 at 9:03:41 PM UTC-5, Michael S wrote:
> On Wednesday, April 20, 2022 at 1:33:50 AM UTC+3, MitchAlsup wrote:
> > On Tuesday, April 19, 2022 at 5:05:28 PM UTC-5, Michael S wrote:
> > > On Tuesday, April 19, 2022 at 10:42:08 PM UTC+3, MitchAlsup wrote:
> > > > On Tuesday, April 19, 2022 at 3:10:26 AM UTC-5, Terje Mathisen wrote:
> > > > > Marcus wrote:
> > > > > > One more thing that I discovered about a week ago:
> > > > > >
> > > > > > In order to support the IEEE 754 semantics of 1 / 0 == Inf, you actually
> > > > > > *need* a special Newtom-Raphson step instruction that handles the
> > > > > > special case where x = 0 and y = Inf.
> > > > > The obvious way to handle this, if you have a little bit of HW help in
> > > > > the form of an input number(s) classifier, is to take out all the
> > > > > special cases and merge these results with what you get from the normal
> > > > > algorithm.
> > > > > >
> > > > > > Consider the approximation of y = 1 / x.
> > > > > >
> > > > > > A single NR step then becomes: y' = y * (2 - x * y)
> > > > <
> > > > Note: y ~= 1/x -- thus algebraically x×y ~= 1
> > > > Here: 0×Inf should be 1 not NaN.
> > > > In fact LIM[x->0]( x×~1/x ) -> ~1
> > > 0*Inf = NaN for the same reason 0/0=Nan and Inf/Inf=Nan.
> > > In all 3 cases the results are "unknown" and FP numbers have no better encoding for "unknown" than NaN.
> > <
> > They may be unknown "in general" but in this particular N-R iteration the LIM is known to be 1.000000000000000000000000000.
> > <
> The rules are "in general". They are not created to satisfy needs of one particular not extremely important algorithm.
> BTW, after thinking a bit about it I feel that existing inf rules are rather "too practical", scarifying mathematical rigor
> in favor of non-proven practicality.
<
And that is exactly why floating point is not "like" real numbers.
<
Perhaps posits are our only real hope.
<
> In particular, being today (tonight) in rigorous mood, I'd prefer (+inf - Finite_Positive_Number) to be NaN rather than +inf.
> Because we don't know if result is bigger than MAX_DBL or, may be, smaller.
> The same goes for (inf / Finite_Number_Above_One).
> Etc...

Re: Approximate reciprocals

<t3oen8$1qkb$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24823&group=comp.arch#24823

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!EhtdJS5E9ITDZpJm3Uerlg.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Approximate reciprocals
Date: Wed, 20 Apr 2022 10:02:49 +0200
Organization: Aioe.org NNTP Server
Message-ID: <t3oen8$1qkb$1@gioia.aioe.org>
References: <t1c154$j5t$1@dont-email.me> <t3k9lo$af7$1@dont-email.me>
<t3lqpf$nea$1@gioia.aioe.org>
<8ed2b18a-4815-48c0-8055-cf91614fe0a4n@googlegroups.com>
<b7766fa1-f75e-4486-b477-6a2ec44efc75n@googlegroups.com>
<3ae4ba74-1397-436d-a43b-fa854fcce949n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="60043"; posting-host="EhtdJS5E9ITDZpJm3Uerlg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.11.1
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Wed, 20 Apr 2022 08:02 UTC

MitchAlsup wrote:
> On Tuesday, April 19, 2022 at 5:05:28 PM UTC-5, Michael S wrote:
>> On Tuesday, April 19, 2022 at 10:42:08 PM UTC+3, MitchAlsup wrote:
>>> On Tuesday, April 19, 2022 at 3:10:26 AM UTC-5, Terje Mathisen wrote:
>>>> Marcus wrote:
>>>>> One more thing that I discovered about a week ago:
>>>>>
>>>>> In order to support the IEEE 754 semantics of 1 / 0 == Inf, you actually
>>>>> *need* a special Newtom-Raphson step instruction that handles the
>>>>> special case where x = 0 and y = Inf.
>>>> The obvious way to handle this, if you have a little bit of HW help in
>>>> the form of an input number(s) classifier, is to take out all the
>>>> special cases and merge these results with what you get from the normal
>>>> algorithm.
>>>>>
>>>>> Consider the approximation of y = 1 / x.
>>>>>
>>>>> A single NR step then becomes: y' = y * (2 - x * y)
>>> <
>>> Note: y ~= 1/x -- thus algebraically x×y ~= 1
>>> Here: 0×Inf should be 1 not NaN.
>>> In fact LIM[x->0]( x×~1/x ) -> ~1
>> 0*Inf = NaN for the same reason 0/0=Nan and Inf/Inf=Nan.
>> In all 3 cases the results are "unknown" and FP numbers have no better encoding for "unknown" than NaN.
> <
> They may be unknown "in general" but in this particular N-R iteration the LIM is known to be 1.000000000000000000000000000.

Which is why you cannot use normal (individual/standard) FP ops to do
this, you need something higher level which knows about the special
rules in effect right here. I.e. like that ARM reciprocal step opcode.

Alternatively, and the solution I prefer, you handle all special inputs
in parallel with the normal algorithm and merge at the end.

I.e. if the divisor is 0 then the 754 standard specifies exactly what
the output should be for all the various dividend/divisor combinations:

non-zero/zero -> inf with sign according to normal (XOR) rules.

zero/zero -> NaN
inf/zero -> NaN
nan/zero -> NaN

You have similar rows for (x)/inf and (x)/NaN, i.e. this is a simple
square lookup table based on the types of the two operands where only
the number/number entry specifies that the normal NR result should be
propagated.

You do of course have to detect overflow/underflow as part of that
normal algorithm!

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Approximate reciprocals

<t3of9k$40m$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24824&group=comp.arch#24824

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!EhtdJS5E9ITDZpJm3Uerlg.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Approximate reciprocals
Date: Wed, 20 Apr 2022 10:12:38 +0200
Organization: Aioe.org NNTP Server
Message-ID: <t3of9k$40m$1@gioia.aioe.org>
References: <t1c154$j5t$1@dont-email.me> <t3k9lo$af7$1@dont-email.me>
<t3lqpf$nea$1@gioia.aioe.org>
<8ed2b18a-4815-48c0-8055-cf91614fe0a4n@googlegroups.com>
<b7766fa1-f75e-4486-b477-6a2ec44efc75n@googlegroups.com>
<3ae4ba74-1397-436d-a43b-fa854fcce949n@googlegroups.com>
<821e3a4c-ae5a-4880-8ade-7ef3d21b6b56n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="4118"; posting-host="EhtdJS5E9ITDZpJm3Uerlg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.11.1
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Wed, 20 Apr 2022 08:12 UTC

Michael S wrote:
> On Wednesday, April 20, 2022 at 1:33:50 AM UTC+3, MitchAlsup wrote:
>> On Tuesday, April 19, 2022 at 5:05:28 PM UTC-5, Michael S wrote:
>>> On Tuesday, April 19, 2022 at 10:42:08 PM UTC+3, MitchAlsup wrote:
>>>> On Tuesday, April 19, 2022 at 3:10:26 AM UTC-5, Terje Mathisen wrote:
>>>>> Marcus wrote:
>>>>>> One more thing that I discovered about a week ago:
>>>>>>
>>>>>> In order to support the IEEE 754 semantics of 1 / 0 == Inf, you actually
>>>>>> *need* a special Newtom-Raphson step instruction that handles the
>>>>>> special case where x = 0 and y = Inf.
>>>>> The obvious way to handle this, if you have a little bit of HW help in
>>>>> the form of an input number(s) classifier, is to take out all the
>>>>> special cases and merge these results with what you get from the normal
>>>>> algorithm.
>>>>>>
>>>>>> Consider the approximation of y = 1 / x.
>>>>>>
>>>>>> A single NR step then becomes: y' = y * (2 - x * y)
>>>> <
>>>> Note: y ~= 1/x -- thus algebraically x×y ~= 1
>>>> Here: 0×Inf should be 1 not NaN.
>>>> In fact LIM[x->0]( x×~1/x ) -> ~1
>>> 0*Inf = NaN for the same reason 0/0=Nan and Inf/Inf=Nan.
>>> In all 3 cases the results are "unknown" and FP numbers have no better encoding for "unknown" than NaN.
>> <
>> They may be unknown "in general" but in this particular N-R iteration the LIM is known to be 1.000000000000000000000000000.
>> <
>
> The rules are "in general". They are not created to satisfy needs of one particular not extremely important algorithm.
> BTW, after thinking a bit about it I feel that existing inf rules are rather "too practical", scarifying mathematical rigor
> in favor of non-proven practicality.
> In particular, being today (tonight) in rigorous mood, I'd prefer (+inf - Finite_Positive_Number) to be NaN rather than +inf.
> Because we don't know if result is bigger than MAX_DBL or, may be, smaller.
> The same goes for (inf / Finite_Number_Above_One).
> Etc...

Sorry, but this is simply wrong:

What you suggest here is to abandon knowledge we do have in favor of
making NaN even more sticky than it currently is.

If you have an algorithm which can spuriously overflow in the middle,
but which with a different evaluation order, or higher intermediate
precision/exponent range, would be OK, then giving Inf as the result
still provides important information: This calculation overflowed
somewhere in the middle.

Always returning NaN instead means that you have lost this info, now the
NaN could have been due to many different errors, including 0/0 etc.

All the 754 ops have been defined to return the best possible answer,
when taken in isolation.

The original (1978) idea was to handle exponent overflow (or underflow)
via a trap handler that would save the current exponent, bias it enough
to bring the current operation into legal range, and then inspect and
correct the final result based on any such bias operations performed
during the processing.

Afaik, nobody is seriously doing this for production code.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Approximate reciprocals

<b3d11498-5ea4-461d-af69-ff1a804d98fcn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24825&group=comp.arch#24825

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:5947:0:b0:446:4c6a:32d0 with SMTP id eo7-20020ad45947000000b004464c6a32d0mr12698843qvb.131.1650447255731;
Wed, 20 Apr 2022 02:34:15 -0700 (PDT)
X-Received: by 2002:a05:6871:54c:b0:e5:8e95:d081 with SMTP id
t12-20020a056871054c00b000e58e95d081mr1078897oal.103.1650447255478; Wed, 20
Apr 2022 02:34:15 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 20 Apr 2022 02:34:15 -0700 (PDT)
In-Reply-To: <t3oen8$1qkb$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2a0d:6fc2:55b0:ca00:5cdc:b922:6737:3cc7;
posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 2a0d:6fc2:55b0:ca00:5cdc:b922:6737:3cc7
References: <t1c154$j5t$1@dont-email.me> <t3k9lo$af7$1@dont-email.me>
<t3lqpf$nea$1@gioia.aioe.org> <8ed2b18a-4815-48c0-8055-cf91614fe0a4n@googlegroups.com>
<b7766fa1-f75e-4486-b477-6a2ec44efc75n@googlegroups.com> <3ae4ba74-1397-436d-a43b-fa854fcce949n@googlegroups.com>
<t3oen8$1qkb$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b3d11498-5ea4-461d-af69-ff1a804d98fcn@googlegroups.com>
Subject: Re: Approximate reciprocals
From: already5...@yahoo.com (Michael S)
Injection-Date: Wed, 20 Apr 2022 09:34:15 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 70

by: Michael S - Wed, 20 Apr 2022 09:34 UTC

On Wednesday, April 20, 2022 at 11:02:51 AM UTC+3, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Tuesday, April 19, 2022 at 5:05:28 PM UTC-5, Michael S wrote:
> >> On Tuesday, April 19, 2022 at 10:42:08 PM UTC+3, MitchAlsup wrote:
> >>> On Tuesday, April 19, 2022 at 3:10:26 AM UTC-5, Terje Mathisen wrote:
> >>>> Marcus wrote:
> >>>>> One more thing that I discovered about a week ago:
> >>>>>
> >>>>> In order to support the IEEE 754 semantics of 1 / 0 == Inf, you actually
> >>>>> *need* a special Newtom-Raphson step instruction that handles the
> >>>>> special case where x = 0 and y = Inf.
> >>>> The obvious way to handle this, if you have a little bit of HW help in
> >>>> the form of an input number(s) classifier, is to take out all the
> >>>> special cases and merge these results with what you get from the normal
> >>>> algorithm.
> >>>>>
> >>>>> Consider the approximation of y = 1 / x.
> >>>>>
> >>>>> A single NR step then becomes: y' = y * (2 - x * y)
> >>> <
> >>> Note: y ~= 1/x -- thus algebraically x×y ~= 1
> >>> Here: 0×Inf should be 1 not NaN.
> >>> In fact LIM[x->0]( x×~1/x ) -> ~1
> >> 0*Inf = NaN for the same reason 0/0=Nan and Inf/Inf=Nan.
> >> In all 3 cases the results are "unknown" and FP numbers have no better encoding for "unknown" than NaN.
> > <
> > They may be unknown "in general" but in this particular N-R iteration the LIM is known to be 1.000000000000000000000000000.
> Which is why you cannot use normal (individual/standard) FP ops to do
> this, you need something higher level which knows about the special
> rules in effect right here. I.e. like that ARM reciprocal step opcode.
>
> Alternatively, and the solution I prefer, you handle all special inputs
> in parallel with the normal algorithm and merge at the end.
>
> I.e. if the divisor is 0 then the 754 standard specifies exactly what
> the output should be for all the various dividend/divisor combinations:
>
> non-zero/zero -> inf with sign according to normal (XOR) rules.
>
> zero/zero -> NaN
> inf/zero -> NaN

Are you sure about it?
I didn't look in the IEEE docs, but both my intuition and gcc compiler disagree.
Both say:
inf/zero -> inf with sign according to normal (XOR) rules.

> nan/zero -> NaN
>
> You have similar rows for (x)/inf and (x)/NaN, i.e. this is a simple
> square lookup table based on the types of the two operands where only
> the number/number entry specifies that the normal NR result should be
> propagated.
>
> You do of course have to detect overflow/underflow as part of that
> normal algorithm!
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Beeping is cute, if you are in the office ;) -- Alan Cox

devel / comp.arch / Re: Approximate reciprocals

Subject	Author
Approximate reciprocals	Marcus
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	robf...@gmail.com
Re: Approximate reciprocals	Marcus
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	Marcus
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Quadibloc
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Marcus
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	BGB
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	Quadibloc
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	James Van Buskirk
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	Quadibloc
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Thomas Koenig
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	robf...@gmail.com
Useful floating point instructions (was: Approximate reciprocals)	Thomas Koenig
Re: Useful floating point instructions	Terje Mathisen
Re: Useful floating point instructions	Stephen Fuld
Re: Useful floating point instructions	MitchAlsup
Re: Useful floating point instructions	Stephen Fuld
Re: Useful floating point instructions	MitchAlsup
Re: Useful floating point instructions	Michael S
Re: Useful floating point instructions	Stephen Fuld
Re: Useful floating point instructions	Terje Mathisen
Re: Useful floating point instructions	Terje Mathisen
Re: Useful floating point instructions	Stefan Monnier
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	George Neuner
Re: Approximate reciprocals	Anton Ertl
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Anton Ertl
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	George Neuner
Re: Approximate reciprocals	Anton Ertl
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	John Dallman
Re: Approximate reciprocals	MitchAlsup
Re: Approximate reciprocals	George Neuner
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	EricP
Re: Approximate reciprocals	Anton Ertl
Re: Approximate reciprocals	Anton Ertl
Re: Approximate reciprocals	John Dallman
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Michael S
Re: Approximate reciprocals	Terje Mathisen
Re: Approximate reciprocals	Elijah Stone
Re: Approximate reciprocals	Marcus
Re: Approximate reciprocals	Marcus