novaBBS - comp.arch - Another code compression idea

Another code compression idea

<2021Dec26.185955@mips.complang.tuwien.ac.at>

https://www.novabbs.com/devel/article-flat.php?id=22463&group=comp.arch#22463

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Another code compression idea
Date: Sun, 26 Dec 2021 17:59:55 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 72
Message-ID: <2021Dec26.185955@mips.complang.tuwien.ac.at>
Injection-Info: reader02.eternal-september.org; posting-host="44bac8bbaf3270f98959b778bea13e5e";
logging-data="7435"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+hO5WZrdwVPkHl2yI3uI90"
Cancel-Lock: sha1:uzGAkzxsMC8YBhvopnOwSFu03EE=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Sun, 26 Dec 2021 17:59 UTC

The RISC-V C (compressed) extension (and AFAIK Thumb and MIPS-16) work
by turning three-address instructions into two-address instructions,
reducing literal fields, and/or reducing the number of addressable
registers.

Here's another idea: How about compressing by using the destination
register of the previous instruction as one of the sources?

+ Smaller code.

+ You directly know that you can forward from the previous instruction
to this one. And extending from this, you can directly encode stuff
for an ILDP architecture [kim&smith02].

- The main disadvantage I see is that the meaning of the instructions
is not encoded just in itself. This is a problem when jumping to
such an instruction (just don't allow that, i.e., have an illegal
instruction exception if you jump to such an instruction). A bigger
problem is when there an interrupt or exception returns to such an
instruction; a way to deal with that may be to allow this encoding
only for instructions that cannot cause exceptions, and to delay
interrupts until the next self-contained instruction.

@InProceedings{kim&smith02,
author = {Ho-Seop Kim and James E. Smith},
title = {An Instruction Set and Microarchitecture for
Instruction Level Distributed Processing},
crossref = {isca02},
pages = {71--81},
url = {http://www.ece.wisc.edu/~hskim/papers/kimh_ildp.pdf},
annote = {This paper addresses the problems of wide
superscalars with communication across the chip and
the number of write ports in the register file. The
authors propose an architecture (ILDP) with
general-purpose registers and with accumulators
(with instructions only accessing one accumulator
(read and/or write) and one register (read or
write); for the accumulators their death is
specified explicitly in the instructions. The
microarchitecture builds \emph{strands} from
instructions working on an accumulator; a strand
starts with an instruction writing to an accumulator
without reading from it, continues with instructions
reading from (and possibly writing to) the
accumulator and ends with an instruction that kills
the accumulator. Strands are allocated to one out of
eight processing elements (PEs) dynamically (i.e.,
accumulators are renamed). A PE consists of
mainly one ALU data path (but also a copy of the
GPRs and an L1 cache). They evaluated this
architecture by translating Alpha binaries into it,
and comparing their architecture to a 4-wide or
8-wide Alpha implementation; their architecture has
a lower L1 cache latency, though. The performance of
ILDP in clock cycles is competetive, and one can
expect faster clocks for ILDP. The paper also
presents data for other stuff, e.g. general-purpose
register writes, which have to be promoted between
strands and which are relatively few.}
}

@Proceedings{isca02,
title = "$29^\textit{th}$ Annual International Symposium on Computer Architecture",
booktitle = "$29^\textit{th}$ Annual International Symposium on Computer Architecture",
year = "2002",
key = "ISCA 29",
}

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Another code compression idea

<8fe6da66-1359-4f80-b0dc-b8a6e0d138een@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22465&group=comp.arch#22465

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:2996:: with SMTP id r22mr10160518qkp.485.1640548854604;
Sun, 26 Dec 2021 12:00:54 -0800 (PST)
X-Received: by 2002:a05:6830:154d:: with SMTP id l13mr10342219otp.282.1640548854242;
Sun, 26 Dec 2021 12:00:54 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 26 Dec 2021 12:00:54 -0800 (PST)
In-Reply-To: <2021Dec26.185955@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:81ac:9808:f6a0:d948;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:81ac:9808:f6a0:d948
References: <2021Dec26.185955@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8fe6da66-1359-4f80-b0dc-b8a6e0d138een@googlegroups.com>
Subject: Re: Another code compression idea
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 26 Dec 2021 20:00:54 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 95

by: MitchAlsup - Sun, 26 Dec 2021 20:00 UTC

On Sunday, December 26, 2021 at 12:33:30 PM UTC-6, Anton Ertl wrote:
> The RISC-V C (compressed) extension (and AFAIK Thumb and MIPS-16) work
> by turning three-address instructions into two-address instructions,
> reducing literal fields, and/or reducing the number of addressable
> registers.
<
With no disregard to Anton:
<
I wish we stopped using this terminology. Is 3-address {2-operands and 1-
result}, or 3-operands with an implied result (accumulator)??
<
A few years ago, I started to use the terminology x-operand (and implying
1-result)
<
But I digress.
>
> Here's another idea: How about compressing by using the destination
> register of the previous instruction as one of the sources?
<
Works about 70% of the time, but only when code has not been scheduled.
>
> + Smaller code.
<
27-bits instead of 32-bits ? One still has a lot of pruning to do to get to 16-bits.
>
> + You directly know that you can forward from the previous instruction
> to this one. And extending from this, you can directly encode stuff
> for an ILDP architecture [kim&smith02].
<
Yes, you know, you also know that the pipeline will be stalled 30% of the
time (LDs) plus however often one has multicycle calculate instructions.
>
> - The main disadvantage I see is that the meaning of the instructions
> is not encoded just in itself. This is a problem when jumping to
> such an instruction (just don't allow that, i.e., have an illegal
> instruction exception if you jump to such an instruction).
<
How do you tell a-priori?
<
< A bigger
> problem is when there an interrupt or exception returns to such an
> instruction; a way to deal with that may be to allow this encoding
> only for instructions that cannot cause exceptions, and to delay
> interrupts until the next self-contained instruction.
<
There are other problems, too.
>
> @InProceedings{kim&smith02,
> author = {Ho-Seop Kim and James E. Smith},
> title = {An Instruction Set and Microarchitecture for
> Instruction Level Distributed Processing},
> crossref = {isca02},
> pages = {71--81},
> url = {http://www.ece.wisc.edu/~hskim/papers/kimh_ildp.pdf},
> annote = {This paper addresses the problems of wide
> superscalars with communication across the chip and
> the number of write ports in the register file. The
> authors propose an architecture (ILDP) with
> general-purpose registers and with accumulators
> (with instructions only accessing one accumulator
> (read and/or write) and one register (read or
> write); for the accumulators their death is
> specified explicitly in the instructions. The
> microarchitecture builds \emph{strands} from
> instructions working on an accumulator; a strand
> starts with an instruction writing to an accumulator
> without reading from it, continues with instructions
> reading from (and possibly writing to) the
> accumulator and ends with an instruction that kills
> the accumulator. Strands are allocated to one out of
> eight processing elements (PEs) dynamically (i.e.,
> accumulators are renamed). A PE consists of
> mainly one ALU data path (but also a copy of the
> GPRs and an L1 cache). They evaluated this
> architecture by translating Alpha binaries into it,
> and comparing their architecture to a 4-wide or
> 8-wide Alpha implementation; their architecture has
> a lower L1 cache latency, though. The performance of
> ILDP in clock cycles is competetive, and one can
> expect faster clocks for ILDP. The paper also
> presents data for other stuff, e.g. general-purpose
> register writes, which have to be promoted between
> strands and which are relatively few.}
> }
>
> @Proceedings{isca02,
> title = "$29^\textit{th}$ Annual International Symposium on Computer Architecture",
> booktitle = "$29^\textit{th}$ Annual International Symposium on Computer Architecture",
> year = "2002",
> key = "ISCA 29",
> }
>
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: Another code compression idea

<0001HW.27791403009DAC15700006D5D38F@news.individual.net>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22469&group=comp.arch#22469

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: findlayb...@blueyonder.co.uk (Bill Findlay)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Sun, 26 Dec 2021 21:20:03 +0000
Lines: 19
Message-ID: <0001HW.27791403009DAC15700006D5D38F@news.individual.net>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at>
Reply-To: findlaybill@blueyonder.co.uk
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Trace: individual.net QX3FiEIHS5McGqivabf0lg/crAw9rdf44LLz2W5VmniNjM7Y4m
X-Orig-Path: not-for-mail
Cancel-Lock: sha1:pQLvW0r50QvLh0U/mZUggcBlCVk=
User-Agent: Hogwasher/5.24

by: Bill Findlay - Sun, 26 Dec 2021 21:20 UTC

On 26 Dec 2021, Anton Ertl wrote
(in article<2021Dec26.185955@mips.complang.tuwien.ac.at>):

> The RISC-V C (compressed) extension (and AFAIK Thumb and MIPS-16) work
> by turning three-address instructions into two-address instructions,
> reducing literal fields, and/or reducing the number of addressable
> registers.
>
> Here's another idea: How about compressing by using the destination
> register of the previous instruction as one of the sources?

The NCR Century did something similar; the *destination* of a 1-address
instruction was taken to be the destination of the most recently
executed 2-address instruction.

--
Bill Findlay

Re: Another code compression idea

<2021Dec26.235511@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22471&group=comp.arch#22471

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Sun, 26 Dec 2021 22:55:11 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 55
Message-ID: <2021Dec26.235511@mips.complang.tuwien.ac.at>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at> <8fe6da66-1359-4f80-b0dc-b8a6e0d138een@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="49bb987570c8ec7ca42a64081be61d96";
logging-data="8246"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18kDzsPmAM+j/OXo/zeE+fF"
Cancel-Lock: sha1:FRanwRaPaGwibLa5+ZaCWI2nAsQ=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Sun, 26 Dec 2021 22:55 UTC

MitchAlsup <MitchAlsup@aol.com> writes:
>On Sunday, December 26, 2021 at 12:33:30 PM UTC-6, Anton Ertl wrote:
>> Here's another idea: How about compressing by using the destination
>> register of the previous instruction as one of the sources?
><
>Works about 70% of the time, but only when code has not been scheduled.

Yes, you would not use compiler instruction scheduling if you want to
make use of this compression idea.

>> + Smaller code.
><
>27-bits instead of 32-bits ? One still has a lot of pruning to do to get to 16-bits.

Sure, like all the other ways other compressed instruction subsets
use: shorter literals, only a subset of the operations, etc.
Basically, we would do something like replace src1=dest (in existing
compressed instruction subsets) with src1=olddest.

>> + You directly know that you can forward from the previous instruction
>> to this one. And extending from this, you can directly encode stuff
>> for an ILDP architecture [kim&smith02].
><
>Yes, you know, you also know that the pipeline will be stalled 30% of the
>time (LDs) plus however often one has multicycle calculate instructions.

For compiler-scheduled code, this compression simply will not work
much of the time. But for OoO, this kind of encoding might be more
useful than the src1=dest encoding.

>> - The main disadvantage I see is that the meaning of the instructions
>> is not encoded just in itself. This is a problem when jumping to
>> such an instruction (just don't allow that, i.e., have an illegal
>> instruction exception if you jump to such an instruction).
><
>How do you tell a-priori?

The hardware does not need to tell a-priori. It has a "last
destination" register in the decoder, that is set to invalid on
control-flow, and when it encounters an instruction that refers to the
last destination, when that is invalid, it produces an illegal
instruction exception.

The assembler sees if there is a label before the instruction. If
there is, the instruction is encoded not to refer to the last
destination.

>There are other problems, too.

Please elaborate.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Another code compression idea

<sqatnm$d4k$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22472&group=comp.arch#22472

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Sun, 26 Dec 2021 15:26:46 -0800
Organization: A noiseless patient Spider
Lines: 73
Message-ID: <sqatnm$d4k$1@dont-email.me>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 26 Dec 2021 23:26:46 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="adc697d50b56aaf743b4a8fa705f79dd";
logging-data="13460"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19u4rdG6rFchHCDpwTffBFj"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:ZuIDjw277jvaKCn9bSkpbxzCVDw=
In-Reply-To: <2021Dec26.185955@mips.complang.tuwien.ac.at>
Content-Language: en-US

by: Ivan Godard - Sun, 26 Dec 2021 23:26 UTC

On 12/26/2021 9:59 AM, Anton Ertl wrote:
> The RISC-V C (compressed) extension (and AFAIK Thumb and MIPS-16) work
> by turning three-address instructions into two-address instructions,
> reducing literal fields, and/or reducing the number of addressable
> registers.
>
> Here's another idea: How about compressing by using the destination
> register of the previous instruction as one of the sources?
>
> + Smaller code.
>
> + You directly know that you can forward from the previous instruction
> to this one. And extending from this, you can directly encode stuff
> for an ILDP architecture [kim&smith02].
>
> - The main disadvantage I see is that the meaning of the instructions
> is not encoded just in itself. This is a problem when jumping to
> such an instruction (just don't allow that, i.e., have an illegal
> instruction exception if you jump to such an instruction). A bigger
> problem is when there an interrupt or exception returns to such an
> instruction; a way to deal with that may be to allow this encoding
> only for instructions that cannot cause exceptions, and to delay
> interrupts until the next self-contained instruction.
>
> @InProceedings{kim&smith02,
> author = {Ho-Seop Kim and James E. Smith},
> title = {An Instruction Set and Microarchitecture for
> Instruction Level Distributed Processing},
> crossref = {isca02},
> pages = {71--81},
> url = {http://www.ece.wisc.edu/~hskim/papers/kimh_ildp.pdf},
> annote = {This paper addresses the problems of wide
> superscalars with communication across the chip and
> the number of write ports in the register file. The
> authors propose an architecture (ILDP) with
> general-purpose registers and with accumulators
> (with instructions only accessing one accumulator
> (read and/or write) and one register (read or
> write); for the accumulators their death is
> specified explicitly in the instructions. The
> microarchitecture builds \emph{strands} from
> instructions working on an accumulator; a strand
> starts with an instruction writing to an accumulator
> without reading from it, continues with instructions
> reading from (and possibly writing to) the
> accumulator and ends with an instruction that kills
> the accumulator. Strands are allocated to one out of
> eight processing elements (PEs) dynamically (i.e.,
> accumulators are renamed). A PE consists of
> mainly one ALU data path (but also a copy of the
> GPRs and an L1 cache). They evaluated this
> architecture by translating Alpha binaries into it,
> and comparing their architecture to a 4-wide or
> 8-wide Alpha implementation; their architecture has
> a lower L1 cache latency, though. The performance of
> ILDP in clock cycles is competetive, and one can
> expect faster clocks for ILDP. The paper also
> presents data for other stuff, e.g. general-purpose
> register writes, which have to be promoted between
> strands and which are relatively few.}
> }
>
> @Proceedings{isca02,
> title = "$29^\textit{th}$ Annual International Symposium on Computer Architecture",
> booktitle = "$29^\textit{th}$ Annual International Symposium on Computer Architecture",
> year = "2002",
> key = "ISCA 29",
> }
>
> - anton

Bill Wulf had a scheme where each instruction had registers 3 in and one
out, and two opcodes, and computed (X a Y) b C.

Re: Another code compression idea

<sqattp$d4k$2@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22473&group=comp.arch#22473

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Sun, 26 Dec 2021 15:30:01 -0800
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <sqattp$d4k$2@dont-email.me>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at>
<8fe6da66-1359-4f80-b0dc-b8a6e0d138een@googlegroups.com>
<2021Dec26.235511@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 26 Dec 2021 23:30:01 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="adc697d50b56aaf743b4a8fa705f79dd";
logging-data="13460"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/5wh0UQvcJQFSfKxenXeH2"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:A9trlALzRDscfxQ/Aa64feBtSEU=
In-Reply-To: <2021Dec26.235511@mips.complang.tuwien.ac.at>
Content-Language: en-US

by: Ivan Godard - Sun, 26 Dec 2021 23:30 UTC

On 12/26/2021 2:55 PM, Anton Ertl wrote:
> MitchAlsup <MitchAlsup@aol.com> writes:
>> On Sunday, December 26, 2021 at 12:33:30 PM UTC-6, Anton Ertl wrote:
>>> Here's another idea: How about compressing by using the destination
>>> register of the previous instruction as one of the sources?
>> <
>> Works about 70% of the time, but only when code has not been scheduled.
>
> Yes, you would not use compiler instruction scheduling if you want to
> make use of this compression idea.
>
>>> + Smaller code.
>> <
>> 27-bits instead of 32-bits ? One still has a lot of pruning to do to get to 16-bits.
>
> Sure, like all the other ways other compressed instruction subsets
> use: shorter literals, only a subset of the operations, etc.
> Basically, we would do something like replace src1=dest (in existing
> compressed instruction subsets) with src1=olddest.

Sounds like a belt of length one?

Re: Another code compression idea

<2021Dec27.082302@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22475&group=comp.arch#22475

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Mon, 27 Dec 2021 07:23:02 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 41
Message-ID: <2021Dec27.082302@mips.complang.tuwien.ac.at>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at> <8fe6da66-1359-4f80-b0dc-b8a6e0d138een@googlegroups.com> <2021Dec26.235511@mips.complang.tuwien.ac.at> <sqattp$d4k$2@dont-email.me>
Injection-Info: reader02.eternal-september.org; posting-host="49bb987570c8ec7ca42a64081be61d96";
logging-data="30770"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/6dBwkp0H8/+vTluJSeeN7"
Cancel-Lock: sha1:y5Q60/qnXncoqsKslUb7CG7hx/4=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Mon, 27 Dec 2021 07:23 UTC

Ivan Godard <ivan@millcomputing.com> writes:
>On 12/26/2021 2:55 PM, Anton Ertl wrote:
>> MitchAlsup <MitchAlsup@aol.com> writes:
>>> On Sunday, December 26, 2021 at 12:33:30 PM UTC-6, Anton Ertl wrote:
>>>> Here's another idea: How about compressing by using the destination
>>>> register of the previous instruction as one of the sources?
>>> <
>>> Works about 70% of the time, but only when code has not been scheduled.
>>
>> Yes, you would not use compiler instruction scheduling if you want to
>> make use of this compression idea.
>>
>>>> + Smaller code.
>>> <
>>> 27-bits instead of 32-bits ? One still has a lot of pruning to do to get to 16-bits.
>>
>> Sure, like all the other ways other compressed instruction subsets
>> use: shorter literals, only a subset of the operations, etc.
>> Basically, we would do something like replace src1=dest (in existing
>> compressed instruction subsets) with src1=olddest.
>
>
>Sounds like a belt of length one?

I thought about whether to mention Mill's belt, but knew that you
would do it for me:-)

Yes, it can be viewed like that, but

* The previous instruction still writes its result in the register set
(at least architecturally).

* The belt has works across control-flow joins (with appropriate
complications), my suggestion is to not allow that to avoid the
complication; but of course you are free to explore whether the
benefits of working across control-flow joins is worth the costs.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Another code compression idea

<24832cce-e4f2-4fe9-87c1-f50ded067c33n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22480&group=comp.arch#22480

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:d8e:: with SMTP id e14mr14533724qve.130.1640599641022;
Mon, 27 Dec 2021 02:07:21 -0800 (PST)
X-Received: by 2002:a9d:206a:: with SMTP id n97mr12494693ota.142.1640599640767;
Mon, 27 Dec 2021 02:07:20 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 27 Dec 2021 02:07:20 -0800 (PST)
In-Reply-To: <2021Dec26.185955@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:e0ab:fea2:55c5:5698;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:e0ab:fea2:55c5:5698
References: <2021Dec26.185955@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <24832cce-e4f2-4fe9-87c1-f50ded067c33n@googlegroups.com>
Subject: Re: Another code compression idea
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Mon, 27 Dec 2021 10:07:21 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 11

by: Quadibloc - Mon, 27 Dec 2021 10:07 UTC

On Sunday, December 26, 2021 at 11:33:30 AM UTC-7, Anton Ertl wrote:
> A bigger
> problem is when there an interrupt or exception returns to such an
> instruction; a way to deal with that may be to allow this encoding
> only for instructions that cannot cause exceptions, and to delay
> interrupts until the next self-contained instruction.

I'd tend to be inclined to deal with it by regarding such instructions,
along with the instruction they follow, as a _single_ instruction for
such purposes.

John Savard

Re: Another code compression idea

<sqc88e$ib4$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22481&group=comp.arch#22481

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Mon, 27 Dec 2021 03:32:30 -0800
Organization: A noiseless patient Spider
Lines: 63
Message-ID: <sqc88e$ib4$1@dont-email.me>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at>
<8fe6da66-1359-4f80-b0dc-b8a6e0d138een@googlegroups.com>
<2021Dec26.235511@mips.complang.tuwien.ac.at> <sqattp$d4k$2@dont-email.me>
<2021Dec27.082302@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 27 Dec 2021 11:32:30 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="adc697d50b56aaf743b4a8fa705f79dd";
logging-data="18788"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19+s4SlQjQQcWLGy4vjWAhd"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:jycqj3RCqcsG4eqkmaJeVsed6D4=
In-Reply-To: <2021Dec27.082302@mips.complang.tuwien.ac.at>
Content-Language: en-US

by: Ivan Godard - Mon, 27 Dec 2021 11:32 UTC

On 12/26/2021 11:23 PM, Anton Ertl wrote:
> Ivan Godard <ivan@millcomputing.com> writes:
>> On 12/26/2021 2:55 PM, Anton Ertl wrote:
>>> MitchAlsup <MitchAlsup@aol.com> writes:
>>>> On Sunday, December 26, 2021 at 12:33:30 PM UTC-6, Anton Ertl wrote:
>>>>> Here's another idea: How about compressing by using the destination
>>>>> register of the previous instruction as one of the sources?
>>>> <
>>>> Works about 70% of the time, but only when code has not been scheduled.
>>>
>>> Yes, you would not use compiler instruction scheduling if you want to
>>> make use of this compression idea.
>>>
>>>>> + Smaller code.
>>>> <
>>>> 27-bits instead of 32-bits ? One still has a lot of pruning to do to get to 16-bits.
>>>
>>> Sure, like all the other ways other compressed instruction subsets
>>> use: shorter literals, only a subset of the operations, etc.
>>> Basically, we would do something like replace src1=dest (in existing
>>> compressed instruction subsets) with src1=olddest.
>>
>>
>> Sounds like a belt of length one?
>
> I thought about whether to mention Mill's belt, but knew that you
> would do it for me:-)
>
> Yes, it can be viewed like that, but
>
> * The previous instruction still writes its result in the register set
> (at least architecturally).
>
> * The belt has works across control-flow joins (with appropriate
> complications), my suggestion is to not allow that to avoid the
> complication; but of course you are free to explore whether the
> benefits of working across control-flow joins is worth the costs.
>
> - anton

Every architecture has a problem with control flow joins. All live data
must be congruent (whatever that means in the architecture) after the
join. It may be more or less easy to extend that congruence back over
the control flow, for example by using a location that is global over
all control to hold the value. However, the available locations are
limited, and the amount of live values is not.

The usual solution in positional addressing (i.e. registers) is to use
global allocation, with a graph coloring allocation algorithm and spill
(to effectively unbounded memory), with fill after the join to achieve
congruence. The amount of spill depends on the global union of live
sets, not on the size of the live set at the join. More registers
reduces the coloring pressure.

The Mill solution for temporal addressing is to rename the addresses of
the live values on each path into the join to some congruent naming. The
amount of rename depends on the size of the live set at the join, not on
the sizes of the sets elsewhere even if the sets have values in common.
That is why positional needs a bigger regfile than temporal needs belt.
Rename is cheaper than spill, but wastes entropy when a spill is not
needed by positional addressing.

Re: Another code compression idea

<cade959f-9859-4a85-ac19-486e71e1919dn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22487&group=comp.arch#22487

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:25ca:: with SMTP id y10mr12510228qko.526.1640624699613;
Mon, 27 Dec 2021 09:04:59 -0800 (PST)
X-Received: by 2002:a05:6808:211c:: with SMTP id r28mr14800839oiw.155.1640624699313;
Mon, 27 Dec 2021 09:04:59 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 27 Dec 2021 09:04:59 -0800 (PST)
In-Reply-To: <sqc88e$ib4$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:1d76:bc23:9532:46a5;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:1d76:bc23:9532:46a5
References: <2021Dec26.185955@mips.complang.tuwien.ac.at> <8fe6da66-1359-4f80-b0dc-b8a6e0d138een@googlegroups.com>
<2021Dec26.235511@mips.complang.tuwien.ac.at> <sqattp$d4k$2@dont-email.me>
<2021Dec27.082302@mips.complang.tuwien.ac.at> <sqc88e$ib4$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <cade959f-9859-4a85-ac19-486e71e1919dn@googlegroups.com>
Subject: Re: Another code compression idea
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Mon, 27 Dec 2021 17:04:59 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 62

by: Quadibloc - Mon, 27 Dec 2021 17:04 UTC

On Monday, December 27, 2021 at 4:32:34 AM UTC-7, Ivan Godard wrote:

> Every architecture has a problem with control flow joins. All live data
> must be congruent (whatever that means in the architecture) after the
> join.

I must say that it's even hard to understand the language you're
speaking here.

Later on, you mention register rename in conventional register-based
architectures, as opposed to the Mill.

So, it's taken some time, but I *think* I've figured out what you're
saying.

In order to allow a branch into a code sequence on the Mill,
you have to save and restore the Belt, something like how
registers are saved and restored on a subroutine call. As
this involves memory, or at least cache, it's expensive, but
it's only done on the occasional times when it's needed.

You were comparing the Mill to an *out-of-order* implementation
of a conventional architecture.

Said out-of-order implementation is always renaming
registers, during all code everywhere, so that a mapping
between the actual hardware registers that contain data
and the registers in the architectural model is maintained.

This lets one branch into an existing code sequence,
because the code following the branch target refers to the
architectural registers, so it will work with the different
rename pointers from either preceding sequence.

So that already means that I have an issue with the
*first* sentence I quoted. An *in-order* implementation
of a conventional architecture - that wretchedly slow
obsolete design people used in the old days when
they used core memory and discrete transistors, or
tiny little integrated circuits that weren't much better -
does *not* have a problem with control flow joins.

And, of course, that has been the paradigm for
computer architecture design all along. Out-of-order
implementations then go to great lengths to make
a modern processor pretend, in order to conform
to its ISA so designed, to be an in-order processor
from the good old days.

Which, of course, in itself is a very strong argument
for the Mill. Or for RISC designs which because they
use 32 registers, can be in-order with the compiler
doing the rename stuff... at least that _was_ true
20 years (or more?) ago, now the RISC designs have
to be out-of-order to give competitive performance
as well.

It is good that you've escaped the constraint of
still thinking you live in the good old days. But
forgetting they ever happened may make it hard
for other people to understand you.

John Savard

Re: Another code compression idea

<sqd3e4$a50$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22499&group=comp.arch#22499

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Mon, 27 Dec 2021 13:16:18 -0600
Organization: A noiseless patient Spider
Lines: 96
Message-ID: <sqd3e4$a50$1@dont-email.me>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at>
<8fe6da66-1359-4f80-b0dc-b8a6e0d138een@googlegroups.com>
<2021Dec26.235511@mips.complang.tuwien.ac.at> <sqattp$d4k$2@dont-email.me>
<2021Dec27.082302@mips.complang.tuwien.ac.at> <sqc88e$ib4$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 27 Dec 2021 19:16:20 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="8aa7ed5a85a0d9a3d655c87683959d4a";
logging-data="10400"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1++I2SKnqyHZlcXSQwrzjNP"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:ofo9uQVm9p0by8zcNVkU8smkez4=
In-Reply-To: <sqc88e$ib4$1@dont-email.me>
Content-Language: en-US

by: BGB - Mon, 27 Dec 2021 19:16 UTC

On 12/27/2021 5:32 AM, Ivan Godard wrote:
> On 12/26/2021 11:23 PM, Anton Ertl wrote:
>> Ivan Godard <ivan@millcomputing.com> writes:
>>> On 12/26/2021 2:55 PM, Anton Ertl wrote:
>>>> MitchAlsup <MitchAlsup@aol.com> writes:
>>>>> On Sunday, December 26, 2021 at 12:33:30 PM UTC-6, Anton Ertl wrote:
>>>>>> Here's another idea: How about compressing by using the destination
>>>>>> register of the previous instruction as one of the sources?
>>>>> <
>>>>> Works about 70% of the time, but only when code has not been
>>>>> scheduled.
>>>>
>>>> Yes, you would not use compiler instruction scheduling if you want to
>>>> make use of this compression idea.
>>>>
>>>>>> + Smaller code.
>>>>> <
>>>>> 27-bits instead of 32-bits ? One still has a lot of pruning to do
>>>>> to get to 16-bits.
>>>>
>>>> Sure, like all the other ways other compressed instruction subsets
>>>> use: shorter literals, only a subset of the operations, etc.
>>>> Basically, we would do something like replace src1=dest (in existing
>>>> compressed instruction subsets) with src1=olddest.
>>>
>>>
>>> Sounds like a belt of length one?
>>
>> I thought about whether to mention Mill's belt, but knew that you
>> would do it for me:-)
>>
>> Yes, it can be viewed like that, but
>>
>> * The previous instruction still writes its result in the register set
>>    (at least architecturally).
>>
>> * The belt has works across control-flow joins (with appropriate
>>    complications), my suggestion is to not allow that to avoid the
>>    complication; but of course you are free to explore whether the
>>    benefits of working across control-flow joins is worth the costs.
>>
>> - anton
>
> Every architecture has a problem with control flow joins. All live data
> must be congruent (whatever that means in the architecture) after the
> join. It may be more or less easy to extend that congruence back over
> the control flow, for example by using a location that is global over
> all control to hold the value. However, the available locations are
> limited, and the amount of live values is not.
>
> The usual solution in positional addressing (i.e. registers) is to use
> global allocation, with a graph coloring allocation algorithm and spill
> (to effectively unbounded memory), with fill after the join to achieve
> congruence. The amount of spill depends on the global union of live
> sets, not on the size of the live set at the join. More registers
> reduces the coloring pressure.
>

This is one area where having lots of registers, and then statically
assigning all the variables to registers within the function, seems to
come in handy...

To a lesser degree, predicated instructions and paths can also help
here, because this allows the "if()" branch to be compiled in a way
which does not necessarily require spill and reload. If the register
spill and reload operations are unconditional, then the end result will
be the same regardless of whether the "then" or "else" branch was taken.

Though, as can be noted, there is still the usual tradeoff for
predication that it is primarily a benefit when the "then" or "else"
blocks are smaller than the cost of a branch mispredict.

Some people claim that predication is useless if one has a branch
predictor, but this ignores the cost of spill/reload, or that in many of
the cases where predication is likely to come up in the first place, one
is likely dealing with unpredictable branches.

> The Mill solution for temporal addressing is to rename the addresses of
> the live values on each path into the join to some congruent naming. The
> amount of rename depends on the size of the live set at the join, not on
> the sizes of the sets elsewhere even if the sets have values in common.
> That is why positional needs a bigger regfile than temporal needs belt.
> Rename is cheaper than spill, but wastes entropy when a spill is not
> needed by positional addressing.
>

Possible...

Lots of potential cost tradeoffs here.

I guess it is also likely that in "actual hardware", the relative cost
of LUTRAMs may not be as cheap as FPGAs make them seem.

Re: Another code compression idea

<sqdajp$sij$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22504&group=comp.arch#22504

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.de!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-eb03-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Mon, 27 Dec 2021 21:18:49 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sqdajp$sij$1@newsreader4.netcologne.de>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at>
<8fe6da66-1359-4f80-b0dc-b8a6e0d138een@googlegroups.com>
<2021Dec26.235511@mips.complang.tuwien.ac.at> <sqattp$d4k$2@dont-email.me>
<2021Dec27.082302@mips.complang.tuwien.ac.at> <sqc88e$ib4$1@dont-email.me>
<cade959f-9859-4a85-ac19-486e71e1919dn@googlegroups.com>
Injection-Date: Mon, 27 Dec 2021 21:18:49 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-eb03-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:eb03:0:7285:c2ff:fe6c:992d";
logging-data="29267"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Mon, 27 Dec 2021 21:18 UTC

Quadibloc <jsavard@ecn.ab.ca> schrieb:
> On Monday, December 27, 2021 at 4:32:34 AM UTC-7, Ivan Godard wrote:
>
>> Every architecture has a problem with control flow joins. All live data
>> must be congruent (whatever that means in the architecture) after the
>> join.
>
> I must say that it's even hard to understand the language you're
> speaking here.
>
> Later on, you mention register rename in conventional register-based
> architectures, as opposed to the Mill.
>
> So, it's taken some time, but I *think* I've figured out what you're
> saying.
>
> In order to allow a branch into a code sequence on the Mill,
> you have to save and restore the Belt, something like how
> registers are saved and restored on a subroutine call.

Think of an if/else statement for SSA, the intermediate form
which is state of the art for compilers. It has only a single
point where each variable is defined, which makes things like
constant propagation very easy.

So,

a = 32
// use a
a = 44
// use a

is translated to

a.1 = 32
// use a.1
a.2 = 44
// use a.2

The question then is: What do you do with

if (condition) then
a = 32
else
a = 44
endif ?

The answer are the so-called "phi nodes", which are functions which take
their values from the respective branches, so it could look something
like

if (condition) then
a.1 = 32
else
a.2 = 44
endif

a.3 = PHI(a.1,a.2)

where the PHI selects the value from whatever branch was taken.

You translate the PHI nodes by putting the right value into the
right register (or memory location). For the Mill, it has to
involve setting the variables into the right slots on the belt,
which I can imagine is quite complex.

Re: Another code compression idea

<sqdksm$lrq$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22518&group=comp.arch#22518

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Mon, 27 Dec 2021 16:14:15 -0800
Organization: A noiseless patient Spider
Lines: 17
Message-ID: <sqdksm$lrq$1@dont-email.me>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at>
<8fe6da66-1359-4f80-b0dc-b8a6e0d138een@googlegroups.com>
<2021Dec26.235511@mips.complang.tuwien.ac.at> <sqattp$d4k$2@dont-email.me>
<2021Dec27.082302@mips.complang.tuwien.ac.at> <sqc88e$ib4$1@dont-email.me>
<cade959f-9859-4a85-ac19-486e71e1919dn@googlegroups.com>
<sqdajp$sij$1@newsreader4.netcologne.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 28 Dec 2021 00:14:15 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="ca315eee59e58cc38e256dff4567bd4c";
logging-data="22394"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX181z25ZhJBlmvzRaUoQ/Plm"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:635r2CaLuNMAcyxxdEZpxZFBLcw=
In-Reply-To: <sqdajp$sij$1@newsreader4.netcologne.de>
Content-Language: en-US

by: Ivan Godard - Tue, 28 Dec 2021 00:14 UTC

On 12/27/2021 1:18 PM, Thomas Koenig wrote:

> You translate the PHI nodes by putting the right value into the
> right register (or memory location). For the Mill, it has to
> involve setting the variables into the right slots on the belt,
> which I can imagine is quite complex.

Not very complex. The logical-to-physical mapping can be thought of as
in effect a hardware array of physical numbers indexed by belt number,
say 16x6 bits on a Silver. Replace the array at a join (or call, or
return, etc.)

That's a logical picture; it doesn't have to actually work that way -
rename can be done in several different ways, as Mitch has explained at
length.

Re: Another code compression idea

<sqe61f$mid$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22521&group=comp.arch#22521

copy link Newsgroups: comp.arch

Path: i2pn2.org!rocksolid2!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ggt...@yahoo.com (Brett)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Tue, 28 Dec 2021 05:06:55 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 93
Message-ID: <sqe61f$mid$1@dont-email.me>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 28 Dec 2021 05:06:55 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="edd13ec09436710e54c0baf1729f65b2";
logging-data="23117"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18iwgGhNfxmQNdLjjhrqyGy"
User-Agent: NewsTap/5.5 (iPad)
Cancel-Lock: sha1:dqplqHJmxydWRSMPXrYFb9K4Cuo=
sha1:rYCRggkIWsRG3lUdYzKuQFC1rKQ=

by: Brett - Tue, 28 Dec 2021 05:06 UTC

Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> The RISC-V C (compressed) extension (and AFAIK Thumb and MIPS-16) work
> by turning three-address instructions into two-address instructions,
> reducing literal fields, and/or reducing the number of addressable
> registers.
>
> Here's another idea: How about compressing by using the destination
> register of the previous instruction as one of the sources?
>
> + Smaller code.
>
> + You directly know that you can forward from the previous instruction
> to this one. And extending from this, you can directly encode stuff
> for an ILDP architecture [kim&smith02].

You just described a register plus accumulator design, the most well known
of which is the 8086, though you can argue the intent. Very successful but
not enough registers due to tech reasons.

The biggest downside is you need byte sized instructions to get that good
code density, which hardware guys hate.

The next step is deciding if you want a belt in front of your accumulator
so you can push two loads to ACC and then do an operation, a very common
occurrence. You save on destination register specifiers for the loads and
on source register specifiers for the integer operation.

This is a mix of push and pull ACC ops, you can go pure push or pure pull
as well.

Then you can add preferred registers for ops like X86-64 has for more
compression. Say four address registers and four integer registers for hot
data, and extension opcodes to get at the other registers. You can get near
50% compression compared to current standards, with no performance loss.

> - The main disadvantage I see is that the meaning of the instructions
> is not encoded just in itself. This is a problem when jumping to
> such an instruction (just don't allow that, i.e., have an illegal
> instruction exception if you jump to such an instruction). A bigger
> problem is when there an interrupt or exception returns to such an
> instruction; a way to deal with that may be to allow this encoding
> only for instructions that cannot cause exceptions, and to delay
> interrupts until the next self-contained instruction.
>
> @InProceedings{kim&smith02,
> author = {Ho-Seop Kim and James E. Smith},
> title = {An Instruction Set and Microarchitecture for
> Instruction Level Distributed Processing},
> crossref = {isca02},
> pages = {71--81},
> url = {http://www.ece.wisc.edu/~hskim/papers/kimh_ildp.pdf},
> annote = {This paper addresses the problems of wide
> superscalars with communication across the chip and
> the number of write ports in the register file. The
> authors propose an architecture (ILDP) with
> general-purpose registers and with accumulators
> (with instructions only accessing one accumulator
> (read and/or write) and one register (read or
> write); for the accumulators their death is
> specified explicitly in the instructions. The
> microarchitecture builds \emph{strands} from
> instructions working on an accumulator; a strand
> starts with an instruction writing to an accumulator
> without reading from it, continues with instructions
> reading from (and possibly writing to) the
> accumulator and ends with an instruction that kills
> the accumulator. Strands are allocated to one out of
> eight processing elements (PEs) dynamically (i.e.,
> accumulators are renamed). A PE consists of
> mainly one ALU data path (but also a copy of the
> GPRs and an L1 cache). They evaluated this
> architecture by translating Alpha binaries into it,
> and comparing their architecture to a 4-wide or
> 8-wide Alpha implementation; their architecture has
> a lower L1 cache latency, though. The performance of
> ILDP in clock cycles is competetive, and one can
> expect faster clocks for ILDP. The paper also
> presents data for other stuff, e.g. general-purpose
> register writes, which have to be promoted between
> strands and which are relatively few.}
> }
>
> @Proceedings{isca02,
> title = "$29^\textit{th}$ Annual International Symposium on Computer Architecture",
> booktitle = "$29^\textit{th}$ Annual International Symposium on Computer Architecture",
> year = "2002",
> key = "ISCA 29",
> }
>
> - anton

Re: Another code compression idea

<sqegh3$kun$1@newsreader4.netcologne.de>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22522&group=comp.arch#22522

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd7-eb03-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Tue, 28 Dec 2021 08:05:55 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <sqegh3$kun$1@newsreader4.netcologne.de>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at>
<sqe61f$mid$1@dont-email.me>
Injection-Date: Tue, 28 Dec 2021 08:05:55 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd7-eb03-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd7:eb03:0:7285:c2ff:fe6c:992d";
logging-data="21463"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)

by: Thomas Koenig - Tue, 28 Dec 2021 08:05 UTC

Brett <ggtgp@yahoo.com> schrieb:
> Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>> The RISC-V C (compressed) extension (and AFAIK Thumb and MIPS-16) work
>> by turning three-address instructions into two-address instructions,
>> reducing literal fields, and/or reducing the number of addressable
>> registers.
>>
>> Here's another idea: How about compressing by using the destination
>> register of the previous instruction as one of the sources?
>>
>> + Smaller code.
>>
>> + You directly know that you can forward from the previous instruction
>> to this one. And extending from this, you can directly encode stuff
>> for an ILDP architecture [kim&smith02].
>
> You just described a register plus accumulator design, the most well known
> of which is the 8086, though you can argue the intent. Very successful but
> not enough registers due to tech reasons.

Does not have to be - the value of the accumulator has to be
retained after an operation. What Anton described can have a
temporary value only, which only lives in store forwarding.

> The biggest downside is you need byte sized instructions to get that good
> code density, which hardware guys hate.

Even a reduction from 32 to 16 bits could make a lot of sense.

Re: Another code compression idea

<2021Dec28.115512@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22526&group=comp.arch#22526

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Tue, 28 Dec 2021 10:55:12 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 49
Message-ID: <2021Dec28.115512@mips.complang.tuwien.ac.at>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at> <sqe61f$mid$1@dont-email.me>
Injection-Info: reader02.eternal-september.org; posting-host="601e837a5ee8f3e908f93612dbc27cca";
logging-data="9061"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19rqNgFHY0Ntz8TtdicmWT/"
Cancel-Lock: sha1:SAZ+E/HsaHVBOQOn+1uI63GOYnM=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Tue, 28 Dec 2021 10:55 UTC

Brett <ggtgp@yahoo.com> writes:
>Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>> The RISC-V C (compressed) extension (and AFAIK Thumb and MIPS-16) work
>> by turning three-address instructions into two-address instructions,
>> reducing literal fields, and/or reducing the number of addressable
>> registers.
>>
>> Here's another idea: How about compressing by using the destination
>> register of the previous instruction as one of the sources?
>>
>> + Smaller code.
>>
>> + You directly know that you can forward from the previous instruction
>> to this one. And extending from this, you can directly encode stuff
>> for an ILDP architecture [kim&smith02].
>
>You just described a register plus accumulator design, the most well known
>of which is the 8086

Not quite. Architecturally the previous instruction still writes to
any register (not just a specific one like on the 8086), and the
current instruction reads from that register. Microarchitecturally,
on a simple single-pipeline implementation you can see the forwarding
as using an accumulator (the latch on the forwarding path), but you
also get that when the current instruction names the register and it
happens to be the same as the destination register of the previous
instruction.

>The biggest downside is you need byte sized instructions to get that good
>code density, which hardware guys hate.

It depends on what else you encode and how many registers you have. I
was thinking about 16-bit instructions, as a replacement for RISC-V's
compressed instructions. The benefit over those would be that you
would have a free hand at selecting the other operand and the target;
the cost would be that you could use this compression method only if
the previous instruction writes to a source register of a current
instruction.

>The next step is deciding if you want a belt in front of your accumulator
>so you can push two loads to ACC and then do an operation, a very common
>occurrence.

How common? I don't think it's common in most code I have written.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Another code compression idea

<2021Dec28.120651@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22527&group=comp.arch#22527

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Tue, 28 Dec 2021 11:06:51 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 16
Distribution: world
Message-ID: <2021Dec28.120651@mips.complang.tuwien.ac.at>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at> <sqe61f$mid$1@dont-email.me> <sqegh3$kun$1@newsreader4.netcologne.de>
Injection-Info: reader02.eternal-september.org; posting-host="601e837a5ee8f3e908f93612dbc27cca";
logging-data="9061"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/9m8t6078lBDIQ88Y0fPHB"
Cancel-Lock: sha1:sx3wIHdUzKmYV0CjxQlOXtoBSl8=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Tue, 28 Dec 2021 11:06 UTC

Thomas Koenig <tkoenig@netcologne.de> writes:
>Does not have to be - the value of the accumulator has to be
>retained after an operation. What Anton described can have a
>temporary value only, which only lives in store forwarding.

No, the value lives on in the destination register of the previous
instruction. Of course, if the current instruction overwrites this, a
sufficiently smart microarchitecture could skip the writeback of the
previous instruction, but that's not specific to my compression idea;
and there are pitfalls wrt. interrupts and exceptions to take into
account.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Another code compression idea

<070dfaa4-6571-44cf-929a-f95aa84d10e9n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22530&group=comp.arch#22530

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:57c2:: with SMTP id w2mr18763539qta.54.1640705410147;
Tue, 28 Dec 2021 07:30:10 -0800 (PST)
X-Received: by 2002:a05:6830:2019:: with SMTP id e25mr16140025otp.96.1640705409903;
Tue, 28 Dec 2021 07:30:09 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 28 Dec 2021 07:30:09 -0800 (PST)
In-Reply-To: <sqe61f$mid$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fb70:6300:845c:6f36:c98d:6e6f;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fb70:6300:845c:6f36:c98d:6e6f
References: <2021Dec26.185955@mips.complang.tuwien.ac.at> <sqe61f$mid$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <070dfaa4-6571-44cf-929a-f95aa84d10e9n@googlegroups.com>
Subject: Re: Another code compression idea
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Tue, 28 Dec 2021 15:30:10 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 14

by: Quadibloc - Tue, 28 Dec 2021 15:30 UTC

On Monday, December 27, 2021 at 10:06:58 PM UTC-7, gg...@yahoo.com wrote:

> You just described a register plus accumulator design, the most well known
> of which is the 8086, though you can argue the intent. Very successful but
> not enough registers due to tech reasons.

Hmm. I had never thought of the 8086 as a "register plus accumulator"
architecture; but when I look up its architecture, I see it doesn't just have
four 16-bit registers or eight 8-bit registers; the 16-bit registers aren't
just A, B, C, and D, but Accumulator, Base, Count, and Data.

As if it was designed to have automatically looping instructions to
process strings or something.

John Savard

Re: Another code compression idea

<2021Dec28.184742@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22532&group=comp.arch#22532

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Tue, 28 Dec 2021 17:47:42 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 36
Message-ID: <2021Dec28.184742@mips.complang.tuwien.ac.at>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at> <sqe61f$mid$1@dont-email.me> <070dfaa4-6571-44cf-929a-f95aa84d10e9n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="601e837a5ee8f3e908f93612dbc27cca";
logging-data="26177"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18jPpJE8CX+u8orEwY/B5Di"
Cancel-Lock: sha1:OTIwadWtiCxK8D04iloAJBEc9po=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Tue, 28 Dec 2021 17:47 UTC

Quadibloc <jsavard@ecn.ab.ca> writes:
>On Monday, December 27, 2021 at 10:06:58 PM UTC-7, gg...@yahoo.com wrote:
>
>> You just described a register plus accumulator design, the most well known
>> of which is the 8086, though you can argue the intent. Very successful but
>> not enough registers due to tech reasons.
>
>Hmm. I had never thought of the 8086 as a "register plus accumulator"
>architecture;

For the programming side it mostly isn't. But on the encoding side
there are shorter encodings for certain instructions when working on
AX (or AL), somewhat like the 16-bit encodings in RISC-V C, which are
just additional encodings for the more general stuff. The 8086
architecture also has a number of instructions with implicit
registers, and AX or AL tend to be used by these instructions (e.g.,
LODSW loads into AX).

>but when I look up its architecture, I see it doesn't just have
>four 16-bit registers or eight 8-bit registers;

8 16-bit registers, and a number of instructions treat them alike.
However, you can use only some of them for addressing (fixed in
IA-32), so they are not general-purpose registers.

>As if it was designed to have automatically looping instructions to
>process strings or something.

The implicit-register instructions certainly were designed for
specific patterns; if your usage fits the pattern, it's cool, if not,
it's nasty.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: Another code compression idea

<sqfmps$ml$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22535&group=comp.arch#22535

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Tue, 28 Dec 2021 12:59:06 -0600
Organization: A noiseless patient Spider
Lines: 37
Message-ID: <sqfmps$ml$1@dont-email.me>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at>
<sqe61f$mid$1@dont-email.me>
<070dfaa4-6571-44cf-929a-f95aa84d10e9n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 28 Dec 2021 18:59:08 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7a192c0c96e1f18167785330ec8f0ab7";
logging-data="725"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18E3MsDO/BcmdvQxmpBLLrt"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.4.1
Cancel-Lock: sha1:wylgIlGnxSCUiVRAKbmcJM0SsEA=
In-Reply-To: <070dfaa4-6571-44cf-929a-f95aa84d10e9n@googlegroups.com>
Content-Language: en-US

by: BGB - Tue, 28 Dec 2021 18:59 UTC

On 12/28/2021 9:30 AM, Quadibloc wrote:
> On Monday, December 27, 2021 at 10:06:58 PM UTC-7, gg...@yahoo.com wrote:
>
>> You just described a register plus accumulator design, the most well known
>> of which is the 8086, though you can argue the intent. Very successful but
>> not enough registers due to tech reasons.
>
> Hmm. I had never thought of the 8086 as a "register plus accumulator"
> architecture; but when I look up its architecture, I see it doesn't just have
> four 16-bit registers or eight 8-bit registers; the 16-bit registers aren't
> just A, B, C, and D, but Accumulator, Base, Count, and Data.
>
> As if it was designed to have automatically looping instructions to
> process strings or something.
>

Not to forget that SI and DI were "Source Index" and "Destination Index".

This just leaves SP and BP as Stack Pointer and Base Pointer...

It seems almost like things like "REP MOVSB" and similar were considered
to be fairly significant design features.

Though, I guess it is possible that a more modern CPU with such an
instruction could have it move 8 or 16 bytes at a time.

Say (in hardware):
do {
if(RCX>=8)
{ Q=[RSI]; [RDI]=Q; RSI+=8; RDI+=8; RCX-=8; }
else
{ B=[RSI]; [RDI]=B; RSI++; RDI++; RCX--; }
}while(RCX!=0);

> John Savard

Re: Another code compression idea

<sqfooi$1jll$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22536&group=comp.arch#22536

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!rd9pRsUZyxkRLAEK7e/Uzw.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Tue, 28 Dec 2021 20:32:36 +0100
Organization: Aioe.org NNTP Server
Message-ID: <sqfooi$1jll$1@gioia.aioe.org>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at>
<sqe61f$mid$1@dont-email.me>
<070dfaa4-6571-44cf-929a-f95aa84d10e9n@googlegroups.com>
<2021Dec28.184742@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="52917"; posting-host="rd9pRsUZyxkRLAEK7e/Uzw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.10.1
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Tue, 28 Dec 2021 19:32 UTC

Anton Ertl wrote:
> Quadibloc <jsavard@ecn.ab.ca> writes:
>> On Monday, December 27, 2021 at 10:06:58 PM UTC-7, gg...@yahoo.com wrote:
>>
>>> You just described a register plus accumulator design, the most well known
>>> of which is the 8086, though you can argue the intent. Very successful but
>>> not enough registers due to tech reasons.
>>
>> Hmm. I had never thought of the 8086 as a "register plus accumulator"
>> architecture;
>
> For the programming side it mostly isn't. But on the encoding side
> there are shorter encodings for certain instructions when working on
> AX (or AL), somewhat like the 16-bit encodings in RISC-V C, which are
> just additional encodings for the more general stuff. The 8086
> architecture also has a number of instructions with implicit
> registers, and AX or AL tend to be used by these instructions (e.g.,
> LODSW loads into AX).
>
>> but when I look up its architecture, I see it doesn't just have
>> four 16-bit registers or eight 8-bit registers;
>
> 8 16-bit registers, and a number of instructions treat them alike.
> However, you can use only some of them for addressing (fixed in
> IA-32), so they are not general-purpose registers.
>
>> As if it was designed to have automatically looping instructions to
>> process strings or something.
>
> The implicit-register instructions certainly were designed for
> specific patterns; if your usage fits the pattern, it's cool, if not,
> it's nasty.

Therefore, all asm was written from the inside out, starting with the
innermost loop which got to use AX/BX/CX/DX/SI/DI as Intel intended,
then everything else had to make do with global or stack variables
unless that inner loop left anything unused. Later on, when we got
SP-relative addressing (E)BP could be used as a 7th regular register.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Another code compression idea

<sqgdia$1qk$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22545&group=comp.arch#22545

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ggt...@yahoo.com (Brett)
Newsgroups: comp.arch
Subject: Re: Another code compression idea
Date: Wed, 29 Dec 2021 01:27:39 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <sqgdia$1qk$1@dont-email.me>
References: <2021Dec26.185955@mips.complang.tuwien.ac.at>
<sqe61f$mid$1@dont-email.me>
<070dfaa4-6571-44cf-929a-f95aa84d10e9n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 29 Dec 2021 01:27:39 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="d10d0421d9c55962dfc5333f73ae7ade";
logging-data="1876"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18ADT1BrOxmfSHVRthuPd71"
User-Agent: NewsTap/5.5 (iPad)
Cancel-Lock: sha1:+VjWz/bPafiiexPueT1rdHqj7mw=
sha1:DdVnt7rm+bXR+d8gwWkA+ojmhHA=

by: Brett - Wed, 29 Dec 2021 01:27 UTC

Quadibloc <jsavard@ecn.ab.ca> wrote:
> On Monday, December 27, 2021 at 10:06:58 PM UTC-7, gg...@yahoo.com wrote:
>
>> You just described a register plus accumulator design, the most well known
>> of which is the 8086, though you can argue the intent. Very successful but
>> not enough registers due to tech reasons.
>
> Hmm. I had never thought of the 8086 as a "register plus accumulator"
> architecture; but when I look up its architecture, I see it doesn't just have
> four 16-bit registers or eight 8-bit registers; the 16-bit registers aren't
> just A, B, C, and D, but Accumulator, Base, Count, and Data.

Yes, then Intel got sucked into the RISC philosophy.

It would be interesting to see how much better the design would have become
if Intel went down the implied register road instead, for better code
density.

The difference between four different loads for four registers verses more
bits for a load register specifier is not what I am talking about, as many
ops will use ACC and it’s shorter opcodes.

I would support 64 registers, which none of the RISC do in the base ISA.
Best code density plus most registers plus load/store pair is the holly
Trinity needed to crack a market. And you can support a fixed 32 bit
instruction mode as well, as moronic as that is.

For constant extenders you could set the high bit of each extension byte,
this wastes some entropy but makes decoding instruction boundaries near
trivial to make the hardware guys happier. Problem is you only get 128 base
opcodes, but you don’t have to have this limit on the second or third byte
if your encoding is easy on deciding opcode base length.

> As if it was designed to have automatically looping instructions to
> process strings or something.
>
> John Savard
>

Re: Another code compression idea

<cba87d50-8996-40c0-acab-92207f94d2d8n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22722&group=comp.arch#22722

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:27c9:: with SMTP id ge9mr44088385qvb.61.1641255206219;
Mon, 03 Jan 2022 16:13:26 -0800 (PST)
X-Received: by 2002:a05:6830:1445:: with SMTP id w5mr34590239otp.112.1641255205918;
Mon, 03 Jan 2022 16:13:25 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 3 Jan 2022 16:13:25 -0800 (PST)
In-Reply-To: <2021Dec28.184742@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=73.188.126.34; posting-account=ujX_IwoAAACu0_cef9hMHeR8g0ZYDNHh
NNTP-Posting-Host: 73.188.126.34
References: <2021Dec26.185955@mips.complang.tuwien.ac.at> <sqe61f$mid$1@dont-email.me>
<070dfaa4-6571-44cf-929a-f95aa84d10e9n@googlegroups.com> <2021Dec28.184742@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <cba87d50-8996-40c0-acab-92207f94d2d8n@googlegroups.com>
Subject: Re: Another code compression idea
From: timcaff...@aol.com (Timothy McCaffrey)
Injection-Date: Tue, 04 Jan 2022 00:13:26 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 45

by: Timothy McCaffrey - Tue, 4 Jan 2022 00:13 UTC

On Tuesday, December 28, 2021 at 12:58:19 PM UTC-5, Anton Ertl wrote:
> Quadibloc <jsa...@ecn.ab.ca> writes:
> >On Monday, December 27, 2021 at 10:06:58 PM UTC-7, gg...@yahoo.com wrote:
> >
> >> You just described a register plus accumulator design, the most well known
> >> of which is the 8086, though you can argue the intent. Very successful but
> >> not enough registers due to tech reasons.
> >
> >Hmm. I had never thought of the 8086 as a "register plus accumulator"
> >architecture;
> For the programming side it mostly isn't. But on the encoding side
> there are shorter encodings for certain instructions when working on
> AX (or AL), somewhat like the 16-bit encodings in RISC-V C, which are
> just additional encodings for the more general stuff. The 8086
> architecture also has a number of instructions with implicit
> registers, and AX or AL tend to be used by these instructions (e.g.,
> LODSW loads into AX).
> >but when I look up its architecture, I see it doesn't just have
> >four 16-bit registers or eight 8-bit registers;
> 8 16-bit registers, and a number of instructions treat them alike.
> However, you can use only some of them for addressing (fixed in
> IA-32), so they are not general-purpose registers.
> >As if it was designed to have automatically looping instructions to
> >process strings or something.
> The implicit-register instructions certainly were designed for
> specific patterns; if your usage fits the pattern, it's cool, if not,
> it's nasty.
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

The 8086 was designed as an 8080 replacement and as a Z-80 competitor.
(Hence the string instructions, and SI/DI).
It was also a stop-gap until the 432 was ready (stop laughing!)

I think most of the accumulator orientation of the 8086 was because it was
designed to run 8080 codes fast. (oh look, we have lots of instructions that
do stuff with A! We better make that efficient in the 8086!)
I think support for modern programming
languages (like Pascal) was kind of an after thought.
Interviews at the time said they used the PDP-11 as inspiration (everybody
said that about their arch.), I think that was mostly marketing. I don't think
they really understood why the PDP-11 was designed like it was.

- Tim

Re: Another code compression idea

<ce81d71b-2edd-4b5a-915e-0954c6c0fdc3n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22724&group=comp.arch#22724

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:100c:: with SMTP id z12mr33972790qkj.680.1641255703656;
Mon, 03 Jan 2022 16:21:43 -0800 (PST)
X-Received: by 2002:a9d:75d4:: with SMTP id c20mr34292843otl.85.1641255703367;
Mon, 03 Jan 2022 16:21:43 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 3 Jan 2022 16:21:43 -0800 (PST)
In-Reply-To: <2021Dec26.185955@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=73.188.126.34; posting-account=ujX_IwoAAACu0_cef9hMHeR8g0ZYDNHh
NNTP-Posting-Host: 73.188.126.34
References: <2021Dec26.185955@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ce81d71b-2edd-4b5a-915e-0954c6c0fdc3n@googlegroups.com>
Subject: Re: Another code compression idea
From: timcaff...@aol.com (Timothy McCaffrey)
Injection-Date: Tue, 04 Jan 2022 00:21:43 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 92

by: Timothy McCaffrey - Tue, 4 Jan 2022 00:21 UTC

On Sunday, December 26, 2021 at 1:33:30 PM UTC-5, Anton Ertl wrote:
> The RISC-V C (compressed) extension (and AFAIK Thumb and MIPS-16) work
> by turning three-address instructions into two-address instructions,
> reducing literal fields, and/or reducing the number of addressable
> registers.
>
> Here's another idea: How about compressing by using the destination
> register of the previous instruction as one of the sources?
>
> + Smaller code.
>
> + You directly know that you can forward from the previous instruction
> to this one. And extending from this, you can directly encode stuff
> for an ILDP architecture [kim&smith02].
>
> - The main disadvantage I see is that the meaning of the instructions
> is not encoded just in itself. This is a problem when jumping to
> such an instruction (just don't allow that, i.e., have an illegal
> instruction exception if you jump to such an instruction). A bigger
> problem is when there an interrupt or exception returns to such an
> instruction; a way to deal with that may be to allow this encoding
> only for instructions that cannot cause exceptions, and to delay
> interrupts until the next self-contained instruction.
>
> @InProceedings{kim&smith02,
> author = {Ho-Seop Kim and James E. Smith},
> title = {An Instruction Set and Microarchitecture for
> Instruction Level Distributed Processing},
> crossref = {isca02},
> pages = {71--81},
> url = {http://www.ece.wisc.edu/~hskim/papers/kimh_ildp.pdf},
> annote = {This paper addresses the problems of wide
> superscalars with communication across the chip and
> the number of write ports in the register file. The
> authors propose an architecture (ILDP) with
> general-purpose registers and with accumulators
> (with instructions only accessing one accumulator
> (read and/or write) and one register (read or
> write); for the accumulators their death is
> specified explicitly in the instructions. The
> microarchitecture builds \emph{strands} from
> instructions working on an accumulator; a strand
> starts with an instruction writing to an accumulator
> without reading from it, continues with instructions
> reading from (and possibly writing to) the
> accumulator and ends with an instruction that kills
> the accumulator. Strands are allocated to one out of
> eight processing elements (PEs) dynamically (i.e.,
> accumulators are renamed). A PE consists of
> mainly one ALU data path (but also a copy of the
> GPRs and an L1 cache). They evaluated this
> architecture by translating Alpha binaries into it,
> and comparing their architecture to a 4-wide or
> 8-wide Alpha implementation; their architecture has
> a lower L1 cache latency, though. The performance of
> ILDP in clock cycles is competetive, and one can
> expect faster clocks for ILDP. The paper also
> presents data for other stuff, e.g. general-purpose
> register writes, which have to be promoted between
> strands and which are relatively few.}
> }
>
> @Proceedings{isca02,
> title = "$29^\textit{th}$ Annual International Symposium on Computer Architecture",
> booktitle = "$29^\textit{th}$ Annual International Symposium on Computer Architecture",
> year = "2002",
> key = "ISCA 29",
> }
>
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

You could just implement an expression stack. Keep the stack in your registers, have another register
that points to "top of stack" in your register file. Normal stack expressions rarely exceed 8 levels.
Lets see, then you could do:
Z = A*Y + B*X
load R1 <- A
load R2 <- B
load R3 <- X
load R4 <- Y
set tos to R30
push R1 ;R30 = A
MUL R4 ;R30 = A*Y
push R2 ;R29 = B
MUL R3 ;R20 = B*X
ADD (tos, tos+1) ;R31 = A*Y+B*X
store R31, Z

This is an off the cuff answer, it seems like there are more efficient encodings than what I just came up with.

- Tim

Re: Another code compression idea

<68dad966-0492-42ae-acf8-b84745da7fe4n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=22725&group=comp.arch#22725

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:2a88:: with SMTP id jr8mr43738640qvb.118.1641256045786;
Mon, 03 Jan 2022 16:27:25 -0800 (PST)
X-Received: by 2002:a9d:206a:: with SMTP id n97mr36398639ota.142.1641256045497;
Mon, 03 Jan 2022 16:27:25 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 3 Jan 2022 16:27:25 -0800 (PST)
In-Reply-To: <cba87d50-8996-40c0-acab-92207f94d2d8n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:5959:7534:4159:ef20;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:5959:7534:4159:ef20
References: <2021Dec26.185955@mips.complang.tuwien.ac.at> <sqe61f$mid$1@dont-email.me>
<070dfaa4-6571-44cf-929a-f95aa84d10e9n@googlegroups.com> <2021Dec28.184742@mips.complang.tuwien.ac.at>
<cba87d50-8996-40c0-acab-92207f94d2d8n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <68dad966-0492-42ae-acf8-b84745da7fe4n@googlegroups.com>
Subject: Re: Another code compression idea
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 04 Jan 2022 00:27:25 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 63

by: MitchAlsup - Tue, 4 Jan 2022 00:27 UTC

On Monday, January 3, 2022 at 6:13:27 PM UTC-6, timca...@aol.com wrote:
> On Tuesday, December 28, 2021 at 12:58:19 PM UTC-5, Anton Ertl wrote:
> > Quadibloc <jsa...@ecn.ab.ca> writes:
> > >On Monday, December 27, 2021 at 10:06:58 PM UTC-7, gg...@yahoo.com wrote:
> > >
> > >> You just described a register plus accumulator design, the most well known
> > >> of which is the 8086, though you can argue the intent. Very successful but
> > >> not enough registers due to tech reasons.
> > >
> > >Hmm. I had never thought of the 8086 as a "register plus accumulator"
> > >architecture;
> > For the programming side it mostly isn't. But on the encoding side
> > there are shorter encodings for certain instructions when working on
> > AX (or AL), somewhat like the 16-bit encodings in RISC-V C, which are
> > just additional encodings for the more general stuff. The 8086
> > architecture also has a number of instructions with implicit
> > registers, and AX or AL tend to be used by these instructions (e.g.,
> > LODSW loads into AX).
> > >but when I look up its architecture, I see it doesn't just have
> > >four 16-bit registers or eight 8-bit registers;
> > 8 16-bit registers, and a number of instructions treat them alike.
> > However, you can use only some of them for addressing (fixed in
> > IA-32), so they are not general-purpose registers.
> > >As if it was designed to have automatically looping instructions to
> > >process strings or something.
> > The implicit-register instructions certainly were designed for
> > specific patterns; if your usage fits the pattern, it's cool, if not,
> > it's nasty.
> > - anton
> > --
> > 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> > Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>
> The 8086 was designed as an 8080 replacement and as a Z-80 competitor.
> (Hence the string instructions, and SI/DI).
> It was also a stop-gap until the 432 was ready (stop laughing!)
<
CMU report indicates the 432 could have been 2× faster if they added a single wire
between the 2 chips. But I digress.
>
> I think most of the accumulator orientation of the 8086 was because it was
> designed to run 8080 codes fast. (oh look, we have lots of instructions that
> do stuff with A! We better make that efficient in the 8086!)
> I think support for modern programming
> languages (like Pascal) was kind of an after thought.
<
> Interviews at the time said they used the PDP-11 as inspiration (everybody
> said that about their arch.), I think that was mostly marketing. I don't think
> they really understood why the PDP-11 was designed like it was.
<
PDP-11 was the inspiration of 68000 and 32032
>
> - Tim

<<<<< EVACUATION ROUTE <<<<<

devel / comp.arch / Another code compression idea

Subject	Author
Another code compression idea	Anton Ertl
Re: Another code compression idea	MitchAlsup
Re: Another code compression idea	Anton Ertl
Re: Another code compression idea	Ivan Godard
Re: Another code compression idea	Anton Ertl
Re: Another code compression idea	Ivan Godard
Re: Another code compression idea	Quadibloc
Re: Another code compression idea	Thomas Koenig
Re: Another code compression idea	Ivan Godard
Re: Another code compression idea	BGB
Re: Another code compression idea	Bill Findlay
Re: Another code compression idea	Ivan Godard
Re: Another code compression idea	Quadibloc
Re: Another code compression idea	Brett
Re: Another code compression idea	Thomas Koenig
Re: Another code compression idea	Anton Ertl
Re: Another code compression idea	Anton Ertl
Re: Another code compression idea	Brett
Re: Another code compression idea	MitchAlsup
Re: Another code compression idea	MitchAlsup
Re: Another code compression idea	Anton Ertl
Re: Another code compression idea	Quadibloc
Re: Another code compression idea	Anton Ertl
Re: Another code compression idea	Terje Mathisen
Re: Another code compression idea	Timothy McCaffrey
Re: Another code compression idea	MitchAlsup
Re: Another code compression idea	BGB
Re: Another code compression idea	Brett
Re: Another code compression idea	Timothy McCaffrey
Re: Another code compression idea	Brett