Welcome to novaBBS (click a section below)

mail files register newsreader groups login

Message-ID:

"Inquiry is fatal to certainty." -- Will Durant

Re: Verification

Subject	Author
Developing a hardware intuition	Paul A. Clayton
Re: Developing a hardware intuition	Marcus
Re: Developing a hardware intuition	BGB
Re: Developing a hardware intuition	MitchAlsup
Re: Developing a hardware intuition	BGB
Re: Developing a hardware intuition	Theo Markettos
Re: Developing a hardware intuition	Stephen Fuld
Re: Developing a hardware intuition	Theo Markettos
Verification (was: Developing a hardware intuition)	Stefan Monnier
Re: Verification (was: Developing a hardware intuition)	MitchAlsup
Re: Verification	Stefan Monnier
Re: Verification	Terje Mathisen
Re: Verification	MitchAlsup
Re: Verification (was: Developing a hardware intuition)	Niklas Holsti
Re: Developing a hardware intuition	MitchAlsup
Re: Developing a hardware intuition	Ivan Godard

Developing a hardware intuition

<t249ur$gtd$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24550&group=comp.arch#24550

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: paaroncl...@gmail.com (Paul A. Clayton)
Newsgroups: comp.arch
Subject: Developing a hardware intuition
Date: Thu, 31 Mar 2022 09:22:34 -0400
Organization: A noiseless patient Spider
Lines: 97
Message-ID: <t249ur$gtd$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 31 Mar 2022 13:22:35 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="fb72bd2640dbc43fe20e54085f33c5f0";
logging-data="17325"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18e/PmL2moAxZTQQFpsNlopSiAy/LhyzBs="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.0
Cancel-Lock: sha1:TvrbvnPdOAfwli25YQLN8iL/c3k=
X-Mozilla-News-Host: news://news.eternal-september.org:119

by: Paul A. Clayton - Thu, 31 Mar 2022 13:22 UTC

A couple of weeks ago, someone with 20+ years of software
experience emailed me asking how to develop an "intuition"
or "way of thinking" in computer architecture to "come up
with elegant solutions that are implementable and scalable
in hardware", "jumping the chasm from absorbing and
understanding to being able to contribute and come up with
elegant architecture".

My response (which I will append at the end of this post)
was very weak; perhaps some more useful advice might be
offered here on comp.arch.

============== My email response ===========================

I myself am only a hobbyist — and not even a hobbyist who has
the skills to develop hardware — so I am not certain how much
useful advice I can offer. I will present a few thoughts.
Even if I were a professional, you would be wise to seek
others for advice; any creative field will have different
methods that work for different people and having the same
thing explained from multiple perspectives often aids
understanding.

It is commonly said that computer hardware differs most
fundamentally from software in parallelism, but I think
spatial reasoning is more significant. The structural support
of a bridge is parallel, but I doubt the engineers designing
a bridge think of each cable individually as acting in
parallel. Giving less attention to the aspect of time may
offer a more humanly natural means of conceiving of
the parallel (in space) operation; I suspect most people are
more familiar with spatial relationships (the toaster is to
the right of the sink; the conventional oven is to the left
of the sink; the microwave is above the oven) than parallel
action even when the functionality is specialized. (Each
piece of kitchen equipment can operate in parallel — and the
specialization of function may help convert "oven bakes,
microwave heats, toaster toasts" to "oven operates, microwave
operates, toaster operates" — but I suspect "is occupied" is
a more common conception.)

(Obviously, a bridge designer cannot ignore time; forces are
not constant over time. When preparing food one pays
attention to cooking time, equipment occupancy, and dependency
chains and not so much the locations of the equipment, but
visualizing the equipment spatially may make "structural
hazards" more obvious and this hazard detection seems to be
second nature.)

Familiarity with some rules of thumb, special techniques, and
trade-off axes would probably help in developing a mindset
for working in computer architecture. Smaller/closer is
faster/lower power is a commonly cited rule of thumb. The
trade-off axis of specialization (with efficiency) and
utilization is basic (as is the generalized application of
Amdahl's Law to fractional improvements). The trade-off axis
of caching results and redoing work (with communication
considered work) is also basic. (One might naively imagine
that one should always cache results and schedule work as
early as possible, but timely reuse is a speculation and even
if all inputs are available at an earlier time a later start
may be better — e.g., to reduce buffering if consumers of a
result will not be ready earlier.)

These do not seem especially alien to software design. (I
suspect software is more biased to favor utilization over
specialization; the total complexity of software seems
higher — as one would expect higher up an interface stack —
and the temporal and workload breadth of software use seems
higher, both factors discourage specialization which adds
complexity and conflicts with breadth of use. Being lower
on the interface stack, hardware design may be more able to
justify effort for correctness and fitness of purpose
[usually performance].)

Speculation and doing more work to produce a result faster
may be less common in software, but even software is not
typically designed to minimize variation from worst case
execution time. E.g., a simple hash table look-up "speculates"
that the primary bucket will have the matching key.

I suspect there is a rough equivalent of "data structures and
algorithms" for hardware development, where a familiarity
with a modest number of principles of the trade-offs is
sufficient and one would gradually gain awareness of
variations and use cases. Looking at good examples is also a
common learning technique, but the availability of technical
information about professional hardware designs seems to have
decreased significantly (and rationales are even less
commonly presented).

I fear little of the above has been helpful, but you are very
welcome to send a follow-up email.

If it would be helpful, I could try to walk through a couple
of textbook/exam-style exercises to present how I would work
on them.

Re: Developing a hardware intuition

<t24e5g$k2c$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24551&group=comp.arch#24551

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: Developing a hardware intuition
Date: Thu, 31 Mar 2022 16:34:23 +0200
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <t24e5g$k2c$1@dont-email.me>
References: <t249ur$gtd$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 31 Mar 2022 14:34:24 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="cae53c3b276e2d68a1ba992af5f086d8";
logging-data="20556"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18vKXM7L3tZVCAIjUZvuIjbypZC6gRzjQY="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Cancel-Lock: sha1:2oVbjjRxvmZ8ZBGFPD+2yTckVRM=
In-Reply-To: <t249ur$gtd$1@dont-email.me>
Content-Language: en-US

by: Marcus - Thu, 31 Mar 2022 14:34 UTC

On 2022-03-31, Paul A. Clayton wrote:
> A couple of weeks ago, someone with 20+ years of software
> experience emailed me asking how to develop an "intuition"
> or "way of thinking" in computer architecture to "come up
> with elegant solutions that are implementable and scalable
> in hardware", "jumping the chasm from absorbing and
> understanding to being able to contribute and come up with
> elegant architecture".
>
> My response (which I will append at the end of this post)
> was very weak; perhaps some more useful advice might be
> offered here on comp.arch.
>
> ============== My email response ===========================
> [snip - sorry]

One thing that I have noticed is that one major difference in mindset
between software and hardware development is that of resource reuse.

In software development it is often beneficial to make many different
specialized solutions. If you identify that you can do less work in
certain cases by skipping operations or replacing them with cheaper
operations, it's almost always a win to do so in software (mainly
because software size is rarely an issue, and the L1I cache quickly
absorbs new/different variants).

In hardware development, on the other hand, it's almost always
beneficial to make heavy use of a few different resources. For instance
it is usually more beneficial to use a single full-width multiplier for
many different purposes than to have many different width multipliers
for different purposes (unless they can all be used concurrently, that
is). This is because die area (or LUTs or whatever your measure is) is
a limited resource.

A similar discussion in this forum concerned the differences in hardware
and software implementations of reciprocals. E.g. in software it can
be beneficial to start off with a narrow word width and work your way
up to wider (and more costly) operations as the precision increases
after a few Newton-Raphson iterations, whereas in hardware it can be
better to just use full width all the way.

Thus it is important to construct your hardware design around powerful
primitives. A Fused Multiply-Add unit is a good example of a primitive
that can be used for many different purposes.

/Marcus

Re: Developing a hardware intuition

<awE*PpxKy@news.chiark.greenend.org.uk>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24552&group=comp.arch#24552

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!nntp.terraraq.uk!nntp-feed.chiark.greenend.org.uk!ewrotcd!.POSTED!not-for-mail
From: theom+n...@chiark.greenend.org.uk (Theo Markettos)
Newsgroups: comp.arch
Subject: Re: Developing a hardware intuition
Date: 31 Mar 2022 16:58:28 +0100 (BST)
Organization: University of Cambridge, England
Lines: 50
Message-ID: <awE*PpxKy@news.chiark.greenend.org.uk>
References: <t249ur$gtd$1@dont-email.me>
NNTP-Posting-Host: chiark.greenend.org.uk
X-Trace: chiark.greenend.org.uk 1648742310 25903 212.13.197.229 (31 Mar 2022 15:58:30 GMT)
X-Complaints-To: abuse@chiark.greenend.org.uk
NNTP-Posting-Date: Thu, 31 Mar 2022 15:58:30 +0000 (UTC)
User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (Linux/3.16.0-11-amd64 (x86_64))
Originator: theom@chiark.greenend.org.uk ([212.13.197.229])

by: Theo Markettos - Thu, 31 Mar 2022 15:58 UTC

Paul A. Clayton <paaronclayton@gmail.com> wrote:
> A couple of weeks ago, someone with 20+ years of software
> experience emailed me asking how to develop an "intuition"
> or "way of thinking" in computer architecture to "come up
> with elegant solutions that are implementable and scalable
> in hardware", "jumping the chasm from absorbing and
> understanding to being able to contribute and come up with
> elegant architecture".
>
> My response (which I will append at the end of this post)
> was very weak; perhaps some more useful advice might be
> offered here on comp.arch.

I think a difference is that in software your foundation is a toolbox
provided by the architecture. You have *these* tools, and you can achieve
your desired result if you use tool A then tool B then tool C. There is
then a lot of scheming that takes place to achieve the desired result based
on the combination of tools that happen to be in the toolbox.

eg there are lots of virtual memory tricks (demand paging, page compression,
copy on write) that rely on non-obvious combinations of the MMU page table,
something cunning in the page fault handler and instruction resumption after
page faults. Lots of people come up with schemes that fit the same basic
pattern, because the basic pattern is set up by the architecture.

In hardware you can do *almost anything*, but that anything has an
implementation cost in performance, area, power, complexity and
compatibility. You don't need to come up with a cunning combination of
tools from the toolbox, because you can just add a new one. The problem is
that it's too expensive to do that for every idea unless the thing you're
offering has a really good justification or relatively minor costs.
So you end up needing to make a more generic tool to add to the toolbox than
a special purpose one for a specific problem.

> It is commonly said that computer hardware differs most
> fundamentally from software in parallelism, but I think
> spatial reasoning is more significant.

Software is increasingly spatial too, of course (threads/cores/servers/etc).

I think a flaw in hardware design is people taking spatiality too literally:
they have an algorithm with N nodes, so we map each node to a LUT/ALU/core
and then wire them in a 2/3/4D mesh, and... we end up with an unroutable
mess of wires which ends up very inefficient. Whereas it may be better to
have central core(s) and pump data through them, with memory addresses
representing dimensions. Parallelism is useful, but too much can be
problematic, just like overly serialising things in software can be
problematic.

Theo

Re: Developing a hardware intuition

<t25990$kr7$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24553&group=comp.arch#24553

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Developing a hardware intuition
Date: Thu, 31 Mar 2022 17:17:01 -0500
Organization: A noiseless patient Spider
Lines: 221
Message-ID: <t25990$kr7$1@dont-email.me>
References: <t249ur$gtd$1@dont-email.me> <t24e5g$k2c$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 31 Mar 2022 22:17:04 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="43bfa1bfb452b6928bd9a432502122e4";
logging-data="21351"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19nNY1x71zOafd+B9CYRjg8"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Cancel-Lock: sha1:CgClT8xWvHVzdWtAFic7ysjAdKw=
In-Reply-To: <t24e5g$k2c$1@dont-email.me>
Content-Language: en-US

by: BGB - Thu, 31 Mar 2022 22:17 UTC

On 3/31/2022 9:34 AM, Marcus wrote:
> On 2022-03-31, Paul A. Clayton wrote:
>> A couple of weeks ago, someone with 20+ years of software
>> experience emailed me asking how to develop an "intuition"
>> or "way of thinking" in computer architecture to "come up
>> with elegant solutions that are implementable and scalable
>> in hardware", "jumping the chasm from absorbing and
>> understanding to being able to contribute and come up with
>> elegant architecture".
>>
>> My response (which I will append at the end of this post)
>> was very weak; perhaps some more useful advice might be
>> offered here on comp.arch.
>>
>> ============== My email response ===========================
> > [snip - sorry]
>
> One thing that I have noticed is that one major difference in mindset
> between software and hardware development is that of resource reuse.
>
> In software development it is often beneficial to make many different
> specialized solutions. If you identify that you can do less work in
> certain cases by skipping operations or replacing them with cheaper
> operations, it's almost always a win to do so in software (mainly
> because software size is rarely an issue, and the L1I cache quickly
> absorbs new/different variants).
>
> In hardware development, on the other hand, it's almost always
> beneficial to make heavy use of a few different resources. For instance
> it is usually more beneficial to use a single full-width multiplier for
> many different purposes than to have many different width multipliers
> for different purposes (unless they can all be used concurrently, that
> is). This is because die area (or LUTs or whatever your measure is) is
> a limited resource.
>

Pros/Cons:
FPGA has far more DSP48's than one actually needs, whereas doing a
single big multiplier is slow/expensive.

So, instead of say, a big 64-bit multiplier, one might end up with
something that does, say:
One 32-bit widening multiply;
Two 32-bit narrow multiplies (SIMD);
Four 16-bit multiplies (SIMD).

Packed integer SIMD might just be normal ALU ops with carry propagation
disabled.

But, yeah, it is still true that one might end up with a few big units
that most everything else ends up being routed through, with different
sub-features being enabled or disabled.

Elsewhere, recently, an argument came up over scaled-index addressing
(as a hypothetical RISC-V extension), where the assumption seemed to be
that for an ISA to offer scaled-index at all, then all operations would
necessarily need to have 3 GPR ports at all times.

This isn't really how it works in BJX2 though, since for the 6R3W
regfile, I usually have the Lane 3 ports sitting around not doing
anything else, I can leverage these for the ports needed for
Scaled-Index and similar.

So, really, it would likely only be a significant burden for a 1-wide
core, which might need a 3R/1W regfile rather than a 2R1W regfile. But,
then again, as I see it, about as soon as the core is much larger than a
microcontroller, this whole debate becomes kinda moot (and if it were an
extension, a microcontroller could choose to omit it).

The index-scale + add doesn't really offer much cost otherwise over
doing the same thing with an immediate (and the AGU doesn't need to care
whether the displacement came from a register or immediate).

Or at least not enough to where I will be like, "Yeah, it doesn't matter
that these 15% or so of Load/Store ops will now need to be expressed as
multi-instruction sequences". This falls a bit outside my "small enough
that it doesn't matter" heuristic.

This is, admittedly, one of the bigger disagreements I had with RISC-V;
well, along with requiring a full-width multiplier (and a divide
instruction) for 'M' (as opposed to a narrower widening multiplier).

Say, for example, if we could have 'Mlite' instead, which provides a
half-width widening multiply (with no divide instruction). The widening
multiply could be either Signed+Unsigned, or Unsigned-Only (since one
mostly uses unsigned-multiply to compose a larger multiply).

But, why would I argue for scaled-index but against full-width 64-bit
multiply?:
Scaled index is cheap and has a visible effect on performance;
64-bit multiply is slow, expensive, and rarely used.

A hardware divide instruction could probably help with Dhrystone score,
but is less relevant in general (Dhrystone seems to significantly
over-represent the use of integer divide vs normal code).

One could (more affordably) provide full-width multiply (and divide) as
dedicated shift-check-add / shift-check-subtract logic (stalling the
pipeline until done), but these would be fairly slow.

I am half wondering if this is what the 'M' extension assumed (as
opposed to people being able to fit them into 1..3 pipeline stages).

OTOH: If you have a 3-cycle half-width multiply, or a 64..68-cycle full
width multiply, faking it in software is faster (well, unless you use an
emulation trap ISR, then 68 cycles is "almost free" in comparison).

(This algo could likely implement MUL as a hacked variant of the DIV
sequence, switching the input and output registers, using the "would be"
output of the DIV as the MUL input, with the accumulator initialized to
Zero, checking the MSB of the DIV output for whether or not to ADD
right-shift register to the accumulator, and then using the accumulator
as the MUL output).

I guess although it would be slow, it could allow for the 'M' extension
within a reasonable cost window.

....

Well, there was also the 'A' extension, which seems to be built under
the assumption that one has some sort of multi-core cache-coherency
protocol (could be faked, but would not behave as expected without this;
but is mostly N/A for single core).

....

> A similar discussion in this forum concerned the differences in hardware
> and software implementations of reciprocals. E.g. in software it can
> be beneficial to start off with a narrow word width and work your way
> up to wider (and more costly) operations as the precision increases
> after a few Newton-Raphson iterations, whereas in hardware it can be
> better to just use full width all the way.
>
> Thus it is important to construct your hardware design around powerful
> primitives. A Fused Multiply-Add unit is a good example of a primitive
> that can be used for many different purposes.
>

Granted.

I had considered switching to an FMAC design (in my BJX2 core), but cost
and timing tradeoffs favored sticking with an FMUL and FADD unit; which
may be internally "glued" to mimic an FMAC operation, albeit with some
numerical restrictions (a true FMAC would implement some calculations
which can't be emulated with split units because of "bits falling off
the bottom").

But, to get these properties with an FMAC unit would require an ~ 108
bit or so mantissa to be used internally (and/or split units which
internally operate on full-precision Binary128).

But, even S.E15.F80 (Float96, *) was seriously pushing it...

*: This format was (in storage) the same as Binary128, except that the
low-order 32 bits were ignored on inputs (treated as if they only
contained zeroes), and were filled with zeroes on output. This being
because the relative cost difference between an 80-bit mantissa and
112-bit mantissa (for FMUL purposes) is quite significant.

Granted, if one had such an FMAC unit, they could leverage it for a
64-bit integer multiplier or integer MAC (feeding integers as FP inputs
isn't that hard to pull off, and the integer-output logic likely also
exists if the FADD mechanism is being leveraged for FP->Int conversions).

The intermediate option being, say, to multiply two F80 inputs producing
a F144 intermediate output (vs F160), using the F144 to produce a
128-bit integer MUL/MAC output (for integer MAC, it would make sense to
support the 3rd input as either Float96 or Int128).

....

In theory, if FDIV exists, IDIV could be routed through the same logic
in this case (as with the above). Doesn't provide a good way to provide
for an integer Modulo/Remainder instruction though.

....

Elsewhere (I am still debating it), is whether my BJX2 project needs a
"real" OS port, or whether I am better off (for now) still hacking
features onto TestKern and going in an "OS like" direction with it.

I had looked into it, and realized that porting the kernel would
actually be the lesser of the issues, and that the "hard part" would be
trying to port over the userland (whether I went for either the GNU or
BSD userland).

More so, it would likely also require either modifying GCC to be able to
target BJX2, or making be BGBCC able mimic the external interface
expected for cross-compilers, ...

Eg:
'bjx2-testkern-cc' (likewise for as, 'ld', ...);
Producing '.o' files;
Accepting '.o' and '.a' files;
Ability to 'ar' multiple '.o' files into a '.a' file;
...

Click here to read the complete article

Re: Developing a hardware intuition

<t27nf8$p7d$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24557&group=comp.arch#24557

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: Developing a hardware intuition
Date: Fri, 1 Apr 2022 13:31:34 -0700
Organization: A noiseless patient Spider
Lines: 45
Message-ID: <t27nf8$p7d$1@dont-email.me>
References: <t249ur$gtd$1@dont-email.me>
<awE*PpxKy@news.chiark.greenend.org.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 1 Apr 2022 20:31:36 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="cf5bad4bbed79fbd2faaefc93cc3f02c";
logging-data="25837"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/IJTb9PNiWh/9TLGPQ59HIASfh6MConVo="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Cancel-Lock: sha1:Jvn1pRbr+CO58ztQ8rRW2Xi5pWs=
In-Reply-To: <awE*PpxKy@news.chiark.greenend.org.uk>
Content-Language: en-US

by: Stephen Fuld - Fri, 1 Apr 2022 20:31 UTC

On 3/31/2022 8:58 AM, Theo Markettos wrote:
> Paul A. Clayton <paaronclayton@gmail.com> wrote:
>> A couple of weeks ago, someone with 20+ years of software
>> experience emailed me asking how to develop an "intuition"
>> or "way of thinking" in computer architecture to "come up
>> with elegant solutions that are implementable and scalable
>> in hardware", "jumping the chasm from absorbing and
>> understanding to being able to contribute and come up with
>> elegant architecture".
>>
>> My response (which I will append at the end of this post)
>> was very weak; perhaps some more useful advice might be
>> offered here on comp.arch.
>
> I think a difference is that in software your foundation is a toolbox
> provided by the architecture. You have *these* tools, and you can achieve
> your desired result if you use tool A then tool B then tool C. There is
> then a lot of scheming that takes place to achieve the desired result based
> on the combination of tools that happen to be in the toolbox.
>
> eg there are lots of virtual memory tricks (demand paging, page compression,
> copy on write) that rely on non-obvious combinations of the MMU page table,
> something cunning in the page fault handler and instruction resumption after
> page faults. Lots of people come up with schemes that fit the same basic
> pattern, because the basic pattern is set up by the architecture.
>
> In hardware you can do *almost anything*, but that anything has an
> implementation cost in performance, area, power, complexity and
> compatibility. You don't need to come up with a cunning combination of
> tools from the toolbox, because you can just add a new one. The problem is
> that it's too expensive to do that for every idea unless the thing you're
> offering has a really good justification or relatively minor costs.
> So you end up needing to make a more generic tool to add to the toolbox than
> a special purpose one for a specific problem.

I am not sure about your answer. The hardware guys have a set of
"tools" but they are different types of gates, storage cells,
interconnect characteristics, etc. They can be combined to create new
combinations i.e. instructions, similarly to the way software combines
various instructions to create new "primitives" i.e. functions.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: Developing a hardware intuition

<bwE*m5DKy@news.chiark.greenend.org.uk>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24558&group=comp.arch#24558

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!nntp.terraraq.uk!nntp-feed.chiark.greenend.org.uk!ewrotcd!.POSTED!not-for-mail
From: theom+n...@chiark.greenend.org.uk (Theo Markettos)
Newsgroups: comp.arch
Subject: Re: Developing a hardware intuition
Date: 01 Apr 2022 23:14:09 +0100 (BST)
Organization: University of Cambridge, England
Lines: 36
Message-ID: <bwE*m5DKy@news.chiark.greenend.org.uk>
References: <t249ur$gtd$1@dont-email.me> <awE*PpxKy@news.chiark.greenend.org.uk> <t27nf8$p7d$1@dont-email.me>
NNTP-Posting-Host: chiark.greenend.org.uk
X-Trace: chiark.greenend.org.uk 1648851251 6237 212.13.197.229 (1 Apr 2022 22:14:11 GMT)
X-Complaints-To: abuse@chiark.greenend.org.uk
NNTP-Posting-Date: Fri, 1 Apr 2022 22:14:11 +0000 (UTC)
User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (Linux/3.16.0-11-amd64 (x86_64))
Originator: theom@chiark.greenend.org.uk ([212.13.197.229])

by: Theo Markettos - Fri, 1 Apr 2022 22:14 UTC

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> wrote:
> I am not sure about your answer. The hardware guys have a set of
> "tools" but they are different types of gates, storage cells,
> interconnect characteristics, etc. They can be combined to create new
> combinations i.e. instructions, similarly to the way software combines
> various instructions to create new "primitives" i.e. functions.

Of course, but you are typically designing hardware at a much higher level
than that. You're writing HDL which gets synthesised down to hardware, and
you typically don't know what kind of hardware they made (unless you take a
deep look).

I suppose the analogy for that would be someone who writes Python:
they're a software developer, but they're nowhere near the compiler so they
don't know what kind of instructions are being generated.

Hardware designers are not choosing what kind of transistor or NOR gate to
use, although they are deciding where to put an SRAM and how wide to make a
bus. If not quite Python, it's more like writing portable C or C++ without
targeting any particular OS or CPU.

The people who worry about transistors are the people who push backend tools
- they're taking the HDL from the hardware designer and making a chip come
out, worry about things like timing margin, clock distribution and power
supplies. They are generally not architects, they are more like human
versions of your C compiler.

(There is also a difference that there is a much stronger focus on
verification, because getting something wrong is very expensive.)

If we are talking about architecture, there are many more degrees of freedom
afforded to the hardware designer on, say, an FPGA, than there are to the
programmer on a CPU. So you are not constrained nearly as much as you are
by an ISA or the architectural features afforded by the current CPU.

Theo

Verification (was: Developing a hardware intuition)

<jwv8rsomoud.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24559&group=comp.arch#24559

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Verification (was: Developing a hardware intuition)
Date: Fri, 01 Apr 2022 18:19:18 -0400
Organization: A noiseless patient Spider
Lines: 10
Message-ID: <jwv8rsomoud.fsf-monnier+comp.arch@gnu.org>
References: <t249ur$gtd$1@dont-email.me>
<awE*PpxKy@news.chiark.greenend.org.uk> <t27nf8$p7d$1@dont-email.me>
<bwE*m5DKy@news.chiark.greenend.org.uk>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="3a5604d31e6d603cd153e068835f67a2";
logging-data="22058"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19StMnbvj9Dr3PcLgbz8Ard"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:uw+R/bL2tu+GjW1BHa+jwWj2d5E=
sha1:MefQlnGQvtmpGsy2E1qJ3Q6wc+c=

by: Stefan Monnier - Fri, 1 Apr 2022 22:19 UTC

> (There is also a difference that there is a much stronger focus on
> verification, because getting something wrong is very expensive.)

Tho, IIUC "verification" in the hardware world refers usually to what
the software world calls "testing". In the software world,
"verification" means the use of exhaustive tools such as theorem provers
or model checkers.

Stefan

Re: Developing a hardware intuition

<5ca19e7c-8ab8-4a43-bcdb-d78179f7ae1cn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24582&group=comp.arch#24582

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:6214:2461:b0:442:6b33:7b61 with SMTP id im1-20020a056214246100b004426b337b61mr276208qvb.57.1649112582787;
Mon, 04 Apr 2022 15:49:42 -0700 (PDT)
X-Received: by 2002:a05:6808:9a9:b0:2ef:9f92:71b2 with SMTP id
e9-20020a05680809a900b002ef9f9271b2mr259506oig.7.1649112582516; Mon, 04 Apr
2022 15:49:42 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 4 Apr 2022 15:49:42 -0700 (PDT)
In-Reply-To: <t25990$kr7$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d1d9:d50a:f13c:423c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d1d9:d50a:f13c:423c
References: <t249ur$gtd$1@dont-email.me> <t24e5g$k2c$1@dont-email.me> <t25990$kr7$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5ca19e7c-8ab8-4a43-bcdb-d78179f7ae1cn@googlegroups.com>
Subject: Re: Developing a hardware intuition
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 04 Apr 2022 22:49:42 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 259

by: MitchAlsup - Mon, 4 Apr 2022 22:49 UTC

On Thursday, March 31, 2022 at 5:17:08 PM UTC-5, BGB wrote:
> On 3/31/2022 9:34 AM, Marcus wrote:
> > On 2022-03-31, Paul A. Clayton wrote:

> Pros/Cons:
> FPGA has far more DSP48's than one actually needs, whereas doing a
> single big multiplier is slow/expensive.
>
> So, instead of say, a big 64-bit multiplier, one might end up with
> something that does, say:
> One 32-bit widening multiply;
> Two 32-bit narrow multiplies (SIMD);
> Four 16-bit multiplies (SIMD).
<
A computer architect understands that one can segment a multiplier tree
by preventing carries from one sub-tree to another. This architect is smart
enough to understand that clipping the carries doe snot even add gates of
delay to the multiplier tree's calculations. So all 3 above multipliers can be
built from one incarnation of a SIMD sliding width multiplier.
>
> Packed integer SIMD might just be normal ALU ops with carry propagation
> disabled.
>
You also have to have a means to clip the carries in the adder that inevitably
follows the tree.
>
> But, yeah, it is still true that one might end up with a few big units
> that most everything else ends up being routed through, with different
> sub-features being enabled or disabled.
>
>
>
> Elsewhere, recently, an argument came up over scaled-index addressing
> (as a hypothetical RISC-V extension), where the assumption seemed to be
> that for an ISA to offer scaled-index at all, then all operations would
> necessarily need to have 3 GPR ports at all times.
<
That is a poor argument,.......proposed by somebody over there.
>
> This isn't really how it works in BJX2 though, since for the 6R3W
> regfile, I usually have the Lane 3 ports sitting around not doing
> anything else, I can leverage these for the ports needed for
> Scaled-Index and similar.
<
Well, that and the index is almost always available on the forwarding path ....
>
> So, really, it would likely only be a significant burden for a 1-wide
> core, which might need a 3R/1W regfile rather than a 2R1W regfile. But,
> then again, as I see it, about as soon as the core is much larger than a
> microcontroller, this whole debate becomes kinda moot (and if it were an
> extension, a microcontroller could choose to omit it).
>
BTW I am getting 1.3-wide issue from a 3R1W RF in My66130 by taking
fuller than standard advantage of other necessities in My 66000.
>
> The index-scale + add doesn't really offer much cost otherwise over
> doing the same thing with an immediate (and the AGU doesn't need to care
> whether the displacement came from a register or immediate).
<
I would argue it is free to add (or insignificant burden)
>
> Or at least not enough to where I will be like, "Yeah, it doesn't matter
> that these 15% or so of Load/Store ops will now need to be expressed as
> multi-instruction sequences". This falls a bit outside my "small enough
> that it doesn't matter" heuristic.
>
Even if the gain is only 2% it is still a win, and it is brain dead easy to AGEN
And after AGEN there is no damage to the rest of the whole cache pipeline.
>
> This is, admittedly, one of the bigger disagreements I had with RISC-V;
> well, along with requiring a full-width multiplier (and a divide
> instruction) for 'M' (as opposed to a narrower widening multiplier).
>
> Say, for example, if we could have 'Mlite' instead, which provides a
> half-width widening multiply (with no divide instruction). The widening
> multiply could be either Signed+Unsigned, or Unsigned-Only (since one
> mostly uses unsigned-multiply to compose a larger multiply).
>
>
> But, why would I argue for scaled-index but against full-width 64-bit
> multiply?:
> Scaled index is cheap and has a visible effect on performance;
> 64-bit multiply is slow, expensive, and rarely used.
<
My 66100 (the smallest possible pipelined implementation) has the
full sized multiplier 64×64 so as to do i× in 4 cycles as well as FMAC
in 4 cycles. The trick, here, is that I run the multiplier pipeline at DIV-2.
When the multiplier is not busy, it can accept an instruction on any
clock, when it is busy multiplying it can accept a multiply every other
clock. This gets rid of either 2048 or 4096 flip-flops in the multiplier
tree (Athloon+Opteron) without losing performance because the
rest of the implementation cannot feed the multiplier continuously.
<
The flip-flops 2 :: 3-input adders cells above the end of the tree
were nearly as large of the multiplier tree itself !!!
>
> A hardware divide instruction could probably help with Dhrystone score,
> but is less relevant in general (Dhrystone seems to significantly
> over-represent the use of integer divide vs normal code).
>
>
> One could (more affordably) provide full-width multiply (and divide) as
> dedicated shift-check-add / shift-check-subtract logic (stalling the
> pipeline until done), but these would be fairly slow.
<
This is standard practice today.
>
> I am half wondering if this is what the 'M' extension assumed (as
> opposed to people being able to fit them into 1..3 pipeline stages).
>
> OTOH: If you have a 3-cycle half-width multiply, or a 64..68-cycle full
> width multiply, faking it in software is faster (well, unless you use an
> emulation trap ISR, then 68 cycles is "almost free" in comparison).
>
> (This algo could likely implement MUL as a hacked variant of the DIV
> sequence, switching the input and output registers, using the "would be"
> output of the DIV as the MUL input, with the accumulator initialized to
> Zero, checking the MSB of the DIV output for whether or not to ADD
> right-shift register to the accumulator, and then using the accumulator
> as the MUL output).
>
> I guess although it would be slow, it could allow for the 'M' extension
> within a reasonable cost window.
>
> ...
>
>
> Well, there was also the 'A' extension, which seems to be built under
> the assumption that one has some sort of multi-core cache-coherency
> protocol (could be faked, but would not behave as expected without this;
> but is mostly N/A for single core).
>
> ...
> > A similar discussion in this forum concerned the differences in hardware
> > and software implementations of reciprocals. E.g. in software it can
> > be beneficial to start off with a narrow word width and work your way
> > up to wider (and more costly) operations as the precision increases
> > after a few Newton-Raphson iterations, whereas in hardware it can be
> > better to just use full width all the way.
> >
> > Thus it is important to construct your hardware design around powerful
> > primitives. A Fused Multiply-Add unit is a good example of a primitive
> > that can be used for many different purposes.
> >
> Granted.
>
>
> I had considered switching to an FMAC design (in my BJX2 core), but cost
> and timing tradeoffs favored sticking with an FMUL and FADD unit; which
> may be internally "glued" to mimic an FMAC operation, albeit with some
> numerical restrictions (a true FMAC would implement some calculations
> which can't be emulated with split units because of "bits falling off
> the bottom").
>
> But, to get these properties with an FMAC unit would require an ~ 108
> bit or so mantissa to be used internally (and/or split units which
> internally operate on full-precision Binary128).
>
> But, even S.E15.F80 (Float96, *) was seriously pushing it...
>
> *: This format was (in storage) the same as Binary128, except that the
> low-order 32 bits were ignored on inputs (treated as if they only
> contained zeroes), and were filled with zeroes on output. This being
> because the relative cost difference between an 80-bit mantissa and
> 112-bit mantissa (for FMUL purposes) is quite significant.
>
>
> Granted, if one had such an FMAC unit, they could leverage it for a
> 64-bit integer multiplier or integer MAC (feeding integers as FP inputs
> isn't that hard to pull off, and the integer-output logic likely also
> exists if the FADD mechanism is being leveraged for FP->Int conversions).
>
> The intermediate option being, say, to multiply two F80 inputs producing
> a F144 intermediate output (vs F160), using the F144 to produce a
> 128-bit integer MUL/MAC output (for integer MAC, it would make sense to
> support the 3rd input as either Float96 or Int128).
>
> ...
>
>
> In theory, if FDIV exists, IDIV could be routed through the same logic
> in this case (as with the above). Doesn't provide a good way to provide
> for an integer Modulo/Remainder instruction though.
>
> ...
>
>
>
>
> Elsewhere (I am still debating it), is whether my BJX2 project needs a
> "real" OS port, or whether I am better off (for now) still hacking
> features onto TestKern and going in an "OS like" direction with it.
>
> I had looked into it, and realized that porting the kernel would
> actually be the lesser of the issues, and that the "hard part" would be
> trying to port over the userland (whether I went for either the GNU or
> BSD userland).
>
>
> More so, it would likely also require either modifying GCC to be able to
> target BJX2, or making be BGBCC able mimic the external interface
> expected for cross-compilers, ...
>
> Eg:
> 'bjx2-testkern-cc' (likewise for as, 'ld', ...);
> Producing '.o' files;
> Accepting '.o' and '.a' files;
> Ability to 'ar' multiple '.o' files into a '.a' file;
> ...
>
> Since this is what things like 'autoconf' and similar expect (well, as
> opposed to a tool which uses a CLI more like that used by MSVC...).
>
> Luckily, at least, these tools don't really care that much what the '.o'
> files contain (so, eg, it can be RIL rather than a COFF or ELF object).
>
> ...

Click here to read the complete article

Re: Verification (was: Developing a hardware intuition)

<099b0bd0-62a4-4fdd-9d18-a86126aec6c3n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24583&group=comp.arch#24583

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5a95:0:b0:2e2:e4f:63c with SMTP id c21-20020ac85a95000000b002e20e4f063cmr566945qtc.537.1649112768650;
Mon, 04 Apr 2022 15:52:48 -0700 (PDT)
X-Received: by 2002:a9d:644b:0:b0:5cd:a627:c439 with SMTP id
m11-20020a9d644b000000b005cda627c439mr205332otl.112.1649112768355; Mon, 04
Apr 2022 15:52:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 4 Apr 2022 15:52:48 -0700 (PDT)
In-Reply-To: <jwv8rsomoud.fsf-monnier+comp.arch@gnu.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d1d9:d50a:f13c:423c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d1d9:d50a:f13c:423c
References: <t249ur$gtd$1@dont-email.me> <awE*PpxKy@news.chiark.greenend.org.uk>
<t27nf8$p7d$1@dont-email.me> <bwE*m5DKy@news.chiark.greenend.org.uk> <jwv8rsomoud.fsf-monnier+comp.arch@gnu.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <099b0bd0-62a4-4fdd-9d18-a86126aec6c3n@googlegroups.com>
Subject: Re: Verification (was: Developing a hardware intuition)
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 04 Apr 2022 22:52:48 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Mon, 4 Apr 2022 22:52 UTC

On Friday, April 1, 2022 at 5:19:21 PM UTC-5, Stefan Monnier wrote:
> > (There is also a difference that there is a much stronger focus on
> > verification, because getting something wrong is very expensive.)
> Tho, IIUC "verification" in the hardware world refers usually to what
> the software world calls "testing". In the software world,
> "verification" means the use of exhaustive tools such as theorem provers
> or model checkers.
<
Err, no. HW verification is a suite of tests upon HW carried out with the
hope that there is never a need for a field update (Pentium FDVI bug)
<
Essentially no SW ever gets tested to this degree of correctness.
>
>
> Stefan

Re: Developing a hardware intuition

<48bc5b42-917b-471e-9406-3b136b20c8c6n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24584&group=comp.arch#24584

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:5bc1:0:b0:42c:3700:a6df with SMTP id t1-20020ad45bc1000000b0042c3700a6dfmr565323qvt.94.1649113529429;
Mon, 04 Apr 2022 16:05:29 -0700 (PDT)
X-Received: by 2002:a9d:6e89:0:b0:5b2:4c01:2210 with SMTP id
a9-20020a9d6e89000000b005b24c012210mr215897otr.85.1649113529119; Mon, 04 Apr
2022 16:05:29 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 4 Apr 2022 16:05:28 -0700 (PDT)
In-Reply-To: <bwE*m5DKy@news.chiark.greenend.org.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d1d9:d50a:f13c:423c;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d1d9:d50a:f13c:423c
References: <t249ur$gtd$1@dont-email.me> <awE*PpxKy@news.chiark.greenend.org.uk>
<t27nf8$p7d$1@dont-email.me> <bwE*m5DKy@news.chiark.greenend.org.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <48bc5b42-917b-471e-9406-3b136b20c8c6n@googlegroups.com>
Subject: Re: Developing a hardware intuition
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 04 Apr 2022 23:05:29 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 78

by: MitchAlsup - Mon, 4 Apr 2022 23:05 UTC

On Friday, April 1, 2022 at 5:14:14 PM UTC-5, Theo Markettos wrote:
> Stephen Fuld <sf...@alumni.cmu.edu.invalid> wrote:
> > I am not sure about your answer. The hardware guys have a set of
> > "tools" but they are different types of gates, storage cells,
> > interconnect characteristics, etc. They can be combined to create new
> > combinations i.e. instructions, similarly to the way software combines
> > various instructions to create new "primitives" i.e. functions.
> Of course, but you are typically designing hardware at a much higher level
> than that. You're writing HDL which gets synthesised down to hardware, and
> you typically don't know what kind of hardware they made (unless you take a
> deep look).
>
> I suppose the analogy for that would be someone who writes Python:
> they're a software developer, but they're nowhere near the compiler so they
> don't know what kind of instructions are being generated.
>
> Hardware designers are not choosing what kind of transistor or NOR gate to
> use, although they are deciding where to put an SRAM and how wide to make a
> bus. If not quite Python, it's more like writing portable C or C++ without
> targeting any particular OS or CPU.
>
> The people who worry about transistors are the people who push backend tools
> - they're taking the HDL from the hardware designer and making a chip come
> out, worry about things like timing margin, clock distribution and power
> supplies. They are generally not architects, they are more like human
> versions of your C compiler.
<
I submit that a "Real Computer Architect" must have an understanding that
traverses the field from::
Spice level modeling
Layout,
Gate design
Bus design
Select line design
State machine design
Calculation unit design,
Block design
Integration
Verification
Block management
Group management
Project management
Documentation design
Documentation update
HDL design
HDL coding
HDL integration
Software modeling
Compiler design
Reading of assembler code from a variety of compilers
<
And once they have spent 1 year writing code generation inside a compiler
and have been exposed to all of the list above--then they are in a position
to start being an apprentice to a computer architect.
<
The computer architect must be able to understand his design well enough
such that when [s]he enters a meeting and there is a problem between the gate
designer and a std-cell designer, he is in a position to understand the arguments,
and make suggestive comments to drive the design forward.
<
The computer architect does NOT throw a design over the cubical wall
and expect good things to happen. He has to own every nuance and aspect
of the design, he may find himself in a position where some aspect of what
was proposed simply cannot work, and he has to fix the bad concept OVERNIGHT
and get the team back working the next day--even if he has to stay up all night
getting it fixed, documented, complete with drawings of what needs to change.
>
> (There is also a difference that there is a much stronger focus on
> verification, because getting something wrong is very expensive.)
>
> If we are talking about architecture, there are many more degrees of freedom
> afforded to the hardware designer on, say, an FPGA, than there are to the
> programmer on a CPU. So you are not constrained nearly as much as you are
> by an ISA or the architectural features afforded by the current CPU.
>
> Theo
<
But I also admit that much of what Theo wrote is dead accurate--much to the
detriment of the state of computers today.

Re: Verification

<jwvtub88n1w.fsf-monnier+comp.arch@gnu.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24586&group=comp.arch#24586

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: monn...@iro.umontreal.ca (Stefan Monnier)
Newsgroups: comp.arch
Subject: Re: Verification
Date: Mon, 04 Apr 2022 19:15:41 -0400
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <jwvtub88n1w.fsf-monnier+comp.arch@gnu.org>
References: <t249ur$gtd$1@dont-email.me>
<awE*PpxKy@news.chiark.greenend.org.uk> <t27nf8$p7d$1@dont-email.me>
<bwE*m5DKy@news.chiark.greenend.org.uk>
<jwv8rsomoud.fsf-monnier+comp.arch@gnu.org>
<099b0bd0-62a4-4fdd-9d18-a86126aec6c3n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="a0362c2c6372b8cb5127af9873c26aef";
logging-data="23304"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19uNj6llu+4QUWO/0iTNumF"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
Cancel-Lock: sha1:tZh7pfEwgWOqu00zLmvexv4ynp4=
sha1:EVttP36CrvDzQTr3draG8tdUSkg=

by: Stefan Monnier - Mon, 4 Apr 2022 23:15 UTC

>> > (There is also a difference that there is a much stronger focus on
>> > verification, because getting something wrong is very expensive.)
>> Tho, IIUC "verification" in the hardware world refers usually to what
>> the software world calls "testing". In the software world,
>> "verification" means the use of exhaustive tools such as theorem provers
>> or model checkers.
> Err, no. HW verification is a suite of tests upon HW carried out with the
> hope that there is never a need for a field update (Pentium FDVI bug)
> Essentially no SW ever gets tested to this degree of correctness.

AFAIK it's not exhaustive, so in software terms it falls in the camp of
"testing" rather than "verification".

I know it's typically a much more extensive test suite than what is used
for software, and I also understand that "exhaustive" can make sense
only if you limit it by making assumptions about your hardware
(typically that starts by pretending the world is not analog).

It's not a criticism, rather a clarification about the kind of
tools used since verification (in the software-sense of the word) is
done using fundamentally very different tools than testing.

Stefan

Re: Developing a hardware intuition

<t2g584$vil$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24588&group=comp.arch#24588

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Developing a hardware intuition
Date: Mon, 4 Apr 2022 18:15:48 -0700
Organization: A noiseless patient Spider
Lines: 92
Message-ID: <t2g584$vil$1@dont-email.me>
References: <t249ur$gtd$1@dont-email.me>
<awE*PpxKy@news.chiark.greenend.org.uk> <t27nf8$p7d$1@dont-email.me>
<bwE*m5DKy@news.chiark.greenend.org.uk>
<48bc5b42-917b-471e-9406-3b136b20c8c6n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 5 Apr 2022 01:15:48 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="33f7bbcf82fac3d87c7090c32472634e";
logging-data="32341"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+A9ow8Z0CZdf/00Ud+ZzlO"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.7.0
Cancel-Lock: sha1:tQlBoj4z/dfP3ax5jyDOQW9IBaI=
In-Reply-To: <48bc5b42-917b-471e-9406-3b136b20c8c6n@googlegroups.com>
Content-Language: en-US

by: Ivan Godard - Tue, 5 Apr 2022 01:15 UTC

On 4/4/2022 4:05 PM, MitchAlsup wrote:
> On Friday, April 1, 2022 at 5:14:14 PM UTC-5, Theo Markettos wrote:
>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> wrote:
>>> I am not sure about your answer. The hardware guys have a set of
>>> "tools" but they are different types of gates, storage cells,
>>> interconnect characteristics, etc. They can be combined to create new
>>> combinations i.e. instructions, similarly to the way software combines
>>> various instructions to create new "primitives" i.e. functions.
>> Of course, but you are typically designing hardware at a much higher level
>> than that. You're writing HDL which gets synthesised down to hardware, and
>> you typically don't know what kind of hardware they made (unless you take a
>> deep look).
>>
>> I suppose the analogy for that would be someone who writes Python:
>> they're a software developer, but they're nowhere near the compiler so they
>> don't know what kind of instructions are being generated.
>>
>> Hardware designers are not choosing what kind of transistor or NOR gate to
>> use, although they are deciding where to put an SRAM and how wide to make a
>> bus. If not quite Python, it's more like writing portable C or C++ without
>> targeting any particular OS or CPU.
>>
>> The people who worry about transistors are the people who push backend tools
>> - they're taking the HDL from the hardware designer and making a chip come
>> out, worry about things like timing margin, clock distribution and power
>> supplies. They are generally not architects, they are more like human
>> versions of your C compiler.
> <
> I submit that a "Real Computer Architect" must have an understanding that
> traverses the field from::
> Spice level modeling
> Layout,
> Gate design
> Bus design
> Select line design
> State machine design
> Calculation unit design,
> Block design
> Integration
> Verification
> Block management
> Group management
> Project management
> Documentation design
> Documentation update
> HDL design
> HDL coding
> HDL integration
> Software modeling
> Compiler design
> Reading of assembler code from a variety of compilers

By this list I am not a Real Computer Architect (IANACA). Which I am
happy to concede.

However, the definition uses the singular article "a", implying that
there quality of Computer Architect is necessarily embodied in a single
person. However, that is false: Computer Architect may be a composite
form or two or more persons, joined at the hip, who together present
your list - *and can interoperate well enough to fill each other's lacunae*.

> And once they have spent 1 year writing code generation inside a compiler
> and have been exposed to all of the list above--then they are in a position
> to start being an apprentice to a computer architect.
> <
> The computer architect must be able to understand his design well enough
> such that when [s]he enters a meeting and there is a problem between the gate
> designer and a std-cell designer, he is in a position to understand the arguments,
> and make suggestive comments to drive the design forward.
> <
> The computer architect does NOT throw a design over the cubical wall
> and expect good things to happen. He has to own every nuance and aspect
> of the design, he may find himself in a position where some aspect of what
> was proposed simply cannot work, and he has to fix the bad concept OVERNIGHT
> and get the team back working the next day--even if he has to stay up all night
> getting it fixed, documented, complete with drawings of what needs to change.
>>
>> (There is also a difference that there is a much stronger focus on
>> verification, because getting something wrong is very expensive.)
>>
>> If we are talking about architecture, there are many more degrees of freedom
>> afforded to the hardware designer on, say, an FPGA, than there are to the
>> programmer on a CPU. So you are not constrained nearly as much as you are
>> by an ISA or the architectural features afforded by the current CPU.
>>
>> Theo
> <
> But I also admit that much of what Theo wrote is dead accurate--much to the
> detriment of the state of computers today.

Re: Developing a hardware intuition

<t2gg9p$6fe$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24589&group=comp.arch#24589

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Developing a hardware intuition
Date: Mon, 4 Apr 2022 23:24:20 -0500
Organization: A noiseless patient Spider
Lines: 241
Message-ID: <t2gg9p$6fe$1@dont-email.me>
References: <t249ur$gtd$1@dont-email.me> <t24e5g$k2c$1@dont-email.me>
<t25990$kr7$1@dont-email.me>
<5ca19e7c-8ab8-4a43-bcdb-d78179f7ae1cn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 5 Apr 2022 04:24:25 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="4cabf26ae8ee01b54c9919950aaa7f73";
logging-data="6638"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19KRTdbpqWsbL4H0dgvjt9H"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Cancel-Lock: sha1:mRxvGmq8n3NlS4iFgDsOJNLN5lk=
In-Reply-To: <5ca19e7c-8ab8-4a43-bcdb-d78179f7ae1cn@googlegroups.com>
Content-Language: en-US

by: BGB - Tue, 5 Apr 2022 04:24 UTC

On 4/4/2022 5:49 PM, MitchAlsup wrote:
> On Thursday, March 31, 2022 at 5:17:08 PM UTC-5, BGB wrote:
>> On 3/31/2022 9:34 AM, Marcus wrote:
>>> On 2022-03-31, Paul A. Clayton wrote:
>
>> Pros/Cons:
>> FPGA has far more DSP48's than one actually needs, whereas doing a
>> single big multiplier is slow/expensive.
>>
>> So, instead of say, a big 64-bit multiplier, one might end up with
>> something that does, say:
>> One 32-bit widening multiply;
>> Two 32-bit narrow multiplies (SIMD);
>> Four 16-bit multiplies (SIMD).
> <
> A computer architect understands that one can segment a multiplier tree
> by preventing carries from one sub-tree to another. This architect is smart
> enough to understand that clipping the carries doe snot even add gates of
> delay to the multiplier tree's calculations. So all 3 above multipliers can be
> built from one incarnation of a SIMD sliding width multiplier.

Yeah. These can be done on an FPGA mostly by reorganizing inputs/outputs
to the DSP's and skipping adders.

Nevermind if the FPGA provides way more DSPs than I actually need, so
their isn't a huge incentive to conserve them. Big limiting factor is
more timing and the latency of the adder trees.

>>
>> Packed integer SIMD might just be normal ALU ops with carry propagation
>> disabled.
>>
> You also have to have a means to clip the carries in the adder that inevitably
> follows the tree.

For ALU, I did carry select, with SIMD following behaving as if the
carries were all 0 (or 1 for subtract).

>>
>> But, yeah, it is still true that one might end up with a few big units
>> that most everything else ends up being routed through, with different
>> sub-features being enabled or disabled.
>>
>>
>>
>> Elsewhere, recently, an argument came up over scaled-index addressing
>> (as a hypothetical RISC-V extension), where the assumption seemed to be
>> that for an ISA to offer scaled-index at all, then all operations would
>> necessarily need to have 3 GPR ports at all times.
> <
> That is a poor argument,.......proposed by somebody over there.

Pretty much.

One drawback of RISC-V is that many of its supporters seemingly get a
bit cultish, and part of their religion is apparently believing that:
Scaled index is expensive
It is not, more so if compared with other stuff in the ISA.
It offers no advantage over faking it using ALU ops.
I also disagree, from my own experience.

>>
>> This isn't really how it works in BJX2 though, since for the 6R3W
>> regfile, I usually have the Lane 3 ports sitting around not doing
>> anything else, I can leverage these for the ports needed for
>> Scaled-Index and similar.
> <
> Well, that and the index is almost always available on the forwarding path ...

Possibly.

>>
>> So, really, it would likely only be a significant burden for a 1-wide
>> core, which might need a 3R/1W regfile rather than a 2R1W regfile. But,
>> then again, as I see it, about as soon as the core is much larger than a
>> microcontroller, this whole debate becomes kinda moot (and if it were an
>> extension, a microcontroller could choose to omit it).
>>
> BTW I am getting 1.3-wide issue from a 3R1W RF in My66130 by taking
> fuller than standard advantage of other necessities in My 66000.

This is harder if using hard-wired port assignments.

In my case, the shared ports are effectively duplicated across the
relevant lanes, and where the register number/... comes from is left up
to the instruction decoder.

So, say:
1-wide: 3R1W
2-wide: 4R2W | 5R2W
3-wide: 6R3W

1-wide is cheap-ish, 2-wide a bit more, and 3-wide slightly more than
2-wide.

However, trying to go wider than 3, the situation seems to become "no
longer favorable".

Ironically, the Control Registers are a lot simpler (~ 1R1W), but due to
being mostly implemented using flip-flops and routed all over the place,
they seem to be a fair bit more expensive.

If I were doing it again, might try to come up with a way to reduce the
number or CRs and SPRs. But, the limitation is mostly how to best deal
with registers which don't really behave like normal GPRs (such as those
which are directly effected by or used to control architectural state).

Though, the main alternative used by SH and MSP430, namely putting a lot
of this stuff into MMIO, is not exactly all that likely to be much
better on this front (I didn't really feel that CPU internal state
belongs in MMIO).

>>
>> The index-scale + add doesn't really offer much cost otherwise over
>> doing the same thing with an immediate (and the AGU doesn't need to care
>> whether the displacement came from a register or immediate).
> <
> I would argue it is free to add (or insignificant burden)

Yeah.

Main relative difference here is "simple adder", vs "adder with ability
to shift the index by 0..3 bits".

>>
>> Or at least not enough to where I will be like, "Yeah, it doesn't matter
>> that these 15% or so of Load/Store ops will now need to be expressed as
>> multi-instruction sequences". This falls a bit outside my "small enough
>> that it doesn't matter" heuristic.
>>
> Even if the gain is only 2% it is still a win, and it is brain dead easy to AGEN
> And after AGEN there is no damage to the rest of the whole cache pipeline.

Pretty much.

This seems like a bit much of a hit to take for sake of philosophy or
aesthetic (or for keeping the listing "as small as possible").

Maybe makes sense for a microcontroller, but by the time one has an FPU
and similar, not having scaled-index no longer really makes much sense IMO.

>>
>> This is, admittedly, one of the bigger disagreements I had with RISC-V;
>> well, along with requiring a full-width multiplier (and a divide
>> instruction) for 'M' (as opposed to a narrower widening multiplier).
>>
>> Say, for example, if we could have 'Mlite' instead, which provides a
>> half-width widening multiply (with no divide instruction). The widening
>> multiply could be either Signed+Unsigned, or Unsigned-Only (since one
>> mostly uses unsigned-multiply to compose a larger multiply).
>>
>>
>> But, why would I argue for scaled-index but against full-width 64-bit
>> multiply?:
>> Scaled index is cheap and has a visible effect on performance;
>> 64-bit multiply is slow, expensive, and rarely used.
> <
> My 66100 (the smallest possible pipelined implementation) has the
> full sized multiplier 64×64 so as to do i× in 4 cycles as well as FMAC
> in 4 cycles. The trick, here, is that I run the multiplier pipeline at DIV-2.
> When the multiplier is not busy, it can accept an instruction on any
> clock, when it is busy multiplying it can accept a multiply every other
> clock. This gets rid of either 2048 or 4096 flip-flops in the multiplier
> tree (Athloon+Opteron) without losing performance because the
> rest of the implementation cannot feed the multiplier continuously.
> <
> The flip-flops 2 :: 3-input adders cells above the end of the tree
> were nearly as large of the multiplier tree itself !!!

OK.

>>
>> A hardware divide instruction could probably help with Dhrystone score,
>> but is less relevant in general (Dhrystone seems to significantly
>> over-represent the use of integer divide vs normal code).
>>
>>
>> One could (more affordably) provide full-width multiply (and divide) as
>> dedicated shift-check-add / shift-check-subtract logic (stalling the
>> pipeline until done), but these would be fairly slow.
> <
> This is standard practice today.

Apparently, some people are providing a pipelined small multiplier
(MULW), while using Shift-Add/Shift-Sub unit for 64-bit MUL/DIV.

I guess, comparing ISAs here (RISC-V vs BJX2):
MUL / MULS.Q (UI,1)
NE / MULU.Q (UI,1)
MULH / NE,1 (64x64->64 High-Bits Signed)
MULHU / NE,1 (64x64->64 High-Bits Unsigned)
MULW / MULS.L (32*32->32, Sign-Extend)
NE / MULU.L (32*32->32, Zero-Extend)
NE / DMULS.L (32*32->64, Signed)
NE / DMULU.L (32*32->64, Unsigned)

DIV / NE,1
DIVU / NE,1
DIVW / NE,1
DIVUW / NE,1

NE: Non-Encodable / Not-Defined
UI: Encodings defined, but unimplemented.

1: These cases could be mapped to a Slow Shift-Add/Sub unit.

But, I guess maybe it could make sense to add a slow MUL/DIV unit (could
also allow the dual-ISA mode to support the 'M' extension).

The 'F' and 'D' extensions would still be a problem given the BJX2 FPU
is significantly different from the RISC-V FPU.

Click here to read the complete article

Re: Verification

<t2gpug$1snm$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24592&group=comp.arch#24592

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!EhtdJS5E9ITDZpJm3Uerlg.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Verification
Date: Tue, 5 Apr 2022 09:09:07 +0200
Organization: Aioe.org NNTP Server
Message-ID: <t2gpug$1snm$1@gioia.aioe.org>
References: <t249ur$gtd$1@dont-email.me>
<awE*PpxKy@news.chiark.greenend.org.uk> <t27nf8$p7d$1@dont-email.me>
<bwE*m5DKy@news.chiark.greenend.org.uk>
<jwv8rsomoud.fsf-monnier+comp.arch@gnu.org>
<099b0bd0-62a4-4fdd-9d18-a86126aec6c3n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="62198"; posting-host="EhtdJS5E9ITDZpJm3Uerlg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
Firefox/68.0 SeaMonkey/2.53.11.1
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Tue, 5 Apr 2022 07:09 UTC

MitchAlsup wrote:
> On Friday, April 1, 2022 at 5:19:21 PM UTC-5, Stefan Monnier wrote:
>>> (There is also a difference that there is a much stronger focus on
>>> verification, because getting something wrong is very expensive.)
>> Tho, IIUC "verification" in the hardware world refers usually to what
>> the software world calls "testing". In the software world,
>> "verification" means the use of exhaustive tools such as theorem provers
>> or model checkers.
> <
> Err, no. HW verification is a suite of tests upon HW carried out with the
> hope that there is never a need for a field update (Pentium FDVI bug)
> <
> Essentially no SW ever gets tested to this degree of correctness.

And we can still get effects like the RowHammer attack where a
sufficiently non-random pattern of writes to a DRAM array can spill over
into neighboring cells!

I.e. it is probably impossible to detect all possible analog
interactions of this type, using signal propagation patterns as a GHz
scale radio transmitter, even without using DRAM for last level cache.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Verification (was: Developing a hardware intuition)

<jb32ciF5dc2U1@mid.individual.net>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24597&group=comp.arch#24597

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch
Subject: Re: Verification (was: Developing a hardware intuition)
Date: Tue, 5 Apr 2022 17:34:58 +0300
Organization: Tidorum Ltd
Lines: 43
Message-ID: <jb32ciF5dc2U1@mid.individual.net>
References: <t249ur$gtd$1@dont-email.me>
<awE*PpxKy@news.chiark.greenend.org.uk> <t27nf8$p7d$1@dont-email.me>
<bwE*m5DKy@news.chiark.greenend.org.uk>
<jwv8rsomoud.fsf-monnier+comp.arch@gnu.org>
<099b0bd0-62a4-4fdd-9d18-a86126aec6c3n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net KRRDP63sE0umR293/YfIIgE/n0PXubzo/2MSvLiNbE5lNMLGJY
Cancel-Lock: sha1:/NmABwO7YXlf7IMWwotDM3yNSKU=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:91.0)
Gecko/20100101 Thunderbird/91.6.2
Content-Language: en-US
In-Reply-To: <099b0bd0-62a4-4fdd-9d18-a86126aec6c3n@googlegroups.com>

by: Niklas Holsti - Tue, 5 Apr 2022 14:34 UTC

On 2022-04-05 1:52, MitchAlsup wrote:
> On Friday, April 1, 2022 at 5:19:21 PM UTC-5, Stefan Monnier wrote:
>>> (There is also a difference that there is a much stronger focus on
>>> verification, because getting something wrong is very expensive.)
>> Tho, IIUC "verification" in the hardware world refers usually to what
>> the software world calls "testing". In the software world,
>> "verification" means the use of exhaustive tools such as theorem provers
>> or model checkers.
> <
> Err, no. HW verification is a suite of tests upon HW carried out with the
> hope that there is never a need for a field update (Pentium FDVI bug)
> <

Perhaps you see "verification" as different from "formal verification",
which is what model-checkers and theorem-provers do?

Intel and others have applied formal verification to HW since the FDIV
bug, according to Wikipedia:

"In the aftermath of the bug and subsequent recall, there was a marked
increase in the use of formal verification of hardware floating point
operations across the semiconductor industry. Prompted by the discovery
of the bug, a technique applicable to the SRT algorithm called
"word-level model checking" was developed in 1996. Intel went on to use
formal verification extensively in the development of later CPU
architectures. In the development of the Pentium 4, symbolic trajectory
evaluation and theorem proving were used to find a number of bugs that
could have led to a similar recall incident had they gone undetected.
The first Intel microarchitecture to use formal verification as the
primary method of validation was Nehalem, developed in 2008."

(https://en.wikipedia.org/wiki/Pentium_FDIV_bug)

> Essentially no SW ever gets tested to this degree of correctness.

Certainly some SW is so tested and verified -- say, SW for nuclear
reactor control. And AIUI, formal verification of SW is increasing in
all high-reliability domains, much thanks to the slow but continuing
improvement of SW tools and computing power for eg. exhaustive model
checking.

Re: Verification

<808ad3b7-f8cc-4f61-be7e-332aed41a5a8n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=24599&group=comp.arch#24599

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:31a0:b0:67d:7500:1752 with SMTP id bi32-20020a05620a31a000b0067d75001752mr2698345qkb.485.1649174216631;
Tue, 05 Apr 2022 08:56:56 -0700 (PDT)
X-Received: by 2002:a05:6830:2708:b0:5cd:a8c2:7665 with SMTP id
j8-20020a056830270800b005cda8c27665mr1466142otu.144.1649174216348; Tue, 05
Apr 2022 08:56:56 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 5 Apr 2022 08:56:56 -0700 (PDT)
In-Reply-To: <t2gpug$1snm$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:8da5:ea99:2f0b:302;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:8da5:ea99:2f0b:302
References: <t249ur$gtd$1@dont-email.me> <awE*PpxKy@news.chiark.greenend.org.uk>
<t27nf8$p7d$1@dont-email.me> <bwE*m5DKy@news.chiark.greenend.org.uk>
<jwv8rsomoud.fsf-monnier+comp.arch@gnu.org> <099b0bd0-62a4-4fdd-9d18-a86126aec6c3n@googlegroups.com>
<t2gpug$1snm$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <808ad3b7-f8cc-4f61-be7e-332aed41a5a8n@googlegroups.com>
Subject: Re: Verification
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 05 Apr 2022 15:56:56 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 28

by: MitchAlsup - Tue, 5 Apr 2022 15:56 UTC

On Tuesday, April 5, 2022 at 2:09:07 AM UTC-5, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Friday, April 1, 2022 at 5:19:21 PM UTC-5, Stefan Monnier wrote:
> >>> (There is also a difference that there is a much stronger focus on
> >>> verification, because getting something wrong is very expensive.)
> >> Tho, IIUC "verification" in the hardware world refers usually to what
> >> the software world calls "testing". In the software world,
> >> "verification" means the use of exhaustive tools such as theorem provers
> >> or model checkers.
> > <
> > Err, no. HW verification is a suite of tests upon HW carried out with the
> > hope that there is never a need for a field update (Pentium FDVI bug)
> > <
> > Essentially no SW ever gets tested to this degree of correctness.
> And we can still get effects like the RowHammer attack where a
> sufficiently non-random pattern of writes to a DRAM array can spill over
> into neighboring cells!
<
Yes, but I have a solution to this that I can't talk about just yet.
>
> I.e. it is probably impossible to detect all possible analog
> interactions of this type, using signal propagation patterns as a GHz
> scale radio transmitter, even without using DRAM for last level cache.
>
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

server_pubkey.txt

rocksolid light 0.9.81
clearnet tor