novaBBS - comp.arch - Re: The Tera MTA

Re: The Tera MTA

<ee9dfec5-ca8c-439f-80e3-e922f35d20aen@googlegroups.com>

https://www.novabbs.com/devel/article-flat.php?id=18898&group=comp.arch#18898

X-Received: by 2002:ad4:5d62:: with SMTP id fn2mr21695513qvb.61.1626642031509;
Sun, 18 Jul 2021 14:00:31 -0700 (PDT)
X-Received: by 2002:aca:4946:: with SMTP id w67mr13251183oia.155.1626642031327;
Sun, 18 Jul 2021 14:00:31 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!4.us.feeder.erje.net!2.eu.feeder.erje.net!feeder.erje.net!fdn.fr!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 18 Jul 2021 14:00:31 -0700 (PDT)
In-Reply-To: <2021Jul18.184756@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa3c:a000:6437:2f3c:fd50:a217;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa3c:a000:6437:2f3c:fd50:a217
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<scuq5c$h7h$1@dont-email.me> <2021Jul18.184756@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ee9dfec5-ca8c-439f-80e3-e922f35d20aen@googlegroups.com>
Subject: Re: The Tera MTA
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sun, 18 Jul 2021 21:00:31 +0000
Content-Type: text/plain; charset="UTF-8"

by: Quadibloc - Sun, 18 Jul 2021 21:00 UTC

On Sunday, July 18, 2021 at 11:44:39 AM UTC-6, Anton Ertl wrote:

> It seems to me that Oracle was quite patient with SPARC, which had
> been losing ground to Intel and AMD for a long time.

And, from the article cited, while it has reassured its customers that
they won't have to switch from SPARC until at least 2034, it looks as
if Oracle is going to do a Unisys - future SPARC mainframes of theirs
will just use x86 processors using advanced JIT emulation to allow
legacy workloads to persist.

John Savard

Re: The Tera MTA

<84d4d0e4-e1cd-4826-8370-e08c9e3546fdn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18899&group=comp.arch#18899

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:2f5:: with SMTP id a21mr12106118qko.36.1626642349346; Sun, 18 Jul 2021 14:05:49 -0700 (PDT)
X-Received: by 2002:a9d:7353:: with SMTP id l19mr13344421otk.76.1626642349112; Sun, 18 Jul 2021 14:05:49 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 18 Jul 2021 14:05:48 -0700 (PDT)
In-Reply-To: <sd1hps$2es$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa3c:a000:6437:2f3c:fd50:a217; posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa3c:a000:6437:2f3c:fd50:a217
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com> <127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com> <scv1hp$gh9$1@dont-email.me> <1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com> <scvp1v$dgh$1@dont-email.me> <sd0kkg$jcf$1@dont-email.me> <05e6b305-5cae-4419-b33b-52b5c080621fn@googlegroups.com> <sd1hps$2es$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <84d4d0e4-e1cd-4826-8370-e08c9e3546fdn@googlegroups.com>
Subject: Re: The Tera MTA
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Sun, 18 Jul 2021 21:05:49 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 28

by: Quadibloc - Sun, 18 Jul 2021 21:05 UTC

On Sunday, July 18, 2021 at 9:36:30 AM UTC-6, Stephen Fuld wrote:
> On 7/18/2021 7:53 AM, Quadibloc wrote:

> > Wouldn't saving all the registers to memory on each clock cycle
> > defeat the purpose of hiding memory latency?

> Perhaps you are misunderstanding how the Tera MTA worked.

That wasn't the problem.

The point I was making was that the use of the term "context switch"
was extremely confusing. A context switch is what happens on a
computer when there's an interrupt or a subroutine is called: the
registers all get saved to memory.

What happens not only on the Tera MTA, but on any Intel processor
with Hyper-Threading, is quite different; the processor moves from
one thread to another thread, switching to the alternate set of registers
allocated to that thread.

The old thread retains its context for the next time the processor
goes back to it.

So even if, technically, the switching between register sets in a
multithreaded processor could be considered a _form_ of
context switch, using that phrase to describe it will create an
erroneous picture in people's minds, causing confusion.

John Savard

Re: The Tera MTA

<5bf369a7-0fa1-4e13-8d80-ba07f05adbd6n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18901&group=comp.arch#18901

copy link Newsgroups: comp.arch

X-Received: by 2002:ac8:5991:: with SMTP id e17mr28849qte.265.1626642526173; Sun, 18 Jul 2021 14:08:46 -0700 (PDT)
X-Received: by 2002:a05:6808:1455:: with SMTP id x21mr20191154oiv.51.1626642526007; Sun, 18 Jul 2021 14:08:46 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 18 Jul 2021 14:08:45 -0700 (PDT)
In-Reply-To: <54348333-d4b7-482e-9804-fda9777808a7n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:c414:6046:92b1:a701; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:c414:6046:92b1:a701
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com> <127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com> <scv1hp$gh9$1@dont-email.me> <1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com> <scvp1v$dgh$1@dont-email.me> <sd0kkg$jcf$1@dont-email.me> <05e6b305-5cae-4419-b33b-52b5c080621fn@googlegroups.com> <06a1b3d0-4830-46c0-b555-ae0c41ee55f5n@googlegroups.com> <54348333-d4b7-482e-9804-fda9777808a7n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5bf369a7-0fa1-4e13-8d80-ba07f05adbd6n@googlegroups.com>
Subject: Re: The Tera MTA
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 18 Jul 2021 21:08:46 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 15

by: MitchAlsup - Sun, 18 Jul 2021 21:08 UTC

On Sunday, July 18, 2021 at 3:58:26 PM UTC-5, Quadibloc wrote:
> On Sunday, July 18, 2021 at 10:49:32 AM UTC-6, MitchAlsup wrote:
> > On Sunday, July 18, 2021 at 9:53:37 AM UTC-5, Quadibloc wrote:
>
> > > Wouldn't saving all the registers to memory on each clock cycle
> > > defeat the purpose of hiding memory latency?
>
> > It has room for all threads in the register file, so no saving is necessary.
> Yes, which means that whatever it's doing every cycle isn't a
> "context switch" in the sense with which I am familiar.
<
It is an entirely different thread, with different privilege, using different
MMU pointers and tables, and running in its own isolated environment.
What more does a context switch need to fit your model ?
>
> John Savard

Re: The Tera MTA

<5478aeac-f497-45db-b985-27efe22becban@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18903&group=comp.arch#18903

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:2d04:: with SMTP id t4mr21814882qkh.160.1626642820936;
Sun, 18 Jul 2021 14:13:40 -0700 (PDT)
X-Received: by 2002:a9d:3b0:: with SMTP id f45mr17281243otf.5.1626642820756;
Sun, 18 Jul 2021 14:13:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 18 Jul 2021 14:13:40 -0700 (PDT)
In-Reply-To: <84d4d0e4-e1cd-4826-8370-e08c9e3546fdn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:c414:6046:92b1:a701;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:c414:6046:92b1:a701
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com> <scv1hp$gh9$1@dont-email.me>
<1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com> <scvp1v$dgh$1@dont-email.me>
<sd0kkg$jcf$1@dont-email.me> <05e6b305-5cae-4419-b33b-52b5c080621fn@googlegroups.com>
<sd1hps$2es$1@dont-email.me> <84d4d0e4-e1cd-4826-8370-e08c9e3546fdn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <5478aeac-f497-45db-b985-27efe22becban@googlegroups.com>
Subject: Re: The Tera MTA
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 18 Jul 2021 21:13:40 +0000
Content-Type: text/plain; charset="UTF-8"

by: MitchAlsup - Sun, 18 Jul 2021 21:13 UTC

On Sunday, July 18, 2021 at 4:05:50 PM UTC-5, Quadibloc wrote:
> On Sunday, July 18, 2021 at 9:36:30 AM UTC-6, Stephen Fuld wrote:
> > On 7/18/2021 7:53 AM, Quadibloc wrote:
>
> > > Wouldn't saving all the registers to memory on each clock cycle
> > > defeat the purpose of hiding memory latency?
>
> > Perhaps you are misunderstanding how the Tera MTA worked.
> That wasn't the problem.
>
> The point I was making was that the use of the term "context switch"
> was extremely confusing. A context switch is what happens on a
> computer when there's an interrupt or a subroutine is called: the
> registers all get saved to memory.
<
I am going to go with "Errrrrrr nooooo"
<
A context is an application program, a memory area, protected from
itself (privilege) and others (memory isolation and translation) that
has open files, sockets, .... and there are multiple unique and independent
ones.
<
A context switch is a change of control from within one context to
within another context. This DOES NOT NECESSARILY require
{Interrupts, context save, scheduling, context restore} Those are
vestiments to single threaded mono-processors of the past and
wholly unnecessary going forward.
>

>
> John Savard

Re: The Tera MTA

<sd32ui$qss$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18909&group=comp.arch#18909

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: The Tera MTA
Date: Mon, 19 Jul 2021 07:35:13 +0200
Organization: A noiseless patient Spider
Lines: 51
Message-ID: <sd32ui$qss$1@dont-email.me>
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com>
<scv1hp$gh9$1@dont-email.me>
<1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com>
<scvp1v$dgh$1@dont-email.me> <sd0kkg$jcf$1@dont-email.me>
<05e6b305-5cae-4419-b33b-52b5c080621fn@googlegroups.com>
<sd1hps$2es$1@dont-email.me>
<84d4d0e4-e1cd-4826-8370-e08c9e3546fdn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 19 Jul 2021 05:35:14 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7adfa040958caed31940ea1d97f73a2a";
logging-data="27548"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Lin77qugnVWdm3A+SKh/YJ9YrB+TFlSk="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:WTd4SZMvssLEM/aAUY092YxeuhY=
In-Reply-To: <84d4d0e4-e1cd-4826-8370-e08c9e3546fdn@googlegroups.com>
Content-Language: en-US

by: Marcus - Mon, 19 Jul 2021 05:35 UTC

On 2021-07-18, Quadibloc wrote:
> On Sunday, July 18, 2021 at 9:36:30 AM UTC-6, Stephen Fuld wrote:
>> On 7/18/2021 7:53 AM, Quadibloc wrote:
>
>>> Wouldn't saving all the registers to memory on each clock cycle
>>> defeat the purpose of hiding memory latency?
>
>> Perhaps you are misunderstanding how the Tera MTA worked.
>
> That wasn't the problem.
>
> The point I was making was that the use of the term "context switch"
> was extremely confusing. A context switch is what happens on a
> computer when there's an interrupt or a subroutine is called: the
> registers all get saved to memory.

In the Tera MTA the act of saving/restoring registers is "hardware
accelerated" so that it can switch threads on every clock cycle. From
the user program execution context perspective it's the exact same thing
as interrupt + save-state-A + restore-state-B + return-from-interrupt,
except it's handled in hardware instead of in software, so no interrupts
are needed.

>
> What happens not only on the Tera MTA, but on any Intel processor
> with Hyper-Threading, is quite different; the processor moves from
> one thread to another thread, switching to the alternate set of registers
> allocated to that thread.
>

I may be wrong, but I was under the impression that Hyper-Threading /
SMT is slightly different: Two threads are executed /simultaneously/.
In other words: The two thread execution contexts (i.e. their register
files, program counters etc) run concurrently during the same clock
cycle, with the intention of maximizing EU usage in a wide issue
machine.

Switching threads in a typical x86 SMT CPU is still done in software
(via kernel interrupts), as on a non-SMT, single-threaded CPU.

> The old thread retains its context for the next time the processor
> goes back to it.
>
> So even if, technically, the switching between register sets in a
> multithreaded processor could be considered a _form_ of
> context switch, using that phrase to describe it will create an
> erroneous picture in people's minds, causing confusion.
>
> John Savard
>

Re: The Tera MTA

<sd3353$6gf$3@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18910&group=comp.arch#18910

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!ux6ld97kLXxG8kVFFLnoWg.user.46.165.242.75.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: The Tera MTA
Date: Sun, 18 Jul 2021 22:38:43 -0700
Organization: Aioe.org NNTP Server
Message-ID: <sd3353$6gf$3@gioia.aioe.org>
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com>
<scv1hp$gh9$1@dont-email.me>
<1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com>
<scvp1v$dgh$1@dont-email.me> <sd0kkg$jcf$1@dont-email.me>
<05e6b305-5cae-4419-b33b-52b5c080621fn@googlegroups.com>
<sd1hps$2es$1@dont-email.me>
<84d4d0e4-e1cd-4826-8370-e08c9e3546fdn@googlegroups.com>
<sd32ui$qss$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="6671"; posting-host="ux6ld97kLXxG8kVFFLnoWg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Content-Language: en-US
X-Notice: Filtered by postfilter v. 0.9.2

by: Chris M. Thomasson - Mon, 19 Jul 2021 05:38 UTC

On 7/18/2021 10:35 PM, Marcus wrote:
> On 2021-07-18, Quadibloc wrote:
>> On Sunday, July 18, 2021 at 9:36:30 AM UTC-6, Stephen Fuld wrote:
>>> On 7/18/2021 7:53 AM, Quadibloc wrote:
[...]
> I may be wrong, but I was under the impression that Hyper-Threading /
> SMT is slightly different: Two threads are executed /simultaneously/.
> In other words: The two thread execution contexts (i.e. their register
> files, program counters etc) run concurrently during the same clock
> cycle, with the intention of maximizing EU usage in a wide issue
> machine.
[...]

Argh! This brings back memories of the 64k aliasing issue in early
HyperThreading from good ol' Intel. It could cause massive false sharing
issues...

Re: The Tera MTA

<sd35p0$t2f$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18913&group=comp.arch#18913

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.niel.me!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: The Tera MTA
Date: Mon, 19 Jul 2021 01:23:25 -0500
Organization: A noiseless patient Spider
Lines: 86
Message-ID: <sd35p0$t2f$1@dont-email.me>
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com>
<scv1hp$gh9$1@dont-email.me>
<1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com>
<scvp1v$dgh$1@dont-email.me> <sd0kkg$jcf$1@dont-email.me>
<05e6b305-5cae-4419-b33b-52b5c080621fn@googlegroups.com>
<sd1hps$2es$1@dont-email.me>
<84d4d0e4-e1cd-4826-8370-e08c9e3546fdn@googlegroups.com>
<sd32ui$qss$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 19 Jul 2021 06:23:28 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="df64e08478e7800b4276a4f1f9ce507d";
logging-data="29775"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19MtXD0m971REcPTSExMv0W"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:ICcUUKrVNE7aqyzBH/XrUXkqtx8=
In-Reply-To: <sd32ui$qss$1@dont-email.me>
Content-Language: en-US

by: BGB - Mon, 19 Jul 2021 06:23 UTC

On 7/19/2021 12:35 AM, Marcus wrote:
> On 2021-07-18, Quadibloc wrote:
>> On Sunday, July 18, 2021 at 9:36:30 AM UTC-6, Stephen Fuld wrote:
>>> On 7/18/2021 7:53 AM, Quadibloc wrote:
>>
>>>> Wouldn't saving all the registers to memory on each clock cycle
>>>> defeat the purpose of hiding memory latency?
>>
>>> Perhaps you are misunderstanding how the Tera MTA worked.
>>
>> That wasn't the problem.
>>
>> The point I was making was that the use of the term "context switch"
>> was extremely confusing. A context switch is what happens on a
>> computer when there's an interrupt or a subroutine is called: the
>> registers all get saved to memory.
>
> In the Tera MTA the act of saving/restoring registers is "hardware
> accelerated" so that it can switch threads on every clock cycle. From
> the user program execution context perspective it's the exact same thing
> as interrupt + save-state-A + restore-state-B + return-from-interrupt,
> except it's handled in hardware instead of in software, so no interrupts
> are needed.
>

Seems to make sense.

Limiting factor though is likely that with rapid cycling (barrel
processing), presumably the CPU needs to have a register file big enough
to hold all of the active threads.

The access patterns would not be viable for use as a conventional cache
(the bandwidth required for streaming the register banks to/from memory
would be a few orders of magnitude beyond what I would consider
"plausible").

This would in effect put a limit on the number of threads for which it
is viable to use hardware threading (and/or limit each thread to a
relatively small number of registers).

Though, I can note that I did use a vaguely similar approach to this for
implementing FM synthesis in my case, where as opposed to having lots of
parallel logic for each MIDI channel, it just sort of cycles between all
of them (and completes a full rotation 64k times per second). Though,
there is only about 256 bits of state per channel in this case.

>>
>> What happens not only on the Tera MTA, but on any Intel processor
>> with Hyper-Threading, is quite different; the processor moves from
>> one thread to another thread, switching to the alternate set of registers
>> allocated to that thread.
>>
>
> I may be wrong, but I was under the impression that Hyper-Threading /
> SMT is slightly different: Two threads are executed /simultaneously/.
> In other words: The two thread execution contexts (i.e. their register
> files, program counters etc) run concurrently during the same clock
> cycle, with the intention of maximizing EU usage in a wide issue
> machine.
>
> Switching threads in a typical x86 SMT CPU is still done in software
> (via kernel interrupts), as on a non-SMT, single-threaded CPU.
>

Elsewhere, I went and looked it up, and noted that apparently, hardware
multithreading which involves rapidly cycling between a group of threads
is referred to as TMT (Temporal Multi-Threading), where SMT refers more
specifically to implementations which execute both threads in parallel
within the same clock cycle (more like Hyperthreading does AFAIK).

>> The old thread retains its context for the next time the processor
>> goes back to it.
>>
>> So even if, technically, the switching between register sets in a
>> multithreaded processor could be considered a _form_ of
>> context switch, using that phrase to describe it will create an
>> erroneous picture in people's minds, causing confusion.
>>
>> John Savard
>>
>

Re: The Tera MTA

<sd37h1$j0j$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18916&group=comp.arch#18916

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: m.del...@this.bitsnbites.eu (Marcus)
Newsgroups: comp.arch
Subject: Re: The Tera MTA
Date: Mon, 19 Jul 2021 08:53:21 +0200
Organization: A noiseless patient Spider
Lines: 109
Message-ID: <sd37h1$j0j$1@dont-email.me>
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com>
<scv1hp$gh9$1@dont-email.me>
<1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com>
<scvp1v$dgh$1@dont-email.me> <sd0kkg$jcf$1@dont-email.me>
<05e6b305-5cae-4419-b33b-52b5c080621fn@googlegroups.com>
<sd1hps$2es$1@dont-email.me>
<84d4d0e4-e1cd-4826-8370-e08c9e3546fdn@googlegroups.com>
<sd32ui$qss$1@dont-email.me> <sd35p0$t2f$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 19 Jul 2021 06:53:21 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7adfa040958caed31940ea1d97f73a2a";
logging-data="19475"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/HT1ulxsYdaz/IgB7O5q3UJig2ziSQzbo="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:DxwigOtuFrByHhkHak+hdtwqYEE=
In-Reply-To: <sd35p0$t2f$1@dont-email.me>
Content-Language: en-US

by: Marcus - Mon, 19 Jul 2021 06:53 UTC

On 2021-07-19 08:23, BGB wrote:
> On 7/19/2021 12:35 AM, Marcus wrote:
>> On 2021-07-18, Quadibloc wrote:
>>> On Sunday, July 18, 2021 at 9:36:30 AM UTC-6, Stephen Fuld wrote:
>>>> On 7/18/2021 7:53 AM, Quadibloc wrote:
>>>
>>>>> Wouldn't saving all the registers to memory on each clock cycle
>>>>> defeat the purpose of hiding memory latency?
>>>
>>>> Perhaps you are misunderstanding how the Tera MTA worked.
>>>
>>> That wasn't the problem.
>>>
>>> The point I was making was that the use of the term "context switch"
>>> was extremely confusing. A context switch is what happens on a
>>> computer when there's an interrupt or a subroutine is called: the
>>> registers all get saved to memory.
>>
>> In the Tera MTA the act of saving/restoring registers is "hardware
>> accelerated" so that it can switch threads on every clock cycle. From
>> the user program execution context perspective it's the exact same thing
>> as interrupt + save-state-A + restore-state-B + return-from-interrupt,
>> except it's handled in hardware instead of in software, so no interrupts
>> are needed.
>>
>
> Seems to make sense.
>
>
> Limiting factor though is likely that with rapid cycling (barrel
> processing), presumably the CPU needs to have a register file big enough
> to hold all of the active threads.
>

Yes. You need one register file slice per HW thread, so for 128 threads
with 32 64-bit GPR:s (as in the Threadstorm), that's a whooping 32 KiB
of register state (only counting the GPR:s). That should even be
possible on an FPGA (especially if you aim slightly lower).

I think that one of the consequences of using that many HW threads is
that cache locality suffers, and the whole idea of this particular
design is to hide long latency memory accesses (due to cache misses)
anyway, so the cache is probably not as important - thus it makes sense
to trade some cache area for register file area instead.

The really cool part is that you can do random memory access and random
branches - "for free" - without having advanced branch predictors and
deep cache hierarchies - /and/ you can make your pipeline almost
arbitrarily long.

> The access patterns would not be viable for use as a conventional cache
> (the bandwidth required for streaming the register banks to/from memory
> would be a few orders of magnitude beyond what I would consider
> "plausible").
>

Precisely.

>
> This would in effect put a limit on the number of threads for which it
> is viable to use hardware threading (and/or limit each thread to a
> relatively small number of registers).
>
> Though, I can note that I did use a vaguely similar approach to this for
> implementing FM synthesis in my case, where as opposed to having lots of
> parallel logic for each MIDI channel, it just sort of cycles between all
> of them (and completes a full rotation 64k times per second). Though,
> there is only about 256 bits of state per channel in this case.
>
>
>>>
>>> What happens not only on the Tera MTA, but on any Intel processor
>>> with Hyper-Threading, is quite different; the processor moves from
>>> one thread to another thread, switching to the alternate set of
>>> registers
>>> allocated to that thread.
>>>
>>
>> I may be wrong, but I was under the impression that Hyper-Threading /
>> SMT is slightly different: Two threads are executed /simultaneously/.
>> In other words: The two thread execution contexts (i.e. their register
>> files, program counters etc) run concurrently during the same clock
>> cycle, with the intention of maximizing EU usage in a wide issue
>> machine.
>>
>> Switching threads in a typical x86 SMT CPU is still done in software
>> (via kernel interrupts), as on a non-SMT, single-threaded CPU.
>>
>
> Elsewhere, I went and looked it up, and noted that apparently, hardware
> multithreading which involves rapidly cycling between a group of threads
> is referred to as TMT (Temporal Multi-Threading), where SMT refers more
> specifically to implementations which execute both threads in parallel
> within the same clock cycle (more like Hyperthreading does AFAIK).
>
>
>>> The old thread retains its context for the next time the processor
>>> goes back to it.
>>>
>>> So even if, technically, the switching between register sets in a
>>> multithreaded processor could be considered a _form_ of
>>> context switch, using that phrase to describe it will create an
>>> erroneous picture in people's minds, causing confusion.
>>>
>>> John Savard
>>>
>>
>

Re: The Tera MTA

<b114b534-ea0a-4399-b601-26263e323a25n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18917&group=comp.arch#18917

copy link Newsgroups: comp.arch

X-Received: by 2002:ad4:4690:: with SMTP id bq16mr23180617qvb.23.1626679039597;
Mon, 19 Jul 2021 00:17:19 -0700 (PDT)
X-Received: by 2002:a4a:df02:: with SMTP id i2mr9308770oou.14.1626679039344;
Mon, 19 Jul 2021 00:17:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 19 Jul 2021 00:17:19 -0700 (PDT)
In-Reply-To: <5478aeac-f497-45db-b985-27efe22becban@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa3c:a000:6437:2f3c:fd50:a217;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa3c:a000:6437:2f3c:fd50:a217
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com> <scv1hp$gh9$1@dont-email.me>
<1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com> <scvp1v$dgh$1@dont-email.me>
<sd0kkg$jcf$1@dont-email.me> <05e6b305-5cae-4419-b33b-52b5c080621fn@googlegroups.com>
<sd1hps$2es$1@dont-email.me> <84d4d0e4-e1cd-4826-8370-e08c9e3546fdn@googlegroups.com>
<5478aeac-f497-45db-b985-27efe22becban@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b114b534-ea0a-4399-b601-26263e323a25n@googlegroups.com>
Subject: Re: The Tera MTA
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Mon, 19 Jul 2021 07:17:19 +0000
Content-Type: text/plain; charset="UTF-8"

by: Quadibloc - Mon, 19 Jul 2021 07:17 UTC

On Sunday, July 18, 2021 at 3:13:41 PM UTC-6, MitchAlsup wrote:

> A context switch is a change of control from within one context to
> within another context. This DOES NOT NECESSARILY require
> {Interrupts, context save, scheduling, context restore} Those are
> vestiments to single threaded mono-processors of the past and
> wholly unnecessary going forward.

I can point to some *really old* computers that had one set of
registers for application programs, and a second set for the
operating system.

It is certainly true that switching to another set of registers
is a simpler way to let an interrupt routine get going faster;
but the term "context switch" at least makes me think of what
a single thread does to change to another context - the threads
in multi-threaded processors are so completely separate that
switching the context takes place outside the code.

John Savard

Re: The Tera MTA

<sd3coj$9m6$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18922&group=comp.arch#18922

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!aioe.org!uNkxFD/dgvFUE+WUQcvYbA.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: The Tera MTA
Date: Mon, 19 Jul 2021 10:22:42 +0200
Organization: Aioe.org NNTP Server
Message-ID: <sd3coj$9m6$1@gioia.aioe.org>
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com>
<scv1hp$gh9$1@dont-email.me>
<1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com>
<scvp1v$dgh$1@dont-email.me> <sd0kkg$jcf$1@dont-email.me>
<05e6b305-5cae-4419-b33b-52b5c080621fn@googlegroups.com>
<sd1hps$2es$1@dont-email.me>
<84d4d0e4-e1cd-4826-8370-e08c9e3546fdn@googlegroups.com>
<5478aeac-f497-45db-b985-27efe22becban@googlegroups.com>
<b114b534-ea0a-4399-b601-26263e323a25n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="9926"; posting-host="uNkxFD/dgvFUE+WUQcvYbA.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.8
X-Notice: Filtered by postfilter v. 0.9.2

by: Terje Mathisen - Mon, 19 Jul 2021 08:22 UTC

Quadibloc wrote:
> On Sunday, July 18, 2021 at 3:13:41 PM UTC-6, MitchAlsup wrote:
>
>> A context switch is a change of control from within one context to
>> within another context. This DOES NOT NECESSARILY require
>> {Interrupts, context save, scheduling, context restore} Those are
>> vestiments to single threaded mono-processors of the past and
>> wholly unnecessary going forward.
>
> I can point to some *really old* computers that had one set of
> registers for application programs, and a second set for the
> operating system.
>
> It is certainly true that switching to another set of registers
> is a simpler way to let an interrupt routine get going faster;
> but the term "context switch" at least makes me think of what
> a single thread does to change to another context - the threads
> in multi-threaded processors are so completely separate that
> switching the context takes place outside the code.

The first machines I have personally used with this feature was the
Norsk Data ND10 and ND100, they had 15 interrupt levels on top of the
user state (level 0), each with a full complement of registers.

This meant that they could take a HW interrupt from some kind of
periferal (like a process monitoring sensor) and switch to the
corresponding task in a single (or even zero?) cycle(s), do whatever was
required and then switch back.

This worked wonderfully for the initial target which was process
monitoring & control, they sold a bunch of these machines to CERN.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: The Tera MTA

<sd3d03$m6l$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18923&group=comp.arch#18923

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: The Tera MTA
Date: Mon, 19 Jul 2021 01:26:43 -0700
Organization: A noiseless patient Spider
Lines: 38
Message-ID: <sd3d03$m6l$1@dont-email.me>
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com>
<scv1hp$gh9$1@dont-email.me>
<1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com>
<scvp1v$dgh$1@dont-email.me> <sd0kkg$jcf$1@dont-email.me>
<05e6b305-5cae-4419-b33b-52b5c080621fn@googlegroups.com>
<sd1hps$2es$1@dont-email.me>
<84d4d0e4-e1cd-4826-8370-e08c9e3546fdn@googlegroups.com>
<5478aeac-f497-45db-b985-27efe22becban@googlegroups.com>
<b114b534-ea0a-4399-b601-26263e323a25n@googlegroups.com>
<sd3coj$9m6$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 19 Jul 2021 08:26:43 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="52a8f687a8e4b9031a9a7ae958c358d9";
logging-data="22741"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/EmLnLRTJZJ8YsjYRva2Zr"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:LSlbErGi4uPYyCPI2N7IqC18pJo=
In-Reply-To: <sd3coj$9m6$1@gioia.aioe.org>
Content-Language: en-US

by: Ivan Godard - Mon, 19 Jul 2021 08:26 UTC

On 7/19/2021 1:22 AM, Terje Mathisen wrote:
> Quadibloc wrote:
>> On Sunday, July 18, 2021 at 3:13:41 PM UTC-6, MitchAlsup wrote:
>>
>>> A context switch is a change of control from within one context to
>>> within another context. This DOES NOT NECESSARILY require
>>> {Interrupts, context save, scheduling, context restore} Those are
>>> vestiments to single threaded mono-processors of the past and
>>> wholly unnecessary going forward.
>>
>> I can point to some *really old* computers that had one set of
>> registers for application programs, and a second set for the
>> operating system.
>>
>> It is certainly true that switching to another set of registers
>> is a simpler way to let an interrupt routine get going faster;
>> but the term "context switch" at least makes me think of what
>> a single thread does to change to another context - the threads
>> in multi-threaded processors are so completely separate that
>> switching the context takes place outside the code.
>
> The first machines I have personally used with this feature was the
> Norsk Data ND10 and ND100, they had 15 interrupt levels on top of the
> user state (level 0), each with a full complement of registers.
>
> This meant that they could take a HW interrupt from some kind of
> periferal (like a process monitoring sensor) and switch to the
> corresponding task in a single (or even zero?) cycle(s), do whatever was
> required and then switch back.
>
> This worked wonderfully for the initial target which was process
> monitoring & control, they sold a bunch of these machines to CERN.
>
> Terje
>
>

For a while they also ran the telephone system of PRC.

Re: The Tera MTA

<sd3hf9$h69$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18924&group=comp.arch#18924

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.niel.me!news.gegeweb.eu!gegeweb.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: The Tera MTA
Date: Mon, 19 Jul 2021 02:43:03 -0700
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <sd3hf9$h69$1@dont-email.me>
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com>
<scv1hp$gh9$1@dont-email.me>
<1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com>
<scvp1v$dgh$1@dont-email.me> <sd0kkg$jcf$1@dont-email.me>
<05e6b305-5cae-4419-b33b-52b5c080621fn@googlegroups.com>
<sd1hps$2es$1@dont-email.me>
<84d4d0e4-e1cd-4826-8370-e08c9e3546fdn@googlegroups.com>
<5478aeac-f497-45db-b985-27efe22becban@googlegroups.com>
<b114b534-ea0a-4399-b601-26263e323a25n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 19 Jul 2021 09:43:05 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="4cf99953499a8d509a226f2e4dd2e5e5";
logging-data="17609"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19+OKYuyqLTbaQ3tWkWk/hN+Tb9ut/17ww="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:J/p3eaHFpE0D6S8UkLDXOPcYqn4=
In-Reply-To: <b114b534-ea0a-4399-b601-26263e323a25n@googlegroups.com>
Content-Language: en-US

by: Stephen Fuld - Mon, 19 Jul 2021 09:43 UTC

On 7/19/2021 12:17 AM, Quadibloc wrote:
> On Sunday, July 18, 2021 at 3:13:41 PM UTC-6, MitchAlsup wrote:
>
>> A context switch is a change of control from within one context to
>> within another context. This DOES NOT NECESSARILY require
>> {Interrupts, context save, scheduling, context restore} Those are
>> vestiments to single threaded mono-processors of the past and
>> wholly unnecessary going forward.
>
> I can point to some *really old* computers that had one set of
> registers for application programs, and a second set for the
> operating system.

Of course, the Univac 1100 had, and still has exactly that, another full
set of registers for the OS, but they were used only for fast interrupt
handling. Most of the OS ran in the user set (so most of the OS could
be interrupted for higher priority work).

And my original ARM Architecture Reference shows a duplicate set of
registers R8-14 for Fast Interrupt mode.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: The Tera MTA

<0001HW.26A5DF18028FEE79700002F9A38F@news.individual.net>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18932&group=comp.arch#18932

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: findlayb...@blueyonder.co.uk (Bill Findlay)
Newsgroups: comp.arch
Subject: Re: The Tera MTA
Date: Mon, 19 Jul 2021 17:26:00 +0100
Organization: none
Lines: 24
Message-ID: <0001HW.26A5DF18028FEE79700002F9A38F@news.individual.net>
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com> <127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com> <scv1hp$gh9$1@dont-email.me> <1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com> <scvp1v$dgh$1@dont-email.me> <sd0kkg$jcf$1@dont-email.me> <05e6b305-5cae-4419-b33b-52b5c080621fn@googlegroups.com> <sd1hps$2es$1@dont-email.me> <84d4d0e4-e1cd-4826-8370-e08c9e3546fdn@googlegroups.com> <5478aeac-f497-45db-b985-27efe22becban@googlegroups.com> <b114b534-ea0a-4399-b601-26263e323a25n@googlegroups.com>
Reply-To: findlaybill@blueyonder.co.uk
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Trace: individual.net 7clKRkUW8yumpg6nvUTrWwXy8FE67onJODd0c0PDcBcizWq6X0
X-Orig-Path: not-for-mail
Cancel-Lock: sha1:H/vHQQT9sijJBXzPP7WG3mYFsPQ=
User-Agent: Hogwasher/5.24

by: Bill Findlay - Mon, 19 Jul 2021 16:26 UTC

On 19 Jul 2021, Quadibloc wrote
(in article<b114b534-ea0a-4399-b601-26263e323a25n@googlegroups.com>):

> On Sunday, July 18, 2021 at 3:13:41 PM UTC-6, MitchAlsup wrote:
>
> > A context switch is a change of control from within one context to
> > within another context. This DOES NOT NECESSARILY require
> > {Interrupts, context save, scheduling, context restore} Those are
> > vestiments to single threaded mono-processors of the past and
> > wholly unnecessary going forward.
>
> I can point to some *really old* computers that had one set of
> registers for application programs, and a second set for the
> operating system.

I can point to a really old computer that had a separate set of
48 registers for each of the 4 apps it could run concurrently:

<http://www.findlayw.plus.com/KDF9/The%20Hardware%20of%20the%20KDF9.pdf>

--
Bill Findlay

Re: The Tera MTA

<b669dfad-aafc-4933-9b2c-07900162d9fbn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18951&group=comp.arch#18951

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:b6c5:: with SMTP id g188mr29448416qkf.92.1626807426137;
Tue, 20 Jul 2021 11:57:06 -0700 (PDT)
X-Received: by 2002:a9d:6c8e:: with SMTP id c14mr3280679otr.5.1626807425870;
Tue, 20 Jul 2021 11:57:05 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Jul 2021 11:57:05 -0700 (PDT)
In-Reply-To: <1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=64.26.99.248; posting-account=6JNn0QoAAAD-Scrkl0ClrfutZTkrOS9S
NNTP-Posting-Host: 64.26.99.248
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com> <scv1hp$gh9$1@dont-email.me>
<1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b669dfad-aafc-4933-9b2c-07900162d9fbn@googlegroups.com>
Subject: Re: The Tera MTA
From: paaroncl...@gmail.com (Paul A. Clayton)
Injection-Date: Tue, 20 Jul 2021 18:57:06 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: Paul A. Clayton - Tue, 20 Jul 2021 18:57 UTC

On Saturday, July 17, 2021 at 3:39:16 PM UTC-4, MitchAlsup wrote:
[snip]
> The register file size is multiplied by the degree of SMT-ness.

Even for SMT, this is not necessarily entirely true. POWER7's SMT4
seems to reasonably count as 4-way SMT but requires no
additional register storage than SMT2. It uses the replication
of the register file for extra read ports to provide twice as
many threads with the limit that each pair of threads is limited
by the number of read ports in a cluster. (This might not quite
count as doubling SMT thread count since it is similar to
splitting the inside of the core into two half-internal-core parts,
i.e., the execution units are coupled with the register files. Aside:
This coupling *might* facilitate some power saving from having
fewer active forwarding paths.)

With additional registers for renaming not scaling with thread
count (one of the advantages of MT is hiding latencies), the
actual increase in register count will typically not scale with
the number of threads. (With a clustered register file, it might
also be practical to store some dead-if-speculation-is-correct
values in only one part, possibly doing moves during periods
of low port use. I am skeptical such would provide much
benefit.)

It might also be practical to reduce register file port count via
banking under MT. More ILP/latency tolerance implies more
choice in when to read from a centralized register file (probably
with some added buffering).

With fine-grained or SoEMT (not SMT), the 3D register file technique
(Marc Tremblay et al., "A Three Dimensional Register File For
Superscalar Processors", 1995 — 30% array area overhead
claimed for 8 'temporal banks' (Andy Glew's term) for 7 read,
3 write ported file) used by Itanium 2 Montecito (Eric S. Fetzer
et al., "The Multi-Threaded, Parity-Protected 128-Word Register
Files on a Dual-Core Itanium-Family Processor", 2005) largely
hiding storage area overhead given wiring constraints for highly
ported register files. With cycle-granular temporal banking
applied with traditional (spatial) banking, thread B could read
from spatial bank 0 while thread A read from bank 1, so the
area savings might be applied to an SMT design (though
requiring all the ports used by a thread to be in the same bank/
group of banks given a large number ports [to save significant
area] seems problematic).

These seem reasonable footnotes to apply to the statement
"register file size is multiplied by the degree of SMT-ness".

Re: The Tera MTA

<19d3513e-ceb0-4e4a-821b-c699fd542981n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18953&group=comp.arch#18953

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:90c:: with SMTP id v12mr30894375qkv.190.1626809459727;
Tue, 20 Jul 2021 12:30:59 -0700 (PDT)
X-Received: by 2002:a05:6808:2d2:: with SMTP id a18mr2994485oid.84.1626809459430;
Tue, 20 Jul 2021 12:30:59 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!usenet.pasdenom.info!usenet-fr.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Jul 2021 12:30:59 -0700 (PDT)
In-Reply-To: <02416980-bf6b-4094-8cea-34bb69937472n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=64.26.99.248; posting-account=6JNn0QoAAAD-Scrkl0ClrfutZTkrOS9S
NNTP-Posting-Host: 64.26.99.248
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<scuq5c$h7h$1@dont-email.me> <0afc2a0a-b3e6-4525-9fa3-a33bc287a6d5n@googlegroups.com>
<51548316-5543-41d9-8ec6-822b861ea6afn@googlegroups.com> <aeab88d8-c9c6-49aa-9738-c778ac4174den@googlegroups.com>
<02416980-bf6b-4094-8cea-34bb69937472n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <19d3513e-ceb0-4e4a-821b-c699fd542981n@googlegroups.com>
Subject: Re: The Tera MTA
From: paaroncl...@gmail.com (Paul A. Clayton)
Injection-Date: Tue, 20 Jul 2021 19:30:59 +0000
Content-Type: text/plain; charset="UTF-8"

by: Paul A. Clayton - Tue, 20 Jul 2021 19:30 UTC

On Saturday, July 17, 2021 at 5:34:02 PM UTC-4, MitchAlsup wrote:
> On Saturday, July 17, 2021 at 4:06:50 PM UTC-5, Quadibloc wrote:
[snip]
>> https://www.nextplatform.com/2017/09/18/m8-last-hurrah-oracle-sparc/
[snip]
> Notice that they doubled the ICache and kept the DCache the same.
> Servers need larger I relative to D while application processors go the
> other direction.

Well, the L2 Icache (shared by four cores) remained the same size.
There is presumably some optimum L1 Icache size based on inter-thread
code sharing, temporal locality at a given code 'working set', L2 contention
avoidance, etc. While not as critical under multithreading (though M8 also
sought higher single-threaded performance), data cache access latency
tends, I think, to be more critical (BTBs can provide accurate fetch
prediction but load address prediction is harder).

> Secondly, 20nm still is cheaper per transistor than 14nm, 10nm, 7nm or 5nm.

Really (in 2021)? Obviously, it depends on volume (NRE is significant)
and chip size (dicing overhead/pad limits), less scaling transistors (I/O),
wiring constraints (SRAMs might be relatively cheaper, i.e., scaling better),
and probably other factors, but I thought I had read that Intel claimed
its 14nm was cheaper per transistor than its 20nm (not that Intel lacked
motivation to fudge figures).

Are power and transistors per die the only benefits then of 14nm?

Re: The Tera MTA

<2d30e258-5cfc-47e6-b5e1-262e5c5f2701n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18971&group=comp.arch#18971

copy link Newsgroups: comp.arch

X-Received: by 2002:ae9:f447:: with SMTP id z7mr26107437qkl.453.1626848821614;
Tue, 20 Jul 2021 23:27:01 -0700 (PDT)
X-Received: by 2002:a9d:7f14:: with SMTP id j20mr23901523otq.82.1626848821405;
Tue, 20 Jul 2021 23:27:01 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Jul 2021 23:27:01 -0700 (PDT)
In-Reply-To: <b669dfad-aafc-4933-9b2c-07900162d9fbn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa3c:a000:412d:1136:693:fe90;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa3c:a000:412d:1136:693:fe90
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com> <scv1hp$gh9$1@dont-email.me>
<1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com> <b669dfad-aafc-4933-9b2c-07900162d9fbn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2d30e258-5cfc-47e6-b5e1-262e5c5f2701n@googlegroups.com>
Subject: Re: The Tera MTA
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 21 Jul 2021 06:27:01 +0000
Content-Type: text/plain; charset="UTF-8"

by: Quadibloc - Wed, 21 Jul 2021 06:27 UTC

On Tuesday, July 20, 2021 at 12:57:07 PM UTC-6, Paul A. Clayton wrote:
> On Saturday, July 17, 2021 at 3:39:16 PM UTC-4, MitchAlsup wrote:
> [snip]
> > The register file size is multiplied by the degree of SMT-ness.

> Even for SMT, this is not necessarily entirely true. POWER7's SMT4
> seems to reasonably count as 4-way SMT but requires no
> additional register storage than SMT2. It uses the replication
> of the register file for extra read ports to provide twice as
> many threads with the limit that each pair of threads is limited
> by the number of read ports in a cluster.

So if you use the 4-way SMT, you do without the extra read
ports...

Still, you do need four copies of the register file, so this is only
an exception because of a technicality: they found some use
for the extra copies when SMT was not in effect.

John Savard

Re: The Tera MTA

<2aa1ef60-b33d-49f5-899c-c017b7a867acn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18972&group=comp.arch#18972

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:620a:1465:: with SMTP id j5mr24929691qkl.63.1626849193206;
Tue, 20 Jul 2021 23:33:13 -0700 (PDT)
X-Received: by 2002:aca:dac5:: with SMTP id r188mr24303379oig.78.1626849193022;
Tue, 20 Jul 2021 23:33:13 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 20 Jul 2021 23:33:12 -0700 (PDT)
In-Reply-To: <19d3513e-ceb0-4e4a-821b-c699fd542981n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:56a:fa3c:a000:412d:1136:693:fe90;
posting-account=1nOeKQkAAABD2jxp4Pzmx9Hx5g9miO8y
NNTP-Posting-Host: 2001:56a:fa3c:a000:412d:1136:693:fe90
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<scuq5c$h7h$1@dont-email.me> <0afc2a0a-b3e6-4525-9fa3-a33bc287a6d5n@googlegroups.com>
<51548316-5543-41d9-8ec6-822b861ea6afn@googlegroups.com> <aeab88d8-c9c6-49aa-9738-c778ac4174den@googlegroups.com>
<02416980-bf6b-4094-8cea-34bb69937472n@googlegroups.com> <19d3513e-ceb0-4e4a-821b-c699fd542981n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2aa1ef60-b33d-49f5-899c-c017b7a867acn@googlegroups.com>
Subject: Re: The Tera MTA
From: jsav...@ecn.ab.ca (Quadibloc)
Injection-Date: Wed, 21 Jul 2021 06:33:13 +0000
Content-Type: text/plain; charset="UTF-8"

by: Quadibloc - Wed, 21 Jul 2021 06:33 UTC

On Tuesday, July 20, 2021 at 1:31:00 PM UTC-6, Paul A. Clayton wrote:
> On Saturday, July 17, 2021 at 5:34:02 PM UTC-4, MitchAlsup wrote:

> > Secondly, 20nm still is cheaper per transistor than 14nm, 10nm, 7nm or 5nm.
>
> Really (in 2021)?

Well, the year that matters is 2015, since that's when the SPARC M7 came out.

The SPARC M8 could still be on 20nm even if the economics of manufacture
have changed since then for a couple of reasons:

- Oracle seems to have lost interest in the SPARC architecture. It promises their
customers they'll have until 2034 to change, but it's likely that future SPARC
systems will be based on current x86 hardware with JIT SPARC emulation
software.

Therefore, they didn't want to invest the time and effort changing over their design
for the M7 to a 14nm process; instead, they may have just took the existing design,
added some baked in microcode for database operations and a little extra cache.

- Due to the current chip shortage that's made it hard to get one's hands on the
latest graphics cards and so on, Oracle found it easier to get some 20nm foundry
capacity than any 14nm foundry capacity.

John Savard

Re: The Tera MTA

<sd8kqb$4i0$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18974&group=comp.arch#18974

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: The Tera MTA
Date: Wed, 21 Jul 2021 01:10:52 -0700
Organization: A noiseless patient Spider
Lines: 1
Message-ID: <sd8kqb$4i0$1@dont-email.me>
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<scuq5c$h7h$1@dont-email.me>
<0afc2a0a-b3e6-4525-9fa3-a33bc287a6d5n@googlegroups.com>
<51548316-5543-41d9-8ec6-822b861ea6afn@googlegroups.com>
<aeab88d8-c9c6-49aa-9738-c778ac4174den@googlegroups.com>
<02416980-bf6b-4094-8cea-34bb69937472n@googlegroups.com>
<19d3513e-ceb0-4e4a-821b-c699fd542981n@googlegroups.com>
<2aa1ef60-b33d-49f5-899c-c017b7a867acn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 21 Jul 2021 08:10:52 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="38e0bf959633f73a7114884d648647e8";
logging-data="4672"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Q3LSjlY8ubDHzQ8wWELpA"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:/bYBtoQ0qNtveX9tN49AkLJkQfk=
In-Reply-To: <2aa1ef60-b33d-49f5-899c-c017b7a867acn@googlegroups.com>
Content-Language: en-US

by: Ivan Godard - Wed, 21 Jul 2021 08:10 UTC

https://www.youtube.com/watch?v=MbtkL5_f6-4

Re: The Tera MTA

<7482c2f6-9d84-4bdb-a6e5-bebfc45b33ecn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18985&group=comp.arch#18985

copy link Newsgroups: comp.arch

X-Received: by 2002:a05:622a:489:: with SMTP id p9mr5063832qtx.256.1626874388521;
Wed, 21 Jul 2021 06:33:08 -0700 (PDT)
X-Received: by 2002:a05:6830:1658:: with SMTP id h24mr15487021otr.182.1626874388254;
Wed, 21 Jul 2021 06:33:08 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 21 Jul 2021 06:33:07 -0700 (PDT)
In-Reply-To: <19d3513e-ceb0-4e4a-821b-c699fd542981n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=64.26.99.248; posting-account=6JNn0QoAAAD-Scrkl0ClrfutZTkrOS9S
NNTP-Posting-Host: 64.26.99.248
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<scuq5c$h7h$1@dont-email.me> <0afc2a0a-b3e6-4525-9fa3-a33bc287a6d5n@googlegroups.com>
<51548316-5543-41d9-8ec6-822b861ea6afn@googlegroups.com> <aeab88d8-c9c6-49aa-9738-c778ac4174den@googlegroups.com>
<02416980-bf6b-4094-8cea-34bb69937472n@googlegroups.com> <19d3513e-ceb0-4e4a-821b-c699fd542981n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7482c2f6-9d84-4bdb-a6e5-bebfc45b33ecn@googlegroups.com>
Subject: Re: The Tera MTA
From: paaroncl...@gmail.com (Paul A. Clayton)
Injection-Date: Wed, 21 Jul 2021 13:33:08 +0000
Content-Type: text/plain; charset="UTF-8"

by: Paul A. Clayton - Wed, 21 Jul 2021 13:33 UTC

On Tuesday, July 20, 2021 at 3:31:00 PM UTC-4, Paul A. Clayton wrote:
> On Saturday, July 17, 2021 at 5:34:02 PM UTC-4, MitchAlsup wrote:
[snip]
>> Secondly, 20nm still is cheaper per transistor than 14nm, 10nm, 7nm or 5nm.
>
> Really (in 2021)? Obviously, it depends on volume (NRE is significant)
> and chip size (dicing overhead/pad limits), less scaling transistors (I/O),
> wiring constraints (SRAMs might be relatively cheaper, i.e., scaling better),
> and probably other factors, but I thought I had read that Intel claimed
> its 14nm was cheaper per transistor than its 20nm (not that Intel lacked
> motivation to fudge figures).
>
> Are power and transistors per die the only benefits then of 14nm?

Doh. There are obviously also some direct performance benefits
(switching speed and less wire delay?). SRAM price per transistor
would also benefit from the lower cost of redundancy compared to
"random logic", so a higher defect rate of a smaller process would
increase "random logic" cost per transistor more than the cost for
SRAM.

Re: The Tera MTA

<2021Jul21.162714@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=18994&group=comp.arch#18994

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: The Tera MTA
Date: Wed, 21 Jul 2021 14:27:14 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 54
Message-ID: <2021Jul21.162714@mips.complang.tuwien.ac.at>
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com> <scuq5c$h7h$1@dont-email.me> <0afc2a0a-b3e6-4525-9fa3-a33bc287a6d5n@googlegroups.com> <51548316-5543-41d9-8ec6-822b861ea6afn@googlegroups.com> <aeab88d8-c9c6-49aa-9738-c778ac4174den@googlegroups.com> <02416980-bf6b-4094-8cea-34bb69937472n@googlegroups.com> <19d3513e-ceb0-4e4a-821b-c699fd542981n@googlegroups.com> <7482c2f6-9d84-4bdb-a6e5-bebfc45b33ecn@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="b49d7fcff2fb833c3a4d66103d083e8a";
logging-data="18157"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18S7LVf6aF0pV/u+LYeC2wX"
Cancel-Lock: sha1:qqORfrvMlL1OplQnOxjRsHf6bOA=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Wed, 21 Jul 2021 14:27 UTC

"Paul A. Clayton" <paaronclayton@gmail.com> writes:
>On Tuesday, July 20, 2021 at 3:31:00 PM UTC-4, Paul A. Clayton wrote:
>> On Saturday, July 17, 2021 at 5:34:02 PM UTC-4, MitchAlsup wrote:
>[snip]
>>> Secondly, 20nm still is cheaper per transistor than 14nm, 10nm, 7nm or 5nm.
>>
>> Really (in 2021)? Obviously, it depends on volume (NRE is significant)
>> and chip size (dicing overhead/pad limits), less scaling transistors (I/O),
>> wiring constraints (SRAMs might be relatively cheaper, i.e., scaling better),
>> and probably other factors, but I thought I had read that Intel claimed
>> its 14nm was cheaper per transistor than its 20nm (not that Intel lacked
>> motivation to fudge figures).
>>
>> Are power and transistors per die the only benefits then of 14nm?

If you look at Intel, they had their difficulties with the 14nm
processes, but they have produced their mainstream CPUs on 14nm since
~2016 (Skylake was still scarce in 2015, so I guess they did not
produce that many at that time); they produced (server) CPUs with more
transistors in 22nm, so the reason for mainstream 14nm was not
transistors per die.

Both the 4770 (22nm, 65W TDP) and 6700 (14nm, 65W TDP) have 3.4GHz
base clock, so they are probably comparable in power consumption for a
given clock rate (although admittedly the Skylake has more IPC and
more transistors, so it's not an apples-to-apples comparison; Haswell
vs. Broadwell would be better, but Broadwell seemed to suffer from
Intel's 14nm woes, and the 5775C (14nm, 65W TDP) has 3.3GHz base
clock, lower than either). So it's not power, either, at least at
first.

And looking at the maximum clock rates (4790K: 4400MHZ, 6700K:
4200MHz), it was not the clock rate, either.

So I guess it's the cost per transistor. If 14nm was not at least
competetive in cost, Intel would have produced year after year of
Haswell respins (and maybe a Skylake backport to 22nm), like they did
with Skylake when 10nm made even more problems than 14nm (and like
Rocket Lake is a 14nm backport of Ice Lake).

>Doh. There are obviously also some direct performance benefits
>(switching speed and less wire delay?).

An avid reader of this newsgroup knows that switching speed has become
more and more irrelevant since 90nm or so (wire delay dominates), and
that wire delay does not improve; what you win in shorter wires, you
lose in slower signals. What newer processes do provide is more
layers. Makes me wonder how well older processes with more layers
would work.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: The Tera MTA

<446fed80-1f97-42f3-bf0c-96c25c075bd1n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19024&group=comp.arch#19024

copy link Newsgroups: comp.arch

X-Received: by 2002:a0c:ff01:: with SMTP id w1mr38735680qvt.28.1626907897175;
Wed, 21 Jul 2021 15:51:37 -0700 (PDT)
X-Received: by 2002:a54:448a:: with SMTP id v10mr4100095oiv.44.1626907896977;
Wed, 21 Jul 2021 15:51:36 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 21 Jul 2021 15:51:36 -0700 (PDT)
In-Reply-To: <2021Jul21.162714@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.182.191; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.182.191
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<scuq5c$h7h$1@dont-email.me> <0afc2a0a-b3e6-4525-9fa3-a33bc287a6d5n@googlegroups.com>
<51548316-5543-41d9-8ec6-822b861ea6afn@googlegroups.com> <aeab88d8-c9c6-49aa-9738-c778ac4174den@googlegroups.com>
<02416980-bf6b-4094-8cea-34bb69937472n@googlegroups.com> <19d3513e-ceb0-4e4a-821b-c699fd542981n@googlegroups.com>
<7482c2f6-9d84-4bdb-a6e5-bebfc45b33ecn@googlegroups.com> <2021Jul21.162714@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <446fed80-1f97-42f3-bf0c-96c25c075bd1n@googlegroups.com>
Subject: Re: The Tera MTA
From: already5...@yahoo.com (Michael S)
Injection-Date: Wed, 21 Jul 2021 22:51:37 +0000
Content-Type: text/plain; charset="UTF-8"

by: Michael S - Wed, 21 Jul 2021 22:51 UTC

On Wednesday, July 21, 2021 at 6:06:57 PM UTC+3, Anton Ertl wrote:
> "Paul A. Clayton" <paaron...@gmail.com> writes:
> >On Tuesday, July 20, 2021 at 3:31:00 PM UTC-4, Paul A. Clayton wrote:
> >> On Saturday, July 17, 2021 at 5:34:02 PM UTC-4, MitchAlsup wrote:
> >[snip]
> >>> Secondly, 20nm still is cheaper per transistor than 14nm, 10nm, 7nm or 5nm.
> >>
> >> Really (in 2021)? Obviously, it depends on volume (NRE is significant)
> >> and chip size (dicing overhead/pad limits), less scaling transistors (I/O),
> >> wiring constraints (SRAMs might be relatively cheaper, i.e., scaling better),
> >> and probably other factors, but I thought I had read that Intel claimed
> >> its 14nm was cheaper per transistor than its 20nm (not that Intel lacked
> >> motivation to fudge figures).
> >>
> >> Are power and transistors per die the only benefits then of 14nm?
> If you look at Intel, they had their difficulties with the 14nm
> processes, but they have produced their mainstream CPUs on 14nm since
> ~2016 (Skylake was still scarce in 2015, so I guess they did not
> produce that many at that time); they produced (server) CPUs with more
> transistors in 22nm, so the reason for mainstream 14nm was not
> transistors per die.
>
> Both the 4770 (22nm, 65W TDP) and 6700 (14nm, 65W TDP) have 3.4GHz
> base clock, so they are probably comparable in power consumption for a
> given clock rate (although admittedly the Skylake has more IPC and
> more transistors, so it's not an apples-to-apples comparison; Haswell
> vs. Broadwell would be better, but Broadwell seemed to suffer from
> Intel's 14nm woes, and the 5775C (14nm, 65W TDP) has 3.3GHz base
> clock, lower than either). So it's not power, either, at least at
> first.
>

D-1541 (Broadwell) vs E5-1428L v3 (Haswell)
Broadwell runs both faster and cooler.
It's not exactly apple2apple because E5 is higher end product with 3 memory channels vs 2 and 20MB LLC vs 12 MB,
but still, the difference in Watt per GHz is rather bigger than can be attributed to this factors.

> And looking at the maximum clock rates (4790K: 4400MHZ, 6700K:
> 4200MHz), it was not the clock rate, either.
>
> So I guess it's the cost per transistor. If 14nm was not at least
> competetive in cost, Intel would have produced year after year of
> Haswell respins (and maybe a Skylake backport to 22nm), like they did
> with Skylake when 10nm made even more problems than 14nm (and like
> Rocket Lake is a 14nm backport of Ice Lake).
> >Doh. There are obviously also some direct performance benefits
> >(switching speed and less wire delay?).
> An avid reader of this newsgroup knows that switching speed has become
> more and more irrelevant since 90nm or so (wire delay dominates), and
> that wire delay does not improve; what you win in shorter wires, you
> lose in slower signals. What newer processes do provide is more
> layers. Makes me wonder how well older processes with more layers
> would work.
> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: The Tera MTA

<2021Jul22.104200@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19039&group=comp.arch#19039

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: The Tera MTA
Date: Thu, 22 Jul 2021 08:42:00 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 56
Message-ID: <2021Jul22.104200@mips.complang.tuwien.ac.at>
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com> <51548316-5543-41d9-8ec6-822b861ea6afn@googlegroups.com> <aeab88d8-c9c6-49aa-9738-c778ac4174den@googlegroups.com> <02416980-bf6b-4094-8cea-34bb69937472n@googlegroups.com> <19d3513e-ceb0-4e4a-821b-c699fd542981n@googlegroups.com> <7482c2f6-9d84-4bdb-a6e5-bebfc45b33ecn@googlegroups.com> <2021Jul21.162714@mips.complang.tuwien.ac.at> <446fed80-1f97-42f3-bf0c-96c25c075bd1n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="f2c0daf5a9bbce338cd102657521e003";
logging-data="22622"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/NXR659kC7hGxHcQffq341"
Cancel-Lock: sha1:tUG/4qP1b7hVBozlQ7XsM63LSck=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Thu, 22 Jul 2021 08:42 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Wednesday, July 21, 2021 at 6:06:57 PM UTC+3, Anton Ertl wrote:
>> "Paul A. Clayton" <paaron...@gmail.com> writes:
>> >On Tuesday, July 20, 2021 at 3:31:00 PM UTC-4, Paul A. Clayton wrote:
>> >> On Saturday, July 17, 2021 at 5:34:02 PM UTC-4, MitchAlsup wrote:
>> >[snip]
>> >>> Secondly, 20nm still is cheaper per transistor than 14nm, 10nm, 7nm or 5nm.
>> >>
>> >> Really (in 2021)? Obviously, it depends on volume (NRE is significant)
>> >> and chip size (dicing overhead/pad limits), less scaling transistors (I/O),
>> >> wiring constraints (SRAMs might be relatively cheaper, i.e., scaling better),
>> >> and probably other factors, but I thought I had read that Intel claimed
>> >> its 14nm was cheaper per transistor than its 20nm (not that Intel lacked
>> >> motivation to fudge figures).
>> >>
>> >> Are power and transistors per die the only benefits then of 14nm?
>> If you look at Intel, they had their difficulties with the 14nm
>> processes, but they have produced their mainstream CPUs on 14nm since
>> ~2016 (Skylake was still scarce in 2015, so I guess they did not
>> produce that many at that time); they produced (server) CPUs with more
>> transistors in 22nm, so the reason for mainstream 14nm was not
>> transistors per die.
>>
>> Both the 4770 (22nm, 65W TDP) and 6700 (14nm, 65W TDP) have 3.4GHz
>> base clock, so they are probably comparable in power consumption for a
>> given clock rate (although admittedly the Skylake has more IPC and
>> more transistors, so it's not an apples-to-apples comparison; Haswell
>> vs. Broadwell would be better, but Broadwell seemed to suffer from
>> Intel's 14nm woes, and the 5775C (14nm, 65W TDP) has 3.3GHz base
>> clock, lower than either). So it's not power, either, at least at
>> first.
>>
>
>D-1541 (Broadwell) vs E5-1428L v3 (Haswell)
>Broadwell runs both faster and cooler.
>It's not exactly apple2apple because E5 is higher end product with 3 memory channels vs 2 and 20MB LLC vs 12 MB,
>but still, the difference in Watt per GHz is rather bigger than can be attributed to this factors.

But then, when I look at the E5-2609 v4 with 20MB cache and 8 cores, I
see that it has a higher TDP and a lower clock rate than the E5-1428L
v3. And when I look at the E5-2608L v4 (also with 20MB), it has a
slightly lower TDP (50W vs. 60W), but roughly proportionally less
clock rate (1.6GHz base, 1.7GHz Turbo). And there's the E5-2630L v3
with 55W TDP, 1.8GHz base and 2.9GHz Turbo, while the E5-2428L v3 has
the same TDP and only 1.8GHz base without turbo.

I guess if we want to learn something about Intel processes, we need
to look at the top-end CPUs for a given TDP. Looking at CPUs like the
E5-1428L v3 and E5-2428L v3 is misleading. And even that should be
taken with a grain of salt, because for some TDPs, Intel may not offer
the maximum that's technically possible.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: The Tera MTA

<8b75c15f-e926-4391-a6f9-05ca1ef62709n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19047&group=comp.arch#19047

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:9244:: with SMTP id u65mr365848qkd.46.1626968980721;
Thu, 22 Jul 2021 08:49:40 -0700 (PDT)
X-Received: by 2002:a05:6820:61d:: with SMTP id e29mr83534oow.69.1626968977283;
Thu, 22 Jul 2021 08:49:37 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!4.us.feeder.erje.net!2.eu.feeder.erje.net!feeder.erje.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 22 Jul 2021 08:49:37 -0700 (PDT)
In-Reply-To: <2d30e258-5cfc-47e6-b5e1-262e5c5f2701n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=64.26.99.248; posting-account=6JNn0QoAAAD-Scrkl0ClrfutZTkrOS9S
NNTP-Posting-Host: 64.26.99.248
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com> <scv1hp$gh9$1@dont-email.me>
<1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com> <b669dfad-aafc-4933-9b2c-07900162d9fbn@googlegroups.com>
<2d30e258-5cfc-47e6-b5e1-262e5c5f2701n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8b75c15f-e926-4391-a6f9-05ca1ef62709n@googlegroups.com>
Subject: Re: The Tera MTA
From: paaroncl...@gmail.com (Paul A. Clayton)
Injection-Date: Thu, 22 Jul 2021 15:49:40 +0000
Content-Type: text/plain; charset="UTF-8"

by: Paul A. Clayton - Thu, 22 Jul 2021 15:49 UTC

On Wednesday, July 21, 2021 at 2:27:02 AM UTC-4, Quadibloc wrote:
> On Tuesday, July 20, 2021 at 12:57:07 PM UTC-6, Paul A. Clayton wrote:
[snip]
>> Even for SMT, this is not necessarily entirely true. POWER7's SMT4
>> seems to reasonably count as 4-way SMT but requires no
>> additional register storage than SMT2. It uses the replication
>> of the register file for extra read ports to provide twice as
>> many threads with the limit that each pair of threads is limited
>> by the number of read ports in a cluster.
> So if you use the 4-way SMT, you do without the extra read
> ports...

(Extra read ports *per thread*. In theory, since the demand on
write ports would be split in SMT4 mode between clusters,
available port count might be increased. If half of the write
ports could be made into read/write ports at reasonable
cost, a pair of 6 read, 4 write, and 2 read/write register arrays
might support four 2-read, 1-write instructions without
forwarding per cluster. With forwarding and 1-read operations
being fairly common, even high execution width could be
supported.)

(Another interesting technique was using three storage arrays
composed of two banks and an array with the XOR of the
entries from the other banks. This allows the alternate bank
and XOR copy to be read when there would otherwise be a
bank conflict. I would have to search for the paper.)
> Still, you do need four copies of the register file, so this is only
> an exception because of a technicality: they found some use
> for the extra copies when SMT was not in effect.

Yes, it is a footnote, but (I think) a significant one. Maybe not
quite as significant as "accessing a storage array N times per
cycle involves providing N access ports* (*banking can be used,
especially with large storage capacity that naturally uses
multiple arrays [with the issue of conflicts], replication can be used
to multiply read ports, phased access can be used for longer
clock periods, etc.)"

I want to write a longer post about some issues with MTA,
mentioning memory interface issues (width given minimum
burst length, e.g.,), niche market vs. commodity effects,
for HPC ability to use lower thread count SMT/FGMT sharing
one or more compute threads with one or more "latency"
threads, and caching (I think first MTA only had instruction
caching, which makes sense for extremely non-cache friendly
code but introduces a bandwidth as well as latency issue; 64-bit
cache blocks could be used at the cost of relatively high tag
overhead). I am something of an MT fanboi, but my
recent productivity has been very low.

Re: The Tera MTA

<sdc5n4$5qq$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19048&group=comp.arch#19048

copy link Newsgroups: comp.arch

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: sfu...@alumni.cmu.edu.invalid (Stephen Fuld)
Newsgroups: comp.arch
Subject: Re: The Tera MTA
Date: Thu, 22 Jul 2021 09:17:38 -0700
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <sdc5n4$5qq$1@dont-email.me>
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com>
<scv1hp$gh9$1@dont-email.me>
<1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com>
<b669dfad-aafc-4933-9b2c-07900162d9fbn@googlegroups.com>
<2d30e258-5cfc-47e6-b5e1-262e5c5f2701n@googlegroups.com>
<8b75c15f-e926-4391-a6f9-05ca1ef62709n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 22 Jul 2021 16:17:40 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="54b63c1eb961672ede2d35d4df609726";
logging-data="5978"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ccqRplvuyWPwLlU7vUPT01W8kFL94chg="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Cancel-Lock: sha1:1I5g4UtzXR/50JE8KqVnEOr/Xik=
In-Reply-To: <8b75c15f-e926-4391-a6f9-05ca1ef62709n@googlegroups.com>
Content-Language: en-US

by: Stephen Fuld - Thu, 22 Jul 2021 16:17 UTC

On 7/22/2021 8:49 AM, Paul A. Clayton wrote:
> On Wednesday, July 21, 2021 at 2:27:02 AM UTC-4, Quadibloc wrote:
>> On Tuesday, July 20, 2021 at 12:57:07 PM UTC-6, Paul A. Clayton wrote:
> [snip]
>>> Even for SMT, this is not necessarily entirely true. POWER7's SMT4
>>> seems to reasonably count as 4-way SMT but requires no
>>> additional register storage than SMT2. It uses the replication
>>> of the register file for extra read ports to provide twice as
>>> many threads with the limit that each pair of threads is limited
>>> by the number of read ports in a cluster.
>> So if you use the 4-way SMT, you do without the extra read
>> ports...
>
> (Extra read ports *per thread*. In theory, since the demand on
> write ports would be split in SMT4 mode between clusters,
> available port count might be increased. If half of the write
> ports could be made into read/write ports at reasonable
> cost, a pair of 6 read, 4 write, and 2 read/write register arrays
> might support four 2-read, 1-write instructions without
> forwarding per cluster. With forwarding and 1-read operations
> being fairly common, even high execution width could be
> supported.)
>
> (Another interesting technique was using three storage arrays
> composed of two banks and an array with the XOR of the
> entries from the other banks. This allows the alternate bank
> and XOR copy to be read when there would otherwise be a
> bank conflict. I would have to search for the paper.)

RAIR (Redundant Array of Independent Registers) :-)

I don't understand. While that could increase read throughput, all
writes would have to be to all three banks. If you are going to require
that, why do the XOR as opposed to just having three banks with
identical values? I must be missing something.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Re: The Tera MTA

<d815ba44-a659-402a-b616-d06b98a6e8c0n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19050&group=comp.arch#19050

copy link Newsgroups: comp.arch

X-Received: by 2002:a37:d86:: with SMTP id 128mr744167qkn.299.1626975499839;
Thu, 22 Jul 2021 10:38:19 -0700 (PDT)
X-Received: by 2002:a54:4608:: with SMTP id p8mr6570773oip.110.1626975499603;
Thu, 22 Jul 2021 10:38:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 22 Jul 2021 10:38:19 -0700 (PDT)
In-Reply-To: <sdc5n4$5qq$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=64.26.99.248; posting-account=6JNn0QoAAAD-Scrkl0ClrfutZTkrOS9S
NNTP-Posting-Host: 64.26.99.248
References: <d8e86c4a-44a4-4db2-b92b-ddd9c966b9fdn@googlegroups.com>
<127e7a94-d498-4467-8834-c6b639515c3bn@googlegroups.com> <scv1hp$gh9$1@dont-email.me>
<1b43802a-3438-4aa8-bdc4-0b86f976eda9n@googlegroups.com> <b669dfad-aafc-4933-9b2c-07900162d9fbn@googlegroups.com>
<2d30e258-5cfc-47e6-b5e1-262e5c5f2701n@googlegroups.com> <8b75c15f-e926-4391-a6f9-05ca1ef62709n@googlegroups.com>
<sdc5n4$5qq$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d815ba44-a659-402a-b616-d06b98a6e8c0n@googlegroups.com>
Subject: Re: The Tera MTA
From: paaroncl...@gmail.com (Paul A. Clayton)
Injection-Date: Thu, 22 Jul 2021 17:38:19 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: Paul A. Clayton - Thu, 22 Jul 2021 17:38 UTC

On Thursday, July 22, 2021 at 12:17:42 PM UTC-4, Stephen Fuld wrote:
> On 7/22/2021 8:49 AM, Paul A. Clayton wrote:
[snip]
>> (Another interesting technique was using three storage arrays
>> composed of two banks and an array with the XOR of the
>> entries from the other banks. This allows the alternate bank
>> and XOR copy to be read when there would otherwise be a
>> bank conflict. I would have to search for the paper.)
> RAIR (Redundant Array of Independent Registers) :-)
>
> I don't understand. While that could increase read throughput, all
> writes would have to be to all three banks. If you are going to require
> that, why do the XOR as opposed to just having three banks with
> identical values? I must be missing something.

I think part of the idea (I do not remember) was that writes can be
delayed (with forwarding/caching) and bank conflicts tend to
average out over time. A single write has to go to two of the three
arrays. For reads (with storage arrays A, B, and X), two reads that
would access A twice (or B twice) can instead read from A (or B)
and B (or A) and X.

The duplication method of adding read ports, results in N/2 read ports
and M write ports in two arrays of R entries. The XOR method results in
N/2 read ports and M/2 write ports in three arrays of R/2 entries.
(If the XOR array does not have twice as many write ports, it would have
to be banked, I guess.)

I suspect with high access count, write buffering, and dynamic scheduling
(with read buffering — operand capture and read scheduling separate from
operation scheduling) ordinary banking would work well enough. I vaguely
recall that the paper was not satisfying, but I do not remember why.

The road to hell is paved with NAND gates. -- J. Gooding

devel / comp.arch / Re: The Tera MTA

Subject	Author
The Tera MTA	Quadibloc
Re: The Tera MTA	MitchAlsup
Re: The Tera MTA	BGB
Re: The Tera MTA	MitchAlsup
Re: The Tera MTA	BGB
Re: The Tera MTA	MitchAlsup
Re: The Tera MTA	Quadibloc
Re: The Tera MTA	Anton Ertl
Re: The Tera MTA	BGB
Re: The Tera MTA	Marcus
Re: The Tera MTA	Quadibloc
Re: The Tera MTA	Quadibloc
Re: The Tera MTA	Stephen Fuld
Re: The Tera MTA	BGB
Re: The Tera MTA	Quadibloc
Re: The Tera MTA	MitchAlsup
Re: The Tera MTA	Quadibloc
Re: The Tera MTA	Terje Mathisen
Re: The Tera MTA	Ivan Godard
Re: The Tera MTA	Stephen Fuld
Re: The Tera MTA	Bill Findlay
Re: The Tera MTA	Marcus
Re: The Tera MTA	Chris M. Thomasson
Re: The Tera MTA	BGB
Re: The Tera MTA	Marcus
Re: The Tera MTA	MitchAlsup
Re: The Tera MTA	Quadibloc
Re: The Tera MTA	MitchAlsup
Re: The Tera MTA	Paul A. Clayton
Re: The Tera MTA	Quadibloc
Re: The Tera MTA	Paul A. Clayton
Re: The Tera MTA	Stephen Fuld
Re: The Tera MTA	Paul A. Clayton
Re: The Tera MTA	Stefan Monnier
Re: The Tera MTA	Quadibloc
Re: The Tera MTA	Paul A. Clayton
Re: The Tera MTA	Tom Gardner
Re: The Tera MTA	Quadibloc
Re: The Tera MTA	Quadibloc
Re: The Tera MTA	Quadibloc
Re: The Tera MTA	MitchAlsup
Re: The Tera MTA	Quadibloc
Re: The Tera MTA	Paul A. Clayton
Re: The Tera MTA	Quadibloc
Re: The Tera MTA	Ivan Godard
Re: The Tera MTA	Paul A. Clayton
Re: The Tera MTA	Anton Ertl
Re: The Tera MTA	Michael S
Re: The Tera MTA	Anton Ertl
Re: The Tera MTA	Michael S
Re: The Tera MTA	Anton Ertl
Re: The Tera MTA	Anton Ertl
Re: The Tera MTA	John Dallman
Re: The Tera MTA	Quadibloc
Re: The Tera MTA	Quadibloc