Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Before Xerox, five carbons were the maximum extension of anybody's ego.


devel / comp.arch / RISC-V U74 on Starfive Visionfive V1

SubjectAuthor
* RISC-V U74 on Starfive Visionfive V1Anton Ertl
+- Re: RISC-V U74 on Starfive Visionfive V1Anton Ertl
`* Re: RISC-V U74 on Starfive Visionfive V1Robert Swindells
 `* Re: RISC-V U74 on Starfive Visionfive V1Anton Ertl
  +* Re: RISC-V U74 on Starfive Visionfive V1Robert Swindells
  |`* Re: RISC-V U74 on Starfive Visionfive V1Anton Ertl
  | `* Re: RISC-V U74 on Starfive Visionfive V1BGB
  |  `* Re: RISC-V U74 on Starfive Visionfive V1MitchAlsup
  |   `- Re: RISC-V U74 on Starfive Visionfive V1BGB
  +* Re: RISC-V U74 on Starfive Visionfive V1Anssi Saari
  |`- Re: RISC-V U74 on Starfive Visionfive V1Anton Ertl
  `* Re: RISC-V U74 on Starfive Visionfive V1antispam
   `- Re: RISC-V U74 on Starfive Visionfive V1Anton Ertl

1
RISC-V U74 on Starfive Visionfive V1

<2022Feb26.124939@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=23783&group=comp.arch#23783

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: RISC-V U74 on Starfive Visionfive V1
Date: Sat, 26 Feb 2022 11:49:39 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 88
Message-ID: <2022Feb26.124939@mips.complang.tuwien.ac.at>
Injection-Info: reader02.eternal-september.org; posting-host="658abee49b599875bd0b6bcb2e014f0d";
logging-data="23453"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19+QKaINNKMgeVcDES53IGO"
Cancel-Lock: sha1:+Eh8qpyU+fMooYFkrJ4cSeIe60E=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 26 Feb 2022 11:49 UTC

We received our Starfive Visionfive V1 SBC, which has a JH7100 SoC
with two U74 cores (dual-issue in-order). We installed the Fedora 33
image from Starfive on it.

First impressions:

+ It works.

+ The software is pretty mature and the installation complete.

+ In particular, perf works (well, at least for cycles and
instructions), which is more than for many of the cores on Aarch64
SBCs.

+ 8GB RAM

- But of course, this makes it easy to see that the cores run at
~1GHz, not the announced 1.5GHz. Maybe there's a way to increase
the clock, but given that we did not buy this SBC to get a fast CPU,
it's a minor blemish.

- It's sluggish when used as a desktop computer (i.e. using it's own
video output, a keyboard and a mouse). It seems that it is pretty
slow at updating the frame buffer. Fortunately, we use it in that
capacity only until we set up ssh and then use it as headless server
through ssh, where it's relatively snappy.

- I/O to the SD-card is also pretty slow, even though we use a fast MicroSD
card (Sandisk Extreme Pro 256GB).

Performance:

Gforth-fast small benchmarks (numbers are seconds):
sieve bubble matrix fib fft release; CPU; gcc
0.388 0.424 0.252 0.504 0.276 20190124; RockPro64 (1416MHz Cortex-A53)
0.597 0.796 0.633 0.798 0.622 20220217; 1GHz U74 (JH7100, Visionfive V1)

(The Cortex-A53 uses 3 instead of 1 register for stack caching, which
buys a factor 1.1-1.3 on these benchmarks.)

Native-code size (in bytes) for the Gforth kernel with gforth-fast
(artifacts like unrolling factors should play little role here):

93738 rv64gc (1 register for stack caching)
99988 ARM A64 (3 registers for stack caching)
105844 ARM A64 (1 register for stack caching)
108963 AMD64 (1 register for stack caching)

These numbers are determined with

gforth-fast --ss-states=2 --print-metrics -i kernl64l.fi -e bye
gforth-fast --ss-states=4 --print-metrics -i kernl64l.fi -e bye

LaTeX benchmark (numbers are seconds):

- Rockpro64 (1416MHz Cortex A53) Debian 9 (Stretch) 3.24
- Starfive Visionfive JH7100 (1 GHz U74) Fedora 33 (TexLive 2020) 5.492

But note that recent TeX Live installations take significantly more
instructions and cycles for this benchmark.

Writing/reading a 256MB file on an ext4 file system (with sync for the
write, and empty buffer before read):

17.3 MB/s time sh -c "dd if=/dev/zero of=bla2 bs=64k count=4096; sync"
20.0 MB/s time dd if=bla2 of=/dev/null bs=64k

Similar results with bs=1M count=1000

Same experiment with an NFS file system backed by BTRFS on HDDs:

21.9 MB/s time sh -c "dd if=/dev/zero of=bla2 bs=64k count=4096; sync"
32.8 MB/s time dd if=bla2 of=/dev/null bs=64k

(The advantage of the local file system is that it is better at
caching, though).

Same with a Ryzen 5800X box (on the same NFS server):

32.8MB/s time sh -c "dd if=/dev/zero of=bla2 bs=64k count=4096; sync"
114.5MB/s time dd if=bla2 of=/dev/null bs=64k

Read performance is very close to the Gb Ethernet limit.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: RISC-V U74 on Starfive Visionfive V1

<2022Feb26.151816@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=23784&group=comp.arch#23784

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: RISC-V U74 on Starfive Visionfive V1
Date: Sat, 26 Feb 2022 14:18:16 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 49
Message-ID: <2022Feb26.151816@mips.complang.tuwien.ac.at>
References: <2022Feb26.124939@mips.complang.tuwien.ac.at>
Injection-Info: reader02.eternal-september.org; posting-host="658abee49b599875bd0b6bcb2e014f0d";
logging-data="11234"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/k2FxxJs4pW6rruE5dZvA2"
Cancel-Lock: sha1:uVzJzkojTFGxI+UtiIGesYcjzoY=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 26 Feb 2022 14:18 UTC

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>Performance:

Some additional results: Unlike MIPS and Alpha, the RISC-V calling
convention has enough callee-saved registers to support 3 registers
for stack caching (or maybe gcc now uses caller-saved registers for
that, but I doubt that). Reading up on that, RISC-V has 12
callee-saved registers, while Alpha has 7 (including %15/fp).

>Gforth-fast small benchmarks (numbers are seconds):
> sieve bubble matrix fib fft release; CPU; gcc
> 0.388 0.424 0.252 0.504 0.276 20190124; RockPro64 (1416MHz Cortex-A53)
0.596 0.739 0.779 0.845 0.640 20220217; 1GHz U74 --ss-states=1
> 0.597 0.796 0.633 0.798 0.622 20220217; 1GHz U74 --ss-states=2
0.538 0.760 0.602 0.782 0.630 20220217; 1GHz U74 --ss-states=3
0.522 0.550 0.517 0.803 0.790 20220217; 1GHz U74 --ss-states=4

The slow fft result for --ss-states=4 is strange; I have seen one
0.630s result when running that, but the frequent result is ~0.8s.
Overall the results are relatively volatile.

>Native-code size (in bytes) for the Gforth kernel with gforth-fast
>(artifacts like unrolling factors should play little role here):

89742 rv64gc (3 registers for stack caching)
> 93738 rv64gc (1 register for stack caching)
> 99988 ARM A64 (3 registers for stack caching)
>105844 ARM A64 (1 register for stack caching)
>108963 AMD64 (1 register for stack caching)
>
>These numbers are determined with
>
>gforth-fast --ss-states=2 --print-metrics -i kernl64l.fi -e bye
>gforth-fast --ss-states=4 --print-metrics -i kernl64l.fi -e bye

One interesting detail:

0x0000000000018996 <gforth_engine+15172>: ld s1,0(s1)
0x0000000000018998 <gforth_engine+15174>: addi s10,s10,8
....
0x00000000000189a0 <gforth_engine+15182>: ld s3,0(s3)
0x00000000000189a4 <gforth_engine+15186>: addi s10,s10,8

So the first load is encoded in 2 bytes, the second in 4.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: RISC-V U74 on Starfive Visionfive V1

<svdk5r$bu6$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=23786&group=comp.arch#23786

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: rjs...@fdy2.co.uk (Robert Swindells)
Newsgroups: comp.arch
Subject: Re: RISC-V U74 on Starfive Visionfive V1
Date: Sat, 26 Feb 2022 16:23:23 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 31
Message-ID: <svdk5r$bu6$1@dont-email.me>
References: <2022Feb26.124939@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 26 Feb 2022 16:23:23 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="de9e09f79848be6655730a391ce88457";
logging-data="12230"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Lk1JPyvWPSmHCJBKPqFVX7ivUpaXY8X4="
User-Agent: Pan/0.149 (Bellevue; 4c157ba git@gitlab.gnome.org:GNOME/pan.git)
Cancel-Lock: sha1:EUuDJ2ScEWM0L4aZsyRS/+1YC3I=
 by: Robert Swindells - Sat, 26 Feb 2022 16:23 UTC

On Sat, 26 Feb 2022 11:49:39 GMT, Anton Ertl wrote:

> We received our Starfive Visionfive V1 SBC, which has a JH7100 SoC with
> two U74 cores (dual-issue in-order). We installed the Fedora 33 image
> from Starfive on it.
>
> First impressions:
>
> + It works.
>
> + The software is pretty mature and the installation complete.
>
> + In particular, perf works (well, at least for cycles and
> instructions), which is more than for many of the cores on Aarch64
> SBCs.
>
> + 8GB RAM

Cheap Aarch64 boards are available with 8GB now as well, not just the
RPi 4.

> - It's sluggish when used as a desktop computer (i.e. using it's own
> video output, a keyboard and a mouse). It seems that it is pretty
> slow at updating the frame buffer. Fortunately, we use it in that
> capacity only until we set up ssh and then use it as headless server
> through ssh, where it's relatively snappy.

Doesn't look like it has a GPU, there are suggestions that later versions
of the SoC will get a PowerVR one, using something with open source
support would have seemed better to me but don't know what they could
pick.

Re: RISC-V U74 on Starfive Visionfive V1

<2022Feb26.225344@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=23794&group=comp.arch#23794

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: RISC-V U74 on Starfive Visionfive V1
Date: Sat, 26 Feb 2022 21:53:44 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 21
Message-ID: <2022Feb26.225344@mips.complang.tuwien.ac.at>
References: <2022Feb26.124939@mips.complang.tuwien.ac.at> <svdk5r$bu6$1@dont-email.me>
Injection-Info: reader02.eternal-september.org; posting-host="658abee49b599875bd0b6bcb2e014f0d";
logging-data="8450"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX199bWIJUgrAILHaM4yZcfDk"
Cancel-Lock: sha1:97mbU72fP1czi8IizVH+TRcl4nA=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 26 Feb 2022 21:53 UTC

Robert Swindells <rjs@fdy2.co.uk> writes:
>On Sat, 26 Feb 2022 11:49:39 GMT, Anton Ertl wrote:
>> - It's sluggish when used as a desktop computer (i.e. using it's own
>> video output, a keyboard and a mouse). It seems that it is pretty
>> slow at updating the frame buffer. Fortunately, we use it in that
>> capacity only until we set up ssh and then use it as headless server
>> through ssh, where it's relatively snappy.
>
>Doesn't look like it has a GPU

It certainly does not feel like it. But even a pure video card like
the Cirrus 5428 I had in my 486/66 did not produce this sluggish
behaviour (or maybe I am spoilt by current snappy computers?-); ok, it
only supported 8-bit colour depth IIRC and 1024x768 resolution, but
given the advances in RAM bandwidth and CPU speed between the 486/66
and the 1000MHz U74, I would expect better performance.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: RISC-V U74 on Starfive Visionfive V1

<svfs7a$fdr$2@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=23800&group=comp.arch#23800

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: rjs...@fdy2.co.uk (Robert Swindells)
Newsgroups: comp.arch
Subject: Re: RISC-V U74 on Starfive Visionfive V1
Date: Sun, 27 Feb 2022 12:52:58 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <svfs7a$fdr$2@dont-email.me>
References: <2022Feb26.124939@mips.complang.tuwien.ac.at>
<svdk5r$bu6$1@dont-email.me> <2022Feb26.225344@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 27 Feb 2022 12:52:58 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="bae907c012d9c0d9ea8934f4179230a3";
logging-data="15803"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+jVHhPVPqZoVYoYOeiibG5FKrIZ2TYNGk="
User-Agent: Pan/0.149 (Bellevue; 4c157ba git@gitlab.gnome.org:GNOME/pan.git)
Cancel-Lock: sha1:dh8UqUXimf5yE++N8h9FZK8635U=
 by: Robert Swindells - Sun, 27 Feb 2022 12:52 UTC

On Sat, 26 Feb 2022 21:53:44 GMT, Anton Ertl wrote:

> Robert Swindells <rjs@fdy2.co.uk> writes:
>>On Sat, 26 Feb 2022 11:49:39 GMT, Anton Ertl wrote:
>>> - It's sluggish when used as a desktop computer (i.e. using it's own
>>> video output, a keyboard and a mouse). It seems that it is pretty
>>> slow at updating the frame buffer. Fortunately, we use it in that
>>> capacity only until we set up ssh and then use it as headless server
>>> through ssh, where it's relatively snappy.
>>
>>Doesn't look like it has a GPU
>
> It certainly does not feel like it. But even a pure video card like the
> Cirrus 5428 I had in my 486/66 did not produce this sluggish behaviour
> (or maybe I am spoilt by current snappy computers?-); ok, it only
> supported 8-bit colour depth IIRC and 1024x768 resolution, but given the
> advances in RAM bandwidth and CPU speed between the 486/66 and the
> 1000MHz U74, I would expect better performance.

I think that two things have happened:

GPUs have stopped providing 2D graphics features.

Software, in particular X11, has stopped trying to use 2D GPU features.

Your Cirrus card would have been able to fill rectangles, copy areas
around on the screen to scroll windows, copy with colour expansion to
do text and draw lines all in hardware. Application software wouldn't have
had many layers above this hardware.

Now, your RISC-V system is probably trying to do all these things through
a software emulation of OpenGL.

Application software libraries now do a lot more in the client and often
just send bitmaps to the server to display.

I guess if you have enough conventional CPU cores then it doesn't really
matter if you dedicate a few of them to doing graphics.

I thought that Larrabee seemed a good match for how X11 currently works
but that wasn't going to be enough to make it profitable.

Some Aarch64 SoC vendors have put 2D acceleration into their chips as well
as the ARM Mali 3D GPU, there isn't much software support for them though.

The RISC-V market needs a better story on how they are going to handle
graphics, trying to use PowerVR for the GPU when people will be mostly
using Linux seems wrong to me.

Re: RISC-V U74 on Starfive Visionfive V1

<2022Feb27.153612@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=23802&group=comp.arch#23802

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: RISC-V U74 on Starfive Visionfive V1
Date: Sun, 27 Feb 2022 14:36:12 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 92
Message-ID: <2022Feb27.153612@mips.complang.tuwien.ac.at>
References: <2022Feb26.124939@mips.complang.tuwien.ac.at> <svdk5r$bu6$1@dont-email.me> <2022Feb26.225344@mips.complang.tuwien.ac.at> <svfs7a$fdr$2@dont-email.me>
Injection-Info: reader02.eternal-september.org; posting-host="1d6e0118d030dd8cbcffcfe004a01bb4";
logging-data="23523"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/YExS0XTX5rNNdQo2tameQ"
Cancel-Lock: sha1:WDyy6oYaMGs8lALIiLOCoqI1yA0=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sun, 27 Feb 2022 14:36 UTC

Robert Swindells <rjs@fdy2.co.uk> writes:
>On Sat, 26 Feb 2022 21:53:44 GMT, Anton Ertl wrote:
>
>> Robert Swindells <rjs@fdy2.co.uk> writes:
>>>On Sat, 26 Feb 2022 11:49:39 GMT, Anton Ertl wrote:
>>>> - It's sluggish when used as a desktop computer (i.e. using it's own
>>>> video output, a keyboard and a mouse). It seems that it is pretty
>>>> slow at updating the frame buffer. Fortunately, we use it in that
>>>> capacity only until we set up ssh and then use it as headless server
>>>> through ssh, where it's relatively snappy.
>>>
>>>Doesn't look like it has a GPU
>>
>> It certainly does not feel like it. But even a pure video card like the
>> Cirrus 5428 I had in my 486/66 did not produce this sluggish behaviour
>> (or maybe I am spoilt by current snappy computers?-); ok, it only
>> supported 8-bit colour depth IIRC and 1024x768 resolution, but given the
>> advances in RAM bandwidth and CPU speed between the 486/66 and the
>> 1000MHz U74, I would expect better performance.
>
>I think that two things have happened:
>
> GPUs have stopped providing 2D graphics features.
>
> Software, in particular X11, has stopped trying to use 2D GPU features.
>
>Your Cirrus card would have been able to fill rectangles, copy areas
>around on the screen to scroll windows, copy with colour expansion to
>do text and draw lines all in hardware.

Reading up on it, yes it did have hardware BitBLT indeed. And it
seems that XFree86 used these acceleration features. But still, it
had a 32-bit memory interface with whatever slow memory was current in
1993; the U74 should be able to run rings around it when blitting
(even though its DRAM controller could do better).

>Now, your RISC-V system is probably trying to do all these things through
>a software emulation of OpenGL.

That may be the reason for the sluggishness.

>Application software libraries now do a lot more in the client and often
>just send bitmaps to the server to display.

That should be easy for the server. But of course if it first has to
do some OpenGL scaling etc., that could be slow on a pure software
implementation of OpenGL.

>The RISC-V market needs a better story on how they are going to handle
>graphics,

Only if it wants to make inroads in the desktop and mobile markets.
For a server, there are other places that need to be addressed first
(memory controller, I/O). They also need to fix these for the
desktop.

>trying to use PowerVR for the GPU when people will be mostly
>using Linux seems wrong to me.

Yes.

The discussion about Blitting inspired me to check out write bandwidth:

Here we malloc() 1GB and use memset() to fill it with 1s:

[fedora-starfive:~/nfstmp/gforth-riscv:76808] time gforth-fast -e "1000000000 allocate throw 1000000000 1 fill bye"

real 0m6.362s
user 0m0.243s
sys 0m6.095s

Funny user/system balance. My explanation is that the system COWs
each page on the first write access, and then that page is in the
cache, and the memset runs fast (that's why we see such a low user
time) up to the next page fault.

To reduce the system overhead, now write the 1GB by allocating a 10MB
block and memset()ting it 100 times, with values 0-99:

[fedora-starfive:~/nfstmp/gforth-riscv:76809] time gforth-fast -e "10000000 allocate throw constant a : foo 100 0 do a 10000000 i fill loop ; foo bye"

real 0m4.055s
user 0m3.902s
sys 0m0.126s

So it seems that the U74 writes at about 250MB/s. That's not great,
but it probably exceeds the CL5428 by a good margin.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: RISC-V U74 on Starfive Visionfive V1

<sm0ee3om7xd.fsf@lakka.kapsi.fi>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=23803&group=comp.arch#23803

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: as...@sci.fi (Anssi Saari)
Newsgroups: comp.arch
Subject: Re: RISC-V U74 on Starfive Visionfive V1
Date: Sun, 27 Feb 2022 21:22:06 +0200
Organization: An impatient and LOUD arachnid
Lines: 13
Message-ID: <sm0ee3om7xd.fsf@lakka.kapsi.fi>
References: <2022Feb26.124939@mips.complang.tuwien.ac.at>
<svdk5r$bu6$1@dont-email.me>
<2022Feb26.225344@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="08bada79738ef684c881281032cbd5e6";
logging-data="30505"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18gLCrDfqCqcRCBwQXzpWSF"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
Cancel-Lock: sha1:aVB7Rrtc05W4+1FcJJVA1lC7I7Y=
sha1:vf0Zb5JPCGbcqzuJe6zuugtRXw8=
 by: Anssi Saari - Sun, 27 Feb 2022 19:22 UTC

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

> It certainly does not feel like it. But even a pure video card like
> the Cirrus 5428 I had in my 486/66 did not produce this sluggish
> behaviour (or maybe I am spoilt by current snappy computers?-); ok, it
> only supported 8-bit colour depth IIRC and 1024x768 resolution, but
> given the advances in RAM bandwidth and CPU speed between the 486/66
> and the 1000MHz U74, I would expect better performance.

From Wikipedia, looks like the Cirrus (Logic) 5428 was pretty fast for
its time and had decent 2D acceleration, even in Linux. So with a
lightweight GUI like twm or Windows Whatever of the time I think the 486
system probably was quite snappy.

Re: RISC-V U74 on Starfive Visionfive V1

<svgvp8$3tp$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=23808&group=comp.arch#23808

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: RISC-V U74 on Starfive Visionfive V1
Date: Sun, 27 Feb 2022 16:59:50 -0600
Organization: A noiseless patient Spider
Lines: 176
Message-ID: <svgvp8$3tp$1@dont-email.me>
References: <2022Feb26.124939@mips.complang.tuwien.ac.at>
<svdk5r$bu6$1@dont-email.me> <2022Feb26.225344@mips.complang.tuwien.ac.at>
<svfs7a$fdr$2@dont-email.me> <2022Feb27.153612@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 27 Feb 2022 22:59:52 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f1156aedfc4c25f5791605d56cba624a";
logging-data="4025"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/XnCakx47ln0hx6w4RaZkq"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Cancel-Lock: sha1:BmA4vZoijBRqQL4TwmYeXXw42n8=
In-Reply-To: <2022Feb27.153612@mips.complang.tuwien.ac.at>
Content-Language: en-US
 by: BGB - Sun, 27 Feb 2022 22:59 UTC

On 2/27/2022 8:36 AM, Anton Ertl wrote:
> Robert Swindells <rjs@fdy2.co.uk> writes:
>> On Sat, 26 Feb 2022 21:53:44 GMT, Anton Ertl wrote:
>>
>>> Robert Swindells <rjs@fdy2.co.uk> writes:
>>>> On Sat, 26 Feb 2022 11:49:39 GMT, Anton Ertl wrote:
>>>>> - It's sluggish when used as a desktop computer (i.e. using it's own
>>>>> video output, a keyboard and a mouse). It seems that it is pretty
>>>>> slow at updating the frame buffer. Fortunately, we use it in that
>>>>> capacity only until we set up ssh and then use it as headless server
>>>>> through ssh, where it's relatively snappy.
>>>>
>>>> Doesn't look like it has a GPU
>>>
>>> It certainly does not feel like it. But even a pure video card like the
>>> Cirrus 5428 I had in my 486/66 did not produce this sluggish behaviour
>>> (or maybe I am spoilt by current snappy computers?-); ok, it only
>>> supported 8-bit colour depth IIRC and 1024x768 resolution, but given the
>>> advances in RAM bandwidth and CPU speed between the 486/66 and the
>>> 1000MHz U74, I would expect better performance.
>>
>> I think that two things have happened:
>>
>> GPUs have stopped providing 2D graphics features.
>>
>> Software, in particular X11, has stopped trying to use 2D GPU features.
>>
>> Your Cirrus card would have been able to fill rectangles, copy areas
>> around on the screen to scroll windows, copy with colour expansion to
>> do text and draw lines all in hardware.
>
> Reading up on it, yes it did have hardware BitBLT indeed. And it
> seems that XFree86 used these acceleration features. But still, it
> had a 32-bit memory interface with whatever slow memory was current in
> 1993; the U74 should be able to run rings around it when blitting
> (even though its DRAM controller could do better).
>
>> Now, your RISC-V system is probably trying to do all these things through
>> a software emulation of OpenGL.
>
> That may be the reason for the sluggishness.
>

Also, the design of RISC-V seems like it would "kinda suck" for OpenGL
emulation performance (at least, limited to the parts of the ISA I am
currently aware of).

>> Application software libraries now do a lot more in the client and often
>> just send bitmaps to the server to display.
>
> That should be easy for the server. But of course if it first has to
> do some OpenGL scaling etc., that could be slow on a pure software
> implementation of OpenGL.
>

All the fancy "Compositing Window Manager" stuff is kinda pointless
fluff IMO.

There hasn't really been any practical advancement in these areas much
past Win2K and similar.

Nor is there much one can really do to move things forward, at least
given that full 3D VR UIs, or desktop backgrounds with spinning 3D
models, are mostly something that didn't make it much past 80s and 90s
era scifi movies (there not being much point IRL in having a spinning
Utah Teapot or similar as ones' desktop background).

Scifi gives us needless spinning stuff, and people doing surfing motions
while wearing gloves and a VR helmet, and also presumably they find some
way to avoid the users invariably getting motion sickness as a result of
all this.

Reality gives us translucent border effects and lag...

Like, oh wow, the window border looks like frosted glass, and now the
whole OS is lagging trying to deliver this effect until the user can
find the option in Control Panel to turn it off.

OS Updates:
Were gonna switch your Power-Button setting back from Hibernate to
Sleep, turn GUI effects back on, ...
Also random "Your battery is Low, Plug in your Device soon."
notifications, me, "FFS Windows, I am running a Ye Olde Desktop PC.
There is no battery in this thing."

>> The RISC-V market needs a better story on how they are going to handle
>> graphics,
>
> Only if it wants to make inroads in the desktop and mobile markets.
> For a server, there are other places that need to be addressed first
> (memory controller, I/O). They also need to fix these for the
> desktop.
>
>> trying to use PowerVR for the GPU when people will be mostly
>> using Linux seems wrong to me.
>
> Yes.
>

Makes me half wonder what the "minimum viable GPU" could be.

I guess the main task it would have would be:
Ability to copy pixels from one place in VRAM to another;
Ability to copy pixels while applying modulation or blending.
Ability to perform a texture fetch and ST stepping (3D / Advanced 2D)
Ability to walk edges of a primitive triangle or quad (3D)

Ideally, it should be able to do these:
Faster than doing them on the CPU;
Cheaper than throwing a specialized CPU core at it.
(Eg: CPU core with ISA tweaks for common GPU tasks).

Seems like one could do something like this with a few specialized
caches and registers:
Framebuffer Cache(s), represent logical screen pixels Read/Write;
Texture Cache, holds texture blocks, Read-Only.

Would need a bunch of registers to represent the edge-walking state of
the primitive being drawn.

Maybe the GPU throws interrupts at whatever core is managing the
geometry transform?... It is that or use polling. Doing a GPU which runs
the full transform seems like a harder problem than one which runs a
rasterizer.

These would come with a big obvious drawback: They would preclude the
use of shaders.

The "able to run shaders" requirement would likely turn the "cheapest"
option into throwing one or more specialized CPU cores at it.

Maybe the GPU cores could run RISC-V, but RISC-V would still kinda suck
as the basis for a GPU ISA.

> The discussion about Blitting inspired me to check out write bandwidth:
>
> Here we malloc() 1GB and use memset() to fill it with 1s:
>
> [fedora-starfive:~/nfstmp/gforth-riscv:76808] time gforth-fast -e "1000000000 allocate throw 1000000000 1 fill bye"
>
> real 0m6.362s
> user 0m0.243s
> sys 0m6.095s
>
> Funny user/system balance. My explanation is that the system COWs
> each page on the first write access, and then that page is in the
> cache, and the memset runs fast (that's why we see such a low user
> time) up to the next page fault.
>
> To reduce the system overhead, now write the 1GB by allocating a 10MB
> block and memset()ting it 100 times, with values 0-99:
>
> [fedora-starfive:~/nfstmp/gforth-riscv:76809] time gforth-fast -e "10000000 allocate throw constant a : foo 100 0 do a 10000000 i fill loop ; foo bye"
>
> real 0m4.055s
> user 0m3.902s
> sys 0m0.126s
>
> So it seems that the U74 writes at about 250MB/s. That's not great,
> but it probably exceeds the CL5428 by a good margin.
>

Well, it is a lot faster at this than my BJX2 core running on a Nexys A7
at least...

> - anton

Re: RISC-V U74 on Starfive Visionfive V1

<87c1bfeb-984d-46aa-89d6-8b269dc82aa9n@googlegroups.com>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=23812&group=comp.arch#23812

 copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:f08:b0:433:6cf:9f7c with SMTP id gw8-20020a0562140f0800b0043306cf9f7cmr3999789qvb.71.1646006814823;
Sun, 27 Feb 2022 16:06:54 -0800 (PST)
X-Received: by 2002:a05:6870:c153:b0:d6:fef3:d1a5 with SMTP id
g19-20020a056870c15300b000d6fef3d1a5mr4850117oad.217.1646006814610; Sun, 27
Feb 2022 16:06:54 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 27 Feb 2022 16:06:54 -0800 (PST)
In-Reply-To: <svgvp8$3tp$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:91b0:4f78:53e9:ab3e;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:91b0:4f78:53e9:ab3e
References: <2022Feb26.124939@mips.complang.tuwien.ac.at> <svdk5r$bu6$1@dont-email.me>
<2022Feb26.225344@mips.complang.tuwien.ac.at> <svfs7a$fdr$2@dont-email.me>
<2022Feb27.153612@mips.complang.tuwien.ac.at> <svgvp8$3tp$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <87c1bfeb-984d-46aa-89d6-8b269dc82aa9n@googlegroups.com>
Subject: Re: RISC-V U74 on Starfive Visionfive V1
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 28 Feb 2022 00:06:54 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Mon, 28 Feb 2022 00:06 UTC

On Sunday, February 27, 2022 at 4:59:55 PM UTC-6, BGB wrote:
> On 2/27/2022 8:36 AM, Anton Ertl wrote:
> >
> > Reading up on it, yes it did have hardware BitBLT indeed. And it
> > seems that XFree86 used these acceleration features. But still, it
> > had a 32-bit memory interface with whatever slow memory was current in
> > 1993; the U74 should be able to run rings around it when blitting
> > (even though its DRAM controller could do better).
> >
> >> Now, your RISC-V system is probably trying to do all these things through
> >> a software emulation of OpenGL.
> >
> > That may be the reason for the sluggishness.
> >
> Also, the design of RISC-V seems like it would "kinda suck" for OpenGL
> emulation performance (at least, limited to the parts of the ISA I am
> currently aware of).
<
Gomer Pyle would be proud........
<
> >> Application software libraries now do a lot more in the client and often
> >> just send bitmaps to the server to display.
> >
> > That should be easy for the server. But of course if it first has to
> > do some OpenGL scaling etc., that could be slow on a pure software
> > implementation of OpenGL.
> >
> All the fancy "Compositing Window Manager" stuff is kinda pointless
> fluff IMO.
>
> There hasn't really been any practical advancement in these areas much
> past Win2K and similar.
>
> Nor is there much one can really do to move things forward, at least
> given that full 3D VR UIs, or desktop backgrounds with spinning 3D
> models, are mostly something that didn't make it much past 80s and 90s
> era scifi movies (there not being much point IRL in having a spinning
> Utah Teapot or similar as ones' desktop background).
>
> Scifi gives us needless spinning stuff, and people doing surfing motions
> while wearing gloves and a VR helmet, and also presumably they find some
> way to avoid the users invariably getting motion sickness as a result of
> all this.
>
I, personally, cannot stay on an internet page that uses moving advertisements.
I either turn of JavaScript or leave.
>
> Reality gives us translucent border effects and lag...
>
> Like, oh wow, the window border looks like frosted glass, and now the
> whole OS is lagging trying to deliver this effect until the user can
> find the option in Control Panel to turn it off.
>
>
> OS Updates:
> Were gonna switch your Power-Button setting back from Hibernate to
> Sleep, turn GUI effects back on, ...
> Also random "Your battery is Low, Plug in your Device soon."
> notifications, me, "FFS Windows, I am running a Ye Olde Desktop PC.
> There is no battery in this thing."
> >> The RISC-V market needs a better story on how they are going to handle
> >> graphics,
> >
> > Only if it wants to make inroads in the desktop and mobile markets.
> > For a server, there are other places that need to be addressed first
> > (memory controller, I/O). They also need to fix these for the
> > desktop.
> >
> >> trying to use PowerVR for the GPU when people will be mostly
> >> using Linux seems wrong to me.
> >
> > Yes.
> >
> Makes me half wonder what the "minimum viable GPU" could be.
>
> I guess the main task it would have would be:
> Ability to copy pixels from one place in VRAM to another;
> Ability to copy pixels while applying modulation or blending.
> Ability to perform a texture fetch and ST stepping (3D / Advanced 2D)
> Ability to walk edges of a primitive triangle or quad (3D)
<
Minimum viable GPU is the backend == Rendering. The above is a list
of things the backend can do. Blitting is simply 'texture' to triangles
normal to the viewing surface. Blending is simply sub-features of
'texture'.
>
> Ideally, it should be able to do these:
> Faster than doing them on the CPU;
> Cheaper than throwing a specialized CPU core at it.
> (Eg: CPU core with ISA tweaks for common GPU tasks).
<
It can do these faster than CPU only when the number of things requested
takes GPU way more time than the latency of getting started and stopped.
{Generally in the 1,000-10,000 triangles range.} GPUs need embarrassing
levels of parallelism to work well.
>
> Seems like one could do something like this with a few specialized
> caches and registers:
> Framebuffer Cache(s), represent logical screen pixels Read/Write;
> Texture Cache, holds texture blocks, Read-Only.
<
It might be time to integrate 'texture' into CPU memory reference pipeline
and integrate access into memory reference instructions.
>
> Would need a bunch of registers to represent the edge-walking state of
> the primitive being drawn.
<
Are you talking about the rasterizer and interpolator ?
>
> Maybe the GPU throws interrupts at whatever core is managing the
> geometry transform?... It is that or use polling. Doing a GPU which runs
> the full transform seems like a harder problem than one which runs a
> rasterizer.
>
> These would come with a big obvious drawback: They would preclude the
> use of shaders.
>
>
>
> The "able to run shaders" requirement would likely turn the "cheapest"
> option into throwing one or more specialized CPU cores at it.
>
> Maybe the GPU cores could run RISC-V, but RISC-V would still kinda suck
> as the basis for a GPU ISA.
> > The discussion about Blitting inspired me to check out write bandwidth:
> >
> > Here we malloc() 1GB and use memset() to fill it with 1s:
> >
> > [fedora-starfive:~/nfstmp/gforth-riscv:76808] time gforth-fast -e "1000000000 allocate throw 1000000000 1 fill bye"
> >
> > real 0m6.362s
> > user 0m0.243s
> > sys 0m6.095s
> >
> > Funny user/system balance. My explanation is that the system COWs
> > each page on the first write access, and then that page is in the
> > cache, and the memset runs fast (that's why we see such a low user
> > time) up to the next page fault.
> >
> > To reduce the system overhead, now write the 1GB by allocating a 10MB
> > block and memset()ting it 100 times, with values 0-99:
> >
> > [fedora-starfive:~/nfstmp/gforth-riscv:76809] time gforth-fast -e "10000000 allocate throw constant a : foo 100 0 do a 10000000 i fill loop ; foo bye"
> >
> > real 0m4.055s
> > user 0m3.902s
> > sys 0m0.126s
> >
> > So it seems that the U74 writes at about 250MB/s. That's not great,
> > but it probably exceeds the CL5428 by a good margin.
> >
> Well, it is a lot faster at this than my BJX2 core running on a Nexys A7
> at least...
>
>
> > - anton

Re: RISC-V U74 on Starfive Visionfive V1

<svhglj$ku4$1@dont-email.me>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=23813&group=comp.arch#23813

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: RISC-V U74 on Starfive Visionfive V1
Date: Sun, 27 Feb 2022 21:48:02 -0600
Organization: A noiseless patient Spider
Lines: 316
Message-ID: <svhglj$ku4$1@dont-email.me>
References: <2022Feb26.124939@mips.complang.tuwien.ac.at>
<svdk5r$bu6$1@dont-email.me> <2022Feb26.225344@mips.complang.tuwien.ac.at>
<svfs7a$fdr$2@dont-email.me> <2022Feb27.153612@mips.complang.tuwien.ac.at>
<svgvp8$3tp$1@dont-email.me>
<87c1bfeb-984d-46aa-89d6-8b269dc82aa9n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 28 Feb 2022 03:48:04 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f827db421d493e319682b358160ec79c";
logging-data="21444"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+eLQDSTODsLFFFgjyfiWZJ"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Cancel-Lock: sha1:4ttMyj2uMkFd4Tb6WzivB9lchhE=
In-Reply-To: <87c1bfeb-984d-46aa-89d6-8b269dc82aa9n@googlegroups.com>
Content-Language: en-US
 by: BGB - Mon, 28 Feb 2022 03:48 UTC

On 2/27/2022 6:06 PM, MitchAlsup wrote:
> On Sunday, February 27, 2022 at 4:59:55 PM UTC-6, BGB wrote:
>> On 2/27/2022 8:36 AM, Anton Ertl wrote:
>>>
>>> Reading up on it, yes it did have hardware BitBLT indeed. And it
>>> seems that XFree86 used these acceleration features. But still, it
>>> had a 32-bit memory interface with whatever slow memory was current in
>>> 1993; the U74 should be able to run rings around it when blitting
>>> (even though its DRAM controller could do better).
>>>
>>>> Now, your RISC-V system is probably trying to do all these things through
>>>> a software emulation of OpenGL.
>>>
>>> That may be the reason for the sluggishness.
>>>
>> Also, the design of RISC-V seems like it would "kinda suck" for OpenGL
>> emulation performance (at least, limited to the parts of the ISA I am
>> currently aware of).
> <
> Gomer Pyle would be proud........
> <

Not sure how to interpret this.

But, short of a bunch of specialized SIMD operations, and indexed
load/store, ..., it seems like a bit of a stretch that RISC-V would be
particularly suited to OpenGL emulation.

....

>>>> Application software libraries now do a lot more in the client and often
>>>> just send bitmaps to the server to display.
>>>
>>> That should be easy for the server. But of course if it first has to
>>> do some OpenGL scaling etc., that could be slow on a pure software
>>> implementation of OpenGL.
>>>
>> All the fancy "Compositing Window Manager" stuff is kinda pointless
>> fluff IMO.
>>
>> There hasn't really been any practical advancement in these areas much
>> past Win2K and similar.
>>
>> Nor is there much one can really do to move things forward, at least
>> given that full 3D VR UIs, or desktop backgrounds with spinning 3D
>> models, are mostly something that didn't make it much past 80s and 90s
>> era scifi movies (there not being much point IRL in having a spinning
>> Utah Teapot or similar as ones' desktop background).
>>
>> Scifi gives us needless spinning stuff, and people doing surfing motions
>> while wearing gloves and a VR helmet, and also presumably they find some
>> way to avoid the users invariably getting motion sickness as a result of
>> all this.
>>
> I, personally, cannot stay on an internet page that uses moving advertisements.
> I either turn of JavaScript or leave.

Yeah. I also have the issue that I seem to be fairly weak against motion
sickness. I can't really play games for more than short bursts, and
watching videos of gameplay is also prone to cause issues.

For VR, the motion sickness is almost immediate. I can deal with 3D
glasses a little better though, but pretty much no one is developing
home-use "make 3D glasses not suck" technology.

>>
>> Reality gives us translucent border effects and lag...
>>
>> Like, oh wow, the window border looks like frosted glass, and now the
>> whole OS is lagging trying to deliver this effect until the user can
>> find the option in Control Panel to turn it off.
>>
>>
>> OS Updates:
>> Were gonna switch your Power-Button setting back from Hibernate to
>> Sleep, turn GUI effects back on, ...
>> Also random "Your battery is Low, Plug in your Device soon."
>> notifications, me, "FFS Windows, I am running a Ye Olde Desktop PC.
>> There is no battery in this thing."
>>>> The RISC-V market needs a better story on how they are going to handle
>>>> graphics,
>>>
>>> Only if it wants to make inroads in the desktop and mobile markets.
>>> For a server, there are other places that need to be addressed first
>>> (memory controller, I/O). They also need to fix these for the
>>> desktop.
>>>
>>>> trying to use PowerVR for the GPU when people will be mostly
>>>> using Linux seems wrong to me.
>>>
>>> Yes.
>>>
>> Makes me half wonder what the "minimum viable GPU" could be.
>>
>> I guess the main task it would have would be:
>> Ability to copy pixels from one place in VRAM to another;
>> Ability to copy pixels while applying modulation or blending.
>> Ability to perform a texture fetch and ST stepping (3D / Advanced 2D)
>> Ability to walk edges of a primitive triangle or quad (3D)
> <
> Minimum viable GPU is the backend == Rendering. The above is a list
> of things the backend can do. Blitting is simply 'texture' to triangles
> normal to the viewing surface. Blending is simply sub-features of
> 'texture'.

I was figuring it is possible it could have special cases for operations
which function like a memcpy or rectangular blit (common in GUI tasks).

>>
>> Ideally, it should be able to do these:
>> Faster than doing them on the CPU;
>> Cheaper than throwing a specialized CPU core at it.
>> (Eg: CPU core with ISA tweaks for common GPU tasks).
> <
> It can do these faster than CPU only when the number of things requested
> takes GPU way more time than the latency of getting started and stopped.
> {Generally in the 1,000-10,000 triangles range.} GPUs need embarrassing
> levels of parallelism to work well.

This likely depends on the type of GPU.
I was not imagining anything like a modern GPU.

>>
>> Seems like one could do something like this with a few specialized
>> caches and registers:
>> Framebuffer Cache(s), represent logical screen pixels Read/Write;
>> Texture Cache, holds texture blocks, Read-Only.
> <
> It might be time to integrate 'texture' into CPU memory reference pipeline
> and integrate access into memory reference instructions.

This is one possibility.

The Texture-Cache could expose something resembling a memory port,
except that it has the ability during fetch to be indexed using texture
coordinates (rather than a linear index) with a mask based on the size
of the texture, and the ability to decode block texture compression. It
would return the result as a color vector.

It would probably exclude bilinear interpolation or similar, since this
would likely be too expensive to be performed directly within a single
fetch (main issue being to deal with fetches which cross the edge of a
texture block).

But, say we have an op like:
LDUTEX (R14, R8, 0), R4 //Fetch first pixel (S+0,T+0)
LDUTEX (R14, R8, 1), R5 //Fetch second pixel (S+1,T+0)
LDUTEX (R14, R8, 2), R6 //Fetch third pixel (S+0,T+1)
LDUTEX (R14, R8, 3), R7 //Fetch fourth pixel (S+1,T+1)
(Pad Cycle) //Do something else here, *
(Pad Cycle) //Do something else here
BILERPX R4, R6, R8, R10 //Bilinear Interpolate (4R)
...
*: Could do math here, or fetch the value from the Z-Buffer, ...
Need to do something here so BILERP doesn't interlock on the loads.

With R14 giving the texture:
(47:0): Address of Texture
(59:48): Encodes the format and size of the texture
(59:56): Pixel Format
(55:52): Y Size (log2)
(51:48): X Size (log2)
R8 giving the ST Coords (2x 16.16 fixed-point)
The displacement encodes a pixel displacement:
0..3: Which corner of an interpolated pixel.
4..7: Round-Nearest(?)

These instructions would fall within the scope of what BJX2 can
currently encode in the Op64 and Op40x2 encodings.

Though, the cheaper option here would be to only support square
Morton-order textures (my existing TKRA GL is primarily using Morton
ordering, and non-square textures are uncommon).

The BILERPX would ignore the "integer part" of the of the ST coords
(with the texture fetches effectively doing a "truncate and mask" fetch).

Unclear here is the timing cost of sticking a block-texture decode on
the tail-end of a Load operation.

The Framebuffer Cache would basically be a normal L1 D-Cache.
Special features would be that it could include built-in pixel repacking.

Say, for example, during drawing, pixels are represented as a 64-bit
vector format (aaaa-rrrr-gggg-bbbb), Texture Fetch returns pixels in
this format, and Framebuffer Store repacks these into RGBA32 or RGB555
or RGBA4444 or similar.

It is possible Framebuffer ops could also include raster-indexed
Load/Store, likely with a predefined table of Y strides, *, ...

*: 256/384/512/768/... This avoiding the need for a multiplier, while
not wasting as much memory as padding the size to a power-of-2 (so,
320x200->384x200, 640x480->768x480, ...).

At least compared with BJX2 as-is, these could save a decent number of
clock-cycles during 3D rendering tasks.

>>
>> Would need a bunch of registers to represent the edge-walking state of
>> the primitive being drawn.
> <
> Are you talking about the rasterizer and interpolator ?


Click here to read the complete article
Re: RISC-V U74 on Starfive Visionfive V1

<2022Feb28.142041@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=23816&group=comp.arch#23816

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: RISC-V U74 on Starfive Visionfive V1
Date: Mon, 28 Feb 2022 13:20:41 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 31
Message-ID: <2022Feb28.142041@mips.complang.tuwien.ac.at>
References: <2022Feb26.124939@mips.complang.tuwien.ac.at> <svdk5r$bu6$1@dont-email.me> <2022Feb26.225344@mips.complang.tuwien.ac.at> <sm0ee3om7xd.fsf@lakka.kapsi.fi>
Injection-Info: reader02.eternal-september.org; posting-host="a40fed29596666cb5509ee2edb014303";
logging-data="25382"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19W7pXHlBkpF1LfQyK72Mxs"
Cancel-Lock: sha1:y4NCdACcYc/kcPPIvzxcB/gAOCA=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Mon, 28 Feb 2022 13:20 UTC

Anssi Saari <as@sci.fi> writes:
>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>
>> It certainly does not feel like it. But even a pure video card like
>> the Cirrus 5428 I had in my 486/66 did not produce this sluggish
>> behaviour (or maybe I am spoilt by current snappy computers?-); ok, it
>> only supported 8-bit colour depth IIRC and 1024x768 resolution, but
>> given the advances in RAM bandwidth and CPU speed between the 486/66
>> and the 1000MHz U74, I would expect better performance.
>
>From Wikipedia, looks like the Cirrus (Logic) 5428 was pretty fast for
>its time and had decent 2D acceleration, even in Linux.

Yes, for it's time. But it was still limited by its RAM bandwidth
(32-bit wide asynchronous DRAM, good for at most 40MHz
<https://www.techopedia.com/definition/6981/extended-data-out-random-access-memory-edo-ram>),
i.e. <=160MB/s). Even with its weak memory controller and using
software instead of hardware acceleration, an U74 core in a JH7100
should be able to be faster (and I measured 250MB/s on it).

>So with a
>lightweight GUI like twm or Windows Whatever of the time I think the 486
>system probably was quite snappy.

If I ever use its video capabilities, I should try twm for a fairer
comparison.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: RISC-V U74 on Starfive Visionfive V1

<t0ifke$pt8$1@gioia.aioe.org>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=24169&group=comp.arch#24169

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!NZ87pNe1TKxNDknVl4tZhw.user.46.165.242.91.POSTED!not-for-mail
From: antis...@math.uni.wroc.pl
Newsgroups: comp.arch
Subject: Re: RISC-V U74 on Starfive Visionfive V1
Date: Sat, 12 Mar 2022 15:52:46 -0000 (UTC)
Organization: Aioe.org NNTP Server
Message-ID: <t0ifke$pt8$1@gioia.aioe.org>
References: <2022Feb26.124939@mips.complang.tuwien.ac.at> <svdk5r$bu6$1@dont-email.me> <2022Feb26.225344@mips.complang.tuwien.ac.at>
Injection-Info: gioia.aioe.org; logging-data="26536"; posting-host="NZ87pNe1TKxNDknVl4tZhw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: tin/2.4.5-20201224 ("Glen Albyn") (Linux/5.10.0-9-amd64 (x86_64))
X-Notice: Filtered by postfilter v. 0.9.2
Cancel-Lock: sha1:xiBIJ7CWN7XGlB6wLXmOHc5sZjo=
 by: antis...@math.uni.wroc.pl - Sat, 12 Mar 2022 15:52 UTC

Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> Robert Swindells <rjs@fdy2.co.uk> writes:
> >On Sat, 26 Feb 2022 11:49:39 GMT, Anton Ertl wrote:
> >> - It's sluggish when used as a desktop computer (i.e. using it's own
> >> video output, a keyboard and a mouse). It seems that it is pretty
> >> slow at updating the frame buffer. Fortunately, we use it in that
> >> capacity only until we set up ssh and then use it as headless server
> >> through ssh, where it's relatively snappy.
> >
> >Doesn't look like it has a GPU
>
> It certainly does not feel like it. But even a pure video card like
> the Cirrus 5428 I had in my 486/66 did not produce this sluggish
> behaviour (or maybe I am spoilt by current snappy computers?-); ok, it
> only supported 8-bit colour depth IIRC and 1024x768 resolution, but
> given the advances in RAM bandwidth and CPU speed between the 486/66
> and the 1000MHz U74, I would expect better performance.

Well, concerning speed of GUI, the best one ever I had was 486/66
with Trident graphic card. Trident was so-so concerning graphic
performance (it was loosing benchmarks to many other cards),
but at that time (1993-1994) the machine was pretty fast (Pentium
was faster byt not that much) and (probably more important)
had a lot of memory (16 M). At that time X was pretty well
optimized and once your working set fit in RAM was quite
fast. Later software developer started adding features and
slowed things quite a lot. Just tip of the iceberg: old X
was showing just boundary frame during window movement.
Updating boundary frame was cheap even on slow graphic card.
Later GUI developers decided that showing whole window
during movement is "nicer" (debatable...). Even with 2D
acceleration I saw CPU usage to peak at 100% during window
movement on _much_ faster hardware (it seems that window
redraw frequency was whatever hardware was capable of doing,
so faster hardware would not help).

--
Waldek Hebisch

Re: RISC-V U74 on Starfive Visionfive V1

<2022Mar12.181111@mips.complang.tuwien.ac.at>

 copy mid

https://www.novabbs.com/devel/article-flat.php?id=24171&group=comp.arch#24171

 copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: RISC-V U74 on Starfive Visionfive V1
Date: Sat, 12 Mar 2022 17:11:11 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 33
Message-ID: <2022Mar12.181111@mips.complang.tuwien.ac.at>
References: <2022Feb26.124939@mips.complang.tuwien.ac.at> <svdk5r$bu6$1@dont-email.me> <2022Feb26.225344@mips.complang.tuwien.ac.at> <t0ifke$pt8$1@gioia.aioe.org>
Injection-Info: reader02.eternal-september.org; posting-host="0fe04f065c6dc0f3006a406de4fb4fc8";
logging-data="21584"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+bZ3MDNcbn89W3q8iA+wni"
Cancel-Lock: sha1:Hlo/IIQ6McBbJz16eSUyUNIr/os=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Sat, 12 Mar 2022 17:11 UTC

antispam@math.uni.wroc.pl writes:
>Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>> It certainly does not feel like it. But even a pure video card like
>> the Cirrus 5428 I had in my 486/66 did not produce this sluggish
>> behaviour (or maybe I am spoilt by current snappy computers?-);
....
>Just tip of the iceberg: old X
>was showing just boundary frame during window movement.
>Updating boundary frame was cheap even on slow graphic card.
>Later GUI developers decided that showing whole window
>during movement is "nicer" (debatable...).

It's coming back. Yes, of course I used that in the beginning. It's
a window manager feature. E.g., twm has the option:

|OpaqueMove
| This variable indicates that the f.move function should actu‐
| ally move the window instead of just an outline so that the
| user can immediately see what the window will look like in the
| new position. This option is typically used on fast displays
| (particularly if NoGrabServer is set).

I just disabled this option, and of course it still shows the outline
instead of the complete window. It seems that you can also disable
opaque moves on XFCE
<https://www.linux.co.cr/desktops/review/2000/xfce-3.3/help.html>;
IIRC I saw opaque moves on the V1, so they might have made their setup
more lightweight.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

1
server_pubkey.txt

rocksolid light 0.9.7
clearnet tor