Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Real programs don't eat cache.


computers / comp.arch / Re: Timing... (Re: More complex instructions to reduce cycle overhead)

Re: Timing... (Re: More complex instructions to reduce cycle overhead)

<s8a3no$jc8$1@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=17024&group=comp.arch#17024

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Timing... (Re: More complex instructions to reduce cycle
overhead)
Date: Sat, 22 May 2021 00:10:13 -0500
Organization: A noiseless patient Spider
Lines: 198
Message-ID: <s8a3no$jc8$1@dont-email.me>
References: <s7dn5p$78r$1@newsreader4.netcologne.de>
<s7mio3$qfs$1@dont-email.me>
<00a4b04a-ef97-44fd-a3a9-aa777fcc71bbn@googlegroups.com>
<jwv1ra92e0t.fsf-monnier+comp.arch@gnu.org>
<049b46dd-4544-4fe7-861b-85f97b3269c3n@googlegroups.com>
<s7n2gj$5na$2@dont-email.me>
<5eb5bb76-37e9-4363-8d56-b1139e2d384bn@googlegroups.com>
<s7n6ah$t1$1@dont-email.me>
<590ea343-cd96-4082-800c-f02412204262n@googlegroups.com>
<s7nchh$b58$1@dont-email.me> <KiHnI.151039$wd1.100928@fx41.iad>
<s7nv66$u2t$1@newsreader4.netcologne.de> <PcQnI.365058$2A5.181861@fx45.iad>
<s7onqq$ape$1@newsreader4.netcologne.de> <s7uclo$23r$1@dont-email.me>
<2aAoI.606122$%W6.592987@fx44.iad> <s7upgf$v6r$1@dont-email.me>
<uZToI.146185$lyv9.30173@fx35.iad>
<98f17a50-83b9-4eb9-bdc0-6f3f6787e7c7n@googlegroups.com>
<s81j26$gb$1@dont-email.me> <zB9pI.338700$Skn4.250226@fx17.iad>
<s83ds5$ccq$1@dont-email.me> <s88rmh$77e$1@dont-email.me>
<2f79fea6-3fb5-40e5-b53d-7f41e99d5b6dn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 22 May 2021 05:10:16 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="7eba654dd3364d58c666703fb24f3432";
logging-data="19848"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18oXwlwhDT4SMS1gWVBttDx"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.10.1
Cancel-Lock: sha1:Pp7tgMUVi2FHJelEXR2wYpWkb00=
In-Reply-To: <2f79fea6-3fb5-40e5-b53d-7f41e99d5b6dn@googlegroups.com>
Content-Language: en-US
 by: BGB - Sat, 22 May 2021 05:10 UTC

On 5/21/2021 7:58 PM, JimBrakefield wrote:
> On Friday, May 21, 2021 at 12:47:00 PM UTC-5, BGB wrote:
>> On 5/19/2021 11:20 AM, BGB wrote:
>>> On 5/19/2021 9:51 AM, EricP wrote:
>>>> BGB wrote:
>>>>>
>>>>> In theory, the fastest signals which can exist with this FPGA are
>>>>> 450MHz / 2.2ns.
>>>>>
>>>>> Though, IME, about all it can really do at these speeds is:
>>>>> reg2 <= reg1;
>>>>>
>>>>>
>>>>> The Kintex and Vertex can apparently go a lot faster than the Artix
>>>>> and Spartan in this regard though.
>>>>
>>>> Which Xilinx chips do you use?
>>>> It is Artix-7 (28 nm) and which model?
>>>>
>>>
>>>
>>> Primary Board:
>>> XC7A100TCSG324-1 (Nexys A7-100T)
>>>
>>> Others:
>>> XC7S50TCSG324-1 (Arty S7-50)
>>> XC7S25TCSG225-1 (CMod S7-25)
>>>
>>> Generally using Vivado 2018.3.1, ...
>>>
>>> Typically using strategies:
>>> Flow_PerfOptimized_high (synthesis, *1)
>>> Vivado Implementation Defaults (latter, *2)
>>>
>>> *1: This makes timing more likely to pass, but at the cost of higher
>>> resource usage and similar.
>>>
>>> *2: Changing this option tends to break the ability to see
>>> post-implementation resource usage stats.
>>>
>>>
>>> Of these, the Artix seems to have a slightly harder time with passing
>>> timing than either of the Spartan devices, for whatever reason.
>>> However, the Spartan's are smaller.
>>>
>>> In theory, they should be about the same speed, or if anything the Artix
>>> should be faster due to having more space and thus better able to route
>>> stuff.
>>>
>>>
>>> As can be noted, whether or not things pass timing is mostly determined
>>> by Vivado. Vivado allows synthesizing for any FPGA it supports, but as
>>> can be noted, unless the FPGA matches the one on the board, the
>>> bitstream doesn't work.
>>>
>>>
>>> The Nexys A7 board has VGA and an SDcard holder, which is fairly useful.
>>> It was ~ $269 when I bought it,
>>>
>>> The CMod-S7 is limited to a microcontroller like configuration with the
>>> BJX2 core, as this board lacks any peripherals or external RAM.
>>>
>>>
>>> Timings I can manage on the BJX2 Core:
>>> 50MHz, works pretty well
>>> 75MHz, can work, prone to fail timing, needs L1 caches reduced to ~ 2K
>>>
>>>
>>> I have the RAM working at either 50 or 75 MHz (internally, RAM
>>> controller logic runs at 100 or 150 MHz, driving the chip at 1/2 the
>>> internal speed).
>>>
>>>
>>> I had another DDR module (DdrB) that runs the DDR chip at 1:1 speeds,
>>> but as noted, it worked in simulation but doesn't seem to work on actual
>>> hardware.
>>>
>>> One difference between the modules is that the 1/2 clocked module can
>>> adjust Data/DQS alignment by 1/4 cycle, whereas the 1:1 module is
>>> limited to 1/2 cycle.
>>>
>>> With DLL enabled (and above the 120MHz minimum), the data should be
>>> correctly aligned with the clock to make this fine-adjustment
>>> unnecessary, but best I could tell, the RAM chip was effectively
>>> becoming non-responsive.
>>>
>> Goes and adds timing constraints to DDR related IO pins, after noting
>> that they were absent, and this apparently results in them being treated
>> as unconstrained...
>>
>> However, this resulted in basically a crap-storm of failed timing
>> constraints, as apparently, the DDR controller module wasn't actually
>> maintaining timing on its IO pins...
>>>
>>> The L2 in the Ringbus design operates at the same speed as the CPU core
>>> (as does the ringbus), and the L2 interfaces with the RAM Module (using
>>> the old/original bus design).
>>>
>>>
>>> The RAM module then contains the glue-logic to step between the external
>>> "master clock" and its internal clock speed. The spot where it steps
>>> clock speeds seems to be prone to random bit-flips.
>>>
>>> I now have logic which basically XORs everything together and currently
>>> generates an 18-bit check value. The OPM and OK signals are also checked
>>> against a bitwise-inverted duplicate (also keyed to the main check
>>> value). The way things are XOR'ed effectively functions like a
>>> horizontal parity over all the bits (probably sufficient for now, may
>>> miss multi-bit errors though).
>>>
>> And, all this just deciding to start falling on its face started to
>> indicate that this may not have actually been the source of the random
>> bit flipping to begin with...
>>
>> Rather, poking around with logic in one place was, rather, effecting the
>> timing at the IO pins...
>>
>>
>> I have also determined:
>> The 50MHz DDR is actually seemingly the fastest I can drive it and still
>> pass timing.
>>
>> The 50MHz is actually driving some of the IO pins on a 100MHz clock, but
>> given the way DDR works, it looks like I either need to drive the pins
>> internally at 2x the external speed, or provide multiple clocks which
>> are slightly out of phase to get the required timings.
>>
>>
>> I looked at the datasheets some more, and it turns out ~ 100 MHz is the
>> fastest these IO pins can actually go (in their base form), or a little
>> faster if both posedge and negedge are used.
>>
>>
>> But, it appears that MIG was able to get the pins to go faster via
>> another "secret weapon" trick: SERDES.
>>
>> So, I guess with SERDES, it is possible to feed parallel data at 50MHz
>> in one side, and get 400MHz out the other side. The reverse is also
>> possible with these pins, receiving parallel data. The SERDES pins
>> apparently operate at ~ 400MHz to 800MHz.
>>
>> And, as it would so happen, the RAM is connected up to the pins which
>> have this capability, leaning in the direction of this probably being
>> what is going on.
>>
>> Yeah...
>>
>>
>> Though I guess in this case, it would be a tradeoff of:
>> Leave it as-is, and live with RAM at 50MHz (safely), or 75MHz (and
>> technically failing timing);
>> Look into a DDR controller based around using SERDES as well;
>> Being like "screw it" and, implementing support for an AXI Bus
>> interface, and then using MIG.
>>
>>
>> Though, I am also left to suspect that the gains I would get from moving
>> to faster speeds would be limited, given I would (then) have to deal
>> with significantly larger CAS and RAS latencies and similar, likely
>> eating most of the gains...
>
> |> Goes and adds timing constraints to DDR related IO pins, after noting
> |> that they were absent, and this apparently results in them being treated
> |> as unconstrained...
>
> Timing driven place and route on an FPGA is much different from ASIC:
> The FPGA router puts in effort to route the paths with least slack and
> can be very lazy with paths with much slack.
> It can also duplicate flip-flops or entire paths if it helps.
> My knowledge is this area is limited, have seen delays moved to unconstrained
> IO signals to the point that IO delays were enormous.
>

Yeah. Stuff was being very inconsistent, and then added some constraints
and everything blew up, and in a few places (such as the Data and DQS
pins) timing was failing.

Fixed this, and things are at least a little more consistent.

Stuff is no longer blowing up at random when things are changed, and now
behavior seems to be mostly back into "deterministic" territory.

Operation at 75MHz is still unreliable / prone to break, some test data
showed that apparently 16-bit words were getting duplicated in a way
that implies a Data/DQS timing issue, ... Granted, this involves driving
these pins at 150 MHz, which is apparently out-of-spec if used this way.

As for why direct-driving the pins would be limited to ~ 100MHz, but
SERDES can apparently drive them at 800MHz, on the same physical pins, I
don't know... (Unless maybe SERDES uses a separate driver? Say, it can
drive the pin a lot faster but at a lower output power or similar).

Some stuff is still buggy, though it looks like the cause may be related
to some other issue.

SubjectRepliesAuthor
o Signed division by 2^n

By: Thomas Koenig on Tue, 11 May 2021

99Thomas Koenig
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor