Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

You can do more with a kind word and a gun than with just a kind word. -- Al Capone


computers / comp.arch / Re: Packing Hash Tables

Re: Packing Hash Tables

<t0ennl$ald$1@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=24143&group=comp.arch#24143

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Packing Hash Tables
Date: Thu, 10 Mar 2022 23:46:22 -0600
Organization: A noiseless patient Spider
Lines: 304
Message-ID: <t0ennl$ald$1@dont-email.me>
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<Ct4WJ.121302$z688.64338@fx35.iad> <t0aokj$nnc$1@dont-email.me>
<056ffd21-d834-405b-976e-46c09755f949n@googlegroups.com>
<t0df7v$afv$1@dont-email.me>
<3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 11 Mar 2022 05:46:29 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="e8d2320aa770a4595efb7be99d1ecbe4";
logging-data="10925"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/kZqbAgE3rakp1SV9eFJgv"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Cancel-Lock: sha1:9FIafDrtRvD30zgeUV/kyhD8hps=
In-Reply-To: <3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com>
Content-Language: en-US
 by: BGB - Fri, 11 Mar 2022 05:46 UTC

On 3/10/2022 7:58 PM, robf...@gmail.com wrote:
> On Thursday, March 10, 2022 at 1:15:31 PM UTC-5, BGB wrote:
>> On 3/10/2022 9:04 AM, robf...@gmail.com wrote:
>>> On Wednesday, March 9, 2022 at 12:37:26 PM UTC-5, BGB wrote:
>>>> On 3/9/2022 10:20 AM, EricP wrote:
>>>
>>>>> Note that operating systems such as Linux the hierarchical page table is
>>>>> part of its design. To port it to Power with its hash/inverted page table
>>>>> they kept the hierarchical table managed in memory as before,
>>>>> and move valid PTE entries into and out of the hash table.
>>>>>
>>>>> In effect, the hierarchical table is still the OS page table,
>>>>> and the hash table is treated like a giant software managed TLB.
>>>>> Software could implement, say, a FIFO mechanism to rotate a working set
>>>>> of Present hierarchical PTE's through hash PTE slots,
>>>>> similar to how a HW TLB might rotate through its entries.
>>>>>
>>>>> In this arrangement, Address Space ID's (ASID) for the hash table entries
>>>>> would be very important for efficiency, as without them a process switch
>>>>> would require invalidating all hash table entries just like a real TLB.
>>>>>
>>> Seems kind of kludgy. I am wondering why now this approach was used.
>>>
>>> I have coded in the bus interface unit the capability to use either hashed or
>>> hierarchical page tables. I was not so insane as to ignore the popularity of
>>> hierarchical page tables. Can probably use both types at the same time.
>>> Hashed tables are a better way to go for a large address
>>> space in my opinion. They use less memory and may have faster look-ups.
>>>
>> Hashed tables could also make sense as an alternative to B-Trees.
>>
>> A B-Tree lookup is likely to be relatively complicated and expensive if
>> compared to a hash lookup.
>>
>> It is possible that I could also combine a hashed top-level with
>> 2-levels of nested page table (optionally collapsing down the upper 6
>> levels into a single page).
>>>
>>>> OK. This gives more context for what I was confused about in my response.
>>>>
>>>> In my case, I use a software managed TLB, with hierarchical page tables,
>>>> and no sort of intermediate stage representation.
>>>>
>>>> TLB miss interrupt occurs, handler walks the page table, and loads the
>>>> resulting entry into the TLB. Basic design for the mechanism was
>>>> more-or-less lifted from SH-4, but with the major difference of going
>>>> from a 64-entry fully associative TLB to a 64x or 256x 4-way TLB.
>>>>
>>>> Arguably, a 6-way or 8-way TLB could be better, but more expensive.
>>>>
>>>>
>>>> I had considered potentially using B-Trees for the upper levels (instead
>>>> of hierarchical nesting) for the 96-bit mode, mostly because as-is this
>>>> leads to a "very sparse" 8-level table, and a B-Tree could be more
>>>> compact (with more traditional nested page tables for the lower 2 or 3
>>>> levels).
>>>>
>>>> Though, there are other (potentially simpler/cheaper) ways to deal with
>>>> this.
>>>>
>>>> This would not really effect the hardware in my case.
>>>
>>> Page tables are walked via hardware for Thor. Software only gets involved on a
>>> page fault. While software managing the TLB is probably comparable in
>>> performance and more flexible than hardware, TLB updates with hardware
>>> walking are almost invisible to software.
>>>
>> This was one annoyance with BJX2 with the use of software managed TLB:
>> Getting the software TLB miss interrupts stable was a major source of
>> annoyance, as there were a whole lot of "little things" that can go
>> wrong in these mechanisms.
>>
>> The L1 caches needing to be able to stop what they are doing, then
>> quietly transfer control to an ISR, then resume normal operation without
>> any adverse side effects, was a bit of a pain to debug.
>>
>> Some of the cases were infrequent, so one might to get Doom or Quake
>> launched (in a simulation) before the crash happens, where firing these
>> up inside a Verilog simulation is not exactly a fast process (the time
>> issue is also a major reason why I am using LZ compressed binaries and
>> WADs).
>>
>>
>> As noted, the MMU is temporarily disabled within ISRs, partly because
>> there is no way for an ISR to deal with a TLB miss (unlike some CPU
>> architectures, ISRs are not re-entrant in my case).
>>
>>
>> Similarly, the TLB miss handler would have to resolve page-fault
>> scenarios (for virtual memory), without access to the main filesystem
>> driver (it has to bypass the driver and access the SDCard directly), but
>> does also mean that a page fault (to virtual memory) during some other
>> SDcard IO request would be "potentially fatal".
>>
>> ...
>>
>> Nevermind whether or not it makes sense to put a pagefile on an SDcard,
>> there is nowhere else to put it in this case.
>>
>>
>> What to do with a page-fault to an unmapped address (eg: not backed to
>> the page file) is currently an unresolved issue, but this is more a
>> software thing than a hardware thing (in a real OS, would probably
>> terminate the process or similar here; for the time being, the only real
>> solution is to trigger a breakpoint).
>>
>> I may need to consider going and adding preemptive multitasking, as it
>> looks like most "real OS" stuff would depend on this (cooperative
>> multitasking doesn't scale particularly well, or leave any real good
>> solution for "OK, this program has gone into an infinite loop or
>> crashed", which basically leaves no real solution otherwise).
>>
>>
>>
>> The debugging was getting bad enough that I almost started to debate
>> whether it would be worthwhile to add hardware page walking and not have
>> to deal with this as much (but would still need a Page Fault handler, so
>> it is not entirely avoidable).
>>
>>
>> Sorta spent most of February beating on trying to debug stuff related to
>> the L1 Caches and their interaction with the TLB Miss ISR, partly as
>> "poking at something" in the L1 D$ set off an unholy bees nest of bugs.
>>
>> Likewise, much of my past debugging effort here was more focused on
>> "quick and dirty" fixes which typically don't really resolve the
>> underlying issues; but, trying to figure out "why" something is going
>> wrong is often a lot harder than adding "make it work" hacks (and then
>> notice the bug hasn't gone away, it has merely moved to a new location).
>>
>>
>> I still can't say (with much confidence) that the TLB is now stable, but
>> in my testing at the moment, it appears to work.
>>
>> Though, this debugging effort wasn't helped with Verilator also having
>> the annoying behavior that adding and removing $display statements can
>> sometimes change the behavior of the logic...
>>> The TLB can be updated via software, its almost possible then to use a software
>>> managed TLB. I guess if the TLB miss signal were fed into the interrupt system,
>>> then maybe things could have the option of software management.
>>>
>> Could be.
>>
>> In my case, TLB miss triggers an interrupt, and the ISR deals with it.
>>
>> In the ISR, there is a 'TEA' register that holds the address of whatever
>> caused the interrupt (target address for TLB miss), and an 'EXSR'
>> register where the low bits encode which interrupt had occurred (as a
>> 16-bit number divided into multiple bit fields).
>>> I get the feeling there are too many choices available in Thor. Maybe a configuration
>>> utility would be better than building in everything.
>>>
>>> Wondering what entity to place share counts, access keys, and privilege levels in. I
>>> had these in the PTE but took them out. There is only one set required for each page
>>> of memory.
>>>
>> I have this stuff in the PTEs/TLBEs.
>>
>> There is also an ACL Cache with an ACL Miss interrupt as well.
>>
>> The software side of things isn't really developed all that well yet, as
>> most of the time thus far had gone more into trying to get things stable
>> enough that the address translation and TLB miss interrupts weren't
>> prone to crash stuff.
>>
>> I seem to have this part stable now, so now it should be possible to
>> start making forward progress in this area.
>
> For Thor, the TLBE / PTE is 128-bits in size with no room to spare without having the
> share-counts, etc. included in it. A larger PTE would really complicate things.
> The TLBE value can be loaded or stored from / to the TLB using a single
> 128-bit wide register. Anything wider means using a pair of registers. I could
> maybe reduce the size of the page number fields a few bits. The fields are 48-bit.
> I had the thought that since there only needs to be one copy of the information it
> could be stored in another entity. But this means potentially adding to the memory
> access time since another table is accessed after the physical page number is
> known.
>

I also have 128-bit TLBE's, with fields for 48 bit addresses.

Though the address fields are actually 36 bits, because one can cut off
the low-order bits for the address (and because the "base form"
addressing is 48-bit).

While there is a 96-bit virtual-address mode, as noted, it works by
using TLBE's in pairs to get more bits (at the cost of associativity).
It mostly just extends the virtual address by 48 bits, with most of the
rest of the bits being unused.

I ended up implementing the 256-bit TLB load via a "double pumping"
strategy (load both halves and then the MMU sticks them together).

Would be a little bit harder with a full 64-bit virtual address space
though.

Copy/Paste

TLBE:
* (127:112) = VUGID / ACLID
* (111: 76) = Virtual Address Page
* ( 75: 64) = KR Access
* ( 63: 48) = ASID
* ( 47: 12) = Physical Address Page
* ( 11: 0) = Page Control bits

VUGID:
* (15:10) = VGID / VUID(Swap)
* ( 9: 0) = VUID / VGID(Swap)
* May be interpreted as a PID or ACLID.
** In these cases, VUGID is treated as a single 16-bit unit.

ASID:
* (15:10) = AGID (0=Shared, Global)
* ( 9: 0) = APID (0=Shared, Group)

KR Access:
* (11): Other X
* (10): Other W
* ( 9): Other R
* ( 8): Group X (VUGID Check Only)
* ( 7): Group W (VUGID Check Only)
* ( 6): Group R (VUGID Check Only)
* ( 5): User X (VUGID Check Only)
* ( 4): User W (VUGID Check Only)
* ( 3): User R (VUGID Check Only)
* ( 2): VUGID Mode 2
* ( 1): VUGID Mode 1
* ( 0): VUGID Mode 0

The VUGID Mode bits basically encode what sort of check to perform
(VUGID or ACL style).

These include, say:
Disabled, VUGID, VUGID (Swap), or ACL.

In VUGID mode, it does User/Group/Other checks, Swap switches
User/Group. ACL: Use ACL (in TLBE), Exact 16-bit match only (ACLE).

> I do not know much about ACL except that I suspect it can be quite complicated to do
> with hardware. Is there ACL for individual pages of memory? There can be users or
> groups of users and system processes all with separate access requirements. I suspect
> there would be some limits engineered into ACL. For instance, limited numbers of
> groups and users to keep the reference fields a reasonable size. Can you elaborate on
> the ACL?
>

These are per-page checks.

The ACL checking doesn't really add all that much cost or complexity
compared with the TLB itself.

Main costs are mostly the need to spend TLB bits on it, and the addition
of a Keyring Register (KRR).

Most of the harder parts of the ACL Check (namely, performing the ACL
lookups) would be handled in software, so all the hardware really needs
to do is check whether the keyring entries are in the ACL Cache, and (it
they hit) what sort of access is allowed to the page in question (the
access bits are OR'ed together).

ACLE:
* (63:44): Reserved / MBZ
* (43:32): KR Access Mode
* (31:16): KRR VUGID / APID
* (15: 0): TLB VUGID / ACLID

So, one basically does associative checks between the entries in the
ACLE cache, and the entries in KRR.

Because KRR changes relatively infrequently, the ACLE can be pretty
small (ideally, it is larger than the KRR though, so 6 or 8 entries
would be better than 4, but I am using 4 entries for now).

The combined result of VUGID/ACL, and Basic checks are combined into a
single bit-mask saying what access is allowed (in a given operating mode
and at a given time). The L1 Caches will use these bits during memory
access.

> In my PTE eight bits are used for access control to indicate read/write/execute/cache
> for either user or system. Maybe these bits would be better used to refer to an ACL. If
> I were to trim four bits off the page number fields that means a 16-bit ACL reference
> field could be supported.

OK.

I was able to fit both Basic checks and ACL/VUGID checks.

The Basic checks are used ye olde User/Supervisor and similar (R/W/X)
and a few bits which effects caching or combinations which are used
encode alternate special behaviors (Privledged-Execute and
Execute-As-ACLID).

....

SubjectRepliesAuthor
o Packing Hash Tables

By: robf...@gmail.com on Wed, 9 Mar 2022

77robf...@gmail.com
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor