Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

There's got to be more to life than compile-and-go.


devel / comp.arch / Re: Packing Hash Tables

SubjectAuthor
* Packing Hash Tablesrobf...@gmail.com
+* Re: Packing Hash TablesBGB
|`- Re: Packing Hash Tablesrobf...@gmail.com
`* Re: Packing Hash TablesEricP
 `* Re: Packing Hash TablesBGB
  `* Re: Packing Hash Tablesrobf...@gmail.com
   `* Re: Packing Hash TablesBGB
    `* Re: Packing Hash Tablesrobf...@gmail.com
     `* Re: Packing Hash TablesBGB
      `* Re: Packing Hash Tablesrobf...@gmail.com
       `* Re: Packing Hash TablesMitchAlsup
        +* Re: Packing Hash Tablesrobf...@gmail.com
        |`* Re: Packing Hash TablesMitchAlsup
        | `* Re: Packing Hash TablesEricP
        |  +- Re: Packing Hash TablesBGB
        |  +* Re: Packing Hash TablesMitchAlsup
        |  |`- Re: Packing Hash TablesEricP
        |  `* Re: Packing Hash TablesMitchAlsup
        |   +* Re: Packing Hash Tablesrobf...@gmail.com
        |   |`* Re: Packing Hash TablesMitchAlsup
        |   | `* Re: Packing Hash Tablesrobf...@gmail.com
        |   |  `* Re: Packing Hash Tablesrobf...@gmail.com
        |   |   +- Re: Packing Hash TablesEricP
        |   |   `* Re: Packing Hash TablesMitchAlsup
        |   |    `* Re: Packing Hash TablesEricP
        |   |     `* Re: Packing Hash TablesMitchAlsup
        |   |      +- Re: Packing Hash Tablesrobf...@gmail.com
        |   |      +- Re: Packing Hash TablesMitchAlsup
        |   |      +* Re: Packing Hash TablesBGB
        |   |      |+* Re: Packing Hash TablesMitchAlsup
        |   |      ||`* Re: Packing Hash TablesBGB
        |   |      || +* Re: Packing Hash Tablesrobf...@gmail.com
        |   |      || |+* Re: Packing Hash TablesThomas Koenig
        |   |      || ||+* Re: Packing Hash Tablesrobf...@gmail.com
        |   |      || |||`- Re: Packing Hash TablesThomas Koenig
        |   |      || ||`- Re: Packing Hash TablesMitchAlsup
        |   |      || |+- Re: Packing Hash TablesMitchAlsup
        |   |      || |`- Re: Packing Hash TablesMitchAlsup
        |   |      || +* Re: Packing Hash Tablesrobf...@gmail.com
        |   |      || |`* Re: Packing Hash TablesBGB
        |   |      || | `* Re: Packing Hash Tablesaph
        |   |      || |  `* Re: Packing Hash TablesBGB
        |   |      || |   `* Re: Packing Hash TablesMitchAlsup
        |   |      || |    +* Re: Packing Hash Tablesrobf...@gmail.com
        |   |      || |    |`- Re: Packing Hash TablesEricP
        |   |      || |    `* Re: Packing Hash TablesBGB
        |   |      || |     `* Re: Packing Hash Tablesrobf...@gmail.com
        |   |      || |      `* Re: Packing Hash TablesBGB
        |   |      || |       `- Re: Packing Hash TablesIvan Godard
        |   |      || `- Re: Packing Hash Tablesrobf...@gmail.com
        |   |      |+- Re: Packing Hash Tablesrobf...@gmail.com
        |   |      |`* Re: Packing Hash TablesEricP
        |   |      | +- Re: Packing Hash Tablesrobf...@gmail.com
        |   |      | `* Re: Packing Hash TablesMitchAlsup
        |   |      |  +* Re: Packing Hash TablesEricP
        |   |      |  |`* Re: Packing Hash TablesMitchAlsup
        |   |      |  | `* Re: Packing Hash TablesEricP
        |   |      |  |  `* Re: Packing Hash TablesMitchAlsup
        |   |      |  |   `* Re: Packing Hash TablesEricP
        |   |      |  |    +* Re: Packing Hash Tablesrobf...@gmail.com
        |   |      |  |    |`- Re: Packing Hash TablesMitchAlsup
        |   |      |  |    `* Re: Packing Hash Tablesrobf...@gmail.com
        |   |      |  |     `* Re: Packing Hash TablesBGB
        |   |      |  |      `* Re: Packing Hash Tablesrobf...@gmail.com
        |   |      |  |       +- Re: Packing Hash TablesBGB
        |   |      |  |       +- Re: Packing Hash TablesMitchAlsup
        |   |      |  |       `- Re: Packing Hash Tablesrobf...@gmail.com
        |   |      |  `* Re: Packing Hash TablesIvan Godard
        |   |      |   `* Re: Packing Hash TablesMitchAlsup
        |   |      |    `* Re: Packing Hash TablesThomas Koenig
        |   |      |     `* Re: Packing Hash TablesMitchAlsup
        |   |      |      `* Re: Packing Hash TablesThomas Koenig
        |   |      |       `* Re: Packing Hash TablesStephen Fuld
        |   |      |        `- Re: Packing Hash TablesEricP
        |   |      `* Re: Packing Hash TablesEricP
        |   |       `- Re: Packing Hash TablesMitchAlsup
        |   `- Re: Packing Hash TablesEricP
        `- Re: Packing Hash TablesBGB

Pages:1234
Re: Packing Hash Tables

<db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24252&group=comp.arch#24252

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5883:0:b0:2e1:c6f9:a12f with SMTP id t3-20020ac85883000000b002e1c6f9a12fmr10508339qta.439.1647279016047;
Mon, 14 Mar 2022 10:30:16 -0700 (PDT)
X-Received: by 2002:a05:6808:1451:b0:2ec:cfe4:21e with SMTP id
x17-20020a056808145100b002eccfe4021emr135836oiv.147.1647279015794; Mon, 14
Mar 2022 10:30:15 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Mon, 14 Mar 2022 10:30:15 -0700 (PDT)
In-Reply-To: <de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1d94:4dbf:9a31:cab9;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1d94:4dbf:9a31:cab9
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<Ct4WJ.121302$z688.64338@fx35.iad> <t0aokj$nnc$1@dont-email.me>
<056ffd21-d834-405b-976e-46c09755f949n@googlegroups.com> <t0df7v$afv$1@dont-email.me>
<3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com> <t0ennl$ald$1@dont-email.me>
<c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com> <5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com>
<6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com> <3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com>
<FAqXJ.156801$Gojc.137903@fx99.iad> <691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com>
<dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com> <d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com>
<d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com> <de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
Subject: Re: Packing Hash Tables
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Mon, 14 Mar 2022 17:30:16 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 19
 by: MitchAlsup - Mon, 14 Mar 2022 17:30 UTC

On Monday, March 14, 2022 at 7:40:00 AM UTC-5, robf...@gmail.com wrote:
> On Sunday, March 13, 2022 at 7:10:48 PM UTC-4, robf...@gmail.com wrote:

> > By specifying int:16 as the type. The compiler supports size specs for ints of 8,16,32,64, and 128. Bytes
> > are also accessible as ‘byte’. I think chars are int:16.. I think there is also a bit type to support bit arrays,
> > but this is experimental.
> > int:16 my_index;
> > > PMA = Post Male Anxiety ?
> > PMA = Physical Memory Attributes, stolen from RISCV docs.
> Borrowed the level idea for my PT. I was just wondering what happens when a page at the same LVL as the
> current is pointed to? Seems like it could form a linked list or a loop, or is a fault generated? At the moment
> I have the code forcing the LVL down a level if it is the same or higher than the current LVL.
<
LVL is monotonically decreasing or a page fault is raised.

Re: Packing Hash Tables

<IY3YJ.108064$3jp8.15219@fx33.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24262&group=comp.arch#24262

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.uzoreto.com!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx33.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Packing Hash Tables
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com> <Ct4WJ.121302$z688.64338@fx35.iad> <t0aokj$nnc$1@dont-email.me> <056ffd21-d834-405b-976e-46c09755f949n@googlegroups.com> <t0df7v$afv$1@dont-email.me> <3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com> <t0ennl$ald$1@dont-email.me> <c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com> <5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com> <6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com> <3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com> <FAqXJ.156801$Gojc.137903@fx99.iad> <691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com> <dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com> <d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com> <d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com> <de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com> <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
In-Reply-To: <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 59
Message-ID: <IY3YJ.108064$3jp8.15219@fx33.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 15 Mar 2022 17:23:52 UTC
Date: Tue, 15 Mar 2022 13:23:45 -0400
X-Received-Bytes: 4409
 by: EricP - Tue, 15 Mar 2022 17:23 UTC

MitchAlsup wrote:
> On Monday, March 14, 2022 at 7:40:00 AM UTC-5, robf...@gmail.com wrote:
>> On Sunday, March 13, 2022 at 7:10:48 PM UTC-4, robf...@gmail.com wrote:
>
>>> By specifying int:16 as the type. The compiler supports size specs for ints of 8,16,32,64, and 128. Bytes
>>> are also accessible as ‘byte’. I think chars are int:16.. I think there is also a bit type to support bit arrays,
>>> but this is experimental.
>>> int:16 my_index;
>>>> PMA = Post Male Anxiety ?
>>> PMA = Physical Memory Attributes, stolen from RISCV docs.
>> Borrowed the level idea for my PT. I was just wondering what happens when a page at the same LVL as the
>> current is pointed to? Seems like it could form a linked list or a loop, or is a fault generated? At the moment
>> I have the code forcing the LVL down a level if it is the same or higher than the current LVL.
> <
> LVL is monotonically decreasing or a page fault is raised.

What about address translate skipping page table levels?

A 64-bit address with 4kB pages gives a 6 level tree with the bit fields

level: 6 5 4 3 2 1 0
size: 7-9-9-9-9-9-12

A two level page table would cover the lower 30 address bits
which would be enough address space for almost all applications.

An 8kB page gives a 5 level tree 1-10-10-10-10-13 and
a two level page table covers the lower 33 address bits.
However I want multiple MMU root registers so it can switch tables
quickly so I would change the address bit fields to 4-7-10-10-10-13
and the top 4 bits select a HW MMU root[N] register.
Again the lower two page table levels are usually sufficient.

I want to be able to skip from a MMU root register to page table level 2.
Also because page table pages can be reused at multiple levels
the skip count should be relative.
Also I think there could be multiple skips in a walk, say a skip
from root to 4, and 4 to 2. At least I see no reason to disallow it.

Each PTE has a field giving the relative number of levels to skip
in a table walk. With 8kB pages and a 5 level table, a 2-bit PTE
field allows it to skip directly from my HW root[N] register to level-2.

Another issue is what the fill values should be that were skipped.
MMU can assume the the address bits skipped were all 0's,
or the PTE could have a bit to prescribe the skipped value as 0 or 1.

There are also the page size selection field to consider.
With 8kB as the base page size, that give multiples that are
13 = 8kB, 23 = 8 MB, 33 = 8 GB, 43 = 8 TB.
Four sizes should be sufficient, which requires 1 PTE bit at each level.
That allows the HW root to skip directly to level-2 where the
PTE specifies a large page size of 8 MB.

Together this allows virtual address translate in a single memory access
in address space sizes up to 8 GB!

Re: Packing Hash Tables

<f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24263&group=comp.arch#24263

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:40ce:b0:67d:4ebe:f3c2 with SMTP id g14-20020a05620a40ce00b0067d4ebef3c2mr18181763qko.631.1647367704003;
Tue, 15 Mar 2022 11:08:24 -0700 (PDT)
X-Received: by 2002:a9d:72c6:0:b0:5af:42ef:bb7c with SMTP id
d6-20020a9d72c6000000b005af42efbb7cmr13008100otk.96.1647367703773; Tue, 15
Mar 2022 11:08:23 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 15 Mar 2022 11:08:23 -0700 (PDT)
In-Reply-To: <IY3YJ.108064$3jp8.15219@fx33.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:81b8:70b3:b884:aba8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:81b8:70b3:b884:aba8
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<Ct4WJ.121302$z688.64338@fx35.iad> <t0aokj$nnc$1@dont-email.me>
<056ffd21-d834-405b-976e-46c09755f949n@googlegroups.com> <t0df7v$afv$1@dont-email.me>
<3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com> <t0ennl$ald$1@dont-email.me>
<c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com> <5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com>
<6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com> <3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com>
<FAqXJ.156801$Gojc.137903@fx99.iad> <691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com>
<dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com> <d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com>
<d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com> <de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com>
<db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com> <IY3YJ.108064$3jp8.15219@fx33.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
Subject: Re: Packing Hash Tables
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 15 Mar 2022 18:08:23 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 139
 by: MitchAlsup - Tue, 15 Mar 2022 18:08 UTC

On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Monday, March 14, 2022 at 7:40:00 AM UTC-5, robf...@gmail.com wrote:
> >> On Sunday, March 13, 2022 at 7:10:48 PM UTC-4, robf...@gmail.com wrote:
> >
> >>> By specifying int:16 as the type. The compiler supports size specs for ints of 8,16,32,64, and 128. Bytes
> >>> are also accessible as ‘byte’. I think chars are int:16.. I think there is also a bit type to support bit arrays,
> >>> but this is experimental.
> >>> int:16 my_index;
> >>>> PMA = Post Male Anxiety ?
> >>> PMA = Physical Memory Attributes, stolen from RISCV docs.
> >> Borrowed the level idea for my PT. I was just wondering what happens when a page at the same LVL as the
> >> current is pointed to? Seems like it could form a linked list or a loop, or is a fault generated? At the moment
> >> I have the code forcing the LVL down a level if it is the same or higher than the current LVL.
> > <
> > LVL is monotonically decreasing or a page fault is raised.
> What about address translate skipping page table levels?
<
Skipping levels is, in my opinion, a necessity. I was part of the I/OMMU team
at AMD where we put in level skipping as part of its design. I think CPU tables
and I/OMMU tables should be one and the same, along with the "a pointer is a
pointer" philosophy. I/O device should get virtual address of originating thread,
I/O DMA should be translated thought page tables of originating thread, I/O
completion interrupt should be delivered to Guest OS hosting originating thread.
>
> A 64-bit address with 4kB pages gives a 6 level tree with the bit fields
>
> level: 6 5 4 3 2 1 0
> size: 7-9-9-9-9-9-12
>
> A two level page table would cover the lower 30 address bits
> which would be enough address space for almost all applications.
>
> An 8kB page gives a 5 level tree 1-10-10-10-10-13 and
> a two level page table covers the lower 33 address bits.
<
This is what I am using: 8K pages 63-bit VA space have not decided
what to do with the top-most bit--yet. I used it a while back to access
foreign address spaces, this has fallen out of favor when I found
indirect accessing where a service provider can access customer
memory via indirection (that is, use customer root pointer and ASID
when making accesses).
<
> However I want multiple MMU root registers so it can switch tables
> quickly so I would change the address bit fields to 4-7-10-10-10-13
> and the top 4 bits select a HW MMU root[N] register.
<
Why not just use the top table<62..53> as 1024 Root pointers ?
Other than having a <ahem> root pointer point at this page, there
is no useful difference between a Root pointer and a PTP; they
both point at a page of self describing descriptors {PTPs or PTEs}
<
Each of these top level Root pointers has an LVL field which
describes which layers of the address space are skipped; or
equivalently the size of that virtual address space.
<
> Again the lower two page table levels are usually sufficient.
<
your 7-bit table only uses 1/8 of the allocated page.
So, I don't see an advantage of 16 root pointers has over 1024 root
pointers.
>
> I want to be able to skip from a MMU root register to page table level 2.
<
Root.LVL = 011
<
You can also "run" 'cat' with a 23-bit VA space Root.LVL = 010.
Root->page_or_ptes[va<22..12>] gives you the PTE.LVL = 001, without any
PTP levels in the table.
<
> Also because page table pages can be reused at multiple levels
> the skip count should be relative.
<
This adds arithmetic (albeit small adders) in the MMU function. Why can
this not be determined when the tables are setup?
<
> Also I think there could be multiple skips in a walk, say a skip
> from root to 4, and 4 to 2. At least I see no reason to disallow it.
<
Nor did I.
<
>
> Each PTE has a field giving the relative number of levels to skip
> in a table walk. With 8kB pages and a 5 level table, a 2-bit PTE
> field allows it to skip directly from my HW root[N] register to level-2.
<
I use a 3-bit absolute next level encoding without arithmetic.
Root.LVL determines the maximal size of the virtual address space.
PTP.LVL determines the next level in the table
previous PTP.LVL -> PTE.LVL = 001 determines the size of the page
{001->PTE, 010->8K, 011->8M, 100->8G, 101->8T, 110->8E}
>
> Another issue is what the fill values should be that were skipped.
> MMU can assume the the address bits skipped were all 0's,
> or the PTE could have a bit to prescribe the skipped value as 0 or 1.
<
I am following the rules of canonicality AMD uses when skipping levels
in the I/O MMU.
>
> There are also the page size selection field to consider.
> With 8kB as the base page size, that give multiples that are
> 13 = 8kB, 23 = 8 MB, 33 = 8 GB, 43 = 8 TB.
> Four sizes should be sufficient, which requires 1 PTE bit at each level.
<
It is not a bit, it is part of an encoding. In any event, you end up with
5 (or 6) states, and this always requires 3-bits to encode. 2-bits for
LVL and 1 bit for PTE is not usefully different than 3-bits for LVL
with one encoding for PTE. The 3-bit encoding can also provide for
PA=VA using another encoding.
<
> That allows the HW root to skip directly to level-2 where the
> PTE specifies a large page size of 8 MB.
<
Root.LVL = 010
page_of_PTEs[va<32..23>] gives you the 8MB PTE.LVL = 001; the
10-unused bits in the PTE specify the size of the 8MB mapping
{16K,24K,...8M-8K,8M} so you don't have to allocate an entire 8M
to this mapping. And so on as PTE maps get larger.
>
> Together this allows virtual address translate in a single memory access
> in address space sizes up to 8 GB!
<
I started here about 2006.

Re: Packing Hash Tables

<7fb10857-53f2-464e-83c9-40664fb79eb1n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24265&group=comp.arch#24265

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:410b:b0:67d:d59c:13b8 with SMTP id j11-20020a05620a410b00b0067dd59c13b8mr7448265qko.449.1647374197195;
Tue, 15 Mar 2022 12:56:37 -0700 (PDT)
X-Received: by 2002:a05:6808:8e4:b0:2ec:aea1:353a with SMTP id
d4-20020a05680808e400b002ecaea1353amr2458430oic.27.1647374196896; Tue, 15 Mar
2022 12:56:36 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 15 Mar 2022 12:56:36 -0700 (PDT)
In-Reply-To: <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1de1:fb00:4d31:2cda:47a7:b02b;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1de1:fb00:4d31:2cda:47a7:b02b
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<Ct4WJ.121302$z688.64338@fx35.iad> <t0aokj$nnc$1@dont-email.me>
<056ffd21-d834-405b-976e-46c09755f949n@googlegroups.com> <t0df7v$afv$1@dont-email.me>
<3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com> <t0ennl$ald$1@dont-email.me>
<c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com> <5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com>
<6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com> <3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com>
<FAqXJ.156801$Gojc.137903@fx99.iad> <691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com>
<dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com> <d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com>
<d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com> <de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com>
<db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com> <IY3YJ.108064$3jp8.15219@fx33.iad>
<f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7fb10857-53f2-464e-83c9-40664fb79eb1n@googlegroups.com>
Subject: Re: Packing Hash Tables
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Tue, 15 Mar 2022 19:56:37 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 177
 by: robf...@gmail.com - Tue, 15 Mar 2022 19:56 UTC

On Tuesday, March 15, 2022 at 2:08:26 PM UTC-4, MitchAlsup wrote:
> On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:
> > MitchAlsup wrote:
> > > On Monday, March 14, 2022 at 7:40:00 AM UTC-5, robf...@gmail.com wrote:
> > >> On Sunday, March 13, 2022 at 7:10:48 PM UTC-4, robf...@gmail.com wrote:
> > >
> > >>> By specifying int:16 as the type. The compiler supports size specs for ints of 8,16,32,64, and 128. Bytes
> > >>> are also accessible as ‘byte’. I think chars are int:16.. I think there is also a bit type to support bit arrays,
> > >>> but this is experimental.
> > >>> int:16 my_index;
> > >>>> PMA = Post Male Anxiety ?
> > >>> PMA = Physical Memory Attributes, stolen from RISCV docs.
> > >> Borrowed the level idea for my PT. I was just wondering what happens when a page at the same LVL as the
> > >> current is pointed to? Seems like it could form a linked list or a loop, or is a fault generated? At the moment
> > >> I have the code forcing the LVL down a level if it is the same or higher than the current LVL.
> > > <
> > > LVL is monotonically decreasing or a page fault is raised.
> > What about address translate skipping page table levels?
> <
> Skipping levels is, in my opinion, a necessity. I was part of the I/OMMU team
> at AMD where we put in level skipping as part of its design. I think CPU tables
> and I/OMMU tables should be one and the same, along with the "a pointer is a
> pointer" philosophy. I/O device should get virtual address of originating thread,
> I/O DMA should be translated thought page tables of originating thread, I/O
> completion interrupt should be delivered to Guest OS hosting originating thread.
> >
> > A 64-bit address with 4kB pages gives a 6 level tree with the bit fields
> >
> > level: 6 5 4 3 2 1 0
> > size: 7-9-9-9-9-9-12
> >
> > A two level page table would cover the lower 30 address bits
> > which would be enough address space for almost all applications.
> >
> > An 8kB page gives a 5 level tree 1-10-10-10-10-13 and
> > a two level page table covers the lower 33 address bits.
> <
> This is what I am using: 8K pages 63-bit VA space have not decided
> what to do with the top-most bit--yet. I used it a while back to access
> foreign address spaces, this has fallen out of favor when I found
> indirect accessing where a service provider can access customer
> memory via indirection (that is, use customer root pointer and ASID
> when making accesses).
> <
> > However I want multiple MMU root registers so it can switch tables
> > quickly so I would change the address bit fields to 4-7-10-10-10-13
> > and the top 4 bits select a HW MMU root[N] register.
> <
> Why not just use the top table<62..53> as 1024 Root pointers ?
> Other than having a <ahem> root pointer point at this page, there
> is no useful difference between a Root pointer and a PTP; they
> both point at a page of self describing descriptors {PTPs or PTEs}
> <
> Each of these top level Root pointers has an LVL field which
> describes which layers of the address space are skipped; or
> equivalently the size of that virtual address space.
> <
> > Again the lower two page table levels are usually sufficient.
> <
> your 7-bit table only uses 1/8 of the allocated page.
> So, I don't see an advantage of 16 root pointers has over 1024 root
> pointers.
> >
> > I want to be able to skip from a MMU root register to page table level 2.
> <
> Root.LVL = 011
> <
> You can also "run" 'cat' with a 23-bit VA space Root.LVL = 010.
> Root->page_or_ptes[va<22..12>] gives you the PTE.LVL = 001, without any
> PTP levels in the table.
> <
> > Also because page table pages can be reused at multiple levels
> > the skip count should be relative.
> <
> This adds arithmetic (albeit small adders) in the MMU function. Why can
> this not be determined when the tables are setup?
> <
> > Also I think there could be multiple skips in a walk, say a skip
> > from root to 4, and 4 to 2. At least I see no reason to disallow it.
> <
> Nor did I.
> <
> >
> > Each PTE has a field giving the relative number of levels to skip
> > in a table walk. With 8kB pages and a 5 level table, a 2-bit PTE
> > field allows it to skip directly from my HW root[N] register to level-2..
> <
> I use a 3-bit absolute next level encoding without arithmetic.
> Root.LVL determines the maximal size of the virtual address space.
> PTP.LVL determines the next level in the table
> previous PTP.LVL -> PTE.LVL = 001 determines the size of the page
> {001->PTE, 010->8K, 011->8M, 100->8G, 101->8T, 110->8E}
> >
> > Another issue is what the fill values should be that were skipped.
> > MMU can assume the the address bits skipped were all 0's,
> > or the PTE could have a bit to prescribe the skipped value as 0 or 1.
> <
> I am following the rules of canonicality AMD uses when skipping levels
> in the I/O MMU.
> >
> > There are also the page size selection field to consider.
> > With 8kB as the base page size, that give multiples that are
> > 13 = 8kB, 23 = 8 MB, 33 = 8 GB, 43 = 8 TB.
> > Four sizes should be sufficient, which requires 1 PTE bit at each level..
> <
> It is not a bit, it is part of an encoding. In any event, you end up with
> 5 (or 6) states, and this always requires 3-bits to encode. 2-bits for
> LVL and 1 bit for PTE is not usefully different than 3-bits for LVL
> with one encoding for PTE. The 3-bit encoding can also provide for
> PA=VA using another encoding.
> <
> > That allows the HW root to skip directly to level-2 where the
> > PTE specifies a large page size of 8 MB.
> <
> Root.LVL = 010
> page_of_PTEs[va<32..23>] gives you the 8MB PTE.LVL = 001; the
> 10-unused bits in the PTE specify the size of the 8MB mapping
> {16K,24K,...8M-8K,8M} so you don't have to allocate an entire 8M
> to this mapping. And so on as PTE maps get larger.
> >
> > Together this allows virtual address translate in a single memory access
> > in address space sizes up to 8 GB!
> <
> I started here about 2006.

I am using larger PTEs and smaller pages so absorbing fewer address bits (8)
at one time. I do not think this makes a lot of difference since most of the time
apps will fit into a one- or two-level table (8+8+12) = 256MB.

I have been pondering the idea of storing garbage collection card status in the
PTE. This would require 16 or 32 bits in the PTE. When a translation occurs the
accessed bit of the PTE is updated in the TLB. At the same time the card status
bit could be updated if the access was a pointer store. One issue is knowing
when a pointer store is occurring. I have this detected using a dedicated pointer
store instruction STPTR.

For garbage collection there is an array of cards somewhere in the memory
system. It would be nice to be able to pack the cards into an existing structure.
I am calling the collection of struct page entries Linux uses a page management
table, PMT. I think garbage collection cards could be stored in this table. It may
be needed to store the cards in the page table too however, as the PMT is global
entity, and it might be desirable to process garbage collection for apps individually.

Re: Packing Hash Tables

<e9733ba7-c937-4e71-a28c-c1ffc908b55en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24267&group=comp.arch#24267

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:57d5:0:b0:2e0:70a8:4419 with SMTP id w21-20020ac857d5000000b002e070a84419mr24264552qta.257.1647379856747;
Tue, 15 Mar 2022 14:30:56 -0700 (PDT)
X-Received: by 2002:a9d:72c6:0:b0:5af:42ef:bb7c with SMTP id
d6-20020a9d72c6000000b005af42efbb7cmr13219042otk.96.1647379856519; Tue, 15
Mar 2022 14:30:56 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 15 Mar 2022 14:30:56 -0700 (PDT)
In-Reply-To: <7fb10857-53f2-464e-83c9-40664fb79eb1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:81b8:70b3:b884:aba8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:81b8:70b3:b884:aba8
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<Ct4WJ.121302$z688.64338@fx35.iad> <t0aokj$nnc$1@dont-email.me>
<056ffd21-d834-405b-976e-46c09755f949n@googlegroups.com> <t0df7v$afv$1@dont-email.me>
<3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com> <t0ennl$ald$1@dont-email.me>
<c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com> <5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com>
<6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com> <3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com>
<FAqXJ.156801$Gojc.137903@fx99.iad> <691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com>
<dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com> <d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com>
<d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com> <de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com>
<db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com> <IY3YJ.108064$3jp8.15219@fx33.iad>
<f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com> <7fb10857-53f2-464e-83c9-40664fb79eb1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e9733ba7-c937-4e71-a28c-c1ffc908b55en@googlegroups.com>
Subject: Re: Packing Hash Tables
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 15 Mar 2022 21:30:56 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 49
 by: MitchAlsup - Tue, 15 Mar 2022 21:30 UTC

On Tuesday, March 15, 2022 at 2:56:39 PM UTC-5, robf...@gmail.com wrote:
> On Tuesday, March 15, 2022 at 2:08:26 PM UTC-4, MitchAlsup wrote:
> > On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:
> > > MitchAlsup wrote:

> > <
> > I started here about 2006.
> I am using larger PTEs and smaller pages so absorbing fewer address bits (8)
> at one time. I do not think this makes a lot of difference since most of the time
> apps will fit into a one- or two-level table (8+8+12) = 256MB.
<
VAX made a mistake like this, too; although yours is not so radical as VAX and won't
incur as much overhead.
<
4+8+8+8+8+8+8+12 -> 7 levels of page tables: 49; if you go with nested page tables.
Compared to::
1+10+10+10+10+10+13 -> 5 levels of page tables: 25; went with nested page tables.
>
> I have been pondering the idea of storing garbage collection card status in the
> PTE. This would require 16 or 32 bits in the PTE. When a translation occurs the
> accessed bit of the PTE is updated in the TLB. At the same time the card status
> bit could be updated if the access was a pointer store. One issue is knowing
> when a pointer store is occurring. I have this detected using a dedicated pointer
> store instruction STPTR.
<
I have coherent TLBs, so any store to any PTP or PTE in the MMU is recognized
and either invalidated or snarfed.
<
I have been contemplating an instruction to load the address of a given PTE/PTP.
So, you* supply a virtual address and a LVL, and you get a pointer to the PTP/PTE
which would have been used at level LVL. Reading that address gets you the PTP or
PTE, writing that address alters the PTP/PTE.
<
Page Fault handlers already get the PTP/PTE at the point PAGEFAULT was raised.
So, what they need is a quick way of getting an address to the failing PTP/PTE.
LVL from the PTP/PTE tells the LDAPT instruction where to stop in the table.
Also Note: if the PAGEFAULT was raised in 1-st level translation it is raised to
Guest OS, whereas if it was raised in 2-nd level translation it is raised to the
Hypervisor. Isolating when HyperVisor needs to get in the game.
<
Should the specified level have been skipped, the returned address will be zero.
<
(*) You being Guest OS or HyperVisor.
>
> For garbage collection there is an array of cards somewhere in the memory
> system. It would be nice to be able to pack the cards into an existing structure.
> I am calling the collection of struct page entries Linux uses a page management
> table, PMT. I think garbage collection cards could be stored in this table. It may
> be needed to store the cards in the page table too however, as the PMT is global
> entity, and it might be desirable to process garbage collection for apps individually.

Re: Packing Hash Tables

<t0r6hu$p5m$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24268&group=comp.arch#24268

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Packing Hash Tables
Date: Tue, 15 Mar 2022 18:13:00 -0500
Organization: A noiseless patient Spider
Lines: 182
Message-ID: <t0r6hu$p5m$1@dont-email.me>
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<t0aokj$nnc$1@dont-email.me>
<056ffd21-d834-405b-976e-46c09755f949n@googlegroups.com>
<t0df7v$afv$1@dont-email.me>
<3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com>
<t0ennl$ald$1@dont-email.me>
<c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com>
<6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com>
<FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com>
<dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com>
<d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com>
<db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad>
<f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 15 Mar 2022 23:13:02 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="1fc4e2098977064c8935f28736d7576b";
logging-data="25782"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18laf/uOwkHel3ATFaSHV7Y"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Cancel-Lock: sha1:pyW82UhNPXOvA7hn19W/x4m/6PU=
In-Reply-To: <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
Content-Language: en-US
 by: BGB - Tue, 15 Mar 2022 23:13 UTC

On 3/15/2022 1:08 PM, MitchAlsup wrote:
> On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:
>> MitchAlsup wrote:
>>> On Monday, March 14, 2022 at 7:40:00 AM UTC-5, robf...@gmail.com wrote:
>>>> On Sunday, March 13, 2022 at 7:10:48 PM UTC-4, robf...@gmail.com wrote:
>>>
>>>>> By specifying int:16 as the type. The compiler supports size specs for ints of 8,16,32,64, and 128. Bytes
>>>>> are also accessible as ‘byte’. I think chars are int:16.. I think there is also a bit type to support bit arrays,
>>>>> but this is experimental.
>>>>> int:16 my_index;
>>>>>> PMA = Post Male Anxiety ?
>>>>> PMA = Physical Memory Attributes, stolen from RISCV docs.
>>>> Borrowed the level idea for my PT. I was just wondering what happens when a page at the same LVL as the
>>>> current is pointed to? Seems like it could form a linked list or a loop, or is a fault generated? At the moment
>>>> I have the code forcing the LVL down a level if it is the same or higher than the current LVL.
>>> <
>>> LVL is monotonically decreasing or a page fault is raised.
>> What about address translate skipping page table levels?
> <
> Skipping levels is, in my opinion, a necessity. I was part of the I/OMMU team
> at AMD where we put in level skipping as part of its design. I think CPU tables
> and I/OMMU tables should be one and the same, along with the "a pointer is a
> pointer" philosophy. I/O device should get virtual address of originating thread,
> I/O DMA should be translated thought page tables of originating thread, I/O
> completion interrupt should be delivered to Guest OS hosting originating thread.

One drawback of level skipping though, is that one can either ignore
bits (effectively multi-mapping the skipped levels), or assumes them to
be zero (doesn't work for randomized addresses), neither of which are
ideal in a large+sparse ASLR scenario.

But, OTOH, 8-level page tables would suck in terms of overhead, lots of
pages with only a single entry in use.

Though, a hacky option being that could add an alternate mechanism.
PTD Entries are one of several options:
A page representing another level of the page table;
A "Short Table Descriptor".

Say, PTD Entry:
(63:48): Reserved/MBZ (PTD), Virtual Index (STD)
(47:12): Page Address
(11: 5): Low Address (STD)
( 4: 2): STD Table Size
( 1: 0): Entry Mode

Mode:
00: Unmapped
01: Page Table
10: Reserved
11: STD

Table Size:
000: 4 Entries
001: 8 Entries
010: 16 Entries
011: 32 Entries
100: 64 Entries
101: 128 Entries
110: 256 Entries
111: 512 Entries
(111 overflows to a full page)

These tables allocated on a multiple of 32 bytes, with a single entry
table being padded to 4 entries.

The STD Entry being basically the same format as a PTD entry, except
that the high-order bits represent the index location if this were in a
PTD Page (or FFFF for unused entries).

Entries in an STD would be kept sorted in ascending order (allowing for
binary search).

The short tables would not be allowed for the last level of a page-table
lookup, which would be required to use a full page.

This is less needed in 48-bit mode, because with 16K pages, a 3-level
table can be used. The waste due to sparse page tables is likely to be
significantly lower.

>>
>> A 64-bit address with 4kB pages gives a 6 level tree with the bit fields
>>
>> level: 6 5 4 3 2 1 0
>> size: 7-9-9-9-9-9-12
>>
>> A two level page table would cover the lower 30 address bits
>> which would be enough address space for almost all applications.
>>
>> An 8kB page gives a 5 level tree 1-10-10-10-10-13 and
>> a two level page table covers the lower 33 address bits.
> <
> This is what I am using: 8K pages 63-bit VA space have not decided
> what to do with the top-most bit--yet. I used it a while back to access
> foreign address spaces, this has fallen out of favor when I found
> indirect accessing where a service provider can access customer
> memory via indirection (that is, use customer root pointer and ASID
> when making accesses).
> <
>> However I want multiple MMU root registers so it can switch tables
>> quickly so I would change the address bit fields to 4-7-10-10-10-13
>> and the top 4 bits select a HW MMU root[N] register.
> <
> Why not just use the top table<62..53> as 1024 Root pointers ?
> Other than having a <ahem> root pointer point at this page, there
> is no useful difference between a Root pointer and a PTP; they
> both point at a page of self describing descriptors {PTPs or PTEs}
> <
> Each of these top level Root pointers has an LVL field which
> describes which layers of the address space are skipped; or
> equivalently the size of that virtual address space.
> <
>> Again the lower two page table levels are usually sufficient.
> <
> your 7-bit table only uses 1/8 of the allocated page.
> So, I don't see an advantage of 16 root pointers has over 1024 root
> pointers.
>>
>> I want to be able to skip from a MMU root register to page table level 2.
> <
> Root.LVL = 011
> <
> You can also "run" 'cat' with a 23-bit VA space Root.LVL = 010.
> Root->page_or_ptes[va<22..12>] gives you the PTE.LVL = 001, without any
> PTP levels in the table.
> <
>> Also because page table pages can be reused at multiple levels
>> the skip count should be relative.
> <
> This adds arithmetic (albeit small adders) in the MMU function. Why can
> this not be determined when the tables are setup?
> <
>> Also I think there could be multiple skips in a walk, say a skip
>> from root to 4, and 4 to 2. At least I see no reason to disallow it.
> <
> Nor did I.
> <
>>
>> Each PTE has a field giving the relative number of levels to skip
>> in a table walk. With 8kB pages and a 5 level table, a 2-bit PTE
>> field allows it to skip directly from my HW root[N] register to level-2.
> <
> I use a 3-bit absolute next level encoding without arithmetic.
> Root.LVL determines the maximal size of the virtual address space.
> PTP.LVL determines the next level in the table
> previous PTP.LVL -> PTE.LVL = 001 determines the size of the page
> {001->PTE, 010->8K, 011->8M, 100->8G, 101->8T, 110->8E}
>>
>> Another issue is what the fill values should be that were skipped.
>> MMU can assume the the address bits skipped were all 0's,
>> or the PTE could have a bit to prescribe the skipped value as 0 or 1.
> <
> I am following the rules of canonicality AMD uses when skipping levels
> in the I/O MMU.
>>
>> There are also the page size selection field to consider.
>> With 8kB as the base page size, that give multiples that are
>> 13 = 8kB, 23 = 8 MB, 33 = 8 GB, 43 = 8 TB.
>> Four sizes should be sufficient, which requires 1 PTE bit at each level.
> <
> It is not a bit, it is part of an encoding. In any event, you end up with
> 5 (or 6) states, and this always requires 3-bits to encode. 2-bits for
> LVL and 1 bit for PTE is not usefully different than 3-bits for LVL
> with one encoding for PTE. The 3-bit encoding can also provide for
> PA=VA using another encoding.
> <
>> That allows the HW root to skip directly to level-2 where the
>> PTE specifies a large page size of 8 MB.
> <
> Root.LVL = 010
> page_of_PTEs[va<32..23>] gives you the 8MB PTE.LVL = 001; the
> 10-unused bits in the PTE specify the size of the 8MB mapping
> {16K,24K,...8M-8K,8M} so you don't have to allocate an entire 8M
> to this mapping. And so on as PTE maps get larger.
>>
>> Together this allows virtual address translate in a single memory access
>> in address space sizes up to 8 GB!
> <
> I started here about 2006.

Re: Packing Hash Tables

<3cd629e4-3623-4ad2-9e0f-ea9c992325e0n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24269&group=comp.arch#24269

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:3184:b0:67d:cce9:bab4 with SMTP id bi4-20020a05620a318400b0067dcce9bab4mr8328271qkb.685.1647388499360;
Tue, 15 Mar 2022 16:54:59 -0700 (PDT)
X-Received: by 2002:a05:6830:25d6:b0:5c9:49ef:3c5b with SMTP id
d22-20020a05683025d600b005c949ef3c5bmr7593019otu.331.1647388499093; Tue, 15
Mar 2022 16:54:59 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!2.eu.feeder.erje.net!feeder.erje.net!fdn.fr!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 15 Mar 2022 16:54:58 -0700 (PDT)
In-Reply-To: <t0r6hu$p5m$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:81b8:70b3:b884:aba8;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:81b8:70b3:b884:aba8
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<t0aokj$nnc$1@dont-email.me> <056ffd21-d834-405b-976e-46c09755f949n@googlegroups.com>
<t0df7v$afv$1@dont-email.me> <3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com>
<t0ennl$ald$1@dont-email.me> <c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com> <6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com> <FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com> <dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com> <d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com> <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad> <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<t0r6hu$p5m$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3cd629e4-3623-4ad2-9e0f-ea9c992325e0n@googlegroups.com>
Subject: Re: Packing Hash Tables
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 15 Mar 2022 23:54:59 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Tue, 15 Mar 2022 23:54 UTC

On Tuesday, March 15, 2022 at 6:13:06 PM UTC-5, BGB wrote:
> On 3/15/2022 1:08 PM, MitchAlsup wrote:
> > On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:
> >> MitchAlsup wrote:
> >>> On Monday, March 14, 2022 at 7:40:00 AM UTC-5, robf...@gmail.com wrote:
> >>>> On Sunday, March 13, 2022 at 7:10:48 PM UTC-4, robf...@gmail.com wrote:
> >>>
> >>>>> By specifying int:16 as the type. The compiler supports size specs for ints of 8,16,32,64, and 128. Bytes
> >>>>> are also accessible as ‘byte’. I think chars are int:16.. I think there is also a bit type to support bit arrays,
> >>>>> but this is experimental.
> >>>>> int:16 my_index;
> >>>>>> PMA = Post Male Anxiety ?
> >>>>> PMA = Physical Memory Attributes, stolen from RISCV docs.
> >>>> Borrowed the level idea for my PT. I was just wondering what happens when a page at the same LVL as the
> >>>> current is pointed to? Seems like it could form a linked list or a loop, or is a fault generated? At the moment
> >>>> I have the code forcing the LVL down a level if it is the same or higher than the current LVL.
> >>> <
> >>> LVL is monotonically decreasing or a page fault is raised.
> >> What about address translate skipping page table levels?
> > <
> > Skipping levels is, in my opinion, a necessity. I was part of the I/OMMU team
> > at AMD where we put in level skipping as part of its design. I think CPU tables
> > and I/OMMU tables should be one and the same, along with the "a pointer is a
> > pointer" philosophy. I/O device should get virtual address of originating thread,
> > I/O DMA should be translated thought page tables of originating thread, I/O
> > completion interrupt should be delivered to Guest OS hosting originating thread.
<
> One drawback of level skipping though, is that one can either ignore
> bits (effectively multi-mapping the skipped levels), or assumes them to
> be zero (doesn't work for randomized addresses), neither of which are
> ideal in a large+sparse ASLR scenario.
<
It is never wise to ignore bits (these always come back to haunt you later),
and it is always wise to check the reserved-must-be-zero bits for actually
being zero.
<
AMDs canonical address space makes you look for all 1s and all 0s instead
of just all 0s. not much added complexity for a useful service.
>
>
> But, OTOH, 8-level page tables would suck in terms of overhead, lots of
> pages with only a single entry in use.
>
> Though, a hacky option being that could add an alternate mechanism.
> PTD Entries are one of several options:
> A page representing another level of the page table;
> A "Short Table Descriptor".
>
> Say, PTD Entry:
> (63:48): Reserved/MBZ (PTD), Virtual Index (STD)
> (47:12): Page Address
> (11: 5): Low Address (STD)
> ( 4: 2): STD Table Size
> ( 1: 0): Entry Mode
>
> Mode:
> 00: Unmapped
> 01: Page Table
> 10: Reserved
> 11: STD
>
> Table Size:
> 000: 4 Entries
> 001: 8 Entries
> 010: 16 Entries
> 011: 32 Entries
> 100: 64 Entries
> 101: 128 Entries
> 110: 256 Entries
> 111: 512 Entries
> (111 overflows to a full page)
>
> These tables allocated on a multiple of 32 bytes, with a single entry
> table being padded to 4 entries.
>
> The STD Entry being basically the same format as a PTD entry, except
> that the high-order bits represent the index location if this were in a
> PTD Page (or FFFF for unused entries).
>
> Entries in an STD would be kept sorted in ascending order (allowing for
> binary search).
>
> The short tables would not be allowed for the last level of a page-table
> lookup, which would be required to use a full page.
>
>
> This is less needed in 48-bit mode, because with 16K pages, a 3-level
> table can be used. The waste due to sparse page tables is likely to be
> significantly lower.
> >>
> >> A 64-bit address with 4kB pages gives a 6 level tree with the bit fields
> >>
> >> level: 6 5 4 3 2 1 0
> >> size: 7-9-9-9-9-9-12
> >>
> >> A two level page table would cover the lower 30 address bits
> >> which would be enough address space for almost all applications.
> >>
> >> An 8kB page gives a 5 level tree 1-10-10-10-10-13 and
> >> a two level page table covers the lower 33 address bits.
> > <
> > This is what I am using: 8K pages 63-bit VA space have not decided
> > what to do with the top-most bit--yet. I used it a while back to access
> > foreign address spaces, this has fallen out of favor when I found
> > indirect accessing where a service provider can access customer
> > memory via indirection (that is, use customer root pointer and ASID
> > when making accesses).
> > <
> >> However I want multiple MMU root registers so it can switch tables
> >> quickly so I would change the address bit fields to 4-7-10-10-10-13
> >> and the top 4 bits select a HW MMU root[N] register.
> > <
> > Why not just use the top table<62..53> as 1024 Root pointers ?
> > Other than having a <ahem> root pointer point at this page, there
> > is no useful difference between a Root pointer and a PTP; they
> > both point at a page of self describing descriptors {PTPs or PTEs}
> > <
> > Each of these top level Root pointers has an LVL field which
> > describes which layers of the address space are skipped; or
> > equivalently the size of that virtual address space.
> > <
> >> Again the lower two page table levels are usually sufficient.
> > <
> > your 7-bit table only uses 1/8 of the allocated page.
> > So, I don't see an advantage of 16 root pointers has over 1024 root
> > pointers.
> >>
> >> I want to be able to skip from a MMU root register to page table level 2.
> > <
> > Root.LVL = 011
> > <
> > You can also "run" 'cat' with a 23-bit VA space Root.LVL = 010.
> > Root->page_or_ptes[va<22..12>] gives you the PTE.LVL = 001, without any
> > PTP levels in the table.
> > <
> >> Also because page table pages can be reused at multiple levels
> >> the skip count should be relative.
> > <
> > This adds arithmetic (albeit small adders) in the MMU function. Why can
> > this not be determined when the tables are setup?
> > <
> >> Also I think there could be multiple skips in a walk, say a skip
> >> from root to 4, and 4 to 2. At least I see no reason to disallow it.
> > <
> > Nor did I.
> > <
> >>
> >> Each PTE has a field giving the relative number of levels to skip
> >> in a table walk. With 8kB pages and a 5 level table, a 2-bit PTE
> >> field allows it to skip directly from my HW root[N] register to level-2.
> > <
> > I use a 3-bit absolute next level encoding without arithmetic.
> > Root.LVL determines the maximal size of the virtual address space.
> > PTP.LVL determines the next level in the table
> > previous PTP.LVL -> PTE.LVL = 001 determines the size of the page
> > {001->PTE, 010->8K, 011->8M, 100->8G, 101->8T, 110->8E}
> >>
> >> Another issue is what the fill values should be that were skipped.
> >> MMU can assume the the address bits skipped were all 0's,
> >> or the PTE could have a bit to prescribe the skipped value as 0 or 1.
> > <
> > I am following the rules of canonicality AMD uses when skipping levels
> > in the I/O MMU.
> >>
> >> There are also the page size selection field to consider.
> >> With 8kB as the base page size, that give multiples that are
> >> 13 = 8kB, 23 = 8 MB, 33 = 8 GB, 43 = 8 TB.
> >> Four sizes should be sufficient, which requires 1 PTE bit at each level.
> > <
> > It is not a bit, it is part of an encoding. In any event, you end up with
> > 5 (or 6) states, and this always requires 3-bits to encode. 2-bits for
> > LVL and 1 bit for PTE is not usefully different than 3-bits for LVL
> > with one encoding for PTE. The 3-bit encoding can also provide for
> > PA=VA using another encoding.
> > <
> >> That allows the HW root to skip directly to level-2 where the
> >> PTE specifies a large page size of 8 MB.
> > <
> > Root.LVL = 010
> > page_of_PTEs[va<32..23>] gives you the 8MB PTE.LVL = 001; the
> > 10-unused bits in the PTE specify the size of the 8MB mapping
> > {16K,24K,...8M-8K,8M} so you don't have to allocate an entire 8M
> > to this mapping. And so on as PTE maps get larger.
> >>
> >> Together this allows virtual address translate in a single memory access
> >> in address space sizes up to 8 GB!
> > <
> > I started here about 2006.


Click here to read the complete article
Re: Packing Hash Tables

<ae3ec2ef-ac46-4945-ad7f-40036a24f8bdn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24270&group=comp.arch#24270

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:a24a:0:b0:67b:4836:fe95 with SMTP id l71-20020a37a24a000000b0067b4836fe95mr19624221qke.109.1647390309995;
Tue, 15 Mar 2022 17:25:09 -0700 (PDT)
X-Received: by 2002:a05:6808:152b:b0:2ec:f48f:8120 with SMTP id
u43-20020a056808152b00b002ecf48f8120mr3084624oiw.58.1647390309642; Tue, 15
Mar 2022 17:25:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 15 Mar 2022 17:25:09 -0700 (PDT)
In-Reply-To: <3cd629e4-3623-4ad2-9e0f-ea9c992325e0n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1de1:fb00:4d31:2cda:47a7:b02b;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1de1:fb00:4d31:2cda:47a7:b02b
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<t0aokj$nnc$1@dont-email.me> <056ffd21-d834-405b-976e-46c09755f949n@googlegroups.com>
<t0df7v$afv$1@dont-email.me> <3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com>
<t0ennl$ald$1@dont-email.me> <c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com> <6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com> <FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com> <dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com> <d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com> <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad> <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<t0r6hu$p5m$1@dont-email.me> <3cd629e4-3623-4ad2-9e0f-ea9c992325e0n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ae3ec2ef-ac46-4945-ad7f-40036a24f8bdn@googlegroups.com>
Subject: Re: Packing Hash Tables
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Wed, 16 Mar 2022 00:25:09 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 281
 by: robf...@gmail.com - Wed, 16 Mar 2022 00:25 UTC

On Tuesday, March 15, 2022 at 7:55:01 PM UTC-4, MitchAlsup wrote:
> On Tuesday, March 15, 2022 at 6:13:06 PM UTC-5, BGB wrote:
> > On 3/15/2022 1:08 PM, MitchAlsup wrote:
> > > On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:
> > >> MitchAlsup wrote:
> > >>> On Monday, March 14, 2022 at 7:40:00 AM UTC-5, robf...@gmail.com wrote:
> > >>>> On Sunday, March 13, 2022 at 7:10:48 PM UTC-4, robf...@gmail.com wrote:
> > >>>
> > >>>>> By specifying int:16 as the type. The compiler supports size specs for ints of 8,16,32,64, and 128. Bytes
> > >>>>> are also accessible as ‘byte’. I think chars are int:16.. I think there is also a bit type to support bit arrays,
> > >>>>> but this is experimental.
> > >>>>> int:16 my_index;
> > >>>>>> PMA = Post Male Anxiety ?
> > >>>>> PMA = Physical Memory Attributes, stolen from RISCV docs.
> > >>>> Borrowed the level idea for my PT. I was just wondering what happens when a page at the same LVL as the
> > >>>> current is pointed to? Seems like it could form a linked list or a loop, or is a fault generated? At the moment
> > >>>> I have the code forcing the LVL down a level if it is the same or higher than the current LVL.
> > >>> <
> > >>> LVL is monotonically decreasing or a page fault is raised.
> > >> What about address translate skipping page table levels?
> > > <
> > > Skipping levels is, in my opinion, a necessity. I was part of the I/OMMU team
> > > at AMD where we put in level skipping as part of its design. I think CPU tables
> > > and I/OMMU tables should be one and the same, along with the "a pointer is a
> > > pointer" philosophy. I/O device should get virtual address of originating thread,
> > > I/O DMA should be translated thought page tables of originating thread, I/O
> > > completion interrupt should be delivered to Guest OS hosting originating thread.
> <
> > One drawback of level skipping though, is that one can either ignore
> > bits (effectively multi-mapping the skipped levels), or assumes them to
> > be zero (doesn't work for randomized addresses), neither of which are
> > ideal in a large+sparse ASLR scenario.
> <
> It is never wise to ignore bits (these always come back to haunt you later),
> and it is always wise to check the reserved-must-be-zero bits for actually
> being zero.
> <
> AMDs canonical address space makes you look for all 1s and all 0s instead
> of just all 0s. not much added complexity for a useful service.
> >
> >
> > But, OTOH, 8-level page tables would suck in terms of overhead, lots of
> > pages with only a single entry in use.
> >
> > Though, a hacky option being that could add an alternate mechanism.
> > PTD Entries are one of several options:
> > A page representing another level of the page table;
> > A "Short Table Descriptor".
> >
> > Say, PTD Entry:
> > (63:48): Reserved/MBZ (PTD), Virtual Index (STD)
> > (47:12): Page Address
> > (11: 5): Low Address (STD)
> > ( 4: 2): STD Table Size
> > ( 1: 0): Entry Mode
> >
> > Mode:
> > 00: Unmapped
> > 01: Page Table
> > 10: Reserved
> > 11: STD
> >
> > Table Size:
> > 000: 4 Entries
> > 001: 8 Entries
> > 010: 16 Entries
> > 011: 32 Entries
> > 100: 64 Entries
> > 101: 128 Entries
> > 110: 256 Entries
> > 111: 512 Entries
> > (111 overflows to a full page)
> >
> > These tables allocated on a multiple of 32 bytes, with a single entry
> > table being padded to 4 entries.
> >
> > The STD Entry being basically the same format as a PTD entry, except
> > that the high-order bits represent the index location if this were in a
> > PTD Page (or FFFF for unused entries).
> >
> > Entries in an STD would be kept sorted in ascending order (allowing for
> > binary search).
> >
> > The short tables would not be allowed for the last level of a page-table
> > lookup, which would be required to use a full page.
> >
> >
> > This is less needed in 48-bit mode, because with 16K pages, a 3-level
> > table can be used. The waste due to sparse page tables is likely to be
> > significantly lower.
> > >>
> > >> A 64-bit address with 4kB pages gives a 6 level tree with the bit fields
> > >>
> > >> level: 6 5 4 3 2 1 0
> > >> size: 7-9-9-9-9-9-12
> > >>
> > >> A two level page table would cover the lower 30 address bits
> > >> which would be enough address space for almost all applications.
> > >>
> > >> An 8kB page gives a 5 level tree 1-10-10-10-10-13 and
> > >> a two level page table covers the lower 33 address bits.
> > > <
> > > This is what I am using: 8K pages 63-bit VA space have not decided
> > > what to do with the top-most bit--yet. I used it a while back to access
> > > foreign address spaces, this has fallen out of favor when I found
> > > indirect accessing where a service provider can access customer
> > > memory via indirection (that is, use customer root pointer and ASID
> > > when making accesses).
> > > <
> > >> However I want multiple MMU root registers so it can switch tables
> > >> quickly so I would change the address bit fields to 4-7-10-10-10-13
> > >> and the top 4 bits select a HW MMU root[N] register.
> > > <
> > > Why not just use the top table<62..53> as 1024 Root pointers ?
> > > Other than having a <ahem> root pointer point at this page, there
> > > is no useful difference between a Root pointer and a PTP; they
> > > both point at a page of self describing descriptors {PTPs or PTEs}
> > > <
> > > Each of these top level Root pointers has an LVL field which
> > > describes which layers of the address space are skipped; or
> > > equivalently the size of that virtual address space.
> > > <
> > >> Again the lower two page table levels are usually sufficient.
> > > <
> > > your 7-bit table only uses 1/8 of the allocated page.
> > > So, I don't see an advantage of 16 root pointers has over 1024 root
> > > pointers.
> > >>
> > >> I want to be able to skip from a MMU root register to page table level 2.
> > > <
> > > Root.LVL = 011
> > > <
> > > You can also "run" 'cat' with a 23-bit VA space Root.LVL = 010.
> > > Root->page_or_ptes[va<22..12>] gives you the PTE.LVL = 001, without any
> > > PTP levels in the table.
> > > <
> > >> Also because page table pages can be reused at multiple levels
> > >> the skip count should be relative.
> > > <
> > > This adds arithmetic (albeit small adders) in the MMU function. Why can
> > > this not be determined when the tables are setup?
> > > <
> > >> Also I think there could be multiple skips in a walk, say a skip
> > >> from root to 4, and 4 to 2. At least I see no reason to disallow it.
> > > <
> > > Nor did I.
> > > <
> > >>
> > >> Each PTE has a field giving the relative number of levels to skip
> > >> in a table walk. With 8kB pages and a 5 level table, a 2-bit PTE
> > >> field allows it to skip directly from my HW root[N] register to level-2.
> > > <
> > > I use a 3-bit absolute next level encoding without arithmetic.
> > > Root.LVL determines the maximal size of the virtual address space.
> > > PTP.LVL determines the next level in the table
> > > previous PTP.LVL -> PTE.LVL = 001 determines the size of the page
> > > {001->PTE, 010->8K, 011->8M, 100->8G, 101->8T, 110->8E}
> > >>
> > >> Another issue is what the fill values should be that were skipped.
> > >> MMU can assume the the address bits skipped were all 0's,
> > >> or the PTE could have a bit to prescribe the skipped value as 0 or 1..
> > > <
> > > I am following the rules of canonicality AMD uses when skipping levels
> > > in the I/O MMU.
> > >>
> > >> There are also the page size selection field to consider.
> > >> With 8kB as the base page size, that give multiples that are
> > >> 13 = 8kB, 23 = 8 MB, 33 = 8 GB, 43 = 8 TB.
> > >> Four sizes should be sufficient, which requires 1 PTE bit at each level.
> > > <
> > > It is not a bit, it is part of an encoding. In any event, you end up with
> > > 5 (or 6) states, and this always requires 3-bits to encode. 2-bits for
> > > LVL and 1 bit for PTE is not usefully different than 3-bits for LVL
> > > with one encoding for PTE. The 3-bit encoding can also provide for
> > > PA=VA using another encoding.
> > > <
> > >> That allows the HW root to skip directly to level-2 where the
> > >> PTE specifies a large page size of 8 MB.
> > > <
> > > Root.LVL = 010
> > > page_of_PTEs[va<32..23>] gives you the 8MB PTE.LVL = 001; the
> > > 10-unused bits in the PTE specify the size of the 8MB mapping
> > > {16K,24K,...8M-8K,8M} so you don't have to allocate an entire 8M
> > > to this mapping. And so on as PTE maps get larger.
> > >>
> > >> Together this allows virtual address translate in a single memory access
> > >> in address space sizes up to 8 GB!
> > > <
> > > I started here about 2006.


Click here to read the complete article
Re: Packing Hash Tables

<3TbYJ.121927$GjY3.90377@fx01.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24272&group=comp.arch#24272

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx01.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Packing Hash Tables
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com> <t0aokj$nnc$1@dont-email.me> <056ffd21-d834-405b-976e-46c09755f949n@googlegroups.com> <t0df7v$afv$1@dont-email.me> <3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com> <t0ennl$ald$1@dont-email.me> <c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com> <5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com> <6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com> <3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com> <FAqXJ.156801$Gojc.137903@fx99.iad> <691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com> <dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com> <d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com> <d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com> <de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com> <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com> <IY3YJ.108064$3jp8.15219@fx33.iad> <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com> <t0r6hu$p5m$1@dont-email.me>
In-Reply-To: <t0r6hu$p5m$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 40
Message-ID: <3TbYJ.121927$GjY3.90377@fx01.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 16 Mar 2022 02:23:59 UTC
Date: Tue, 15 Mar 2022 22:23:43 -0400
X-Received-Bytes: 3339
 by: EricP - Wed, 16 Mar 2022 02:23 UTC

BGB wrote:
> On 3/15/2022 1:08 PM, MitchAlsup wrote:
>> On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:
>>> What about address translate skipping page table levels?
>> <
>> Skipping levels is, in my opinion, a necessity. I was part of the
>> I/OMMU team
>> at AMD where we put in level skipping as part of its design. I think
>> CPU tables
>> and I/OMMU tables should be one and the same, along with the "a
>> pointer is a
>> pointer" philosophy. I/O device should get virtual address of
>> originating thread,
>> I/O DMA should be translated thought page tables of originating
>> thread, I/O
>> completion interrupt should be delivered to Guest OS hosting
>> originating thread.
>
> One drawback of level skipping though, is that one can either ignore
> bits (effectively multi-mapping the skipped levels), or assumes them to
> be zero (doesn't work for randomized addresses), neither of which are
> ideal in a large+sparse ASLR scenario.

As I mention below, it can assume the missing values are 0's
or have a PTE bit to specify the value.

A PTE bit allows one to differentiate between whether the
page is part of a grow-up or grow-down memory section.
E.g. heap areas expand upward, stack expand downwards.

>>> Another issue is what the fill values should be that were skipped.
>>> MMU can assume the the address bits skipped were all 0's,
>>> or the PTE could have a bit to prescribe the skipped value as 0 or 1.
>> <
>> I am following the rules of canonicality AMD uses when skipping levels
>> in the I/O MMU.

More feature bits also puts more presure on the PTE which is already full.

Re: Packing Hash Tables

<ddba6add-19ab-43b6-b98f-58110a886f8cn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24273&group=comp.arch#24273

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:e6c5:0:b0:42c:d5f:7e4c with SMTP id l5-20020a0ce6c5000000b0042c0d5f7e4cmr23939409qvn.93.1647398766246;
Tue, 15 Mar 2022 19:46:06 -0700 (PDT)
X-Received: by 2002:a05:6808:55:b0:2ec:a4ae:fdde with SMTP id
v21-20020a056808005500b002eca4aefddemr2990993oic.106.1647398765607; Tue, 15
Mar 2022 19:46:05 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 15 Mar 2022 19:46:05 -0700 (PDT)
In-Reply-To: <3TbYJ.121927$GjY3.90377@fx01.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1de1:fb00:4d31:2cda:47a7:b02b;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1de1:fb00:4d31:2cda:47a7:b02b
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<t0aokj$nnc$1@dont-email.me> <056ffd21-d834-405b-976e-46c09755f949n@googlegroups.com>
<t0df7v$afv$1@dont-email.me> <3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com>
<t0ennl$ald$1@dont-email.me> <c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com> <6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com> <FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com> <dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com> <d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com> <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad> <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<t0r6hu$p5m$1@dont-email.me> <3TbYJ.121927$GjY3.90377@fx01.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ddba6add-19ab-43b6-b98f-58110a886f8cn@googlegroups.com>
Subject: Re: Packing Hash Tables
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Wed, 16 Mar 2022 02:46:06 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 54
 by: robf...@gmail.com - Wed, 16 Mar 2022 02:46 UTC

On Tuesday, March 15, 2022 at 10:24:03 PM UTC-4, EricP wrote:
> BGB wrote:
> > On 3/15/2022 1:08 PM, MitchAlsup wrote:
> >> On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:
> >>> What about address translate skipping page table levels?
> >> <
> >> Skipping levels is, in my opinion, a necessity. I was part of the
> >> I/OMMU team
> >> at AMD where we put in level skipping as part of its design. I think
> >> CPU tables
> >> and I/OMMU tables should be one and the same, along with the "a
> >> pointer is a
> >> pointer" philosophy. I/O device should get virtual address of
> >> originating thread,
> >> I/O DMA should be translated thought page tables of originating
> >> thread, I/O
> >> completion interrupt should be delivered to Guest OS hosting
> >> originating thread.
> >
> > One drawback of level skipping though, is that one can either ignore
> > bits (effectively multi-mapping the skipped levels), or assumes them to
> > be zero (doesn't work for randomized addresses), neither of which are
> > ideal in a large+sparse ASLR scenario.
> As I mention below, it can assume the missing values are 0's
> or have a PTE bit to specify the value.
>
> A PTE bit allows one to differentiate between whether the
> page is part of a grow-up or grow-down memory section.
> E.g. heap areas expand upward, stack expand downwards.
> >>> Another issue is what the fill values should be that were skipped.
> >>> MMU can assume the the address bits skipped were all 0's,
> >>> or the PTE could have a bit to prescribe the skipped value as 0 or 1.
> >> <
> >> I am following the rules of canonicality AMD uses when skipping levels
> >> in the I/O MMU.
> More feature bits also puts more presure on the PTE which is already full.

>A PTE bit allows one to differentiate between whether the
>page is part of a grow-up or grow-down memory section.
>E.g. heap areas expand upward, stack expand downwards.

I wonder if the address could be sign-extended, or multi-bit sign-extended?
One may want to have ROM / IO mapped to virtual addresses that are the
same as physical ones, high in memory.

>So, what they need is a quick way of getting an address to the failing PTP/PTE.

I am reminded of a trace cache. Something similar could be used.

It would be easier if a link back to the parent were in a PTP. Since the failing PTE
/ PTP is known, back links could be followed. I wonder if an empty entry in the
PTP could be used as a back-link. Is it faster to scan a page for a back-link or use
other means? If there were extra room in the PTP a link could be added in the
header.

Re: Packing Hash Tables

<t0rjd1$d6i$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24274&group=comp.arch#24274

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Packing Hash Tables
Date: Tue, 15 Mar 2022 21:52:16 -0500
Organization: A noiseless patient Spider
Lines: 260
Message-ID: <t0rjd1$d6i$1@dont-email.me>
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<t0df7v$afv$1@dont-email.me>
<3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com>
<t0ennl$ald$1@dont-email.me>
<c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com>
<6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com>
<FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com>
<dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com>
<d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com>
<db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad>
<f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<t0r6hu$p5m$1@dont-email.me>
<3cd629e4-3623-4ad2-9e0f-ea9c992325e0n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 16 Mar 2022 02:52:17 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="1fc4e2098977064c8935f28736d7576b";
logging-data="13522"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19k1Vdrqp+yRHDw3z1tgayd"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Cancel-Lock: sha1:iG3c/8yOam3xkDcvngbJEgc9fcM=
In-Reply-To: <3cd629e4-3623-4ad2-9e0f-ea9c992325e0n@googlegroups.com>
Content-Language: en-US
 by: BGB - Wed, 16 Mar 2022 02:52 UTC

On 3/15/2022 6:54 PM, MitchAlsup wrote:
> On Tuesday, March 15, 2022 at 6:13:06 PM UTC-5, BGB wrote:
>> On 3/15/2022 1:08 PM, MitchAlsup wrote:
>>> On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:
>>>> MitchAlsup wrote:
>>>>> On Monday, March 14, 2022 at 7:40:00 AM UTC-5, robf...@gmail.com wrote:
>>>>>> On Sunday, March 13, 2022 at 7:10:48 PM UTC-4, robf...@gmail.com wrote:
>>>>>
>>>>>>> By specifying int:16 as the type. The compiler supports size specs for ints of 8,16,32,64, and 128. Bytes
>>>>>>> are also accessible as ‘byte’. I think chars are int:16.. I think there is also a bit type to support bit arrays,
>>>>>>> but this is experimental.
>>>>>>> int:16 my_index;
>>>>>>>> PMA = Post Male Anxiety ?
>>>>>>> PMA = Physical Memory Attributes, stolen from RISCV docs.
>>>>>> Borrowed the level idea for my PT. I was just wondering what happens when a page at the same LVL as the
>>>>>> current is pointed to? Seems like it could form a linked list or a loop, or is a fault generated? At the moment
>>>>>> I have the code forcing the LVL down a level if it is the same or higher than the current LVL.
>>>>> <
>>>>> LVL is monotonically decreasing or a page fault is raised.
>>>> What about address translate skipping page table levels?
>>> <
>>> Skipping levels is, in my opinion, a necessity. I was part of the I/OMMU team
>>> at AMD where we put in level skipping as part of its design. I think CPU tables
>>> and I/OMMU tables should be one and the same, along with the "a pointer is a
>>> pointer" philosophy. I/O device should get virtual address of originating thread,
>>> I/O DMA should be translated thought page tables of originating thread, I/O
>>> completion interrupt should be delivered to Guest OS hosting originating thread.
> <
>> One drawback of level skipping though, is that one can either ignore
>> bits (effectively multi-mapping the skipped levels), or assumes them to
>> be zero (doesn't work for randomized addresses), neither of which are
>> ideal in a large+sparse ASLR scenario.
> <
> It is never wise to ignore bits (these always come back to haunt you later),
> and it is always wise to check the reserved-must-be-zero bits for actually
> being zero.
> <
> AMDs canonical address space makes you look for all 1s and all 0s instead
> of just all 0s. not much added complexity for a useful service.

I was not proposing ignoring these bits.
As I see it, treating these bits as "anything may match" would seriously
weaken the use of ASLR.

>>
>>
>> But, OTOH, 8-level page tables would suck in terms of overhead, lots of
>> pages with only a single entry in use.
>>
>> Though, a hacky option being that could add an alternate mechanism.
>> PTD Entries are one of several options:
>> A page representing another level of the page table;
>> A "Short Table Descriptor".
>>
>> Say, PTD Entry:
>> (63:48): Reserved/MBZ (PTD), Virtual Index (STD)
>> (47:12): Page Address
>> (11: 5): Low Address (STD)
>> ( 4: 2): STD Table Size
>> ( 1: 0): Entry Mode
>>
>> Mode:
>> 00: Unmapped
>> 01: Page Table
>> 10: Reserved
>> 11: STD
>>
>> Table Size:
>> 000: 4 Entries
>> 001: 8 Entries
>> 010: 16 Entries
>> 011: 32 Entries
>> 100: 64 Entries
>> 101: 128 Entries
>> 110: 256 Entries
>> 111: 512 Entries
>> (111 overflows to a full page)
>>
>> These tables allocated on a multiple of 32 bytes, with a single entry
>> table being padded to 4 entries.
>>
>> The STD Entry being basically the same format as a PTD entry, except
>> that the high-order bits represent the index location if this were in a
>> PTD Page (or FFFF for unused entries).
>>
>> Entries in an STD would be kept sorted in ascending order (allowing for
>> binary search).
>>
>> The short tables would not be allowed for the last level of a page-table
>> lookup, which would be required to use a full page.
>>
>>
>> This is less needed in 48-bit mode, because with 16K pages, a 3-level
>> table can be used. The waste due to sparse page tables is likely to be
>> significantly lower.
>>>>
>>>> A 64-bit address with 4kB pages gives a 6 level tree with the bit fields
>>>>
>>>> level: 6 5 4 3 2 1 0
>>>> size: 7-9-9-9-9-9-12
>>>>
>>>> A two level page table would cover the lower 30 address bits
>>>> which would be enough address space for almost all applications.
>>>>
>>>> An 8kB page gives a 5 level tree 1-10-10-10-10-13 and
>>>> a two level page table covers the lower 33 address bits.
>>> <
>>> This is what I am using: 8K pages 63-bit VA space have not decided
>>> what to do with the top-most bit--yet. I used it a while back to access
>>> foreign address spaces, this has fallen out of favor when I found
>>> indirect accessing where a service provider can access customer
>>> memory via indirection (that is, use customer root pointer and ASID
>>> when making accesses).
>>> <

I guess I can note, 3 levels with 16K pages:
Each page holds 2048 entries, or 11 bits.
3*11+14 = 47
The (High) 8000..FFFF range doesn't use the main page table.

Meanwhile:
7*11+14=91 (Doesn't cover a 96 bit space)
8*11+14=102 (Covers a 96 bit space)

Now, say, 256 1M allocs spread over an address space, each falling in
its own randomized part of the space, with a naive page-table scheme.

3-level:
Uses ~ 8MB of page tables.

8-level:
Uses ~ 29MB of page tables.

It would go up linearly with the number of allocations, so 2048
allocations would eat 232MB of page tables, but should then start
leveling off (very slowly) as the number of 2-entry tables starts to
increase, ...

The 'STD' scheme could reduce this:
3-level: ~ 4MB
8-level: Also ~ 4MB

Most of the savings would be by reducing 1-entry tables pages from 16K
to 32B. Though, would still expand linearly with the number of
allocations (due mostly to the bottom level page tables).

Managing the page tables as a single giant B-Tree could be somewhat more
space efficient. A B-Tree could reduce 256x 1MB allocs to around 320K
regardless of how they are organized in the address space.

This would grow linearly based on the total number of pages allocated,
in the area of around 20 bytes per page, assuming 128-bit PTEs.

This would roughly double though for 96-bit addressing though (requiring
a 256-bit PTE), but at 640K, would still be the most space-efficient option.

The depth of the tree would also be linearly dependent on the total
number of pages allocated.

Likely cost is that the computational cost for a B-Tree lookup would be
considerably higher than for a more conventional page-table structure.
Computational cost (and tree depth) would be a logarithmic scale
relative to the total number of pages allocated.

A chain-hash structure could be both space efficient and simpler to work
with. Computational cost would scale linearly with the total number of
pages allocated.

>>>> However I want multiple MMU root registers so it can switch tables
>>>> quickly so I would change the address bit fields to 4-7-10-10-10-13
>>>> and the top 4 bits select a HW MMU root[N] register.
>>> <
>>> Why not just use the top table<62..53> as 1024 Root pointers ?
>>> Other than having a <ahem> root pointer point at this page, there
>>> is no useful difference between a Root pointer and a PTP; they
>>> both point at a page of self describing descriptors {PTPs or PTEs}
>>> <
>>> Each of these top level Root pointers has an LVL field which
>>> describes which layers of the address space are skipped; or
>>> equivalently the size of that virtual address space.
>>> <
>>>> Again the lower two page table levels are usually sufficient.
>>> <
>>> your 7-bit table only uses 1/8 of the allocated page.
>>> So, I don't see an advantage of 16 root pointers has over 1024 root
>>> pointers.
>>>>
>>>> I want to be able to skip from a MMU root register to page table level 2.
>>> <
>>> Root.LVL = 011
>>> <
>>> You can also "run" 'cat' with a 23-bit VA space Root.LVL = 010.
>>> Root->page_or_ptes[va<22..12>] gives you the PTE.LVL = 001, without any
>>> PTP levels in the table.
>>> <
>>>> Also because page table pages can be reused at multiple levels
>>>> the skip count should be relative.
>>> <
>>> This adds arithmetic (albeit small adders) in the MMU function. Why can
>>> this not be determined when the tables are setup?
>>> <
>>>> Also I think there could be multiple skips in a walk, say a skip
>>>> from root to 4, and 4 to 2. At least I see no reason to disallow it.
>>> <
>>> Nor did I.
>>> <
>>>>
>>>> Each PTE has a field giving the relative number of levels to skip
>>>> in a table walk. With 8kB pages and a 5 level table, a 2-bit PTE
>>>> field allows it to skip directly from my HW root[N] register to level-2.
>>> <
>>> I use a 3-bit absolute next level encoding without arithmetic.
>>> Root.LVL determines the maximal size of the virtual address space.
>>> PTP.LVL determines the next level in the table
>>> previous PTP.LVL -> PTE.LVL = 001 determines the size of the page
>>> {001->PTE, 010->8K, 011->8M, 100->8G, 101->8T, 110->8E}
>>>>
>>>> Another issue is what the fill values should be that were skipped.
>>>> MMU can assume the the address bits skipped were all 0's,
>>>> or the PTE could have a bit to prescribe the skipped value as 0 or 1.
>>> <
>>> I am following the rules of canonicality AMD uses when skipping levels
>>> in the I/O MMU.
>>>>
>>>> There are also the page size selection field to consider.
>>>> With 8kB as the base page size, that give multiples that are
>>>> 13 = 8kB, 23 = 8 MB, 33 = 8 GB, 43 = 8 TB.
>>>> Four sizes should be sufficient, which requires 1 PTE bit at each level.
>>> <
>>> It is not a bit, it is part of an encoding. In any event, you end up with
>>> 5 (or 6) states, and this always requires 3-bits to encode. 2-bits for
>>> LVL and 1 bit for PTE is not usefully different than 3-bits for LVL
>>> with one encoding for PTE. The 3-bit encoding can also provide for
>>> PA=VA using another encoding.
>>> <
>>>> That allows the HW root to skip directly to level-2 where the
>>>> PTE specifies a large page size of 8 MB.
>>> <
>>> Root.LVL = 010
>>> page_of_PTEs[va<32..23>] gives you the 8MB PTE.LVL = 001; the
>>> 10-unused bits in the PTE specify the size of the 8MB mapping
>>> {16K,24K,...8M-8K,8M} so you don't have to allocate an entire 8M
>>> to this mapping. And so on as PTE maps get larger.
>>>>
>>>> Together this allows virtual address translate in a single memory access
>>>> in address space sizes up to 8 GB!
>>> <
>>> I started here about 2006.


Click here to read the complete article
Re: Packing Hash Tables

<448e523a-bd08-414d-8489-f7453f2218edn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24275&group=comp.arch#24275

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:a24a:0:b0:67b:4836:fe95 with SMTP id l71-20020a37a24a000000b0067b4836fe95mr19946139qke.109.1647405190975;
Tue, 15 Mar 2022 21:33:10 -0700 (PDT)
X-Received: by 2002:a05:6870:f2a5:b0:da:b3f:2b50 with SMTP id
u37-20020a056870f2a500b000da0b3f2b50mr2706294oap.239.1647405190631; Tue, 15
Mar 2022 21:33:10 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 15 Mar 2022 21:33:10 -0700 (PDT)
In-Reply-To: <t0rjd1$d6i$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1de1:fb00:4d31:2cda:47a7:b02b;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1de1:fb00:4d31:2cda:47a7:b02b
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<t0df7v$afv$1@dont-email.me> <3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com>
<t0ennl$ald$1@dont-email.me> <c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com> <6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com> <FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com> <dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com> <d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com> <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad> <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<t0r6hu$p5m$1@dont-email.me> <3cd629e4-3623-4ad2-9e0f-ea9c992325e0n@googlegroups.com>
<t0rjd1$d6i$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <448e523a-bd08-414d-8489-f7453f2218edn@googlegroups.com>
Subject: Re: Packing Hash Tables
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Wed, 16 Mar 2022 04:33:10 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 331
 by: robf...@gmail.com - Wed, 16 Mar 2022 04:33 UTC

On Tuesday, March 15, 2022 at 10:52:20 PM UTC-4, BGB wrote:
> On 3/15/2022 6:54 PM, MitchAlsup wrote:
> > On Tuesday, March 15, 2022 at 6:13:06 PM UTC-5, BGB wrote:
> >> On 3/15/2022 1:08 PM, MitchAlsup wrote:
> >>> On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:
> >>>> MitchAlsup wrote:
> >>>>> On Monday, March 14, 2022 at 7:40:00 AM UTC-5, robf...@gmail.com wrote:
> >>>>>> On Sunday, March 13, 2022 at 7:10:48 PM UTC-4, robf...@gmail.com wrote:
> >>>>>
> >>>>>>> By specifying int:16 as the type. The compiler supports size specs for ints of 8,16,32,64, and 128. Bytes
> >>>>>>> are also accessible as ‘byte’. I think chars are int:16.. I think there is also a bit type to support bit arrays,
> >>>>>>> but this is experimental.
> >>>>>>> int:16 my_index;
> >>>>>>>> PMA = Post Male Anxiety ?
> >>>>>>> PMA = Physical Memory Attributes, stolen from RISCV docs.
> >>>>>> Borrowed the level idea for my PT. I was just wondering what happens when a page at the same LVL as the
> >>>>>> current is pointed to? Seems like it could form a linked list or a loop, or is a fault generated? At the moment
> >>>>>> I have the code forcing the LVL down a level if it is the same or higher than the current LVL.
> >>>>> <
> >>>>> LVL is monotonically decreasing or a page fault is raised.
> >>>> What about address translate skipping page table levels?
> >>> <
> >>> Skipping levels is, in my opinion, a necessity. I was part of the I/OMMU team
> >>> at AMD where we put in level skipping as part of its design. I think CPU tables
> >>> and I/OMMU tables should be one and the same, along with the "a pointer is a
> >>> pointer" philosophy. I/O device should get virtual address of originating thread,
> >>> I/O DMA should be translated thought page tables of originating thread, I/O
> >>> completion interrupt should be delivered to Guest OS hosting originating thread.
> > <
> >> One drawback of level skipping though, is that one can either ignore
> >> bits (effectively multi-mapping the skipped levels), or assumes them to
> >> be zero (doesn't work for randomized addresses), neither of which are
> >> ideal in a large+sparse ASLR scenario.
> > <
> > It is never wise to ignore bits (these always come back to haunt you later),
> > and it is always wise to check the reserved-must-be-zero bits for actually
> > being zero.
> > <
> > AMDs canonical address space makes you look for all 1s and all 0s instead
> > of just all 0s. not much added complexity for a useful service.
> I was not proposing ignoring these bits.
> As I see it, treating these bits as "anything may match" would seriously
> weaken the use of ASLR.
> >>
> >>
> >> But, OTOH, 8-level page tables would suck in terms of overhead, lots of
> >> pages with only a single entry in use.
> >>
> >> Though, a hacky option being that could add an alternate mechanism.
> >> PTD Entries are one of several options:
> >> A page representing another level of the page table;
> >> A "Short Table Descriptor".
> >>
> >> Say, PTD Entry:
> >> (63:48): Reserved/MBZ (PTD), Virtual Index (STD)
> >> (47:12): Page Address
> >> (11: 5): Low Address (STD)
> >> ( 4: 2): STD Table Size
> >> ( 1: 0): Entry Mode
> >>
> >> Mode:
> >> 00: Unmapped
> >> 01: Page Table
> >> 10: Reserved
> >> 11: STD
> >>
> >> Table Size:
> >> 000: 4 Entries
> >> 001: 8 Entries
> >> 010: 16 Entries
> >> 011: 32 Entries
> >> 100: 64 Entries
> >> 101: 128 Entries
> >> 110: 256 Entries
> >> 111: 512 Entries
> >> (111 overflows to a full page)
> >>
> >> These tables allocated on a multiple of 32 bytes, with a single entry
> >> table being padded to 4 entries.
> >>
> >> The STD Entry being basically the same format as a PTD entry, except
> >> that the high-order bits represent the index location if this were in a
> >> PTD Page (or FFFF for unused entries).
> >>
> >> Entries in an STD would be kept sorted in ascending order (allowing for
> >> binary search).
> >>
> >> The short tables would not be allowed for the last level of a page-table
> >> lookup, which would be required to use a full page.
> >>
> >>
> >> This is less needed in 48-bit mode, because with 16K pages, a 3-level
> >> table can be used. The waste due to sparse page tables is likely to be
> >> significantly lower.
> >>>>
> >>>> A 64-bit address with 4kB pages gives a 6 level tree with the bit fields
> >>>>
> >>>> level: 6 5 4 3 2 1 0
> >>>> size: 7-9-9-9-9-9-12
> >>>>
> >>>> A two level page table would cover the lower 30 address bits
> >>>> which would be enough address space for almost all applications.
> >>>>
> >>>> An 8kB page gives a 5 level tree 1-10-10-10-10-13 and
> >>>> a two level page table covers the lower 33 address bits.
> >>> <
> >>> This is what I am using: 8K pages 63-bit VA space have not decided
> >>> what to do with the top-most bit--yet. I used it a while back to access
> >>> foreign address spaces, this has fallen out of favor when I found
> >>> indirect accessing where a service provider can access customer
> >>> memory via indirection (that is, use customer root pointer and ASID
> >>> when making accesses).
> >>> <
> I guess I can note, 3 levels with 16K pages:
> Each page holds 2048 entries, or 11 bits.
> 3*11+14 = 47
> The (High) 8000..FFFF range doesn't use the main page table.
>
> Meanwhile:
> 7*11+14=91 (Doesn't cover a 96 bit space)
> 8*11+14=102 (Covers a 96 bit space)
>
>
> Now, say, 256 1M allocs spread over an address space, each falling in
> its own randomized part of the space, with a naive page-table scheme.
>
> 3-level:
> Uses ~ 8MB of page tables.
>
> 8-level:
> Uses ~ 29MB of page tables.
>
>
> It would go up linearly with the number of allocations, so 2048
> allocations would eat 232MB of page tables, but should then start
> leveling off (very slowly) as the number of 2-entry tables starts to
> increase, ...
>
>
>
> The 'STD' scheme could reduce this:
> 3-level: ~ 4MB
> 8-level: Also ~ 4MB
>
> Most of the savings would be by reducing 1-entry tables pages from 16K
> to 32B. Though, would still expand linearly with the number of
> allocations (due mostly to the bottom level page tables).
>
>
>
> Managing the page tables as a single giant B-Tree could be somewhat more
> space efficient. A B-Tree could reduce 256x 1MB allocs to around 320K
> regardless of how they are organized in the address space.
>
>
> This would grow linearly based on the total number of pages allocated,
> in the area of around 20 bytes per page, assuming 128-bit PTEs.
>
> This would roughly double though for 96-bit addressing though (requiring
> a 256-bit PTE), but at 640K, would still be the most space-efficient option.
>
> The depth of the tree would also be linearly dependent on the total
> number of pages allocated.
>
> Likely cost is that the computational cost for a B-Tree lookup would be
> considerably higher than for a more conventional page-table structure.
> Computational cost (and tree depth) would be a logarithmic scale
> relative to the total number of pages allocated.
>
>
> A chain-hash structure could be both space efficient and simpler to work
> with. Computational cost would scale linearly with the total number of
> pages allocated.
> >>>> However I want multiple MMU root registers so it can switch tables
> >>>> quickly so I would change the address bit fields to 4-7-10-10-10-13
> >>>> and the top 4 bits select a HW MMU root[N] register.
> >>> <
> >>> Why not just use the top table<62..53> as 1024 Root pointers ?
> >>> Other than having a <ahem> root pointer point at this page, there
> >>> is no useful difference between a Root pointer and a PTP; they
> >>> both point at a page of self describing descriptors {PTPs or PTEs}
> >>> <
> >>> Each of these top level Root pointers has an LVL field which
> >>> describes which layers of the address space are skipped; or
> >>> equivalently the size of that virtual address space.
> >>> <
> >>>> Again the lower two page table levels are usually sufficient.
> >>> <
> >>> your 7-bit table only uses 1/8 of the allocated page.
> >>> So, I don't see an advantage of 16 root pointers has over 1024 root
> >>> pointers.
> >>>>
> >>>> I want to be able to skip from a MMU root register to page table level 2.
> >>> <
> >>> Root.LVL = 011
> >>> <
> >>> You can also "run" 'cat' with a 23-bit VA space Root.LVL = 010.
> >>> Root->page_or_ptes[va<22..12>] gives you the PTE.LVL = 001, without any
> >>> PTP levels in the table.
> >>> <
> >>>> Also because page table pages can be reused at multiple levels
> >>>> the skip count should be relative.
> >>> <
> >>> This adds arithmetic (albeit small adders) in the MMU function. Why can
> >>> this not be determined when the tables are setup?
> >>> <
> >>>> Also I think there could be multiple skips in a walk, say a skip
> >>>> from root to 4, and 4 to 2. At least I see no reason to disallow it.
> >>> <
> >>> Nor did I.
> >>> <
> >>>>
> >>>> Each PTE has a field giving the relative number of levels to skip
> >>>> in a table walk. With 8kB pages and a 5 level table, a 2-bit PTE
> >>>> field allows it to skip directly from my HW root[N] register to level-2.
> >>> <
> >>> I use a 3-bit absolute next level encoding without arithmetic.
> >>> Root.LVL determines the maximal size of the virtual address space.
> >>> PTP.LVL determines the next level in the table
> >>> previous PTP.LVL -> PTE.LVL = 001 determines the size of the page
> >>> {001->PTE, 010->8K, 011->8M, 100->8G, 101->8T, 110->8E}
> >>>>
> >>>> Another issue is what the fill values should be that were skipped.
> >>>> MMU can assume the the address bits skipped were all 0's,
> >>>> or the PTE could have a bit to prescribe the skipped value as 0 or 1..
> >>> <
> >>> I am following the rules of canonicality AMD uses when skipping levels
> >>> in the I/O MMU.
> >>>>
> >>>> There are also the page size selection field to consider.
> >>>> With 8kB as the base page size, that give multiples that are
> >>>> 13 = 8kB, 23 = 8 MB, 33 = 8 GB, 43 = 8 TB.
> >>>> Four sizes should be sufficient, which requires 1 PTE bit at each level.
> >>> <
> >>> It is not a bit, it is part of an encoding. In any event, you end up with
> >>> 5 (or 6) states, and this always requires 3-bits to encode. 2-bits for
> >>> LVL and 1 bit for PTE is not usefully different than 3-bits for LVL
> >>> with one encoding for PTE. The 3-bit encoding can also provide for
> >>> PA=VA using another encoding.
> >>> <
> >>>> That allows the HW root to skip directly to level-2 where the
> >>>> PTE specifies a large page size of 8 MB.
> >>> <
> >>> Root.LVL = 010
> >>> page_of_PTEs[va<32..23>] gives you the 8MB PTE.LVL = 001; the
> >>> 10-unused bits in the PTE specify the size of the 8MB mapping
> >>> {16K,24K,...8M-8K,8M} so you don't have to allocate an entire 8M
> >>> to this mapping. And so on as PTE maps get larger.
> >>>>
> >>>> Together this allows virtual address translate in a single memory access
> >>>> in address space sizes up to 8 GB!
> >>> <
> >>> I started here about 2006.


Click here to read the complete article
Re: Packing Hash Tables

<a4bb1e3e-d311-4dac-90d8-4008ad217105n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24276&group=comp.arch#24276

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:27e6:b0:440:cd94:d69d with SMTP id jt6-20020a05621427e600b00440cd94d69dmr3117088qvb.39.1647407198088;
Tue, 15 Mar 2022 22:06:38 -0700 (PDT)
X-Received: by 2002:a9d:638f:0:b0:5b2:344f:9e6e with SMTP id
w15-20020a9d638f000000b005b2344f9e6emr13693902otk.144.1647407197780; Tue, 15
Mar 2022 22:06:37 -0700 (PDT)
Path: i2pn2.org!rocksolid2!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 15 Mar 2022 22:06:37 -0700 (PDT)
In-Reply-To: <448e523a-bd08-414d-8489-f7453f2218edn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1de1:fb00:4d31:2cda:47a7:b02b;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1de1:fb00:4d31:2cda:47a7:b02b
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<t0df7v$afv$1@dont-email.me> <3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com>
<t0ennl$ald$1@dont-email.me> <c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com> <6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com> <FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com> <dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com> <d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com> <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad> <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<t0r6hu$p5m$1@dont-email.me> <3cd629e4-3623-4ad2-9e0f-ea9c992325e0n@googlegroups.com>
<t0rjd1$d6i$1@dont-email.me> <448e523a-bd08-414d-8489-f7453f2218edn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a4bb1e3e-d311-4dac-90d8-4008ad217105n@googlegroups.com>
Subject: Re: Packing Hash Tables
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Wed, 16 Mar 2022 05:06:38 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 357
 by: robf...@gmail.com - Wed, 16 Mar 2022 05:06 UTC

On Wednesday, March 16, 2022 at 12:33:12 AM UTC-4, robf...@gmail.com wrote:
> On Tuesday, March 15, 2022 at 10:52:20 PM UTC-4, BGB wrote:
> > On 3/15/2022 6:54 PM, MitchAlsup wrote:
> > > On Tuesday, March 15, 2022 at 6:13:06 PM UTC-5, BGB wrote:
> > >> On 3/15/2022 1:08 PM, MitchAlsup wrote:
> > >>> On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:
> > >>>> MitchAlsup wrote:
> > >>>>> On Monday, March 14, 2022 at 7:40:00 AM UTC-5, robf...@gmail.com wrote:
> > >>>>>> On Sunday, March 13, 2022 at 7:10:48 PM UTC-4, robf...@gmail.com wrote:
> > >>>>>
> > >>>>>>> By specifying int:16 as the type. The compiler supports size specs for ints of 8,16,32,64, and 128. Bytes
> > >>>>>>> are also accessible as ‘byte’. I think chars are int:16.. I think there is also a bit type to support bit arrays,
> > >>>>>>> but this is experimental.
> > >>>>>>> int:16 my_index;
> > >>>>>>>> PMA = Post Male Anxiety ?
> > >>>>>>> PMA = Physical Memory Attributes, stolen from RISCV docs.
> > >>>>>> Borrowed the level idea for my PT. I was just wondering what happens when a page at the same LVL as the
> > >>>>>> current is pointed to? Seems like it could form a linked list or a loop, or is a fault generated? At the moment
> > >>>>>> I have the code forcing the LVL down a level if it is the same or higher than the current LVL.
> > >>>>> <
> > >>>>> LVL is monotonically decreasing or a page fault is raised.
> > >>>> What about address translate skipping page table levels?
> > >>> <
> > >>> Skipping levels is, in my opinion, a necessity. I was part of the I/OMMU team
> > >>> at AMD where we put in level skipping as part of its design. I think CPU tables
> > >>> and I/OMMU tables should be one and the same, along with the "a pointer is a
> > >>> pointer" philosophy. I/O device should get virtual address of originating thread,
> > >>> I/O DMA should be translated thought page tables of originating thread, I/O
> > >>> completion interrupt should be delivered to Guest OS hosting originating thread.
> > > <
> > >> One drawback of level skipping though, is that one can either ignore
> > >> bits (effectively multi-mapping the skipped levels), or assumes them to
> > >> be zero (doesn't work for randomized addresses), neither of which are
> > >> ideal in a large+sparse ASLR scenario.
> > > <
> > > It is never wise to ignore bits (these always come back to haunt you later),
> > > and it is always wise to check the reserved-must-be-zero bits for actually
> > > being zero.
> > > <
> > > AMDs canonical address space makes you look for all 1s and all 0s instead
> > > of just all 0s. not much added complexity for a useful service.
> > I was not proposing ignoring these bits.
> > As I see it, treating these bits as "anything may match" would seriously
> > weaken the use of ASLR.
> > >>
> > >>
> > >> But, OTOH, 8-level page tables would suck in terms of overhead, lots of
> > >> pages with only a single entry in use.
> > >>
> > >> Though, a hacky option being that could add an alternate mechanism.
> > >> PTD Entries are one of several options:
> > >> A page representing another level of the page table;
> > >> A "Short Table Descriptor".
> > >>
> > >> Say, PTD Entry:
> > >> (63:48): Reserved/MBZ (PTD), Virtual Index (STD)
> > >> (47:12): Page Address
> > >> (11: 5): Low Address (STD)
> > >> ( 4: 2): STD Table Size
> > >> ( 1: 0): Entry Mode
> > >>
> > >> Mode:
> > >> 00: Unmapped
> > >> 01: Page Table
> > >> 10: Reserved
> > >> 11: STD
> > >>
> > >> Table Size:
> > >> 000: 4 Entries
> > >> 001: 8 Entries
> > >> 010: 16 Entries
> > >> 011: 32 Entries
> > >> 100: 64 Entries
> > >> 101: 128 Entries
> > >> 110: 256 Entries
> > >> 111: 512 Entries
> > >> (111 overflows to a full page)
> > >>
> > >> These tables allocated on a multiple of 32 bytes, with a single entry
> > >> table being padded to 4 entries.
> > >>
> > >> The STD Entry being basically the same format as a PTD entry, except
> > >> that the high-order bits represent the index location if this were in a
> > >> PTD Page (or FFFF for unused entries).
> > >>
> > >> Entries in an STD would be kept sorted in ascending order (allowing for
> > >> binary search).
> > >>
> > >> The short tables would not be allowed for the last level of a page-table
> > >> lookup, which would be required to use a full page.
> > >>
> > >>
> > >> This is less needed in 48-bit mode, because with 16K pages, a 3-level
> > >> table can be used. The waste due to sparse page tables is likely to be
> > >> significantly lower.
> > >>>>
> > >>>> A 64-bit address with 4kB pages gives a 6 level tree with the bit fields
> > >>>>
> > >>>> level: 6 5 4 3 2 1 0
> > >>>> size: 7-9-9-9-9-9-12
> > >>>>
> > >>>> A two level page table would cover the lower 30 address bits
> > >>>> which would be enough address space for almost all applications.
> > >>>>
> > >>>> An 8kB page gives a 5 level tree 1-10-10-10-10-13 and
> > >>>> a two level page table covers the lower 33 address bits.
> > >>> <
> > >>> This is what I am using: 8K pages 63-bit VA space have not decided
> > >>> what to do with the top-most bit--yet. I used it a while back to access
> > >>> foreign address spaces, this has fallen out of favor when I found
> > >>> indirect accessing where a service provider can access customer
> > >>> memory via indirection (that is, use customer root pointer and ASID
> > >>> when making accesses).
> > >>> <
> > I guess I can note, 3 levels with 16K pages:
> > Each page holds 2048 entries, or 11 bits.
> > 3*11+14 = 47
> > The (High) 8000..FFFF range doesn't use the main page table.
> >
> > Meanwhile:
> > 7*11+14=91 (Doesn't cover a 96 bit space)
> > 8*11+14=102 (Covers a 96 bit space)
> >
> >
> > Now, say, 256 1M allocs spread over an address space, each falling in
> > its own randomized part of the space, with a naive page-table scheme.
> >
> > 3-level:
> > Uses ~ 8MB of page tables.
> >
> > 8-level:
> > Uses ~ 29MB of page tables.
> >
> >
> > It would go up linearly with the number of allocations, so 2048
> > allocations would eat 232MB of page tables, but should then start
> > leveling off (very slowly) as the number of 2-entry tables starts to
> > increase, ...
> >
> >
> >
> > The 'STD' scheme could reduce this:
> > 3-level: ~ 4MB
> > 8-level: Also ~ 4MB
> >
> > Most of the savings would be by reducing 1-entry tables pages from 16K
> > to 32B. Though, would still expand linearly with the number of
> > allocations (due mostly to the bottom level page tables).
> >
> >
> >
> > Managing the page tables as a single giant B-Tree could be somewhat more
> > space efficient. A B-Tree could reduce 256x 1MB allocs to around 320K
> > regardless of how they are organized in the address space.
> >
> >
> > This would grow linearly based on the total number of pages allocated,
> > in the area of around 20 bytes per page, assuming 128-bit PTEs.
> >
> > This would roughly double though for 96-bit addressing though (requiring
> > a 256-bit PTE), but at 640K, would still be the most space-efficient option.
> >
> > The depth of the tree would also be linearly dependent on the total
> > number of pages allocated.
> >
> > Likely cost is that the computational cost for a B-Tree lookup would be
> > considerably higher than for a more conventional page-table structure.
> > Computational cost (and tree depth) would be a logarithmic scale
> > relative to the total number of pages allocated.
> >
> >
> > A chain-hash structure could be both space efficient and simpler to work
> > with. Computational cost would scale linearly with the total number of
> > pages allocated.
> > >>>> However I want multiple MMU root registers so it can switch tables
> > >>>> quickly so I would change the address bit fields to 4-7-10-10-10-13
> > >>>> and the top 4 bits select a HW MMU root[N] register.
> > >>> <
> > >>> Why not just use the top table<62..53> as 1024 Root pointers ?
> > >>> Other than having a <ahem> root pointer point at this page, there
> > >>> is no useful difference between a Root pointer and a PTP; they
> > >>> both point at a page of self describing descriptors {PTPs or PTEs}
> > >>> <
> > >>> Each of these top level Root pointers has an LVL field which
> > >>> describes which layers of the address space are skipped; or
> > >>> equivalently the size of that virtual address space.
> > >>> <
> > >>>> Again the lower two page table levels are usually sufficient.
> > >>> <
> > >>> your 7-bit table only uses 1/8 of the allocated page.
> > >>> So, I don't see an advantage of 16 root pointers has over 1024 root
> > >>> pointers.
> > >>>>
> > >>>> I want to be able to skip from a MMU root register to page table level 2.
> > >>> <
> > >>> Root.LVL = 011
> > >>> <
> > >>> You can also "run" 'cat' with a 23-bit VA space Root.LVL = 010.
> > >>> Root->page_or_ptes[va<22..12>] gives you the PTE.LVL = 001, without any
> > >>> PTP levels in the table.
> > >>> <
> > >>>> Also because page table pages can be reused at multiple levels
> > >>>> the skip count should be relative.
> > >>> <
> > >>> This adds arithmetic (albeit small adders) in the MMU function. Why can
> > >>> this not be determined when the tables are setup?
> > >>> <
> > >>>> Also I think there could be multiple skips in a walk, say a skip
> > >>>> from root to 4, and 4 to 2. At least I see no reason to disallow it.
> > >>> <
> > >>> Nor did I.
> > >>> <
> > >>>>
> > >>>> Each PTE has a field giving the relative number of levels to skip
> > >>>> in a table walk. With 8kB pages and a 5 level table, a 2-bit PTE
> > >>>> field allows it to skip directly from my HW root[N] register to level-2.
> > >>> <
> > >>> I use a 3-bit absolute next level encoding without arithmetic.
> > >>> Root.LVL determines the maximal size of the virtual address space.
> > >>> PTP.LVL determines the next level in the table
> > >>> previous PTP.LVL -> PTE.LVL = 001 determines the size of the page
> > >>> {001->PTE, 010->8K, 011->8M, 100->8G, 101->8T, 110->8E}
> > >>>>
> > >>>> Another issue is what the fill values should be that were skipped.
> > >>>> MMU can assume the the address bits skipped were all 0's,
> > >>>> or the PTE could have a bit to prescribe the skipped value as 0 or 1.
> > >>> <
> > >>> I am following the rules of canonicality AMD uses when skipping levels
> > >>> in the I/O MMU.
> > >>>>
> > >>>> There are also the page size selection field to consider.
> > >>>> With 8kB as the base page size, that give multiples that are
> > >>>> 13 = 8kB, 23 = 8 MB, 33 = 8 GB, 43 = 8 TB.
> > >>>> Four sizes should be sufficient, which requires 1 PTE bit at each level.
> > >>> <
> > >>> It is not a bit, it is part of an encoding. In any event, you end up with
> > >>> 5 (or 6) states, and this always requires 3-bits to encode. 2-bits for
> > >>> LVL and 1 bit for PTE is not usefully different than 3-bits for LVL
> > >>> with one encoding for PTE. The 3-bit encoding can also provide for
> > >>> PA=VA using another encoding.
> > >>> <
> > >>>> That allows the HW root to skip directly to level-2 where the
> > >>>> PTE specifies a large page size of 8 MB.
> > >>> <
> > >>> Root.LVL = 010
> > >>> page_of_PTEs[va<32..23>] gives you the 8MB PTE.LVL = 001; the
> > >>> 10-unused bits in the PTE specify the size of the 8MB mapping
> > >>> {16K,24K,...8M-8K,8M} so you don't have to allocate an entire 8M
> > >>> to this mapping. And so on as PTE maps get larger.
> > >>>>
> > >>>> Together this allows virtual address translate in a single memory access
> > >>>> in address space sizes up to 8 GB!
> > >>> <
> > >>> I started here about 2006.
> I am preferring hash tables over hierarchical tables. They are easier to deal with
> in some ways. They do not need to process in power-of-two addressing so the
> size of entries can be adjusted to suit.
>
> I am wondering though how often pages of memory are shared?
>
> Using a simple hash table where the table index corresponds to the physical
> address four virtual address entries could be supported at the same cost as
> using a complex hash table that stores both the physical and virtual addresses
> and doubles the size of the table to allow for some sharing. But with the simple
> table sharing would be limited to four shares per page. To get more shares of a
> page pages would need to be replicated at a different physical address. But it
> seems like it could work.
>
> Big users of shared memory are probably the OS and other libraries and frameworks.
>
> Making copies might be one way to help with system integrity. If something
> goes amiss with one copy of the OS, other copies are not affected.


Click here to read the complete article
Re: Packing Hash Tables

<t0s13k$br0$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24279&group=comp.arch#24279

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd4-d5e7-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Packing Hash Tables
Date: Wed, 16 Mar 2022 06:46:12 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t0s13k$br0$1@newsreader4.netcologne.de>
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<t0ennl$ald$1@dont-email.me>
<c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com>
<6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com>
<FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com>
<dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com>
<d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com>
<db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad>
<f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<t0r6hu$p5m$1@dont-email.me>
<3cd629e4-3623-4ad2-9e0f-ea9c992325e0n@googlegroups.com>
<t0rjd1$d6i$1@dont-email.me>
<448e523a-bd08-414d-8489-f7453f2218edn@googlegroups.com>
Injection-Date: Wed, 16 Mar 2022 06:46:12 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd4-d5e7-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd4:d5e7:0:7285:c2ff:fe6c:992d";
logging-data="12128"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Wed, 16 Mar 2022 06:46 UTC

robf...@gmail.com <robfi680@gmail.com> schrieb:

> I am wondering though how often pages of memory are shared?

Shared libraries are a prime example. On my Linux system, there
are currently 372 process (bletch), and I stronlgy suspect every
one links against libc, which has a size of around 2 Megabytes.

Another is multithreaded applications. Anything using pthreads
uses shared memory between its threads (realized in Linux by
setting the right flags to clone(2) or clone3(2)).

Pretty ubiquitous, I would say.

Re: Packing Hash Tables

<82e9f378-f1e5-4482-ab6d-aa532bcac4cfn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24280&group=comp.arch#24280

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:d87:b0:67b:311c:ecbd with SMTP id q7-20020a05620a0d8700b0067b311cecbdmr20479889qkl.146.1647415927897;
Wed, 16 Mar 2022 00:32:07 -0700 (PDT)
X-Received: by 2002:a05:6808:1451:b0:2ec:cfe4:21e with SMTP id
x17-20020a056808145100b002eccfe4021emr3513168oiv.147.1647415927620; Wed, 16
Mar 2022 00:32:07 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 16 Mar 2022 00:32:07 -0700 (PDT)
In-Reply-To: <t0s13k$br0$1@newsreader4.netcologne.de>
Injection-Info: google-groups.googlegroups.com; posting-host=2607:fea8:1de1:fb00:4d31:2cda:47a7:b02b;
posting-account=QId4bgoAAABV4s50talpu-qMcPp519Eb
NNTP-Posting-Host: 2607:fea8:1de1:fb00:4d31:2cda:47a7:b02b
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<t0ennl$ald$1@dont-email.me> <c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com> <6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com> <FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com> <dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com> <d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com> <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad> <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<t0r6hu$p5m$1@dont-email.me> <3cd629e4-3623-4ad2-9e0f-ea9c992325e0n@googlegroups.com>
<t0rjd1$d6i$1@dont-email.me> <448e523a-bd08-414d-8489-f7453f2218edn@googlegroups.com>
<t0s13k$br0$1@newsreader4.netcologne.de>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <82e9f378-f1e5-4482-ab6d-aa532bcac4cfn@googlegroups.com>
Subject: Re: Packing Hash Tables
From: robfi...@gmail.com (robf...@gmail.com)
Injection-Date: Wed, 16 Mar 2022 07:32:07 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 29
 by: robf...@gmail.com - Wed, 16 Mar 2022 07:32 UTC

On Wednesday, March 16, 2022 at 2:46:15 AM UTC-4, Thomas Koenig wrote:
> robf...@gmail.com <robf...@gmail.com> schrieb:
> > I am wondering though how often pages of memory are shared?
> Shared libraries are a prime example. On my Linux system, there
> are currently 372 process (bletch), and I stronlgy suspect every
> one links against libc, which has a size of around 2 Megabytes.
>
> Another is multithreaded applications. Anything using pthreads
> uses shared memory between its threads (realized in Linux by
> setting the right flags to clone(2) or clone3(2)).
>
> Pretty ubiquitous, I would say.

I was looking at the Linux docs and found a spot where it mentions a process
that goes around looking for pages that can be shared between process. Apparently
there was a limit that could be set with a minimum of two, but no maximum
mentioned. I would think for integrity reasons there would be a limit. What if libc
gets corrupted? Suddenly 372 process do not work?

Came up with a novel idea I think. Added six bit base and limit fields to the PTE.
They work by dividing up the page into 64 zones and allowing a PTE to use only
part of a page. So, multiple virtual addresses may share the same physical page
without being able to access the data outside of the zone defined by the base and
limit values. The base field is added to virtual address bits 10 to 15 and when added
causes the address to appear to rotate all the data in the zone into view. The limit
field determines how many consecutive zones are visible. For instance, with a 64kB
page size the page is divided into 1kB zones. Selecting a base of 3 will rotate the
virtual address so that zone 3 appears at the zero position. Selecting a limit of 3 will
make 4 1kB blocks visible. Selecting a base of zero and a limit of 63 will make the
entire page available.

Re: Packing Hash Tables

<cqmYJ.150006$LN2.92082@fx13.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24281&group=comp.arch#24281

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx13.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Packing Hash Tables
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com> <t0aokj$nnc$1@dont-email.me> <056ffd21-d834-405b-976e-46c09755f949n@googlegroups.com> <t0df7v$afv$1@dont-email.me> <3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com> <t0ennl$ald$1@dont-email.me> <c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com> <5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com> <6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com> <3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com> <FAqXJ.156801$Gojc.137903@fx99.iad> <691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com> <dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com> <d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com> <d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com> <de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com> <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com> <IY3YJ.108064$3jp8.15219@fx33.iad> <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
In-Reply-To: <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 244
Message-ID: <cqmYJ.150006$LN2.92082@fx13.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 16 Mar 2022 14:24:08 UTC
Date: Wed, 16 Mar 2022 10:23:32 -0400
X-Received-Bytes: 12487
 by: EricP - Wed, 16 Mar 2022 14:23 UTC

MitchAlsup wrote:
> On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:
>> MitchAlsup wrote:
>>> On Monday, March 14, 2022 at 7:40:00 AM UTC-5, robf...@gmail.com wrote:
>>>> On Sunday, March 13, 2022 at 7:10:48 PM UTC-4, robf...@gmail.com wrote:
>>>>> By specifying int:16 as the type. The compiler supports size specs for ints of 8,16,32,64, and 128. Bytes
>>>>> are also accessible as ‘byte’. I think chars are int:16.. I think there is also a bit type to support bit arrays,
>>>>> but this is experimental.
>>>>> int:16 my_index;
>>>>>> PMA = Post Male Anxiety ?
>>>>> PMA = Physical Memory Attributes, stolen from RISCV docs.
>>>> Borrowed the level idea for my PT. I was just wondering what happens when a page at the same LVL as the
>>>> current is pointed to? Seems like it could form a linked list or a loop, or is a fault generated? At the moment
>>>> I have the code forcing the LVL down a level if it is the same or higher than the current LVL.
>>> <
>>> LVL is monotonically decreasing or a page fault is raised.
>> What about address translate skipping page table levels?
> <
> Skipping levels is, in my opinion, a necessity. I was part of the I/OMMU team
> at AMD where we put in level skipping as part of its design. I think CPU tables
> and I/OMMU tables should be one and the same, along with the "a pointer is a
> pointer" philosophy. I/O device should get virtual address of originating thread,
> I/O DMA should be translated thought page tables of originating thread, I/O
> completion interrupt should be delivered to Guest OS hosting originating thread.
>> A 64-bit address with 4kB pages gives a 6 level tree with the bit fields
>>
>> level: 6 5 4 3 2 1 0
>> size: 7-9-9-9-9-9-12
>>
>> A two level page table would cover the lower 30 address bits
>> which would be enough address space for almost all applications.
>>
>> An 8kB page gives a 5 level tree 1-10-10-10-10-13 and
>> a two level page table covers the lower 33 address bits.
> <
> This is what I am using: 8K pages 63-bit VA space have not decided
> what to do with the top-most bit--yet. I used it a while back to access
> foreign address spaces, this has fallen out of favor when I found
> indirect accessing where a service provider can access customer
> memory via indirection (that is, use customer root pointer and ASID
> when making accesses).

Originally I had the 1 high address bit selecting between two
root HW registers. The intent was that the process address space
page table was pointed to by root[0] and kernel by root[1].
Switching processes required writing just root[0].

However I later I wanted Block Address Translate (BAT) in addition
to page table translates, and I needed multiples of them.
16 entries seems sufficient for now.
Each MMU register entry can be a page table root, or a BAT root.

> <
>> However I want multiple MMU root registers so it can switch tables
>> quickly so I would change the address bit fields to 4-7-10-10-10-13
>> and the top 4 bits select a HW MMU root[N] register.
> <
> Why not just use the top table<62..53> as 1024 Root pointers ?
> Other than having a <ahem> root pointer point at this page, there
> is no useful difference between a Root pointer and a PTP; they
> both point at a page of self describing descriptors {PTPs or PTEs}

A bunch of reasons.
The root is functional equivalent to x86 CR3 register,
and as such it can have any format or size.
It is not constrained to fit nicely into a 64-bit slot.

I want to be able to switch the processes address space with just
a few HW register writes. Munging a bunch of stuff together in a table
means that large chunks of the table have to be edited to switch.

I want BAT entries to be self contained in HW registers that don't
have to access memory to load a descriptor.

> <
> Each of these top level Root pointers has an LVL field which
> describes which layers of the address space are skipped; or
> equivalently the size of that virtual address space.

Yes. The HW root pointer register can have any format - it does not
have to match a PTE layout - so the skip count field can be any size.

If I also want to support PTE's that can specify a skip count then
a 2-bit field should be sufficient. But PTEs are already pretty full.

> <
>> Again the lower two page table levels are usually sufficient.
> <
> your 7-bit table only uses 1/8 of the allocated page.

The top level table of the page table doesn't have to be 8 kB.

Here it is a 1 kB table aligned on a 1 kB boundary.
The root register physical address has sufficient bits to locate
that table on any 1 kB boundary, such as inside a process header.

> So, I don't see an advantage of 16 root pointers has over 1024 root
> pointers.

Because if it is stored in memory then the format needs to be defined.
If the root is a HW register it can have any format to suite its function.
A PTE is 64 bits but a BAT descriptor is 3*64=192 bits.

1024 entries would have to be loaded from memory dynamically.
I didn't want that as it implies a hardware predefined memory format.
I want the root registers written by code so the OS can decide where
and how to store this information.

I want the number to be small so process address space switch is fast,
usually one or two register writes.

>> I want to be able to skip from a MMU root register to page table level 2.
> <
> Root.LVL = 011

Yes, a skip count of 3 skips over levels 5,4,3 to 2.
Level-2 is sufficient as root doesn't need to point directly at
a 8 kB level-1 PTP mapping 8 MB or a single 8 MB data page.
8 MB is really too small to be useful.

> <
> You can also "run" 'cat' with a 23-bit VA space Root.LVL = 010.
> Root->page_or_ptes[va<22..12>] gives you the PTE.LVL = 001, without any
> PTP levels in the table.

That would be a skip count of 4 to skip to level-1.
The root would have to have a "large page" bit to indicate either
a 8 kB PTP pointing to up to 8 MB of small pages, or 1 large 8 MB page.

An 8 MB virtual area is too small to be useful.
I didn't think that was worth the extra trouble.

> <
>> Also because page table pages can be reused at multiple levels
>> the skip count should be relative.
> <
> This adds arithmetic (albeit small adders) in the MMU function. Why can
> this not be determined when the tables are setup?

It is only used by the HW tablewalker and SW page fault handler,
and it really only decides what the next state of a state machine is.
That looks more MUX than adder.

> <
>> Also I think there could be multiple skips in a walk, say a skip
>> from root to 4, and 4 to 2. At least I see no reason to disallow it.
> <
> Nor did I.

I'm kind of on the fence about this.

MMU has to support skip from root to a lower level, but the root bits
are stored the a HW register independent of the PTE format.

These interior skip counts however require 2 PTE bits
and the PTE is already pretty full.

And I'm not convinced that an OS would make use of these
interior skip counts if they were present.

> <
>> Each PTE has a field giving the relative number of levels to skip
>> in a table walk. With 8kB pages and a 5 level table, a 2-bit PTE
>> field allows it to skip directly from my HW root[N] register to level-2.
> <
> I use a 3-bit absolute next level encoding without arithmetic.
> Root.LVL determines the maximal size of the virtual address space.
> PTP.LVL determines the next level in the table
> previous PTP.LVL -> PTE.LVL = 001 determines the size of the page
> {001->PTE, 010->8K, 011->8M, 100->8G, 101->8T, 110->8E}

I considered absolute too but what made me decide for relative skip count
was the potential for page table pages to be reused at multiple levels
by the table management routines, as I describe in a different msg.

Absolute LVL breaks with that, relative doesn't break.
And this only matters to the tablewalker state machine.

Anyway I'm not convinced an OS could use multiple skips if they were present.

>> Another issue is what the fill values should be that were skipped.
>> MMU can assume the the address bits skipped were all 0's,
>> or the PTE could have a bit to prescribe the skipped value as 0 or 1.
> <
> I am following the rules of canonicality AMD uses when skipping levels
> in the I/O MMU.

I didn't know AMD IOMMU did skips. I'll look at it.

>> There are also the page size selection field to consider.
>> With 8kB as the base page size, that give multiples that are
>> 13 = 8kB, 23 = 8 MB, 33 = 8 GB, 43 = 8 TB.
>> Four sizes should be sufficient, which requires 1 PTE bit at each level.
> <
> It is not a bit, it is part of an encoding. In any event, you end up with
> 5 (or 6) states, and this always requires 3-bits to encode. 2-bits for
> LVL and 1 bit for PTE is not usefully different than 3-bits for LVL
> with one encoding for PTE. The 3-bit encoding can also provide for
> PA=VA using another encoding.


Click here to read the complete article
Re: Packing Hash Tables

<6d38f64c-ff74-45f9-9ee5-3d3619518d15n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24283&group=comp.arch#24283

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:214:b0:2e1:a8cf:959f with SMTP id b20-20020a05622a021400b002e1a8cf959fmr565913qtx.300.1647446227866;
Wed, 16 Mar 2022 08:57:07 -0700 (PDT)
X-Received: by 2002:a05:6870:1692:b0:dd:9dc0:1747 with SMTP id
j18-20020a056870169200b000dd9dc01747mr154810oae.205.1647446227584; Wed, 16
Mar 2022 08:57:07 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!nntp.club.cc.cmu.edu!45.76.7.193.MISMATCH!3.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 16 Mar 2022 08:57:07 -0700 (PDT)
In-Reply-To: <ddba6add-19ab-43b6-b98f-58110a886f8cn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:847d:d7bf:6931:a6b7;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:847d:d7bf:6931:a6b7
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<t0aokj$nnc$1@dont-email.me> <056ffd21-d834-405b-976e-46c09755f949n@googlegroups.com>
<t0df7v$afv$1@dont-email.me> <3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com>
<t0ennl$ald$1@dont-email.me> <c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com> <6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com> <FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com> <dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com> <d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com> <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad> <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<t0r6hu$p5m$1@dont-email.me> <3TbYJ.121927$GjY3.90377@fx01.iad> <ddba6add-19ab-43b6-b98f-58110a886f8cn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6d38f64c-ff74-45f9-9ee5-3d3619518d15n@googlegroups.com>
Subject: Re: Packing Hash Tables
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 16 Mar 2022 15:57:07 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 66
 by: MitchAlsup - Wed, 16 Mar 2022 15:57 UTC

On Tuesday, March 15, 2022 at 9:46:07 PM UTC-5, robf...@gmail.com wrote:
> On Tuesday, March 15, 2022 at 10:24:03 PM UTC-4, EricP wrote:
> > BGB wrote:
> > > On 3/15/2022 1:08 PM, MitchAlsup wrote:
> > >> On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:
> > >>> What about address translate skipping page table levels?
> > >> <
> > >> Skipping levels is, in my opinion, a necessity. I was part of the
> > >> I/OMMU team
> > >> at AMD where we put in level skipping as part of its design. I think
> > >> CPU tables
> > >> and I/OMMU tables should be one and the same, along with the "a
> > >> pointer is a
> > >> pointer" philosophy. I/O device should get virtual address of
> > >> originating thread,
> > >> I/O DMA should be translated thought page tables of originating
> > >> thread, I/O
> > >> completion interrupt should be delivered to Guest OS hosting
> > >> originating thread.
> > >
> > > One drawback of level skipping though, is that one can either ignore
> > > bits (effectively multi-mapping the skipped levels), or assumes them to
> > > be zero (doesn't work for randomized addresses), neither of which are
> > > ideal in a large+sparse ASLR scenario.
> > As I mention below, it can assume the missing values are 0's
> > or have a PTE bit to specify the value.
> >
> > A PTE bit allows one to differentiate between whether the
> > page is part of a grow-up or grow-down memory section.
> > E.g. heap areas expand upward, stack expand downwards.
> > >>> Another issue is what the fill values should be that were skipped.
> > >>> MMU can assume the the address bits skipped were all 0's,
> > >>> or the PTE could have a bit to prescribe the skipped value as 0 or 1.
> > >> <
> > >> I am following the rules of canonicality AMD uses when skipping levels
> > >> in the I/O MMU.
> > More feature bits also puts more presure on the PTE which is already full.
>
> >A PTE bit allows one to differentiate between whether the
> >page is part of a grow-up or grow-down memory section.
> >E.g. heap areas expand upward, stack expand downwards.
<
> I wonder if the address could be sign-extended, or multi-bit sign-extended?
> One may want to have ROM / IO mapped to virtual addresses that are the
> same as physical ones, high in memory.
<
I put ROM in its own 64-bit physical address space.
I put MMI/O in its own 64-bit physical address space.
I put configuration in its own 64-bit physical address space.
Leaving all 64-bits for real memory (DRAM).
<
> >So, what they need is a quick way of getting an address to the failing PTP/PTE.
> I am reminded of a trace cache. Something similar could be used.
>
> It would be easier if a link back to the parent were in a PTP. Since the failing PTE
> / PTP is known, back links could be followed. I wonder if an empty entry in the
> PTP could be used as a back-link. Is it faster to scan a page for a back-link or use
> other means? If there were extra room in the PTP a link could be added in the
> header.
<
Linux uses struct_page (about 2 cache lines per entry) to keep track of "all the
states a page can get into": so struct_page[PPN] gets you to everything you need
to know about a page in 1 layer of indirection.
<
Since HW makes no claims to the bit patterns when p = 0 (present), PPN of this
entry should be pointing at what this non-present PTP would have been pointing
at were it present.

Re: Packing Hash Tables

<t0t1e3$1ic$1@newsreader4.netcologne.de>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24284&group=comp.arch#24284

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!news.uzoreto.com!newsreader4.netcologne.de!news.netcologne.de!.POSTED.2001-4dd4-d5e7-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de!not-for-mail
From: tkoe...@netcologne.de (Thomas Koenig)
Newsgroups: comp.arch
Subject: Re: Packing Hash Tables
Date: Wed, 16 Mar 2022 15:57:55 -0000 (UTC)
Organization: news.netcologne.de
Distribution: world
Message-ID: <t0t1e3$1ic$1@newsreader4.netcologne.de>
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com>
<6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com>
<FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com>
<dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com>
<d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com>
<db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad>
<f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<t0r6hu$p5m$1@dont-email.me>
<3cd629e4-3623-4ad2-9e0f-ea9c992325e0n@googlegroups.com>
<t0rjd1$d6i$1@dont-email.me>
<448e523a-bd08-414d-8489-f7453f2218edn@googlegroups.com>
<t0s13k$br0$1@newsreader4.netcologne.de>
<82e9f378-f1e5-4482-ab6d-aa532bcac4cfn@googlegroups.com>
Injection-Date: Wed, 16 Mar 2022 15:57:55 -0000 (UTC)
Injection-Info: newsreader4.netcologne.de; posting-host="2001-4dd4-d5e7-0-7285-c2ff-fe6c-992d.ipv6dyn.netcologne.de:2001:4dd4:d5e7:0:7285:c2ff:fe6c:992d";
logging-data="1612"; mail-complaints-to="abuse@netcologne.de"
User-Agent: slrn/1.0.3 (Linux)
 by: Thomas Koenig - Wed, 16 Mar 2022 15:57 UTC

robf...@gmail.com <robfi680@gmail.com> schrieb:
> On Wednesday, March 16, 2022 at 2:46:15 AM UTC-4, Thomas Koenig wrote:
>> robf...@gmail.com <robf...@gmail.com> schrieb:
>> > I am wondering though how often pages of memory are shared?
>> Shared libraries are a prime example. On my Linux system, there
>> are currently 372 process (bletch), and I stronlgy suspect every
>> one links against libc, which has a size of around 2 Megabytes.
>>
>> Another is multithreaded applications. Anything using pthreads
>> uses shared memory between its threads (realized in Linux by
>> setting the right flags to clone(2) or clone3(2)).
>>
>> Pretty ubiquitous, I would say.

.... plus I forgot multiple invocations of the same binary,
the static code shold also be shared.

> I was looking at the Linux docs and found a spot where it mentions a process
> that goes around looking for pages that can be shared between process.

This is mostly for virtual machines which share lots of pages.
It is also necessary to enable this in the kernel, the
memory has to be obtained via private anonymous mmap(2) using
MAP_PRIVATE|MAP_ANONYMOUS and the process in question has to use
madvise(2) with MADV_MERGEABLE .

Can be useful, but only in rather specialized circumstances.

Re: Packing Hash Tables

<50bb99b1-aa80-4c9b-b417-42845ad678ffn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24285&group=comp.arch#24285

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5de9:0:b0:435:4fdb:5c46 with SMTP id jn9-20020ad45de9000000b004354fdb5c46mr26427844qvb.125.1647446670097;
Wed, 16 Mar 2022 09:04:30 -0700 (PDT)
X-Received: by 2002:a05:6808:11cc:b0:2d9:a01a:4ba6 with SMTP id
p12-20020a05680811cc00b002d9a01a4ba6mr149847oiv.205.1647446669851; Wed, 16
Mar 2022 09:04:29 -0700 (PDT)
Path: i2pn2.org!i2pn.org!news.swapon.de!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 16 Mar 2022 09:04:29 -0700 (PDT)
In-Reply-To: <448e523a-bd08-414d-8489-f7453f2218edn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:847d:d7bf:6931:a6b7;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:847d:d7bf:6931:a6b7
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<t0df7v$afv$1@dont-email.me> <3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com>
<t0ennl$ald$1@dont-email.me> <c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com> <6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com> <FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com> <dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com> <d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com> <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad> <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<t0r6hu$p5m$1@dont-email.me> <3cd629e4-3623-4ad2-9e0f-ea9c992325e0n@googlegroups.com>
<t0rjd1$d6i$1@dont-email.me> <448e523a-bd08-414d-8489-f7453f2218edn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <50bb99b1-aa80-4c9b-b417-42845ad678ffn@googlegroups.com>
Subject: Re: Packing Hash Tables
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 16 Mar 2022 16:04:30 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Wed, 16 Mar 2022 16:04 UTC

On Tuesday, March 15, 2022 at 11:33:12 PM UTC-5, robf...@gmail.com wrote:
> On Tuesday, March 15, 2022 at 10:52:20 PM UTC-4, BGB wrote:
> > On 3/15/2022 6:54 PM, MitchAlsup wrote:

> > >>> I started here about 2006.
> I am preferring hash tables over hierarchical tables. They are easier to deal with
> in some ways. They do not need to process in power-of-two addressing so the
> size of entries can be adjusted to suit.
<
I went through a phase like that circa 1993.
>
> I am wondering though how often pages of memory are shared?
<
You should consider::
all code is shared
all constant data is shared
Many files are shared (mmaped)
In Linux everything starts out as shared and loses this notion by Copy-on-write.
>
> Using a simple hash table where the table index corresponds to the physical
> address four virtual address entries could be supported at the same cost as
> using a complex hash table that stores both the physical and virtual addresses
> and doubles the size of the table to allow for some sharing. But with the simple
> table sharing would be limited to four shares per page. To get more shares of a
> page pages would need to be replicated at a different physical address. But it
> seems like it could work.
>
> Big users of shared memory are probably the OS and other libraries and frameworks.
>
> Making copies might be one way to help with system integrity. If something
> goes amiss with one copy of the OS, other copies are not affected.
<
HyperVisors allow one Guest OS to crash and have running application migrate to
another Guest OS which has not crashed.

Re: Packing Hash Tables

<2ba33ba9-c381-4250-b0be-99488c950db0n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24286&group=comp.arch#24286

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ad4:5def:0:b0:435:ac38:a77e with SMTP id jn15-20020ad45def000000b00435ac38a77emr378830qvb.24.1647447006722;
Wed, 16 Mar 2022 09:10:06 -0700 (PDT)
X-Received: by 2002:a05:6870:45a4:b0:dd:b08e:fa49 with SMTP id
y36-20020a05687045a400b000ddb08efa49mr204537oao.270.1647447006185; Wed, 16
Mar 2022 09:10:06 -0700 (PDT)
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 16 Mar 2022 09:10:05 -0700 (PDT)
In-Reply-To: <82e9f378-f1e5-4482-ab6d-aa532bcac4cfn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:847d:d7bf:6931:a6b7;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:847d:d7bf:6931:a6b7
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<t0ennl$ald$1@dont-email.me> <c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com> <6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com> <FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com> <dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com> <d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com> <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad> <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<t0r6hu$p5m$1@dont-email.me> <3cd629e4-3623-4ad2-9e0f-ea9c992325e0n@googlegroups.com>
<t0rjd1$d6i$1@dont-email.me> <448e523a-bd08-414d-8489-f7453f2218edn@googlegroups.com>
<t0s13k$br0$1@newsreader4.netcologne.de> <82e9f378-f1e5-4482-ab6d-aa532bcac4cfn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <2ba33ba9-c381-4250-b0be-99488c950db0n@googlegroups.com>
Subject: Re: Packing Hash Tables
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 16 Mar 2022 16:10:06 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 33
 by: MitchAlsup - Wed, 16 Mar 2022 16:10 UTC

On Wednesday, March 16, 2022 at 2:32:09 AM UTC-5, robf...@gmail.com wrote:
> On Wednesday, March 16, 2022 at 2:46:15 AM UTC-4, Thomas Koenig wrote:
> > robf...@gmail.com <robf...@gmail.com> schrieb:
> > > I am wondering though how often pages of memory are shared?
> > Shared libraries are a prime example. On my Linux system, there
> > are currently 372 process (bletch), and I stronlgy suspect every
> > one links against libc, which has a size of around 2 Megabytes.
> >
> > Another is multithreaded applications. Anything using pthreads
> > uses shared memory between its threads (realized in Linux by
> > setting the right flags to clone(2) or clone3(2)).
> >
> > Pretty ubiquitous, I would say.
> I was looking at the Linux docs and found a spot where it mentions a process
> that goes around looking for pages that can be shared between process. Apparently
> there was a limit that could be set with a minimum of two, but no maximum
> mentioned. I would think for integrity reasons there would be a limit. What if libc
> gets corrupted? Suddenly 372 process do not work?
>
> Came up with a novel idea I think. Added six bit base and limit fields to the PTE.
<
Works great if you have 72-bits in a word. But not so great if you have a 64-bit
PTE mapping from a 64-bit VA to a 64-bit PA.
<
> They work by dividing up the page into 64 zones and allowing a PTE to use only
> part of a page. So, multiple virtual addresses may share the same physical page
> without being able to access the data outside of the zone defined by the base and
> limit values. The base field is added to virtual address bits 10 to 15 and when added
> causes the address to appear to rotate all the data in the zone into view. The limit
> field determines how many consecutive zones are visible. For instance, with a 64kB
> page size the page is divided into 1kB zones. Selecting a base of 3 will rotate the
> virtual address so that zone 3 appears at the zero position. Selecting a limit of 3 will
> make 4 1kB blocks visible. Selecting a base of zero and a limit of 63 will make the
> entire page available.

Re: Packing Hash Tables

<1cd1abea-7522-488e-9fc4-efa978cc7a12n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24287&group=comp.arch#24287

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1713:b0:67b:3b91:e91b with SMTP id az19-20020a05620a171300b0067b3b91e91bmr477260qkb.534.1647448467667;
Wed, 16 Mar 2022 09:34:27 -0700 (PDT)
X-Received: by 2002:a05:6870:d249:b0:dd:ada6:736b with SMTP id
h9-20020a056870d24900b000ddada6736bmr240623oac.27.1647448467044; Wed, 16 Mar
2022 09:34:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!2.eu.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 16 Mar 2022 09:34:26 -0700 (PDT)
In-Reply-To: <cqmYJ.150006$LN2.92082@fx13.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:847d:d7bf:6931:a6b7;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:847d:d7bf:6931:a6b7
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<t0aokj$nnc$1@dont-email.me> <056ffd21-d834-405b-976e-46c09755f949n@googlegroups.com>
<t0df7v$afv$1@dont-email.me> <3ca0fe64-f159-4a25-b8a3-cfe5ba7d0093n@googlegroups.com>
<t0ennl$ald$1@dont-email.me> <c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com> <6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com> <FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com> <dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com> <d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com> <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad> <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<cqmYJ.150006$LN2.92082@fx13.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <1cd1abea-7522-488e-9fc4-efa978cc7a12n@googlegroups.com>
Subject: Re: Packing Hash Tables
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 16 Mar 2022 16:34:27 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 216
 by: MitchAlsup - Wed, 16 Mar 2022 16:34 UTC

On Wednesday, March 16, 2022 at 9:24:13 AM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:

> >> However I want multiple MMU root registers so it can switch tables
> >> quickly so I would change the address bit fields to 4-7-10-10-10-13
> >> and the top 4 bits select a HW MMU root[N] register.
> > <
> > Why not just use the top table<62..53> as 1024 Root pointers ?
> > Other than having a <ahem> root pointer point at this page, there
> > is no useful difference between a Root pointer and a PTP; they
> > both point at a page of self describing descriptors {PTPs or PTEs}
> A bunch of reasons.
> The root is functional equivalent to x86 CR3 register,
> and as such it can have any format or size.
> It is not constrained to fit nicely into a 64-bit slot.
<
My machine does not have any control registers, nor does a processor
"perform" a context switch.
>
> I want to be able to switch the processes address space with just
> a few HW register writes. Munging a bunch of stuff together in a table
> means that large chunks of the table have to be edited to switch.
>
> I want BAT entries to be self contained in HW registers that don't
> have to access memory to load a descriptor.
> > <
> > Each of these top level Root pointers has an LVL field which
> > describes which layers of the address space are skipped; or
> > equivalently the size of that virtual address space.
> Yes. The HW root pointer register can have any format - it does not
> have to match a PTE layout - so the skip count field can be any size.
>
> If I also want to support PTE's that can specify a skip count then
> a 2-bit field should be sufficient. But PTEs are already pretty full.
> > <
> >> Again the lower two page table levels are usually sufficient.
> > <
> > your 7-bit table only uses 1/8 of the allocated page.
> The top level table of the page table doesn't have to be 8 kB.
>
> Here it is a 1 kB table aligned on a 1 kB boundary.
> The root register physical address has sufficient bits to locate
> that table on any 1 kB boundary, such as inside a process header.
> > So, I don't see an advantage of 16 root pointers has over 1024 root
> > pointers.
> Because if it is stored in memory then the format needs to be defined.
> If the root is a HW register it can have any format to suite its function.
> A PTE is 64 bits but a BAT descriptor is 3*64=192 bits.
>
> 1024 entries would have to be loaded from memory dynamically.
> I didn't want that as it implies a hardware predefined memory format.
> I want the root registers written by code so the OS can decide where
> and how to store this information.
>
> I want the number to be small so process address space switch is fast,
> usually one or two register writes.
> >> I want to be able to skip from a MMU root register to page table level 2.
> > <
> > Root.LVL = 011
> Yes, a skip count of 3 skips over levels 5,4,3 to 2.
> Level-2 is sufficient as root doesn't need to point directly at
> a 8 kB level-1 PTP mapping 8 MB or a single 8 MB data page.
> 8 MB is really too small to be useful.
> > <
> > You can also "run" 'cat' with a 23-bit VA space Root.LVL = 010.
> > Root->page_or_ptes[va<22..12>] gives you the PTE.LVL = 001, without any
> > PTP levels in the table.
> That would be a skip count of 4 to skip to level-1.
> The root would have to have a "large page" bit to indicate either
> a 8 kB PTP pointing to up to 8 MB of small pages, or 1 large 8 MB page.
>
> An 8 MB virtual area is too small to be useful.
> I didn't think that was worth the extra trouble.
> > <
> >> Also because page table pages can be reused at multiple levels
> >> the skip count should be relative.
> > <
> > This adds arithmetic (albeit small adders) in the MMU function. Why can
> > this not be determined when the tables are setup?
> It is only used by the HW tablewalker and SW page fault handler,
> and it really only decides what the next state of a state machine is.
> That looks more MUX than adder.
> > <
> >> Also I think there could be multiple skips in a walk, say a skip
> >> from root to 4, and 4 to 2. At least I see no reason to disallow it.
> > <
> > Nor did I.
> I'm kind of on the fence about this.
>
> MMU has to support skip from root to a lower level, but the root bits
> are stored the a HW register independent of the PTE format.
>
> These interior skip counts however require 2 PTE bits
> and the PTE is already pretty full.
>
> And I'm not convinced that an OS would make use of these
> interior skip counts if they were present.
<
Imagine a running program that a user wants to continue running but insert
a debugger. My thought was that the debugger could get some upper layer
in the MMU tables and point at the application tables. If the debugger is
already running you may want to share this debugger with another copy of
itself, so you may be forced to skip a few levels to the application tables.
Thus more than one level of skip.
<
> > <
> >> Each PTE has a field giving the relative number of levels to skip
> >> in a table walk. With 8kB pages and a 5 level table, a 2-bit PTE
> >> field allows it to skip directly from my HW root[N] register to level-2.
> > <
> > I use a 3-bit absolute next level encoding without arithmetic.
> > Root.LVL determines the maximal size of the virtual address space.
> > PTP.LVL determines the next level in the table
> > previous PTP.LVL -> PTE.LVL = 001 determines the size of the page
> > {001->PTE, 010->8K, 011->8M, 100->8G, 101->8T, 110->8E}
<
> I considered absolute too but what made me decide for relative skip count
> was the potential for page table pages to be reused at multiple levels
> by the table management routines, as I describe in a different msg.
>
> Absolute LVL breaks with that, relative doesn't break.
> And this only matters to the tablewalker state machine.
<
I agree that it only maters to the table walker, but you have added a cycle of
delay between levels of walking the table.
>
> Anyway I'm not convinced an OS could use multiple skips if they were present.
> >> Another issue is what the fill values should be that were skipped.
> >> MMU can assume the the address bits skipped were all 0's,
> >> or the PTE could have a bit to prescribe the skipped value as 0 or 1.
> > <
> > I am following the rules of canonicality AMD uses when skipping levels
> > in the I/O MMU.
> I didn't know AMD IOMMU did skips. I'll look at it.
> >> There are also the page size selection field to consider.
> >> With 8kB as the base page size, that give multiples that are
> >> 13 = 8kB, 23 = 8 MB, 33 = 8 GB, 43 = 8 TB.
> >> Four sizes should be sufficient, which requires 1 PTE bit at each level.
> > <
> > It is not a bit, it is part of an encoding. In any event, you end up with
> > 5 (or 6) states, and this always requires 3-bits to encode. 2-bits for
> > LVL and 1 bit for PTE is not usefully different than 3-bits for LVL
> > with one encoding for PTE. The 3-bit encoding can also provide for
> > PA=VA using another encoding.
> We must be thinking about different things.
> At each level is either a single large naturally aligned data page
> whose size is defined by the level,
<
There are PTPs which only point at 8KB pages.
Then there are PTEs which can point at naturally aligned super pages.
......these can be restricted to the number of 8KB pages they contain.
......So, you could map a single {16KB, 256KB,...} page using an 8M entry.
<
> or a PTP containing PTE's that point to the next lower level.
<
Always 1 8KB page.
<
> And that next lower level is either a single large page, or a 8 kB PTP.
<
I prefer to think of it as 1024 translation entries 8Bytes each.
PTPs can be freely interspersed with PTEs.
<
> So 1 bit suffices to specify that the next lower level is "large".
<
See, this was my point--you already consumed 2-bits for level skipping
now you add 1-bit to designate a PTE (as opposed to a PTP.)
Instead of using 3-bits as you describe, I use those 3 bits as
illustrated above. But in particular, by remembering the LVL
of the entry that got you PTE, you KNOW the size of the page!
<
> The tablewalker knows which level it is working on and what "large" means.
>
> So if root skips to a "large' page at level-3 that is defined
> as a single aligned 8 TB page with 43 bit byte offsets.
> If root skips to a "large" page at level-2 that is defined
> as a single aligned 8 GB page with 33 bit byte offsets.
<
Root always, in my model, points at an 8KB page containing 1024
translation entries: an entry from that table could be as you describe.
{That is: Root is known to be a PTP.}
>
> I think it unlikely that either of those features would ever be used.
> The requirement for physical alignment pretty much kills its usefulness.
<
I doubt any PTEs larger than 8GB will be used.
<
> > <
> >> That allows the HW root to skip directly to level-2 where the
> >> PTE specifies a large page size of 8 MB.
> > <
> > Root.LVL = 010
> > page_of_PTEs[va<32..23>] gives you the 8MB PTE.LVL = 001; the
> > 10-unused bits in the PTE specify the size of the 8MB mapping
> > {16K,24K,...8M-8K,8M} so you don't have to allocate an entire 8M
> > to this mapping. And so on as PTE maps get larger.
> That's a good idea.
> I was always bothered by those "wasted" bits.
<
This still comes with the alignment restrictions, so it does not alleviate
to entire issue, just makes it more tolerable.
<
> >> Together this allows virtual address translate in a single memory access
> >> in address space sizes up to 8 GB!
> > <
> > I started here about 2006.
> I've been musing on it along time too.
> Mostly disconnected ideas that need to coalesce.
>
> I think the skip count for the guest process and kernel page tables,
> combined with a BAT entry to map a guest OS onto host memory,
> might be the solution to the N^2 VM translate problem.
> In theory it could translate with max of 2 memory lookups.
<
I was thinking that when an OS is first launched, it be given 8GB of space.
This is mapped by HV in 1 layer of translation, and unless HV has to start
paging Guest OS, it can be left as 1 translation {for a long time}.


Click here to read the complete article
Re: Packing Hash Tables

<4VoYJ.108094$3jp8.77411@fx33.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24288&group=comp.arch#24288

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx33.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Packing Hash Tables
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com> <t0ennl$ald$1@dont-email.me> <c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com> <5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com> <6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com> <3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com> <FAqXJ.156801$Gojc.137903@fx99.iad> <691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com> <dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com> <d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com> <d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com> <de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com> <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com> <IY3YJ.108064$3jp8.15219@fx33.iad> <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com> <t0r6hu$p5m$1@dont-email.me> <3TbYJ.121927$GjY3.90377@fx01.iad> <ddba6add-19ab-43b6-b98f-58110a886f8cn@googlegroups.com> <6d38f64c-ff74-45f9-9ee5-3d3619518d15n@googlegroups.com>
In-Reply-To: <6d38f64c-ff74-45f9-9ee5-3d3619518d15n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 60
Message-ID: <4VoYJ.108094$3jp8.77411@fx33.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 16 Mar 2022 17:13:36 UTC
Date: Wed, 16 Mar 2022 13:13:27 -0400
X-Received-Bytes: 4664
 by: EricP - Wed, 16 Mar 2022 17:13 UTC

MitchAlsup wrote:
> On Tuesday, March 15, 2022 at 9:46:07 PM UTC-5, robf...@gmail.com wrote:
>>> On 3/15/2022 1:08 PM, MitchAlsup wrote:
>>> So, what they need is a quick way of getting an address to the failing PTP/PTE.
>> I am reminded of a trace cache. Something similar could be used.
>>
>> It would be easier if a link back to the parent were in a PTP. Since the failing PTE
>> / PTP is known, back links could be followed. I wonder if an empty entry in the
>> PTP could be used as a back-link. Is it faster to scan a page for a back-link or use
>> other means? If there were extra room in the PTP a link could be added in the
>> header.
> <
> Linux uses struct_page (about 2 cache lines per entry) to keep track of "all the
> states a page can get into": so struct_page[PPN] gets you to everything you need
> to know about a page in 1 layer of indirection.
> <
> Since HW makes no claims to the bit patterns when p = 0 (present), PPN of this
> entry should be pointing at what this non-present PTP would have been pointing
> at were it present.

The struct_page table is used for different purposes.
It is used for recording info about the status of each physical page.
There can be multiple virtual addresses in multiple processes
mapped to the same physical page.

Page faults are different.
The page fault exception stack gets the instruction pointer of
the start of the faulting instruction, the faulting address,
and some status bits indicating the error, what access was being attempted.

The page faulting thread's handler acquires a page table mutex
and uses the faulting virtual address to walk down the PGT doing
the same checks a HW walker would do. For it to access the page table
the PTP must (somehow) already be mapped into virtual memory.

There are many situations that can occur.
Two threads could have faulted the same page at once.
The first fixes the fault, releases the mutex and returns.
The second walks the table and finds the fault already fixed
and just returns.

Or maybe it walks the table and finds an interior PTP marked Not-Present.
Or maybe the faulting page was Present but marked read-only and
this thread attempted a write.

In those cases the fault handler uses the faulting virtual address to
look up the memory section range covering it in the virtual section table.
If it finds a valid memory section then it can take corrective action.
For the missing interior it could fill in the tree and then a data page.
For the write to a read-only page, if its copy-on-write then copy to a
new physical page and change the PTE to the new PPN with write access.

In its simplest sense the page fault handler is deterministic like a
Programmable Logic Array PLA, with input state codes and don't care values,
actions to take, and next state outputs. The problem is there are lots
of states and actions, and a lot of fiddly things to take care of
like what if you need a new physical page and the free list is empty.

Re: Packing Hash Tables

<63fe646c-495b-4a18-934f-3d3b51dce9e9n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24289&group=comp.arch#24289

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1713:b0:67b:3b91:e91b with SMTP id az19-20020a05620a171300b0067b3b91e91bmr734523qkb.534.1647454651038;
Wed, 16 Mar 2022 11:17:31 -0700 (PDT)
X-Received: by 2002:a05:6870:d249:b0:dd:ada6:736b with SMTP id
h9-20020a056870d24900b000ddada6736bmr420421oac.27.1647454650767; Wed, 16 Mar
2022 11:17:30 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 16 Mar 2022 11:17:30 -0700 (PDT)
In-Reply-To: <4VoYJ.108094$3jp8.77411@fx33.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:847d:d7bf:6931:a6b7;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:847d:d7bf:6931:a6b7
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<t0ennl$ald$1@dont-email.me> <c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com> <6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com> <FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com> <dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com> <d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com> <db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad> <f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<t0r6hu$p5m$1@dont-email.me> <3TbYJ.121927$GjY3.90377@fx01.iad>
<ddba6add-19ab-43b6-b98f-58110a886f8cn@googlegroups.com> <6d38f64c-ff74-45f9-9ee5-3d3619518d15n@googlegroups.com>
<4VoYJ.108094$3jp8.77411@fx33.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <63fe646c-495b-4a18-934f-3d3b51dce9e9n@googlegroups.com>
Subject: Re: Packing Hash Tables
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 16 Mar 2022 18:17:31 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 97
 by: MitchAlsup - Wed, 16 Mar 2022 18:17 UTC

On Wednesday, March 16, 2022 at 12:13:39 PM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Tuesday, March 15, 2022 at 9:46:07 PM UTC-5, robf...@gmail.com wrote:
> >>> On 3/15/2022 1:08 PM, MitchAlsup wrote:
> >>> So, what they need is a quick way of getting an address to the failing PTP/PTE.
> >> I am reminded of a trace cache. Something similar could be used.
> >>
> >> It would be easier if a link back to the parent were in a PTP. Since the failing PTE
> >> / PTP is known, back links could be followed. I wonder if an empty entry in the
> >> PTP could be used as a back-link. Is it faster to scan a page for a back-link or use
> >> other means? If there were extra room in the PTP a link could be added in the
> >> header.
> > <
> > Linux uses struct_page (about 2 cache lines per entry) to keep track of "all the
> > states a page can get into": so struct_page[PPN] gets you to everything you need
> > to know about a page in 1 layer of indirection.
> > <
> > Since HW makes no claims to the bit patterns when p = 0 (present), PPN of this
> > entry should be pointing at what this non-present PTP would have been pointing
> > at were it present.
> The struct_page table is used for different purposes.
> It is used for recording info about the status of each physical page.
> There can be multiple virtual addresses in multiple processes
> mapped to the same physical page.
>
> Page faults are different.
> The page fault exception stack gets the instruction pointer of
> the start of the faulting instruction, the faulting address,
> and some status bits indicating the error, what access was being attempted.
<
Page Fault gets attached to thread number (which accesses the faulting IP)
....................gets the faulting instruction-specifier in R1
....................gets the faulting virtual address in R2
....................gets the address of fault generating PTE/PTP in R3
If the problem is in Guest OS translations Guest OS page fault handler
receives control. If the problem is in HyperVisor Tables, HV page fault
handler receives control. In both cases the faulting thread becomes
a "customer" of the fault handler. This customer thing is a means to
access customer memory to fix exceptions, and access to customer
state {to fix page tables,...} from afar using indirection without needing
to have the customer virtual memory mapped into handler.
>
> The page faulting thread's handler acquires a page table mutex
> and uses the faulting virtual address to walk down the PGT doing
> the same checks a HW walker would do. For it to access the page table
> the PTP must (somehow) already be mapped into virtual memory.
<
No need to walk the tables you already have a pointer to the fault detecting
PTP/PTE. Only if you need to go back up the table do you need to walk it.
But, here, you can LDPT LVL-1 and let HW give you the address to the
upper level table.
>
> There are many situations that can occur.
> Two threads could have faulted the same page at once.
> The first fixes the fault, releases the mutex and returns.
> The second walks the table and finds the fault already fixed
> and just returns.
<
Except you don't have to walk the table.
>
> Or maybe it walks the table and finds an interior PTP marked Not-Present.
> Or maybe the faulting page was Present but marked read-only and
> this thread attempted a write.
<
Anything is possible; you just don't have to walk the table; you receive
control with a pointer to where the fault detecting memory management
pointer PTP/PTE resides in the translation tables themselves.
<
And you only have to lock the table if you want to change it.
And when you get the table locked, you have to check that the table has
not already been updated by some other thread. More test-and-test-and-set
than just test-and-set.
>
> In those cases the fault handler uses the faulting virtual address to
> look up the memory section range covering it in the virtual section table.
Sure,
> If it finds a valid memory section then it can take corrective action.
sure,
> For the missing interior it could fill in the tree and then a data page.
sure,
> For the write to a read-only page, if its copy-on-write then copy to a
> new physical page and change the PTE to the new PPN with write access.
normal,
>
> In its simplest sense the page fault handler is deterministic like a
> Programmable Logic Array PLA, with input state codes and don't care values,
> actions to take, and next state outputs. The problem is there are lots
> of states and actions, and a lot of fiddly things to take care of
> like what if you need a new physical page and the free list is empty.
<
It is the job of the architecture to define an interface from the point of
fault detection to the point that receives control, such that the handler
has as little work to do as possible, and to expose to handler means of
access to HW formatted data faster than software can access HW
formatted data using instructions (LDPT).
It is not the job of HW to manage as many states as SW has decided
are necessary and reasonable, nor should HW define the format of any
SW data structures--mechanism not policy.

Re: Packing Hash Tables

<t0t9js$dg9$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24290&group=comp.arch#24290

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: cr88...@gmail.com (BGB)
Newsgroups: comp.arch
Subject: Re: Packing Hash Tables
Date: Wed, 16 Mar 2022 13:17:30 -0500
Organization: A noiseless patient Spider
Lines: 157
Message-ID: <t0t9js$dg9$1@dont-email.me>
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com>
<6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com>
<FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com>
<dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com>
<d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com>
<db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad>
<f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<t0r6hu$p5m$1@dont-email.me>
<3cd629e4-3623-4ad2-9e0f-ea9c992325e0n@googlegroups.com>
<t0rjd1$d6i$1@dont-email.me>
<448e523a-bd08-414d-8489-f7453f2218edn@googlegroups.com>
<a4bb1e3e-d311-4dac-90d8-4008ad217105n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 16 Mar 2022 18:17:32 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="1fc4e2098977064c8935f28736d7576b";
logging-data="13833"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18/4bj+30fhKrV5+rMsskZu"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.1
Cancel-Lock: sha1:p00s7f189g256j0qbgCoMDEPVR4=
In-Reply-To: <a4bb1e3e-d311-4dac-90d8-4008ad217105n@googlegroups.com>
Content-Language: en-US
 by: BGB - Wed, 16 Mar 2022 18:17 UTC

On 3/16/2022 12:06 AM, robf...@gmail.com wrote:
> On Wednesday, March 16, 2022 at 12:33:12 AM UTC-4, robf...@gmail.com wrote:
>> On Tuesday, March 15, 2022 at 10:52:20 PM UTC-4, BGB wrote:
>>> On 3/15/2022 6:54 PM, MitchAlsup wrote:
>>>> On Tuesday, March 15, 2022 at 6:13:06 PM UTC-5, BGB wrote:
>>>>> On 3/15/2022 1:08 PM, MitchAlsup wrote:
>>>>>> On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:
>>>>>>> MitchAlsup wrote:
>>>>>>>> On Monday, March 14, 2022 at 7:40:00 AM UTC-5, robf...@gmail.com wrote:
>>>>>>>>> On Sunday, March 13, 2022 at 7:10:48 PM UTC-4, robf...@gmail.com wrote:
>>>>>>>>

<big snip, context didn't appear to be used here>

>> I am preferring hash tables over hierarchical tables. They are easier to deal with
>> in some ways. They do not need to process in power-of-two addressing so the
>> size of entries can be adjusted to suit.
>>

Possible, they could have an advantage in terms of space, but:
Storage area still needs to be variable size to deal with general use cases;
The performance of a chain-hash would depend highly on the size of the
top-level hash table;
A direct-hash would not deal well with resizing.

I am not sure whether "in general" a chain-hash or B-Tree would be
better for a large sparse address space, would need to model this.

Chain Hash, Example:
16K Head
U32 n_page; //number of pages in chain hash
U32 free_pte; //freed page-table entry
2040*32b: List of sub-pages (chain hash data area)
2048x32b: Top-level hash index (into chain area)

This structure would address up to 16GB with 16K pages and a 32B PTE.

Could be extended by using the trick of treating the higher-number
sub-page entries as indirect:
0..1023: Direct Page Reference
1024..1535: Indirect page reference
... Should be plenty ...

Should be able to be relatively fast for smaller address space sizes.
For a 4GB address space, one would be looking at a chain length of
around 128.

In a B-Tree, each page would be a node (internal or leaf).
64B, Node Header
Node Level
Next/Prev Node
...
510x64B, PTE or PDE

Handling insert or delete operations is a little more involved for
B-Trees than for a chain hash.

Likely to be slower for lookup except for larger address spaces.

Lookup would be ~ 9 steps per node (within the binary search).

4GB falls on the 2/3 level split, or between 18 and 27 steps.

I guess it is a question of which is cheaper:
~ 27 steps in a binary search;
Walking 128 entries in a chain hash.

Another option could be an AVL tree, but this would have higher memory
overhead than a B-Tree (however, the logic for working with these trees
would be a fair bit simpler).

Say, we spend ~ 64B per node:
Node Level
Left/Right Pointer (16B)
PTE/PDE (32B) Leaf, or Center / Pivot Point (Non-Leaf)
PTE could be stored in 24B as well for a 96-bit VA.

A 4GB address space would need ~ 33MB ...
An AVL tree would kinda suck on this point.

Could squeeze the AVL nodes down slightly if we are OK with a
non-power-of-2 node size (or, maybe get them down to 32B via
bit-packing), say:
(255:174): Virtual Address
(173:168): ?
(167:160): Node Depth (0=Leaf)
(159:124): Left Node
(123: 88): Right Node
( 87: 76): ?
( 75: 64): KR Mode
( 63: 48): VUGID / ACLID
( 47: 14): PA
( 13: 0): Control Bits

Would take ~ 17MB for 4GB, vs ~ 9MB for the B-Tree.

Would be able to walk the tree in 18 or 19 steps, and is not needlessly
penalized by a sparse address space.

For performance reasons, the PTE would likely have the VA expressed as a
continuous span of bits, which would be better for integer-compare ops
during lookup.

I guess maybe one could also look into other possibilities as well...

>> I am wondering though how often pages of memory are shared?
>>

Lots, particularly if one assumes that any read-only sections (such as
".text") are shared.

Whether or not one considered the same page shared between multiple
threads as shared is a secondary question.

>> Using a simple hash table where the table index corresponds to the physical
>> address four virtual address entries could be supported at the same cost as
>> using a complex hash table that stores both the physical and virtual addresses
>> and doubles the size of the table to allow for some sharing. But with the simple
>> table sharing would be limited to four shares per page. To get more shares of a
>> page pages would need to be replicated at a different physical address. But it
>> seems like it could work.
>>
>> Big users of shared memory are probably the OS and other libraries and frameworks.
>>
>> Making copies might be one way to help with system integrity. If something
>> goes amiss with one copy of the OS, other copies are not affected.
>
> Just read this that says the page size should be 16kB.
>
> https://u.cs.biu.ac.il/~wisemay/page-size.pdf

I did my own experiments, and 16K came out as the local optimum in my
case as well.

....

I have been off recently experimenting with trying to get virtual memory
(pagefile) stuff working.

I can now run programs from remapped virtual addresses, but trying to
use the pagefile causes stuff to break pretty quickly (program crashes
roughly about as soon as it uses up enough memory that it starts trying
to page things out to swap).

Re: Packing Hash Tables

<t0tbte$1tk$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=24291&group=comp.arch#24291

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: iva...@millcomputing.com (Ivan Godard)
Newsgroups: comp.arch
Subject: Re: Packing Hash Tables
Date: Wed, 16 Mar 2022 11:56:46 -0700
Organization: A noiseless patient Spider
Lines: 63
Message-ID: <t0tbte$1tk$1@dont-email.me>
References: <43b7bfee-85be-4bab-842d-ef0d966f6dcdn@googlegroups.com>
<t0ennl$ald$1@dont-email.me>
<c88890ab-633e-4be5-8efc-e0bcb8a48be9n@googlegroups.com>
<5667c422-d8ad-4159-a0f1-4543223eca38n@googlegroups.com>
<6c227477-2d01-4496-b93a-2b9389cc7453n@googlegroups.com>
<3c363fd7-4004-4d99-bea7-7eb3c5cb3aa6n@googlegroups.com>
<FAqXJ.156801$Gojc.137903@fx99.iad>
<691e9232-da74-43c2-9b07-e6dd2a9fd406n@googlegroups.com>
<dc768c6b-9ab2-45ed-9236-6543b645c051n@googlegroups.com>
<d28b11ac-d42c-470b-9335-a31014654dc6n@googlegroups.com>
<d3d52db1-7513-465e-b120-86e0cab2cf44n@googlegroups.com>
<de548517-85c5-4f38-b012-bf5699d44daan@googlegroups.com>
<db8ab033-b373-46dc-9381-2b5e2c82a3a0n@googlegroups.com>
<IY3YJ.108064$3jp8.15219@fx33.iad>
<f4b408a9-a145-4a7f-ae80-a3f3fd46babdn@googlegroups.com>
<t0r6hu$p5m$1@dont-email.me> <3TbYJ.121927$GjY3.90377@fx01.iad>
<ddba6add-19ab-43b6-b98f-58110a886f8cn@googlegroups.com>
<6d38f64c-ff74-45f9-9ee5-3d3619518d15n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 16 Mar 2022 18:56:47 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="63584e01f2406f3d5c0eae8f25bfd457";
logging-data="1972"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18aab7PJa59vuvmaaBqCjpL"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.6.2
Cancel-Lock: sha1:Fb3MlHh3BBlLEpYk4BdGMunhKJQ=
In-Reply-To: <6d38f64c-ff74-45f9-9ee5-3d3619518d15n@googlegroups.com>
Content-Language: en-US
 by: Ivan Godard - Wed, 16 Mar 2022 18:56 UTC

On 3/16/2022 8:57 AM, MitchAlsup wrote:
> On Tuesday, March 15, 2022 at 9:46:07 PM UTC-5, robf...@gmail.com wrote:
>> On Tuesday, March 15, 2022 at 10:24:03 PM UTC-4, EricP wrote:
>>> BGB wrote:
>>>> On 3/15/2022 1:08 PM, MitchAlsup wrote:
>>>>> On Tuesday, March 15, 2022 at 12:23:56 PM UTC-5, EricP wrote:
>>>>>> What about address translate skipping page table levels?
>>>>> <
>>>>> Skipping levels is, in my opinion, a necessity. I was part of the
>>>>> I/OMMU team
>>>>> at AMD where we put in level skipping as part of its design. I think
>>>>> CPU tables
>>>>> and I/OMMU tables should be one and the same, along with the "a
>>>>> pointer is a
>>>>> pointer" philosophy. I/O device should get virtual address of
>>>>> originating thread,
>>>>> I/O DMA should be translated thought page tables of originating
>>>>> thread, I/O
>>>>> completion interrupt should be delivered to Guest OS hosting
>>>>> originating thread.
>>>>
>>>> One drawback of level skipping though, is that one can either ignore
>>>> bits (effectively multi-mapping the skipped levels), or assumes them to
>>>> be zero (doesn't work for randomized addresses), neither of which are
>>>> ideal in a large+sparse ASLR scenario.
>>> As I mention below, it can assume the missing values are 0's
>>> or have a PTE bit to specify the value.
>>>
>>> A PTE bit allows one to differentiate between whether the
>>> page is part of a grow-up or grow-down memory section.
>>> E.g. heap areas expand upward, stack expand downwards.
>>>>>> Another issue is what the fill values should be that were skipped.
>>>>>> MMU can assume the the address bits skipped were all 0's,
>>>>>> or the PTE could have a bit to prescribe the skipped value as 0 or 1.
>>>>> <
>>>>> I am following the rules of canonicality AMD uses when skipping levels
>>>>> in the I/O MMU.
>>> More feature bits also puts more presure on the PTE which is already full.
>>
>>> A PTE bit allows one to differentiate between whether the
>>> page is part of a grow-up or grow-down memory section.
>>> E.g. heap areas expand upward, stack expand downwards.
> <
>> I wonder if the address could be sign-extended, or multi-bit sign-extended?
>> One may want to have ROM / IO mapped to virtual addresses that are the
>> same as physical ones, high in memory.
> <
> I put ROM in its own 64-bit physical address space.
> I put MMI/O in its own 64-bit physical address space.
> I put configuration in its own 64-bit physical address space.
> Leaving all 64-bits for real memory (DRAM).

As opposed to mapping virtual to physical and then picking off slices of
physical with apertures that map physical to device?

The problem with embedding subspaces (which is what apertures recognize)
into the TLB is that it prevents dynamically adding more spaces. Say you
have two kinds of RAM, on disjoint pins but both with device addresses
starting at zero. That's just another aperture, but in the TLB its a new
chip.

Granted that's not something done by bog-standard usage. I may have been
perverted by experience with partionable pinsets.

Pages:1234
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor