Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

"Never face facts; if you do, you'll never get up in the morning." -- Marlo Thomas


devel / comp.arch / Re: Safepoints

SubjectAuthor
* Safepointsantispam
+* Re: SafepointsMitchAlsup
|`* Re: Safepointsantispam
| `* Re: SafepointsBranimir Maksimovic
|  +* Re: SafepointsBranimir Maksimovic
|  |`- Re: SafepointsBranimir Maksimovic
|  `* Re: Safepointsantispam
|   `- Re: SafepointsBranimir Maksimovic
+* Re: SafepointsIvan Godard
|+* Re: Safepointsantispam
||`- Re: SafepointsBranimir Maksimovic
|`* Re: SafepointsBranimir Maksimovic
| +* Re: Safepointsantispam
| |`* Re: SafepointsBranimir Maksimovic
| | `* Re: SafepointsBranimir Maksimovic
| |  `* Re: Safepointsantispam
| |   `- Re: SafepointsBranimir Maksimovic
| `- Re: SafepointsMitchAlsup
+* Re: SafepointsBranimir Maksimovic
|`* Re: Safepointsantispam
| `- Re: SafepointsBranimir Maksimovic
+* Re: SafepointsThomas Koenig
|`- Re: SafepointsBranimir Maksimovic
+* Re: SafepointsDavid Brown
|+* Re: SafepointsBranimir Maksimovic
||+- Re: SafepointsBranimir Maksimovic
||+* Re: SafepointsAndy Valencia
|||`- Re: SafepointsBranimir Maksimovic
||`- Re: SafepointsDavid Brown
|+* Re: SafepointsStephen Fuld
||+* Re: SafepointsMitchAlsup
|||+- Re: SafepointsStephen Fuld
|||`* Re: SafepointsDavid Brown
||| +- Re: Safepointsrobf...@gmail.com
||| `* Re: SafepointsBranimir Maksimovic
|||  `* Re: SafepointsDavid Brown
|||   `* Re: SafepointsEricP
|||    +- Re: SafepointsDavid Brown
|||    `- Re: SafepointsEricP
||`* Re: SafepointsDavid Brown
|| +* Re: SafepointsStephen Fuld
|| |`* Re: SafepointsEricP
|| | `* Re: SafepointsStephen Fuld
|| |  +* Re: SafepointsDavid Brown
|| |  |`* Re: SafepointsEricP
|| |  | `* Re: SafepointsDavid Brown
|| |  |  `* Re: SafepointsEricP
|| |  |   `* Re: SafepointsDavid Brown
|| |  |    +* Re: SafepointsStephen Fuld
|| |  |    |`- Re: SafepointsDavid Brown
|| |  |    `* Re: SafepointsEricP
|| |  |     +- Re: SafepointsMitchAlsup
|| |  |     +- Re: SafepointsDavid Brown
|| |  |     `* Re: SafepointsPaul A. Clayton
|| |  |      +* Re: SafepointsMitchAlsup
|| |  |      |+* Re: SafepointsTerje Mathisen
|| |  |      ||`- Re: SafepointsMitchAlsup
|| |  |      |`* Re: SafepointsPaul A. Clayton
|| |  |      | `* Re: SafepointsMitchAlsup
|| |  |      |  +* Re: SafepointsEricP
|| |  |      |  |`* Re: SafepointsMitchAlsup
|| |  |      |  | `* Re: SafepointsEricP
|| |  |      |  |  `* Re: SafepointsMitchAlsup
|| |  |      |  |   `* Re: SafepointsEricP
|| |  |      |  |    `- Re: SafepointsMitchAlsup
|| |  |      |  `* Re: SafepointsPaul A. Clayton
|| |  |      |   `- Re: SafepointsMitchAlsup
|| |  |      `* Re: SafepointsEricP
|| |  |       `* Re: SafepointsChris M. Thomasson
|| |  |        `* Re: SafepointsEricP
|| |  |         `* Re: SafepointsChris M. Thomasson
|| |  |          +- Re: SafepointsMitchAlsup
|| |  |          `* Re: SafepointsEricP
|| |  |           `* Re: SafepointsChris M. Thomasson
|| |  |            `- Re: SafepointsChris M. Thomasson
|| |  `* Re: SafepointsEricP
|| |   `* Re: SafepointsDavid Brown
|| |    `* Re: SafepointsEricP
|| |     `* Re: SafepointsDavid Brown
|| |      `* Re: SafepointsEricP
|| |       +- Re: SafepointsMitchAlsup
|| |       `- Re: SafepointsDavid Brown
|| `- Re: SafepointsMitchAlsup
|`- Re: everything old is new again, SafepointsJohn Levine
`* Re: Safepointsaph
 `* Re: SafepointsBranimir Maksimovic
  +* Re: Safepointsantispam
  |`* Re: Safepointsaph
  | `* Re: Safepointsantispam
  |  `* Re: Safepointsaph
  |   +- Re: SafepointsBranimir Maksimovic
  |   `- Re: Safepointsantispam
  `* Re: Safepointsaph
   `* Re: SafepointsBranimir Maksimovic
    `* Re: Safepointsantispam
     `* Re: SafepointsBranimir Maksimovic
      +* Re: SafepointsBranimir Maksimovic
      |`- Re: SafepointsBranimir Maksimovic
      `* Re: Safepointsantispam
       `- Re: SafepointsBranimir Maksimovic

Pages:1234
Re: Safepoints

<0a195651-4c2e-4a3a-a452-6c650d01ce63n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19604&group=comp.arch#19604

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:57c4:: with SMTP id w4mr9825300qta.39.1628269578890; Fri, 06 Aug 2021 10:06:18 -0700 (PDT)
X-Received: by 2002:a9d:1c85:: with SMTP id l5mr3070174ota.5.1628269578572; Fri, 06 Aug 2021 10:06:18 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 6 Aug 2021 10:06:18 -0700 (PDT)
In-Reply-To: <ktdPI.2401$NyB2.2194@fx17.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:ece2:6007:88e4:f63b; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:ece2:6007:88e4:f63b
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me> <se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me> <1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <lhCOI.3073$Fx8.581@fx45.iad> <sef1gf$dvr$1@dont-email.me> <qdIOI.1006$yW1.813@fx08.iad> <segs22$n6j$1@dont-email.me> <ktdPI.2401$NyB2.2194@fx17.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0a195651-4c2e-4a3a-a452-6c650d01ce63n@googlegroups.com>
Subject: Re: Safepoints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Fri, 06 Aug 2021 17:06:18 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 50
 by: MitchAlsup - Fri, 6 Aug 2021 17:06 UTC

On Friday, August 6, 2021 at 11:31:15 AM UTC-5, EricP wrote:
> David Brown wrote:
> > On 05/08/2021 04:40, EricP wrote:
> >> David Brown wrote:
> >>> On 04/08/2021 21:55, EricP wrote:
> >>
> >>>> If those map to single instructions on a uC, great.
> >>>> If not, then they do the equivalent PUSHF, INTD, LD, ST, POPF sequence.
> >>>> If an INTD_N instruction is available then it saves a PUSHF and POPF.
> >>>> But LL/SC saves any disable and is more generally useful even on a uC.
> >>>>
> >>> No, it is not more useful - because it does not work in conjunction with
> >>> interrupts.
> >> Its not clear what you think does not work with interrupts but if it is
> >> LL/SC the interrupt should clear the lock flag and prevent the SC.
> >>
> >> The ARMv7-M manual says exceptions do clear the flag:
> >> section A3.4.4 Context switch support
> >> "the local monitor is changed to Open Access automatically
> >> as part of an exception entry or exit"
> >>
> >> In ARM parlance Open Access means not exclusive - the load-lock is
> >> canceled.
> >
> > Yes, I realise that. This is not primarily designed to make ldrex/strex
> > work in an interrupt function, but to ensure that you don't have
> > problems when a task gets interrupted in the middle of a ldrex/strex
> > sequence, then another task starts a new sequence (claiming the lock for
> > itself), then the first task resumes again and thinks it has the lock.
<
> I don't think LL/SC can function correctly if it does not cancel the
> lock on interrupt. Alpha documents this with the LL and SC instructions.
<
Interrupts, exceptions, and traps ALL need to undo any pending illusion
of ATOMICity. Any invisible transfer of control must undo these things.
ESM has the concept of a control point associated with the event, and
control is transferred to this point prior to transferring control to the
interrupt, exception, trap handler. So, not only do you have to get rid
of any pending "I am in an ATOMIC event" notion, you have to leave the
IP pointing at a position that the entire ATOMIC event car be attempted
again; or you have to point the IP at a place where the thread KNOWS that
the event failed.
>
> It is unfortunate that ARM documents the lock-cancel-on-interrupt separately
> (and I might add, making it almost impossible to find unless you
> pretty much know what to look for - that section took 1/2 hour to find)
> as this might give the impression that it is a secondary side effect
> rather than critical to the correct functioning.
>
> So lock-cancel-on-interrupt should not be viewed as a side effect
> but as a basic feature that you may use to your advantage.

Re: Safepoints

<sejue2$j43$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19605&group=comp.arch#19605

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Safepoints
Date: Fri, 6 Aug 2021 20:18:41 +0200
Organization: A noiseless patient Spider
Lines: 372
Message-ID: <sejue2$j43$1@dont-email.me>
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me>
<seeb8j$f0t$1@dont-email.me> <1JzOI.8$xM2.7@fx22.iad>
<seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me>
<5hHOI.1976$lK.1523@fx41.iad> <segqla$4k9$1@dont-email.me>
<r_cPI.1442$yW1.495@fx08.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 6 Aug 2021 18:18:42 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="69500232c0bb3033bf6b13cbace8db6f";
logging-data="19587"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+iECnq9CR9d8pniyuq6pQhZHq3POm3nQ4="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:0SPzRmk5XIC7Jc9OifsPeOqZS9U=
In-Reply-To: <r_cPI.1442$yW1.495@fx08.iad>
Content-Language: en-GB
 by: David Brown - Fri, 6 Aug 2021 18:18 UTC

On 06/08/2021 17:56, EricP wrote:
> David Brown wrote:
>>
>> I am not sure I a explaining things well here.
>
> Your explanations are fine but we have been talking about different things.
>
> You have been mostly referring to atomic memory operations between
> threads or smp cpus, and I explicitly excluded atomic operations.
>
> I was expanding on Mitch's earlier point about turning asynchronous
> events delivered using interrupt semantics into synchronous ones,
> which is what the original poster was asking about.
>

OK, that does make a difference!

> By atomic I mean memory operations between concurrent threads or cpus.
> Non interruptible sequence is orthogonal, within a single thread or cpu.
> E.g. on x86 ADD [mem],reg is a non interruptible R-M-W sequence but
> is not atomic. LCK ADD [mem],reg is both atomic and non interruptible.

Yes. On the systems I deal with, non-interruptible implies atomic since
there is only one core. (For other memory masters, such as DMA
controllers, you use other techniques to be sure that all accesses are
consistent with intended behaviour.)

(Atomic does not imply non-interruptible - if a sequence can be
restarted, and nothing is able to observe a half-way stage or change it
in the middle, then it is atomic even if it can be interrupted. The
ldrex/strex protected 32-bit increment is atomic but interruptible.)

>
>> The assembly for a 32-bit increment (of a variable in memory) in Thumb-2
>> will be something along the lines of :
>>
>>     ldr r3, [r2]
>>     add r3, r3, #1
>>     str r3, [r2]
>>
>> Three instructions, with one load and one store.  With tightly-coupled
>> memory (or on an M0 to M4 microcontroller, on-board ram), loads and
>> stores are two cycles.  So that is 3 instructions, 5 cycles.
>>
>> Making this atomic on an M0 microcontroller is :
>>
>>     cpsid i
>>
>>     ldr r3, [r2]
>>     add r3, r3, #1
>>     str r3, [r2]
>>
>>     cpsie
>>
>> Two extra instructions, at two cycles.  (These will generally not be
>> needed inside an interrupt function, unless the same data will be
>> accessed by different interrupt routines with different priorities.)
>>
>> If you want a more flexible sequence that saves and restores interrupt
>> status, and also need it safe for the dual-issue M7, you can use:
>>
>>     mrs r1, primask
>>     cpsid i
>>     dsb
>>
>>     ldr r3, [r2]
>>     add r3, r3, #1
>>     str r3, [r2]
>>
>>     msr primask, r1
>>
>> Four instructions, four cycles overhead for making the sequence atomic.
>
> Yes, most OS programming standards require interrupt state
> be saved and restored so the code is reentrant.

It depends on where you are in the code - sometimes you already know
that interrupts are disabled (this is typically the case inside
interrupt routines in simpler microcontrollers), and very often you know
that interrupts are enabled (for an ARM Cortex-M device, that is pretty
much all the time, except during such specific critical sections).

>
>> The equivalent for this using ldrex/strex is indeed short and fast:
>>
>> loop:
>>     ldrex r3, [r2]
>>     add r3, r3, 1
>>     strex r1, r3, [r2]
>>     cmp r1, #0
>>     bne loop
>>     dsb
>
> I looked at the ARMv7 manual again and I still think,
> _for code that is within a single thread or cpu as per
> the original question about asynchronous signals_
> that the DSB is unnecessary in this particular case.
> A cpu will always see its own instructions in program order.
> As DSB stalls the cpu until all prior memory and branches have
> completed it would impose an extra overhead on the sequence.

I've looked at the manuals too, and I really don't think it is clear
enough on the matter. But ARM's examples generally do use a DSB in many
such cases. In fact, there should perhaps also be one beside the
restore interrupt status instruction too. I think it should come before
the "msr primask" (or "cpsie i") to re-enable interrupts, but I can see
in the FreeRTOS kernel code (which has been checked by ARM folk), it
comes after re-enabling interrupts.

It would be a lot easier if the ARM manuals gave more details here!

The primary point is to ensure that the interrupt disabling actually
takes effect before the load happens, so that you eliminate the
possibility of an interrupt happening after the load. While it is
correct that the effect of instructions are seen in order on the cpu,
that does not necessarily apply in combination with interrupts which
are, by definition, asynchronous. And in particular, writes can be
delated and occur later in the pipeline. Also note that an M7 is dual
issue, which could also complicate matters. (Again, the ARM manuals do
not give helpful details here as far as I can see, but use DSB liberally.)

For another example, on the AVR microcontroller (which does document
these details), interrupt disabling takes effect one instruction later
due to pipelining. Thus on the AVR, following an interrupt disable, you
always make sure you have a "harmless" instruction (a NOP, if you have
nothing better) before you have a memory access instruction.

On a Cortex-M, a DSB is a single cycle instruction in most situations.

It would be nice to be sure here - but I'd rather err on the side of
having an extra DSB than risk problems. I think I might now add one
before re-enabling interrupts too. And I will see if I can set up some
test code to provoke problems.

>
>> The ldrex and strex instructions don't take any longer than their
>> non-locking equivalents.  And this is safe to use in interrupts.  In
>> real-time work, it is vital to track worst-case execution time, not
>> best-case or common-case.  So though the best case might be 3 cycles
>> overhead, an extra round of the loop might be another 9 cycles.  (Single
>> extra rounds will happen occasionally - multiple extra rounds should not
>> be realistically possible.)
>>
>> So it seems to be a viable alternative to disabling interrupts, with
>> approximately the same overhead.  However, it has two major
>> disadvantages to go with its obvious advantage of not blocking
>> interrupts.
>>
>> It will only work for sequences ending in a single write of 32 bits or
>> less, and it will only work for restartable sequences.
>
> Yes, this is a limitation of LL/SC - it cannot do double wide operations.
>
> A few years ago I proposed an enhancement here which
> allows LL/SC to multiple locations within a single cache line.
>
>   LL  - retains the lock if load is to the same cache line
>   SCH - store conditional and hold stores and retains lock
>   SCR - store conditional and release stores and releases lock
>   CLL - Clear lock
>
> The extra cost in the cache controller is holding the first line updates
> separate until the release occurs in case it needs to roll back.
> For the modified line it can either use a separate cache line buffer,
> or if L2 is inclusive it can update L1 and either keep or toss that copy.
>

That could be a useful extension, yes. (On Cortex M devices, the
granularity of the lock is the entire memory space, so it would not need
to be restricted to a cache line. And often you don't have a cache, or
the memory you are accessing is in uncached tightly-coupled memory.)

>> Suppose, instead, that the atomic operation you want is not a simple
>> increment of a 32-bit value, but is storing a 64-bit value.  With the
>> interrupt disable strategy, you still have exactly the same 4
>> instruction, 4 cycle overhead (or two instructions in the simplest
>> version), for both read and write routines.  How do you do this with
>> ldrex/strex ?
>
> Yes unfortunately in general you don't.
>
> However there are specific situations where it can be done.
> E.g. reading a 64-bit clock on a 32-bit cpu since the clock is always
> increasing one can read high1 read low, read high2, compare high1,high2
>

Indeed. (That is precisely the method I use for 64-bit clocks - a trick
I learned from 6502 code on the BBC Micro mentioned in another thread here.)

> Another example is updating a 64-bit PTE on a 32-bit system without using
> spinlocks and never leave the intermediate PTE in an illegal state.
> One does the reads and writes setting and clearing bits
> in a particular order.

Yes, there are many specific methods that can be used when you know more
details. If a 64-bit value is only written during an interrupt, and
read from thread code, then the interrupt can just use a volatile write
(generally a "std" store double instruction in Thumb-2) and the
application code can read it safely as a volatile (with a "ldd" load
double instruction - which gets restarted if it gets interrupted).


Click here to read the complete article
Re: Safepoints

<sejul5$kkc$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19606&group=comp.arch#19606

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch
Subject: Re: Safepoints
Date: Fri, 6 Aug 2021 20:22:28 +0200
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <sejul5$kkc$1@dont-email.me>
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me>
<seeb8j$f0t$1@dont-email.me> <1JzOI.8$xM2.7@fx22.iad>
<seejpg$ce6$1@dont-email.me> <lhCOI.3073$Fx8.581@fx45.iad>
<sef1gf$dvr$1@dont-email.me> <qdIOI.1006$yW1.813@fx08.iad>
<segs22$n6j$1@dont-email.me> <ktdPI.2401$NyB2.2194@fx17.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 6 Aug 2021 18:22:29 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="69500232c0bb3033bf6b13cbace8db6f";
logging-data="21132"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/TJEe9PJPzdlk2TBwf9kLXlYQKLN0hpQs="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:7+z9PPBJPr1lQ5lOWMJVink9dbE=
In-Reply-To: <ktdPI.2401$NyB2.2194@fx17.iad>
Content-Language: en-GB
 by: David Brown - Fri, 6 Aug 2021 18:22 UTC

On 06/08/2021 18:30, EricP wrote:
> David Brown wrote:
>> On 05/08/2021 04:40, EricP wrote:
>>> David Brown wrote:
>>>> On 04/08/2021 21:55, EricP wrote:
>>>
>>>>> If those map to single instructions on a uC, great.
>>>>> If not, then they do the equivalent PUSHF, INTD, LD, ST, POPF
>>>>> sequence.
>>>>> If an INTD_N instruction is available then it saves a PUSHF and POPF.
>>>>> But LL/SC saves any disable and is more generally useful even on a uC.
>>>>>
>>>> No, it is not more useful - because it does not work in conjunction
>>>> with
>>>> interrupts.
>>> Its not clear what you think does not work with interrupts but if it is
>>> LL/SC the interrupt should clear the lock flag and prevent the SC.
>>>
>>> The ARMv7-M manual says exceptions do clear the flag:
>>> section A3.4.4 Context switch support
>>> "the local monitor is changed to Open Access automatically
>>> as part of an exception entry or exit"
>>>
>>> In ARM parlance Open Access means not exclusive - the load-lock is
>>> canceled.
>>
>> Yes, I realise that.  This is not primarily designed to make ldrex/strex
>> work in an interrupt function, but to ensure that you don't have
>> problems when a task gets interrupted in the middle of a ldrex/strex
>> sequence, then another task starts a new sequence (claiming the lock for
>> itself), then the first task resumes again and thinks it has the lock.
>
> I don't think LL/SC can function correctly if it does not cancel the
> lock on interrupt. Alpha documents this with the LL and SC instructions.
>

That is correct. And ARM says that it /is/ cancelled, at least on the
Cortex-M devices. (There are so many ARM cores, and I have not looked
at the details for any others.)

> It is unfortunate that ARM documents the lock-cancel-on-interrupt
> separately
> (and I might add, making it almost impossible to find unless you
> pretty much know what to look for - that section took 1/2 hour to find)
> as this might give the impression that it is a secondary side effect
> rather than critical to the correct functioning.

Yes, ARM documentation is not always as helpful as it might be (putting
it diplomatically). It also does not help that some of the
implementation details are left to the core implementers, rather than ARM.

>
> So lock-cancel-on-interrupt should not be viewed as a side effect
> but as a basic feature that you may use to your advantage.
>

It is essential - or at least, it is essential if the interrupt causes a
task switch.

Re: Safepoints

<9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19623&group=comp.arch#19623

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:430a:: with SMTP id z10mr14111165qtm.303.1628370632203;
Sat, 07 Aug 2021 14:10:32 -0700 (PDT)
X-Received: by 2002:a05:6808:1807:: with SMTP id bh7mr11519084oib.157.1628370631730;
Sat, 07 Aug 2021 14:10:31 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 7 Aug 2021 14:10:31 -0700 (PDT)
In-Reply-To: <r_cPI.1442$yW1.495@fx08.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=64.26.99.248; posting-account=6JNn0QoAAAD-Scrkl0ClrfutZTkrOS9S
NNTP-Posting-Host: 64.26.99.248
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me>
<1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad>
<segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com>
Subject: Re: Safepoints
From: paaroncl...@gmail.com (Paul A. Clayton)
Injection-Date: Sat, 07 Aug 2021 21:10:32 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Paul A. Clayton - Sat, 7 Aug 2021 21:10 UTC

On Friday, August 6, 2021 at 11:58:19 AM UTC-4, EricP wrote:
[snip]
> A few years ago I proposed an enhancement here which
> allows LL/SC to multiple locations within a single cache line.
>
> LL - retains the lock if load is to the same cache line
> SCH - store conditional and hold stores and retains lock
> SCR - store conditional and release stores and releases lock
> CLL - Clear lock

One would not necessarily need SCH and one could merge SCR
and CLL into ordinary SC — if one did not need escaping memory
accesses. (For single cache block transactional memory, it might
not be excessively inconvenient to pre-load all cache-block-external
reads into registers and store any external writes after the
commitment. If one is not guaranteeing atomicity for external
accesses, such seems acceptable.)

This same interface could be trivially extended to support a broader
range of transactional memory subsets. Adding escaping/non-atomic
accesses would seem to require adding instructions beyond LL/SC
(with a new ISA one might be able to get away with using register
names to annotate types of loads but that seems a bit scary; being
able to mark stack and TLS as strictly thread local might be an
alternative to allowing a limited use for logging and such with a
benefit of sometimes reducing the monitored footprint).

In my opinion, a small family of TM "sizes" could be defined
(partially according to platform use — i.e., some hardware would
not support certain features — partially for hardware optimization
potential — communicating the features used might be problematic
— partially for software optimization, zArch-like guaranteed completion
"constrained transactions" could simplify software). It is not clear if
single 'word' LL/SC is sufficiently cheaper to implement that any
implementation would not support, e.g., 32-byte aligned granule
monitoring.

By only specializing the transaction initializing load and the transaction
ending store, function code could be shared with transactional memory
and locks. (Supporting locks and TM by having an if-then-else at the
start and end of a critical section may be unattractive. On the other
hand, a TM critical section would typically need to check a lock and
avoid using the TM method if the lock is held; this seems an ugly
aspect of TM.)

I suspect it would be useful to share code between TM and lock
critical sections. Associating a lock name with a TM-attempted
critical section would seem to facilitate reduced tracking (if software
guaranteed that all contention used that lock name).

Transaction nesting also seems to be a sore spot. Flattening the
transaction is simpler for software (internal sub-transactions that
can complete independently do not have to be memoized/tracked
at the cost of having to be redone on an external failure).

Supporting conflict count (like Mitch Alsup's Advanced Synchronization
Faclity) might not be especially difficult given a full-word-sized
return value. (I do wonder if multiple success values would be useful.
I could imagine knowing that one's successful transaction killed
another might be useful; that thread *might* be able to reduce the
probability of future contention (choosing other tasks, choosing less
efficient but less contentious ways of accomplishing tasks) or
reduce the cost of contention (migrating the task to reduce cache
coherence overhead — which presents the thought of connection
between performance monitoring and TM, even outside of transactions
information about cache content sharing might be useful [a little like
"informing loads", loads for which a miss could be detected to choose
a different path]. Transactional memory hardware is a form of
behavior monitoring which might be generalized in some useful
manner.)

(I/O interactions also face issues with transactional memory. Some of
this seems to come from legacy conceptions/interfaces — e.g.,
non-idempotent reads do not seem to be universally useful even for I/O
devices. There may also be uses for allowing partial 'leakage' of data
within a transaction between threads generally and not just I/O device
accesses.)

A large aspect of the challenge is defining interfaces which facilitate
sharing in conceptualizations across diverse uses, support flexibility
and wisdom in assigning work/responsibilities to hardware, system
software, and application software components, and facilitate change
(experimentation seems important for developing wisdom and
costs and benefits change with time and use-goals).

> The extra cost in the cache controller is holding the first line updates
> separate until the release occurs in case it needs to roll back.
> For the modified line it can either use a separate cache line buffer,
> or if L2 is inclusive it can update L1 and either keep or toss that copy.

If performance is not a problem (transactions almost always succeed
or memory is fast), using main memory for recovery might be
reasonable. Even for a 64-byte chunk, buffering the writes seems
unlikely to be excessively expensive outside of the smallest
microcontrollers. There are probably other uses for such buffer
storage that would not have temporal overlap (perhaps a circular
buffer that connects to an I/O device??) so that even some small
microcontroller processors might not have a problem.

(Microcontrollers present an interesting test of an interface's
generality. However, a microcontroller need not support all features
an application processor would be expected to support and may
need to support some features — related to real-time guarantees, e.g., —
that an application processor might not find useful. Software and
conception sharing is useful, but compartmentalization of information
is also useful.)

With respect to this post thread's context, the distinction between
atomic with respect to interrupts and atomic with respect to
inter-thread communication may not — in my opinion — be as
great as commonly conceived. Interrupts can be viewed as
thread switches (or thread invocations with the current thread
continuing in parallel, possibly speculatively).

This also brings in the diversity among locks and more
speculative transactional memory (this seems to be a design
space). A traditional disabling of interrupts is analogous to a
lock. I suspect there is potential for conceptual unification that
could both extend understanding to a broader user base by
sharing concepts across uses, provide a simpler conception,
and possibly facilitate optimization opportunities in software
and hardware.

It seems quite possible for an "interrupt" not to disrupt the
'atomicity' of an operation just as two threads may run
concurrently without necessarily causing a memory transaction
to abort. For real-time uses, guarantees about resource
availability/timeliness could be important; some threads
currently designed to use interrupt-blocking might be able to
run fine-grained interleaved with an interrupt handling thread
while some might be so time-constrained that the critical
thread needs all resources to meet the real-time guarantee.
(This seems related to thread priority and general resource
allocation. Again, an opportunity for conceptual unification.
Lessons from economics might be applied to this aspect of
resource management under communication/understanding
constraints.)

I have a similar view with respect to treating I/O agents as not
fundamentally different from processing agents/threads.
(Cache-coherent I/O is one already-taken step in this direction.)
I am nearly clueless about how such could be designed and what
the implications would be, but is seems an obvious path, especially
with I/O-like accelerators, 'smart' network interfaces, complex
storage devices, etc.

(As a reader might guess, I had a block of free time with Internet
access that coincided with being 'adequately caffeinated'.☺)

Re: Safepoints

<30b55b24-ba70-4da4-8bc9-a37d9503df82n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19625&group=comp.arch#19625

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a0c:e908:: with SMTP id a8mr17385626qvo.61.1628372357203;
Sat, 07 Aug 2021 14:39:17 -0700 (PDT)
X-Received: by 2002:a54:4194:: with SMTP id 20mr297951oiy.78.1628372356977;
Sat, 07 Aug 2021 14:39:16 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sat, 7 Aug 2021 14:39:16 -0700 (PDT)
In-Reply-To: <9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:2918:de57:d444:a0e6;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:2918:de57:d444:a0e6
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me>
<1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad>
<segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad> <9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <30b55b24-ba70-4da4-8bc9-a37d9503df82n@googlegroups.com>
Subject: Re: Safepoints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 07 Aug 2021 21:39:17 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Sat, 7 Aug 2021 21:39 UTC

On Saturday, August 7, 2021 at 4:10:33 PM UTC-5, Paul A. Clayton wrote:
> On Friday, August 6, 2021 at 11:58:19 AM UTC-4, EricP wrote:
> [snip]
> > A few years ago I proposed an enhancement here which
> > allows LL/SC to multiple locations within a single cache line.
> >
> > LL - retains the lock if load is to the same cache line
> > SCH - store conditional and hold stores and retains lock
> > SCR - store conditional and release stores and releases lock
> > CLL - Clear lock
<
Side note:
My 66000 has a lock bit in the 2-register memory instructions.
All of these instructions have a lock-bit. When a LD instruction
has the lock bit set, the cache line of that load is "participating"
an ATOMIC event. When a ST instruction has the lock bit set
that ST is the last instruction of the ATOMIC event. The CLL
is equivalent to a ST instruction to a non-participating cache line
with the lock bit set. So while STH, STR, and CLL are not instructions
they are present within the endoding already present.
<
> One would not necessarily need SCH and one could merge SCR
> and CLL into ordinary SC — if one did not need escaping memory
> accesses.
<
95%+ of ATOMIC events do not need to use the CLL form.
<
> (For single cache block transactional memory, it might
> not be excessively inconvenient to pre-load all cache-block-external
> reads into registers and store any external writes after the
> commitment. If one is not guaranteeing atomicity for external
> accesses, such seems acceptable.)
>
> This same interface could be trivially extended to support a broader
> range of transactional memory subsets.
<
As I told MS: ASF is not transactional memory, but it can be used to
help build transactional memory. ESM has this same property.
<
> Adding escaping/non-atomic
> accesses would seem to require adding instructions beyond LL/SC
<
Not new instructions but simply defining the semantics of multiple
uses of those instructions.
<
> (with a new ISA one might be able to get away with using register
> names to annotate types of loads but that seems a bit scary; being
> able to mark stack and TLS as strictly thread local might be an
> alternative to allowing a limited use for logging and such with a
> benefit of sometimes reducing the monitored footprint).
>
> In my opinion, a small family of TM "sizes" could be defined
> (partially according to platform use — i.e., some hardware would
> not support certain features — partially for hardware optimization
> potential — communicating the features used might be problematic
> — partially for software optimization, zArch-like guaranteed completion
> "constrained transactions" could simplify software). It is not clear if
> single 'word' LL/SC is sufficiently cheaper to implement that any
> implementation would not support, e.g., 32-byte aligned granule
> monitoring.
<
In retrospect: one can EASILY* perform ATOMIC events on as many
cache lines as the processors miss buffer can have outstanding
at any point in time (typically around 8 for GBOoO machines).
<
(*) the EASY part is the HW, the hard part is the proper definition of
the semantics of the instructions under all possible conditions--
including: exceptions, interrupts, interference {light through withering}
cache effects,...
>
> By only specializing the transaction initializing load and the transaction
> ending store, function code could be shared with transactional memory
> and locks.
<
This is exactly what ESM and ASF did ! It got the instruction set out of
needing to add another ATOMIC instruction every processor release.
<
> (Supporting locks and TM by having an if-then-else at the
> start and end of a critical section may be unattractive. On the other
> hand, a TM critical section would typically need to check a lock and
> avoid using the TM method if the lock is held; this seems an ugly
> aspect of TM.)
>
> I suspect it would be useful to share code between TM and lock
> critical sections. Associating a lock name with a TM-attempted
> critical section would seem to facilitate reduced tracking (if software
> guaranteed that all contention used that lock name).
>
> Transaction nesting also seems to be a sore spot. Flattening the
> transaction is simpler for software (internal sub-transactions that
> can complete independently do not have to be memoized/tracked
> at the cost of having to be redone on an external failure).
>
> Supporting conflict count (like Mitch Alsup's Advanced Synchronization
> Faclity) might not be especially difficult given a full-word-sized
> return value. (I do wonder if multiple success values would be useful.
> I could imagine knowing that one's successful transaction killed
> another might be useful; that thread *might* be able to reduce the
> probability of future contention (choosing other tasks, choosing less
> efficient but less contentious ways of accomplishing tasks)
<
The thread that successfully got through the ATOMIC event should go on
and perform whatever work it was supposed to perform. The bonked
thread should look for ways to proactively avoid interference in the future..
This is where the count comes in. Given that thread k got bonked, it
understands interference is present. Given that each bonked thread gets
a unique count, then all bonked threads have a way to attempt another
ATOMIC event on a different unit of work which has a vastly higher
probability of success than everyone pounding on the front of the queue.
It is this proactivity that reduces the problem from BigO(n^3) to BigO(3)
when properly programmed.
<
> or
> reduce the cost of contention (migrating the task to reduce cache
> coherence overhead — which presents the thought of connection
> between performance monitoring and TM, even outside of transactions
> information about cache content sharing might be useful [a little like
> "informing loads", loads for which a miss could be detected to choose
> a different path]. Transactional memory hardware is a form of
> behavior monitoring which might be generalized in some useful
> manner.)
>
> (I/O interactions also face issues with transactional memory. Some of
> this seems to come from legacy conceptions/interfaces — e.g.,
> non-idempotent reads do not seem to be universally useful even for I/O
> devices. There may also be uses for allowing partial 'leakage' of data
> within a transaction between threads generally and not just I/O device
> accesses.)
>
> A large aspect of the challenge is defining interfaces which facilitate
> sharing in conceptualizations across diverse uses, support flexibility
> and wisdom in assigning work/responsibilities to hardware, system
> software, and application software components, and facilitate change
> (experimentation seems important for developing wisdom and
> costs and benefits change with time and use-goals).
> > The extra cost in the cache controller is holding the first line updates
> > separate until the release occurs in case it needs to roll back.
> > For the modified line it can either use a separate cache line buffer,
> > or if L2 is inclusive it can update L1 and either keep or toss that copy.
> If performance is not a problem (transactions almost always succeed
> or memory is fast), using main memory for recovery might be
> reasonable. Even for a 64-byte chunk, buffering the writes seems
> unlikely to be excessively expensive outside of the smallest
> microcontrollers. There are probably other uses for such buffer
> storage that would not have temporal overlap (perhaps a circular
> buffer that connects to an I/O device??) so that even some small
> microcontroller processors might not have a problem.
>
> (Microcontrollers present an interesting test of an interface's
> generality. However, a microcontroller need not support all features
> an application processor would be expected to support and may
> need to support some features — related to real-time guarantees, e.g., —
> that an application processor might not find useful. Software and
> conception sharing is useful, but compartmentalization of information
> is also useful.)
>
> With respect to this post thread's context, the distinction between
> atomic with respect to interrupts and atomic with respect to
> inter-thread communication may not — in my opinion — be as
> great as commonly conceived. Interrupts can be viewed as
> thread switches (or thread invocations with the current thread
> continuing in parallel, possibly speculatively).
>
> This also brings in the diversity among locks and more
> speculative transactional memory (this seems to be a design
> space). A traditional disabling of interrupts is analogous to a
> lock. I suspect there is potential for conceptual unification that
> could both extend understanding to a broader user base by
> sharing concepts across uses, provide a simpler conception,
> and possibly facilitate optimization opportunities in software
> and hardware.
>
> It seems quite possible for an "interrupt" not to disrupt the
> 'atomicity' of an operation just as two threads may run
> concurrently without necessarily causing a memory transaction
> to abort.
<
Remember, no interested 3rd party can see any (ANY) intermediate
state of the ATOMIC event. The interested 3rd party CAN be the ISR !!!
(trying to enqueue his little unit of work on the run queues.)
<
I came to the conclusion that attempting to provide the illusion of
ATOMICity over an interrupt, exception, trap... is "just asking for trouble.."
<
Also note: single stepping through an ATOMIC event and achieving
success cannot be allowed.
<
> For real-time uses, guarantees about resource
> availability/timeliness could be important; some threads
> currently designed to use interrupt-blocking might be able to
> run fine-grained interleaved with an interrupt handling thread
> while some might be so time-constrained that the critical
> thread needs all resources to meet the real-time guarantee.
> (This seems related to thread priority and general resource
> allocation. Again, an opportunity for conceptual unification.
> Lessons from economics might be applied to this aspect of
> resource management under communication/understanding
> constraints.)
>
> I have a similar view with respect to treating I/O agents as not
> fundamentally different from processing agents/threads.
> (Cache-coherent I/O is one already-taken step in this direction.)
> I am nearly clueless about how such could be designed and what
> the implications would be, but is seems an obvious path, especially
> with I/O-like accelerators, 'smart' network interfaces, complex
> storage devices, etc.
<
This simply means the bus interconnect uses coherent accesses
for all line-sized transfers. The tricky coherent protocols (patented)
get in the way here, too.
>
> (As a reader might guess, I had a block of free time with Internet
> access that coincided with being 'adequately caffeinated'.☺)
<
Adequately is severely under representing the caffenation level here.


Click here to read the complete article
Re: Safepoints

<9%IPI.2750$WG5.976@fx38.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19632&group=comp.arch#19632

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!feeder5.feed.usenet.farm!feeder1.feed.usenet.farm!feed.usenet.farm!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx38.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Safepoints
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me> <se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me> <1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me> <a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad> <segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad> <9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com>
In-Reply-To: <9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 56
Message-ID: <9%IPI.2750$WG5.976@fx38.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 08 Aug 2021 04:23:33 UTC
Date: Sun, 08 Aug 2021 00:22:56 -0400
X-Received-Bytes: 3723
 by: EricP - Sun, 8 Aug 2021 04:22 UTC

Paul A. Clayton wrote:
> On Friday, August 6, 2021 at 11:58:19 AM UTC-4, EricP wrote:
> [snip]
>> A few years ago I proposed an enhancement here which
>> allows LL/SC to multiple locations within a single cache line.
>>
>> LL - retains the lock if load is to the same cache line
>> SCH - store conditional and hold stores and retains lock
>> SCR - store conditional and release stores and releases lock
>> CLL - Clear lock
>
> One would not necessarily need SCH and one could merge SCR
> and CLL into ordinary SC — if one did not need escaping memory
> accesses. (For single cache block transactional memory, it might
> not be excessively inconvenient to pre-load all cache-block-external
> reads into registers and store any external writes after the
> commitment. If one is not guaranteeing atomicity for external
> accesses, such seems acceptable.)

The LL reads the cache line and marks it locked.
If another LL is to a different line it releases the first lock
and applies it to the second line.
If the SCH or SCR are to a different line than the LL then they fail.
SCR is needed to distinguish when the store and lock sequence ends.

The reason I added CLL, and its use is optional, is that LL reads
the line exclusive, then pins it in cache for some minimum number
of clocks, to prevent ping-ponging and to ensure forward progress
for at least one contender.

For original LL/SC that line pin clock count is small but it only
has to cover one LL, maybe a simple calculation, and one SC.

If one allows multiple LL, presumable more calculations,
and multiple SCH and an SCR (maybe 8 loads and 8 stores),
then that minimum line pin window must be expanded accordingly,
again to ensure forward progress for at least one contender.

Having a line pinned longer than necessary could have performance
implications for some applications. CLL gives applications that
decide to bail out of a multiple-update sequence a way to clear
the lock and unpin the line as soon as possible.
If they don't use CLL then the lock clock times out
and unpins the line eventually anyway.

While the line is held exclusive and pinned it cannot be accessed
by any other cpu or DMA device. Also the pinning might be (cheaply)
implemented by simply stopping listening to the inbound cache
coherence messages for a while, which could have system wide
performance implications beyond the single pinned line.

This multiple update window size likely precludes the use of the
cheap approach, and require more complicated message handling to
pick off messages for the pinned line.

Re: Safepoints

<senrfc$mkj$2@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19633&group=comp.arch#19633

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!ux6ld97kLXxG8kVFFLnoWg.user.46.165.242.75.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: Safepoints
Date: Sat, 7 Aug 2021 22:52:43 -0700
Organization: Aioe.org NNTP Server
Message-ID: <senrfc$mkj$2@gioia.aioe.org>
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me>
<seeb8j$f0t$1@dont-email.me> <1JzOI.8$xM2.7@fx22.iad>
<seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me>
<5hHOI.1976$lK.1523@fx41.iad> <segqla$4k9$1@dont-email.me>
<r_cPI.1442$yW1.495@fx08.iad>
<9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com>
<9%IPI.2750$WG5.976@fx38.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="23187"; posting-host="ux6ld97kLXxG8kVFFLnoWg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-US
 by: Chris M. Thomasson - Sun, 8 Aug 2021 05:52 UTC

On 8/7/2021 9:22 PM, EricP wrote:
> Paul A. Clayton wrote:
>> On Friday, August 6, 2021 at 11:58:19 AM UTC-4, EricP wrote:
>> [snip]
>>> A few years ago I proposed an enhancement here which
>>> allows LL/SC to multiple locations within a single cache line.
>>>
>>> LL - retains the lock if load is to the same cache line
>>> SCH - store conditional and hold stores and retains lock
>>> SCR - store conditional and release stores and releases lock
>>> CLL - Clear lock
>>
>> One would not necessarily need SCH and one could merge SCR
>> and CLL into ordinary SC — if one did not need escaping memory
>> accesses. (For single cache block transactional memory, it might not
>> be excessively inconvenient to pre-load all cache-block-external reads
>> into registers and store any external writes after the commitment. If
>> one is not guaranteeing atomicity for external accesses, such seems
>> acceptable.)
>
> The LL reads the cache line and marks it locked.
> If another LL is to a different line it releases the first lock
> and applies it to the second line.
> If the SCH or SCR are to a different line than the LL then they fail.
> SCR is needed to distinguish when the store and lock sequence ends.
[...]

I remember having to pay attention to the reservation granule for LL/SC.
Iirc, some implementations can be prone to live lock. False sharing the
reservation granule was bad mojo!

Re: Safepoints

<seoomh$sac$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19641&group=comp.arch#19641

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.mixmin.net!aioe.org!pIhVuqI7njB9TMV+aIPpbg.user.46.165.242.91.POSTED!not-for-mail
From: terje.ma...@tmsw.no (Terje Mathisen)
Newsgroups: comp.arch
Subject: Re: Safepoints
Date: Sun, 8 Aug 2021 16:11:29 +0200
Organization: Aioe.org NNTP Server
Message-ID: <seoomh$sac$1@gioia.aioe.org>
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me>
<seeb8j$f0t$1@dont-email.me> <1JzOI.8$xM2.7@fx22.iad>
<seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me>
<5hHOI.1976$lK.1523@fx41.iad> <segqla$4k9$1@dont-email.me>
<r_cPI.1442$yW1.495@fx08.iad>
<9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com>
<30b55b24-ba70-4da4-8bc9-a37d9503df82n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="29004"; posting-host="pIhVuqI7njB9TMV+aIPpbg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101
Firefox/60.0 SeaMonkey/2.53.8.1
X-Notice: Filtered by postfilter v. 0.9.2
 by: Terje Mathisen - Sun, 8 Aug 2021 14:11 UTC

MitchAlsup wrote:
> The thread that successfully got through the ATOMIC event should go on
> and perform whatever work it was supposed to perform. The bonked
> thread should look for ways to proactively avoid interference in the future.
> This is where the count comes in. Given that thread k got bonked, it
> understands interference is present. Given that each bonked thread gets
> a unique count, then all bonked threads have a way to attempt another
> ATOMIC event on a different unit of work which has a vastly higher
> probability of success than everyone pounding on the front of the queue.
> It is this proactivity that reduces the problem from BigO(n^3) to BigO(3)
> when properly programmed.

This is exactly the same algorithm we used to fix the "life or death"
class bug in the surgical alarm setup on the latest Oslo region hospital:

With they had more than 1200 active devices, then after a server room
(one of three) power cycle, all the devices connected to that server
room would try to reconnect at the same time, and this generated enough
load that none of them could stay connected even if they got in. The
workaround was to add a single new response type packet to the
(re-)connect logic.

If the server detected that it was getting close to the maximum number
of simultaneous connection attempts, it would calculate when enough of
them would be finished and send back a single response saying "Buzy now,
try again in N seconds". At this point the client would wait in line,
counting down the seconds, and then on the second attempt it was
guaranteed to get serviced.

Like Mitch notes here, the only requirement is the ability to send back
some sort of queue/wait token and then the clients can cooperate on
turning a horrible contention issue into an orderly queue.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Re: Safepoints

<49455ffc-731e-4c59-8f81-b1bdf0978e66n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19646&group=comp.arch#19646

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:1349:: with SMTP id f9mr16484085qtj.16.1628438183164;
Sun, 08 Aug 2021 08:56:23 -0700 (PDT)
X-Received: by 2002:a05:6830:31a4:: with SMTP id q4mr11900807ots.82.1628438182922;
Sun, 08 Aug 2021 08:56:22 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 8 Aug 2021 08:56:22 -0700 (PDT)
In-Reply-To: <seoomh$sac$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1de5:284a:9e56:52df;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1de5:284a:9e56:52df
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me>
<1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad>
<segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad>
<9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com> <30b55b24-ba70-4da4-8bc9-a37d9503df82n@googlegroups.com>
<seoomh$sac$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <49455ffc-731e-4c59-8f81-b1bdf0978e66n@googlegroups.com>
Subject: Re: Safepoints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 08 Aug 2021 15:56:23 +0000
Content-Type: text/plain; charset="UTF-8"
 by: MitchAlsup - Sun, 8 Aug 2021 15:56 UTC

On Sunday, August 8, 2021 at 9:11:33 AM UTC-5, Terje Mathisen wrote:
> MitchAlsup wrote:
> > The thread that successfully got through the ATOMIC event should go on
> > and perform whatever work it was supposed to perform. The bonked
> > thread should look for ways to proactively avoid interference in the future.
> > This is where the count comes in. Given that thread k got bonked, it
> > understands interference is present. Given that each bonked thread gets
> > a unique count, then all bonked threads have a way to attempt another
> > ATOMIC event on a different unit of work which has a vastly higher
> > probability of success than everyone pounding on the front of the queue.
> > It is this proactivity that reduces the problem from BigO(n^3) to BigO(3)
> > when properly programmed.
> This is exactly the same algorithm we used to fix the "life or death"
> class bug in the surgical alarm setup on the latest Oslo region hospital:
>
> With they had more than 1200 active devices, then after a server room
> (one of three) power cycle, all the devices connected to that server
> room would try to reconnect at the same time, and this generated enough
> load that none of them could stay connected even if they got in. The
> workaround was to add a single new response type packet to the
> (re-)connect logic.
>
> If the server detected that it was getting close to the maximum number
> of simultaneous connection attempts, it would calculate when enough of
> them would be finished and send back a single response saying "Buzy now,
> try again in N seconds". At this point the client would wait in line,
> counting down the seconds, and then on the second attempt it was
> guaranteed to get serviced.
<
This was the Phone problem after a NYC blackout in the mid 1980s,
After the server regained power all the other nodes attempted to re-
connect and flooded the server reconnect to fail. The system would
reboot properly in "the lab" but in the real world, the delay from node to
node was variable (wire length and speed of light). The fix was to add
(something like) 10ms to the timeout clock of each node and the
system came back up. The "lab" system had the timeout clock set
to work only at topologically short distances not real world distances.
<
{Story told to be by a person a Bell Northern Research}
>
> Like Mitch notes here, the only requirement is the ability to send back
> some sort of queue/wait token and then the clients can cooperate on
> turning a horrible contention issue into an orderly queue.
>
> Terje
>
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Re: Safepoints

<qWTPI.3160$WG5.2339@fx38.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19653&group=comp.arch#19653

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!4.us.feeder.erje.net!2.eu.feeder.erje.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx38.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Safepoints
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me> <se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me> <1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me> <a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad> <segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad> <9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com> <9%IPI.2750$WG5.976@fx38.iad> <senrfc$mkj$2@gioia.aioe.org>
In-Reply-To: <senrfc$mkj$2@gioia.aioe.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 54
Message-ID: <qWTPI.3160$WG5.2339@fx38.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Sun, 08 Aug 2021 16:49:26 UTC
Date: Sun, 08 Aug 2021 12:49:18 -0400
X-Received-Bytes: 3658
 by: EricP - Sun, 8 Aug 2021 16:49 UTC

Chris M. Thomasson wrote:
> On 8/7/2021 9:22 PM, EricP wrote:
>> Paul A. Clayton wrote:
>>> On Friday, August 6, 2021 at 11:58:19 AM UTC-4, EricP wrote:
>>> [snip]
>>>> A few years ago I proposed an enhancement here which
>>>> allows LL/SC to multiple locations within a single cache line.
>>>>
>>>> LL - retains the lock if load is to the same cache line
>>>> SCH - store conditional and hold stores and retains lock
>>>> SCR - store conditional and release stores and releases lock
>>>> CLL - Clear lock
>>>
>>> One would not necessarily need SCH and one could merge SCR
>>> and CLL into ordinary SC — if one did not need escaping memory
>>> accesses. (For single cache block transactional memory, it might not
>>> be excessively inconvenient to pre-load all cache-block-external
>>> reads into registers and store any external writes after the
>>> commitment. If one is not guaranteeing atomicity for external
>>> accesses, such seems acceptable.)
>>
>> The LL reads the cache line and marks it locked.
>> If another LL is to a different line it releases the first lock
>> and applies it to the second line.
>> If the SCH or SCR are to a different line than the LL then they fail.
>> SCR is needed to distinguish when the store and lock sequence ends.
> [...]
>
> I remember having to pay attention to the reservation granule for LL/SC.
> Iirc, some implementations can be prone to live lock. False sharing the
> reservation granule was bad mojo!

The temporary pinning prevents livelock (ping-pong) between actual
contenders and aborts due to false sharing.
It ensures that one of N contenders succeeds (in the lock limit window).

The local cache controller does not control who gets exclusive access
to the line so at that level there is no way to know which contender
will succeed, and there is no fair ordering, and no upper bound on
number of LL/SC retries or delay of any individual.

However if the cache uses directory coherence controllers it ensures
that only one hand-off of exclusive ownership occurs at once,
and the directory controller should have an inbound request queue
that is serviced in FIFO order for each line.

Taken together, the local pinning and coherence directory request FIFO
should give fair ordering and guaranteed forward progress for all
for exclusive access to one cache line.

If the coherence protocol uses a shared snoopy bus then there is no
FIFO queue controlling ownership order and that guarantee disappears.

Re: Safepoints

<sepcgc$1nks$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19663&group=comp.arch#19663

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!ux6ld97kLXxG8kVFFLnoWg.user.46.165.242.75.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: Safepoints
Date: Sun, 8 Aug 2021 12:49:29 -0700
Organization: Aioe.org NNTP Server
Message-ID: <sepcgc$1nks$1@gioia.aioe.org>
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me>
<seeb8j$f0t$1@dont-email.me> <1JzOI.8$xM2.7@fx22.iad>
<seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me>
<5hHOI.1976$lK.1523@fx41.iad> <segqla$4k9$1@dont-email.me>
<r_cPI.1442$yW1.495@fx08.iad>
<9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com>
<9%IPI.2750$WG5.976@fx38.iad> <senrfc$mkj$2@gioia.aioe.org>
<qWTPI.3160$WG5.2339@fx38.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="56988"; posting-host="ux6ld97kLXxG8kVFFLnoWg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-US
 by: Chris M. Thomasson - Sun, 8 Aug 2021 19:49 UTC

On 8/8/2021 9:49 AM, EricP wrote:
> Chris M. Thomasson wrote:
>> On 8/7/2021 9:22 PM, EricP wrote:
>>> Paul A. Clayton wrote:
>>>> On Friday, August 6, 2021 at 11:58:19 AM UTC-4, EricP wrote:
>>>> [snip]
>>>>> A few years ago I proposed an enhancement here which
>>>>> allows LL/SC to multiple locations within a single cache line.
>>>>>
>>>>> LL - retains the lock if load is to the same cache line
>>>>> SCH - store conditional and hold stores and retains lock
>>>>> SCR - store conditional and release stores and releases lock
>>>>> CLL - Clear lock
>>>>
>>>> One would not necessarily need SCH and one could merge SCR
>>>> and CLL into ordinary SC — if one did not need escaping memory
>>>> accesses. (For single cache block transactional memory, it might not
>>>> be excessively inconvenient to pre-load all cache-block-external
>>>> reads into registers and store any external writes after the
>>>> commitment. If one is not guaranteeing atomicity for external
>>>> accesses, such seems acceptable.)
>>>
>>> The LL reads the cache line and marks it locked.
>>> If another LL is to a different line it releases the first lock
>>> and applies it to the second line.
>>> If the SCH or SCR are to a different line than the LL then they fail.
>>> SCR is needed to distinguish when the store and lock sequence ends.
>> [...]
>>
>> I remember having to pay attention to the reservation granule for
>> LL/SC. Iirc, some implementations can be prone to live lock. False
>> sharing the reservation granule was bad mojo!
>
> The temporary pinning prevents livelock (ping-pong) between actual
> contenders and aborts due to false sharing.
> It ensures that one of N contenders succeeds (in the lock limit window).
>
> The local cache controller does not control who gets exclusive access
> to the line so at that level there is no way to know which contender
> will succeed, and there is no fair ordering, and no upper bound on
> number of LL/SC retries or delay of any individual.
>
> However if the cache uses directory coherence controllers it ensures
> that only one hand-off of exclusive ownership occurs at once,
> and the directory controller should have an inbound request queue
> that is serviced in FIFO order for each line.
>
> Taken together, the local pinning and coherence directory request FIFO
> should give fair ordering and guaranteed forward progress for all
> for exclusive access to one cache line.
>
> If the coherence protocol uses a shared snoopy bus then there is no
> FIFO queue controlling ownership order and that guarantee disappears.
>
>

True. Fwiw, iirc, I tried to simulate a live lock using DWCAS where a
thread would try to increment the high part of the counter, and a shit
load of other threads would just mutate the low part of the counter
willy nilly. It would definitely increase the failure rate!

;^)

Re: Safepoints

<a011f551-60f8-4850-a6ef-767bd11c261an@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19665&group=comp.arch#19665

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:6af:: with SMTP id s15mr3378228qvz.52.1628454579367;
Sun, 08 Aug 2021 13:29:39 -0700 (PDT)
X-Received: by 2002:a9d:5c86:: with SMTP id a6mr1201620oti.329.1628454579133;
Sun, 08 Aug 2021 13:29:39 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Sun, 8 Aug 2021 13:29:38 -0700 (PDT)
In-Reply-To: <sepcgc$1nks$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:1de5:284a:9e56:52df;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:1de5:284a:9e56:52df
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me>
<1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad>
<segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad>
<9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com> <9%IPI.2750$WG5.976@fx38.iad>
<senrfc$mkj$2@gioia.aioe.org> <qWTPI.3160$WG5.2339@fx38.iad> <sepcgc$1nks$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a011f551-60f8-4850-a6ef-767bd11c261an@googlegroups.com>
Subject: Re: Safepoints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sun, 08 Aug 2021 20:29:39 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Sun, 8 Aug 2021 20:29 UTC

On Sunday, August 8, 2021 at 2:49:35 PM UTC-5, Chris M. Thomasson wrote:
> On 8/8/2021 9:49 AM, EricP wrote:
> > Chris M. Thomasson wrote:
> >> On 8/7/2021 9:22 PM, EricP wrote:
> >>> Paul A. Clayton wrote:
> >>>> On Friday, August 6, 2021 at 11:58:19 AM UTC-4, EricP wrote:
> >>>> [snip]
> >>>>> A few years ago I proposed an enhancement here which
> >>>>> allows LL/SC to multiple locations within a single cache line.
> >>>>>
> >>>>> LL - retains the lock if load is to the same cache line
> >>>>> SCH - store conditional and hold stores and retains lock
> >>>>> SCR - store conditional and release stores and releases lock
> >>>>> CLL - Clear lock
> >>>>
> >>>> One would not necessarily need SCH and one could merge SCR
> >>>> and CLL into ordinary SC — if one did not need escaping memory
> >>>> accesses. (For single cache block transactional memory, it might not
> >>>> be excessively inconvenient to pre-load all cache-block-external
> >>>> reads into registers and store any external writes after the
> >>>> commitment. If one is not guaranteeing atomicity for external
> >>>> accesses, such seems acceptable.)
> >>>
> >>> The LL reads the cache line and marks it locked.
> >>> If another LL is to a different line it releases the first lock
> >>> and applies it to the second line.
> >>> If the SCH or SCR are to a different line than the LL then they fail.
> >>> SCR is needed to distinguish when the store and lock sequence ends.
> >> [...]
> >>
> >> I remember having to pay attention to the reservation granule for
> >> LL/SC. Iirc, some implementations can be prone to live lock. False
> >> sharing the reservation granule was bad mojo!
> >
> > The temporary pinning prevents livelock (ping-pong) between actual
> > contenders and aborts due to false sharing.
> > It ensures that one of N contenders succeeds (in the lock limit window)..
> >
> > The local cache controller does not control who gets exclusive access
> > to the line so at that level there is no way to know which contender
> > will succeed, and there is no fair ordering, and no upper bound on
> > number of LL/SC retries or delay of any individual.
> >
> > However if the cache uses directory coherence controllers it ensures
> > that only one hand-off of exclusive ownership occurs at once,
> > and the directory controller should have an inbound request queue
> > that is serviced in FIFO order for each line.
> >
> > Taken together, the local pinning and coherence directory request FIFO
> > should give fair ordering and guaranteed forward progress for all
> > for exclusive access to one cache line.
> >
> > If the coherence protocol uses a shared snoopy bus then there is no
> > FIFO queue controlling ownership order and that guarantee disappears.
> >
> >
> True. Fwiw, iirc, I tried to simulate a live lock using DWCAS where a
> thread would try to increment the high part of the counter, and a shit
> load of other threads would just mutate the low part of the counter
> willy nilly. It would definitely increase the failure rate!
<
I worked on a system where there were 3 buses; each one has a numerical
proof that it was coherent; and connected together as a coherent system.
Our processors were crashing about once a month running transaction
processing in the lab (not ready to be sold). A senior engineer had already
spent 6 months with a team of other company engineers trying to solve
the problem.
<
I found the problem the first day (logical analyzer trace) but did not recognize
that I had its finger prints until the following day.
<
The numerics proof DEPENDED on the bus response to snooped transactions
to be at least 4 cycles of delay, and our system was responding in cycle 3.
Our competitors system was responding in cycle 5.
<
BTW: I found the problem by treating the system as a physics experiment.
I was known when I arrived that the problem was occurring when the place
in the OS that changes the page tables takes an interrupt. So I had one of
their team engineers write a task that when it received control it immediately
ask to quit its time slice and go back to sleep. We fired up 24 such tasks
and the time delay from system crash went from once a month to milliseconds!
We got a trace of the problem on the third such crash, and found the timing
irregularity the next day by overlaying the logic analyzer traces and detecting
that there was a cycle when a CPU was table walking when the bus displayed
incoherent behavior.
<
It still took 2 more days to figure out how to alter the FPGA that sequenced
the bus interconnect.
<
Did I mention this was a 1 way ticket on Dec 19 and I got home by Christmas !
>
> ;^)

Re: Safepoints

<D5wQI.2739$yW1.1705@fx08.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19709&group=comp.arch#19709

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx08.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Safepoints
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me> <se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me> <1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me> <a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad> <segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad> <9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com> <9%IPI.2750$WG5.976@fx38.iad> <senrfc$mkj$2@gioia.aioe.org> <qWTPI.3160$WG5.2339@fx38.iad> <sepcgc$1nks$1@gioia.aioe.org>
In-Reply-To: <sepcgc$1nks$1@gioia.aioe.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 23
Message-ID: <D5wQI.2739$yW1.1705@fx08.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 10 Aug 2021 14:32:03 UTC
Date: Tue, 10 Aug 2021 10:31:53 -0400
X-Received-Bytes: 2133
 by: EricP - Tue, 10 Aug 2021 14:31 UTC

Chris M. Thomasson wrote:
>
> True. Fwiw, iirc, I tried to simulate a live lock using DWCAS where a
> thread would try to increment the high part of the counter, and a shit
> load of other threads would just mutate the low part of the counter
> willy nilly. It would definitely increase the failure rate!
>
> ;^)

Just out of curiosity, what was this test run on, x86? x64?
What did the test do and what result did you see?
If x86/x64, did you put a LCK prefix on the CMPXCHG8B/CMPXCHG16B
or just leave them naked (non-atomic)?

Note that Intel documents CMPXCHG8B and CMPXCHG16B as
"To simplify the interface to the processor’s bus, the destination operand
receives a write cycle without regard to the result of the comparison."

Without a LCK prefix a CMPXCHG could in theory interleave with another
but clobber the other changes and one might not see this happen
unless you explicitly test for it.

Re: Safepoints

<35a020f8-6228-4b68-a39c-d89e8475bb90n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19713&group=comp.arch#19713

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:2912:: with SMTP id m18mr8334375qkp.331.1628618289302;
Tue, 10 Aug 2021 10:58:09 -0700 (PDT)
X-Received: by 2002:a9d:86e:: with SMTP id 101mr1844097oty.114.1628618289020;
Tue, 10 Aug 2021 10:58:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!4.us.feeder.erje.net!2.eu.feeder.erje.net!feeder.erje.net!fdn.fr!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 10 Aug 2021 10:58:08 -0700 (PDT)
In-Reply-To: <30b55b24-ba70-4da4-8bc9-a37d9503df82n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=64.26.99.248; posting-account=6JNn0QoAAAD-Scrkl0ClrfutZTkrOS9S
NNTP-Posting-Host: 64.26.99.248
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me>
<1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad>
<segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad>
<9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com> <30b55b24-ba70-4da4-8bc9-a37d9503df82n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <35a020f8-6228-4b68-a39c-d89e8475bb90n@googlegroups.com>
Subject: Re: Safepoints
From: paaroncl...@gmail.com (Paul A. Clayton)
Injection-Date: Tue, 10 Aug 2021 17:58:09 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Paul A. Clayton - Tue, 10 Aug 2021 17:58 UTC

On Saturday, August 7, 2021 at 5:39:18 PM UTC-4, MitchAlsup wrote:
> On Saturday, August 7, 2021 at 4:10:33 PM UTC-5, Paul A. Clayton wrote:
[snip]
> Side note:
> My 66000 has a lock bit in the 2-register memory instructions.
> All of these instructions have a lock-bit. When a LD instruction
> has the lock bit set, the cache line of that load is "participating"
> an ATOMIC event. When a ST instruction has the lock bit set
> that ST is the last instruction of the ATOMIC event. The CLL
> is equivalent to a ST instruction to a non-participating cache line
> with the lock bit set. So while STH, STR, and CLL are not instructions
> they are present within the endoding already present.

I am almost surprised, Mitch, that you did not use a
PRED-like instruction modifier to extend the semantics of
memory access instructions in its shadow.☺ In addition to
allowing any load/store instruction to have the extra
annotation applied independent of ordinary instruction
encoding constraints, the encoding freedom might facilitate
more extensive annotations, e.g., by adding an optional
32-bit or 64-bit "immediate" to the LOCK instruction
modifier.)

I was thinking of a similar "dummy" SC to replace CLL (clear
lock), inspired by the dummy stores used to implement
memory-based conditional move (where the register CMOV/
SELECT instruction sets up/retains the store address or
retains/sets up a dummy address). The main problem would be
forcing a failure so that the previous stores do not become
visible. For a single cache block reservation this would be
trivial, but extending the reservation to an arbitrary set
of addresses makes it difficult to be sure that the SC will
fail. On a general register machine, one might reserve one
register name to indicate a SC that always fails. If the ISA
has a zero register, using this as a base register for a SC might
be defined as always failing. Using an architecturally defined
stack pointer or thread-local-storage pointer as a base address
might work, but CLL does seem less crufty and potentially
constraining.

>> One would not necessarily need SCH and one could merge SCR
>> and CLL into ordinary SC — if one did not need escaping memory
>> accesses.
>
> 95%+ of ATOMIC events do not need to use the CLL form.

Yet that leaves 5-% that desire an abort mechanism. For ESM,
ensuring an atomicity failure is easy since the working set is
pre-defined. For a less constrained mechanism, inducing a
failure without an explicit abort might be more complicated.

>> (For single cache block transactional memory, it might
>> not be excessively inconvenient to pre-load all cache-block-external
>> reads into registers and store any external writes after the
>> commitment. If one is not guaranteeing atomicity for external
>> accesses, such seems acceptable.)
>>
>> This same interface could be trivially extended to support a broader
>> range of transactional memory subsets.
>
> As I told MS: ASF is not transactional memory, but it can be used to
> help build transactional memory. ESM has this same property.

I consider LL/SC transactional memory (with the limit of
single load read set and single store write set with the
read set identical to the write set), but this is a
classification/nomenclature issue.

One problem with not having hardware support for more
extensive atomic operations is the cost of software tracking
read and write sets. Hardware can exploit the presence of
cache tags to reduce the overhead of tracking participation
and conservative filters might be more efficiently
implemented in hardware. This seems to be a little similar
to the argument for software cache coherence; the greater
flexibility of software can be advantageous. However,
specialization is often more efficient even when the fit is
not as good.

(There might be a way for hardware to cooperate with
software, even allowing software to more flexibly use such
features as a conservative filter. The distinction between
hardware and software can get fuzzy; I seem to recall
reading that some coherence management has been handled by a
special-purpose processor running a small set of software —
effectively the hardware configuration/control state extends
into an instruction memory.)

>> Adding escaping/non-atomic
>> accesses would seem to require adding instructions beyond LL/SC
>
> Not new instructions but simply defining the semantics of multiple
> uses of those instructions.

This assumes on has memory access instructions with special
annotations already (ASF used x86's LOCK prefix to indicate
participation, IIRC; ESM has 'lock' bits).

[snip]
>> By only specializing the transaction initializing load and the transaction
>> ending store, function code could be shared with transactional memory
>> and locks.
>
> This is exactly what ESM and ASF did ! It got the instruction set out of
> needing to add another ATOMIC instruction every processor release.

If I understand correctly, one can run an ESM critical
section inside of a software lock for that section (and the
atomicity will be guaranteed by the software lock), but this
adds some unnecessary tracking overhead since the software
lock guarantees exclusivity.

I am thinking about something like lock elision. This seems
mainly outside of ESM's design goal which seems to be about
providing a flexible set of atomic operations with
sufficient constraints to support speculation and stronger
forward progress assurances.

[snip]
>> Supporting conflict count (like Mitch Alsup's Advanced Synchronization
>> Faclity) might not be especially difficult given a full-word-sized
>> return value. (I do wonder if multiple success values would be useful.
>> I could imagine knowing that one's successful transaction killed
>> another might be useful; that thread *might* be able to reduce the
>> probability of future contention (choosing other tasks, choosing less
>> efficient but less contentious ways of accomplishing tasks)
>
> The thread that successfully got through the ATOMIC event should go on
> and perform whatever work it was supposed to perform. The bonked
> thread should look for ways to proactively avoid interference in the future.
> This is where the count comes in. Given that thread k got bonked, it
> understands interference is present. Given that each bonked thread gets
> a unique count, then all bonked threads have a way to attempt another
> ATOMIC event on a different unit of work which has a vastly higher
> probability of success than everyone pounding on the front of the queue.
> It is this proactivity that reduces the problem from BigO(n^3) to BigO(3)
> when properly programmed.

While you have made a strong case for failures providing
more information than just failure (or failure and cannot
succeed on retry), my (small) point was that success might
benefit from having more information than just "success".
There seems to be a perspective that successes cannot use
metadata comparable to an error code; I am not convinced
that this perspective is universally true, especially if
cooperation and "playing nice" is desirable. (This brings to
mind the failure of the Unix nice command. In part there
seems to have been no benefit in being nice — such as
rewarding frugal resource use by lower the price of the
resource if community demand is lowered and saving 'money'
for later resource purchases of this or other resources.)

[snip]
> Remember, no interested 3rd party can see any (ANY) intermediate
> state of the ATOMIC event. The interested 3rd party CAN be the ISR !!!
> (trying to enqueue his little unit of work on the run queues.)

This is true for an *atomic* operation, but one might
imagine a more complex interface that includes leaking
information out of an in-progress quasi-atomic operation.

> I came to the conclusion that attempting to provide the illusion of
> ATOMICity over an interrupt, exception, trap... is "just asking for trouble."

I can very much understand limiting early implementations
in that way. I would not want to make that a feature that is
guaranteed never to be implemented. Some unexpected use case
or clever implementation mechanism might arise that makes
such attractive. (Of course, ISA guarantees are not really
absolute; one can simply declare that a new ISA is being
provided.)

> Also note: single stepping through an ATOMIC event and achieving
> success cannot be allowed.

While that does seem to be generally "asking for trouble", I
am not certain that single stepping is impossible — not
having given it that much thought and, especially, not
having the best grasp of synchronization issues.

At minimum, it seems one could provide a trace of processor
state for a successful atomic event, allowing something
similar to single stepping. For ESM, I think one could even
provide a "lock" that prevents other threads from
introducing conflicts while post-atomic single-stepping;
the eight blocks could be kept in a special reserved state
enforcing the "lock". I do not have a clue whether that
would be useful.


Click here to read the complete article
Re: Safepoints

<7f7207de-30cf-4f82-845c-44cdfb2853f3n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19715&group=comp.arch#19715

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:9d3:: with SMTP id 202mr23117862qkj.369.1628621635704;
Tue, 10 Aug 2021 11:53:55 -0700 (PDT)
X-Received: by 2002:a54:4194:: with SMTP id 20mr10388185oiy.78.1628621635400;
Tue, 10 Aug 2021 11:53:55 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 10 Aug 2021 11:53:55 -0700 (PDT)
In-Reply-To: <35a020f8-6228-4b68-a39c-d89e8475bb90n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7cec:d983:d472:176b;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7cec:d983:d472:176b
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me>
<1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad>
<segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad>
<9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com> <30b55b24-ba70-4da4-8bc9-a37d9503df82n@googlegroups.com>
<35a020f8-6228-4b68-a39c-d89e8475bb90n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7f7207de-30cf-4f82-845c-44cdfb2853f3n@googlegroups.com>
Subject: Re: Safepoints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 10 Aug 2021 18:53:55 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Tue, 10 Aug 2021 18:53 UTC

On Tuesday, August 10, 2021 at 12:58:10 PM UTC-5, Paul A. Clayton wrote:
> On Saturday, August 7, 2021 at 5:39:18 PM UTC-4, MitchAlsup wrote:
> > On Saturday, August 7, 2021 at 4:10:33 PM UTC-5, Paul A. Clayton wrote:
> [snip]
> > Side note:
> > My 66000 has a lock bit in the 2-register memory instructions.
> > All of these instructions have a lock-bit. When a LD instruction
> > has the lock bit set, the cache line of that load is "participating"
> > an ATOMIC event. When a ST instruction has the lock bit set
> > that ST is the last instruction of the ATOMIC event. The CLL
> > is equivalent to a ST instruction to a non-participating cache line
> > with the lock bit set. So while STH, STR, and CLL are not instructions
> > they are present within the endoding already present.
<
> I am almost surprised, Mitch, that you did not use a
> PRED-like instruction modifier to extend the semantics of
> memory access instructions in its shadow.☺ In addition to
> allowing any load/store instruction to have the extra
> annotation applied independent of ordinary instruction
> encoding constraints, the encoding freedom might facilitate
> more extensive annotations, e.g., by adding an optional
> 32-bit or 64-bit "immediate" to the LOCK instruction
> modifier.)
<
To your credit, you are the first person to have noted this.
<
But, ESM grew out of ASF and ASF had no predicates, indeed, when
I did the ESM stuff (circa 2007) I did not have a PRED in my ISA.
<
You have a sharp thinking cap !!
<
In retrospect, 8 instructions is not enough to perform some of the
things ESM is capable of doing.
>
> I was thinking of a similar "dummy" SC to replace CLL (clear
> lock), inspired by the dummy stores used to implement
> memory-based conditional move (where the register CMOV/
> SELECT instruction sets up/retains the store address or
> retains/sets up a dummy address). The main problem would be
> forcing a failure so that the previous stores do not become
> visible. For a single cache block reservation this would be
> trivial, but extending the reservation to an arbitrary set
<
Arbitrary: yes, but a fixed (smallish) number of them.
<
> of addresses makes it difficult to be sure that the SC will
> fail. On a general register machine, one might reserve one
> register name to indicate a SC that always fails. If the ISA
> has a zero register, using this as a base register for a SC might
> be defined as always failing. Using an architecturally defined
> stack pointer or thread-local-storage pointer as a base address
> might work, but CLL does seem less crufty and potentially
> constraining.
<
Whatever mechanics fits with the rest of your architecture can be
made to work adequately. I chose to do this all in the Miss Buffer.
<
> >> One would not necessarily need SCH and one could merge SCR
> >> and CLL into ordinary SC — if one did not need escaping memory
> >> accesses.
> >
> > 95%+ of ATOMIC events do not need to use the CLL form.
<
> Yet that leaves 5-% that desire an abort mechanism. For ESM,
> ensuring an atomicity failure is easy since the working set is
> pre-defined. For a less constrained mechanism, inducing a
> failure without an explicit abort might be more complicated.
<
In ESM, failures are detected in the Miss Buffer and the Miss Buffer
has been given the ability to interrupt the current instruction stream.
Such an interruption causes control to transfer to the failure control
point but does not transfer control to an actual interrupt. So the place
where the SNOOP is detected to "have interfered" with the ATOMIC
event is where the "interrupt" is raised.
<
Now: setting up that control point:: When the first inbound memory reference
(LD or PREfetch) with the lock bit is encountered, its address becomes the
failure control point. If a subsequent Branch-on-the-condition-of-interference
is encountered, the target address becomes the failure control point. All
STs and PUSHes to participating cache lines are deferred until the success point.
<
> >> (For single cache block transactional memory, it might
> >> not be excessively inconvenient to pre-load all cache-block-external
> >> reads into registers and store any external writes after the
> >> commitment. If one is not guaranteeing atomicity for external
> >> accesses, such seems acceptable.)
> >>
> >> This same interface could be trivially extended to support a broader
> >> range of transactional memory subsets.
> >
> > As I told MS: ASF is not transactional memory, but it can be used to
> > help build transactional memory. ESM has this same property.
<
> I consider LL/SC transactional memory (with the limit of
> single load read set and single store write set with the
> read set identical to the write set), but this is a
> classification/nomenclature issue.
<
{{eyes wider open than usual and staring with a gaze}} Interesting
>
> One problem with not having hardware support for more
> extensive atomic operations is the cost of software tracking
> read and write sets. Hardware can exploit the presence of
> cache tags to reduce the overhead of tracking participation
> and conservative filters might be more efficiently
> implemented in hardware. This seems to be a little similar
> to the argument for software cache coherence; the greater
> flexibility of software can be advantageous. However,
> specialization is often more efficient even when the fit is
> not as good.
>
> (There might be a way for hardware to cooperate with
> software, even allowing software to more flexibly use such
> features as a conservative filter. The distinction between
> hardware and software can get fuzzy; I seem to recall
> reading that some coherence management has been handled by a
> special-purpose processor running a small set of software —
> effectively the hardware configuration/control state extends
> into an instruction memory.)
<
ASF and ESM are, in essence, a way to get out of the game of inventing
more and more exotic synchronization instructions over time. SW can
create whatever ATOMIC primitives it desires and wrap them up in
subroutines, macros, or code to be inlined.
>
> >> Adding escaping/non-atomic
> >> accesses would seem to require adding instructions beyond LL/SC
> >
> > Not new instructions but simply defining the semantics of multiple
> > uses of those instructions.
<
> This assumes on has memory access instructions with special
> annotations already (ASF used x86's LOCK prefix to indicate
> participation, IIRC; ESM has 'lock' bits).
<
I guess one should also include casting of these lock bits by means of
PRED-like instruction.
>
> [snip]
> >> By only specializing the transaction initializing load and the transaction
> >> ending store, function code could be shared with transactional memory
> >> and locks.
> >
> > This is exactly what ESM and ASF did ! It got the instruction set out of
> > needing to add another ATOMIC instruction every processor release.
<
> If I understand correctly, one can run an ESM critical
> section inside of a software lock for that section (and the
> atomicity will be guaranteed by the software lock), but this
> adds some unnecessary tracking overhead since the software
> lock guarantees exclusivity.
>
> I am thinking about something like lock elision. This seems
> mainly outside of ESM's design goal which seems to be about
> providing a flexible set of atomic operations with
> sufficient constraints to support speculation and stronger
> forward progress assurances.
<
Yes, at first blush, one can run ESM under a SW lock........but why ?
ESM is present to allow the ILLUSION of ATOMICity without ever
locking anything!
>
> [snip]
> >> Supporting conflict count (like Mitch Alsup's Advanced Synchronization
> >> Faclity) might not be especially difficult given a full-word-sized
> >> return value. (I do wonder if multiple success values would be useful.
> >> I could imagine knowing that one's successful transaction killed
> >> another might be useful; that thread *might* be able to reduce the
> >> probability of future contention (choosing other tasks, choosing less
> >> efficient but less contentious ways of accomplishing tasks)
> >
> > The thread that successfully got through the ATOMIC event should go on
> > and perform whatever work it was supposed to perform. The bonked
> > thread should look for ways to proactively avoid interference in the future.
> > This is where the count comes in. Given that thread k got bonked, it
> > understands interference is present. Given that each bonked thread gets
> > a unique count, then all bonked threads have a way to attempt another
> > ATOMIC event on a different unit of work which has a vastly higher
> > probability of success than everyone pounding on the front of the queue..
> > It is this proactivity that reduces the problem from BigO(n^3) to BigO(3)
> > when properly programmed.
<
> While you have made a strong case for failures providing
> more information than just failure (or failure and cannot
> succeed on retry), my (small) point was that success might
> benefit from having more information than just "success".
> There seems to be a perspective that successes cannot use
> metadata comparable to an error code; I am not convinced
> that this perspective is universally true, especially if
> cooperation and "playing nice" is desirable. (This brings to
> mind the failure of the Unix nice command. In part there
> seems to have been no benefit in being nice — such as
> rewarding frugal resource use by lower the price of the
> resource if community demand is lowered and saving 'money'
> for later resource purchases of this or other resources.)
<
I will grant you that the success path might be able to use some
kind of indication pertaining to the number of threads interfering
with an event, but I never ran into a good reason for this. If someone
can come up with a use case, the mechanics of inserting this into
ESM is straightforward.
>
> [snip]
> > Remember, no interested 3rd party can see any (ANY) intermediate
> > state of the ATOMIC event. The interested 3rd party CAN be the ISR !!!
> > (trying to enqueue his little unit of work on the run queues.)
<
> This is true for an *atomic* operation, but one might
> imagine a more complex interface that includes leaking
> information out of an in-progress quasi-atomic operation.
<
Sounds like it would create more trouble than it would be worth.
<
> > I came to the conclusion that attempting to provide the illusion of
> > ATOMICity over an interrupt, exception, trap... is "just asking for trouble."
<
> I can very much understand limiting early implementations
> in that way. I would not want to make that a feature that is
> guaranteed never to be implemented. Some unexpected use case
> or clever implementation mechanism might arise that makes
> such attractive. (Of course, ISA guarantees are not really
> absolute; one can simply declare that a new ISA is being
> provided.)
<
> > Also note: single stepping through an ATOMIC event and achieving
> > success cannot be allowed.
<
> While that does seem to be generally "asking for trouble", I
> am not certain that single stepping is impossible — not
> having given it that much thought and, especially, not
> having the best grasp of synchronization issues.
<
Consider single stepping through an ATOMIC event under actual
contention::
<
By the time the human gets the carrot > at the control terminal,
million upon millions of instructions have been performed, and
by the time the human makes his first response, millions and
millions of more instructions have been performed.
<
Under contention, it is doubtful that the single stepping event
will ever succeed, so all one can single step through is the failure
cases. So why not fail when control is transferred out of the event ?
>
> At minimum, it seems one could provide a trace of processor
> state for a successful atomic event, allowing something
> similar to single stepping.
<
The problem is NOT single stepping !!
The problem is that you cannot make it SMELL ATOMIC !!
<
> For ESM, I think one could even
> provide a "lock" that prevents other threads from
> introducing conflicts while post-atomic single-stepping;
<
You are setting yourself up for deadlock.
<
> the eight blocks could be kept in a special reserved state
> enforcing the "lock". I do not have a clue whether that
> would be useful.
>
> Perhaps versioned memory would even facilitate examining
> past behavior while allowing new versions to be accessed
> rather than "locking" the participating memory addresses?
>
> [snip]
> >> (As a reader might guess, I had a block of free time with Internet
> >> access that coincided with being 'adequately caffeinated'.☺)
> >
> > Adequately is severely under representing the caffenation level here.
<
> But I *like* my drug, and I am not certain that all of my
> "high energy" traits are always due to coffee and/or sugar.


Click here to read the complete article
Re: Safepoints

<XFAQI.17105$lK.8288@fx41.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19716&group=comp.arch#19716

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx41.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Safepoints
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me> <se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me> <1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me> <a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad> <segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad> <9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com> <30b55b24-ba70-4da4-8bc9-a37d9503df82n@googlegroups.com> <35a020f8-6228-4b68-a39c-d89e8475bb90n@googlegroups.com> <7f7207de-30cf-4f82-845c-44cdfb2853f3n@googlegroups.com>
In-Reply-To: <7f7207de-30cf-4f82-845c-44cdfb2853f3n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 35
Message-ID: <XFAQI.17105$lK.8288@fx41.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 10 Aug 2021 19:43:51 UTC
Date: Tue, 10 Aug 2021 15:43:38 -0400
X-Received-Bytes: 2826
 by: EricP - Tue, 10 Aug 2021 19:43 UTC

MitchAlsup wrote:
> On Tuesday, August 10, 2021 at 12:58:10 PM UTC-5, Paul A. Clayton wrote:
>> On Saturday, August 7, 2021 at 5:39:18 PM UTC-4, MitchAlsup wrote:
> <
>>> Also note: single stepping through an ATOMIC event and achieving
>>> success cannot be allowed.
> <
>> While that does seem to be generally "asking for trouble", I
>> am not certain that single stepping is impossible — not
>> having given it that much thought and, especially, not
>> having the best grasp of synchronization issues.
> <
> Consider single stepping through an ATOMIC event under actual
> contention::
> <
> By the time the human gets the carrot > at the control terminal,
> million upon millions of instructions have been performed, and
> by the time the human makes his first response, millions and
> millions of more instructions have been performed.
> <
> Under contention, it is doubtful that the single stepping event
> will ever succeed, so all one can single step through is the failure
> cases. So why not fail when control is transferred out of the event ?
>> At minimum, it seems one could provide a trace of processor
>> state for a successful atomic event, allowing something
>> similar to single stepping.
> <
> The problem is NOT single stepping !!
> The problem is that you cannot make it SMELL ATOMIC !!

It could have a HW debug trace log (a circular buffer register)
that starts being written if you single step into an ESM sequence,
that a suitably privileged debugger could extract on exception.

Re: Safepoints

<a91e6c35-aae5-4c1e-8809-85e6e36573d6n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19717&group=comp.arch#19717

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:6611:: with SMTP id c17mr26297231qtp.392.1628625787392;
Tue, 10 Aug 2021 13:03:07 -0700 (PDT)
X-Received: by 2002:a05:6830:31a4:: with SMTP id q4mr19690076ots.82.1628625787180;
Tue, 10 Aug 2021 13:03:07 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 10 Aug 2021 13:03:06 -0700 (PDT)
In-Reply-To: <XFAQI.17105$lK.8288@fx41.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7cec:d983:d472:176b;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7cec:d983:d472:176b
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me>
<1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad>
<segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad>
<9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com> <30b55b24-ba70-4da4-8bc9-a37d9503df82n@googlegroups.com>
<35a020f8-6228-4b68-a39c-d89e8475bb90n@googlegroups.com> <7f7207de-30cf-4f82-845c-44cdfb2853f3n@googlegroups.com>
<XFAQI.17105$lK.8288@fx41.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a91e6c35-aae5-4c1e-8809-85e6e36573d6n@googlegroups.com>
Subject: Re: Safepoints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 10 Aug 2021 20:03:07 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Tue, 10 Aug 2021 20:03 UTC

On Tuesday, August 10, 2021 at 2:43:53 PM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Tuesday, August 10, 2021 at 12:58:10 PM UTC-5, Paul A. Clayton wrote:
> >> On Saturday, August 7, 2021 at 5:39:18 PM UTC-4, MitchAlsup wrote:
> > <
> >>> Also note: single stepping through an ATOMIC event and achieving
> >>> success cannot be allowed.
> > <
> >> While that does seem to be generally "asking for trouble", I
> >> am not certain that single stepping is impossible — not
> >> having given it that much thought and, especially, not
> >> having the best grasp of synchronization issues.
> > <
> > Consider single stepping through an ATOMIC event under actual
> > contention::
> > <
> > By the time the human gets the carrot > at the control terminal,
> > million upon millions of instructions have been performed, and
> > by the time the human makes his first response, millions and
> > millions of more instructions have been performed.
> > <
> > Under contention, it is doubtful that the single stepping event
> > will ever succeed, so all one can single step through is the failure
> > cases. So why not fail when control is transferred out of the event ?
> >> At minimum, it seems one could provide a trace of processor
> >> state for a successful atomic event, allowing something
> >> similar to single stepping.
> > <
> > The problem is NOT single stepping !!
> > The problem is that you cannot make it SMELL ATOMIC !!
<
> It could have a HW debug trace log (a circular buffer register)
> that starts being written if you single step into an ESM sequence,
> that a suitably privileged debugger could extract on exception.
<
It is a given that you will probably have one of these anyway.
<
You still can't make things operating at the speed of humans
smell atomic with dozens of 5 GHz cores !!! and no actual lock
mechanism.

Re: Safepoints

<PNDQI.17240$lK.1540@fx41.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19719&group=comp.arch#19719

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx41.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Safepoints
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me> <se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me> <1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me> <a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad> <segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad> <9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com> <30b55b24-ba70-4da4-8bc9-a37d9503df82n@googlegroups.com> <35a020f8-6228-4b68-a39c-d89e8475bb90n@googlegroups.com> <7f7207de-30cf-4f82-845c-44cdfb2853f3n@googlegroups.com> <XFAQI.17105$lK.8288@fx41.iad> <a91e6c35-aae5-4c1e-8809-85e6e36573d6n@googlegroups.com>
In-Reply-To: <a91e6c35-aae5-4c1e-8809-85e6e36573d6n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 53
Message-ID: <PNDQI.17240$lK.1540@fx41.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 10 Aug 2021 23:17:03 UTC
Date: Tue, 10 Aug 2021 19:16:46 -0400
X-Received-Bytes: 3743
 by: EricP - Tue, 10 Aug 2021 23:16 UTC

MitchAlsup wrote:
> On Tuesday, August 10, 2021 at 2:43:53 PM UTC-5, EricP wrote:
>> MitchAlsup wrote:
>>> On Tuesday, August 10, 2021 at 12:58:10 PM UTC-5, Paul A. Clayton wrote:
>>>> On Saturday, August 7, 2021 at 5:39:18 PM UTC-4, MitchAlsup wrote:
>>> <
>>>>> Also note: single stepping through an ATOMIC event and achieving
>>>>> success cannot be allowed.
>>> <
>>>> While that does seem to be generally "asking for trouble", I
>>>> am not certain that single stepping is impossible — not
>>>> having given it that much thought and, especially, not
>>>> having the best grasp of synchronization issues.
>>> <
>>> Consider single stepping through an ATOMIC event under actual
>>> contention::
>>> <
>>> By the time the human gets the carrot > at the control terminal,
>>> million upon millions of instructions have been performed, and
>>> by the time the human makes his first response, millions and
>>> millions of more instructions have been performed.
>>> <
>>> Under contention, it is doubtful that the single stepping event
>>> will ever succeed, so all one can single step through is the failure
>>> cases. So why not fail when control is transferred out of the event ?
>>>> At minimum, it seems one could provide a trace of processor
>>>> state for a successful atomic event, allowing something
>>>> similar to single stepping.
>>> <
>>> The problem is NOT single stepping !!
>>> The problem is that you cannot make it SMELL ATOMIC !!
> <
>> It could have a HW debug trace log (a circular buffer register)
>> that starts being written if you single step into an ESM sequence,
>> that a suitably privileged debugger could extract on exception.
> <
> It is a given that you will probably have one of these anyway.

Ah, good, so it's just a matter of documenting it. :-)

> <
> You still can't make things operating at the speed of humans
> smell atomic with dozens of 5 GHz cores !!! and no actual lock
> mechanism.

Single step shuts off at LOCK start and begins HW logging.
Single step restarts at COMMIT and turns log off.
Between start and commit ESM does its normal thing atomically
without interference and the HW log is the scope inside.
The debugger puts breakpoint instructions inside the ESM region to
trigger exceptions and the log displays what happened up to the break.

Re: Safepoints

<7dc6c144-aad1-4a0d-afc6-8e8b801fb509n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19720&group=comp.arch#19720

  copy link   Newsgroups: comp.arch
X-Received: by 2002:aed:2163:: with SMTP id 90mr27325340qtc.186.1628638255524;
Tue, 10 Aug 2021 16:30:55 -0700 (PDT)
X-Received: by 2002:a4a:2a12:: with SMTP id k18mr10108294oof.74.1628638255307;
Tue, 10 Aug 2021 16:30:55 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 10 Aug 2021 16:30:55 -0700 (PDT)
In-Reply-To: <PNDQI.17240$lK.1540@fx41.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:7cec:d983:d472:176b;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:7cec:d983:d472:176b
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me>
<1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad>
<segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad>
<9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com> <30b55b24-ba70-4da4-8bc9-a37d9503df82n@googlegroups.com>
<35a020f8-6228-4b68-a39c-d89e8475bb90n@googlegroups.com> <7f7207de-30cf-4f82-845c-44cdfb2853f3n@googlegroups.com>
<XFAQI.17105$lK.8288@fx41.iad> <a91e6c35-aae5-4c1e-8809-85e6e36573d6n@googlegroups.com>
<PNDQI.17240$lK.1540@fx41.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7dc6c144-aad1-4a0d-afc6-8e8b801fb509n@googlegroups.com>
Subject: Re: Safepoints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Tue, 10 Aug 2021 23:30:55 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: MitchAlsup - Tue, 10 Aug 2021 23:30 UTC

On Tuesday, August 10, 2021 at 6:17:06 PM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Tuesday, August 10, 2021 at 2:43:53 PM UTC-5, EricP wrote:
> >> MitchAlsup wrote:
> >>> On Tuesday, August 10, 2021 at 12:58:10 PM UTC-5, Paul A. Clayton wrote:
> >>>> On Saturday, August 7, 2021 at 5:39:18 PM UTC-4, MitchAlsup wrote:
> >>> <
> >>>>> Also note: single stepping through an ATOMIC event and achieving
> >>>>> success cannot be allowed.
> >>> <
> >>>> While that does seem to be generally "asking for trouble", I
> >>>> am not certain that single stepping is impossible — not
> >>>> having given it that much thought and, especially, not
> >>>> having the best grasp of synchronization issues.
> >>> <
> >>> Consider single stepping through an ATOMIC event under actual
> >>> contention::
> >>> <
> >>> By the time the human gets the carrot > at the control terminal,
> >>> million upon millions of instructions have been performed, and
> >>> by the time the human makes his first response, millions and
> >>> millions of more instructions have been performed.
> >>> <
> >>> Under contention, it is doubtful that the single stepping event
> >>> will ever succeed, so all one can single step through is the failure
> >>> cases. So why not fail when control is transferred out of the event ?
> >>>> At minimum, it seems one could provide a trace of processor
> >>>> state for a successful atomic event, allowing something
> >>>> similar to single stepping.
> >>> <
> >>> The problem is NOT single stepping !!
> >>> The problem is that you cannot make it SMELL ATOMIC !!
> > <
> >> It could have a HW debug trace log (a circular buffer register)
> >> that starts being written if you single step into an ESM sequence,
> >> that a suitably privileged debugger could extract on exception.
> > <
> > It is a given that you will probably have one of these anyway.
> Ah, good, so it's just a matter of documenting it. :-)
> > <
> > You still can't make things operating at the speed of humans
> > smell atomic with dozens of 5 GHz cores !!! and no actual lock
> > mechanism.
<
> Single step shuts off at LOCK start and begins HW logging.
> Single step restarts at COMMIT and turns log off.
> Between start and commit ESM does its normal thing atomically
> without interference and the HW log is the scope inside.
> The debugger puts breakpoint instructions inside the ESM region to
> trigger exceptions and the log displays what happened up to the break.
<
What if a large number of the 8-gazillion other threads perform a
variety of ATOMIC events while the debugger is gaining control?
<
The user (or debugger) might have to sift through Gigabytes of
HW logging to find the portion appropriate for his thread. And
one could argue that the debugger might not have permission
to perform such a sifting, should "secure" SW be performing
ATOMIC events; certainly the user should not have such permission !

Re: Safepoints

<sev8to$1vao$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19723&group=comp.arch#19723

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!ux6ld97kLXxG8kVFFLnoWg.user.46.165.242.75.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: Safepoints
Date: Tue, 10 Aug 2021 18:25:10 -0700
Organization: Aioe.org NNTP Server
Message-ID: <sev8to$1vao$1@gioia.aioe.org>
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me>
<seeb8j$f0t$1@dont-email.me> <1JzOI.8$xM2.7@fx22.iad>
<seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me>
<5hHOI.1976$lK.1523@fx41.iad> <segqla$4k9$1@dont-email.me>
<r_cPI.1442$yW1.495@fx08.iad>
<9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com>
<9%IPI.2750$WG5.976@fx38.iad> <senrfc$mkj$2@gioia.aioe.org>
<qWTPI.3160$WG5.2339@fx38.iad> <sepcgc$1nks$1@gioia.aioe.org>
<D5wQI.2739$yW1.1705@fx08.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="64856"; posting-host="ux6ld97kLXxG8kVFFLnoWg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-US
 by: Chris M. Thomasson - Wed, 11 Aug 2021 01:25 UTC

On 8/10/2021 7:31 AM, EricP wrote:
> Chris M. Thomasson wrote:
>>
>> True. Fwiw, iirc, I tried to simulate a live lock using DWCAS where a
>> thread would try to increment the high part of the counter, and a shit
>> load of other threads would just mutate the low part of the counter
>> willy nilly. It would definitely increase the failure rate!
>>
>> ;^)
>
> Just out of curiosity, what was this test run on, x86? x64?
> What did the test do and what result did you see?
> If x86/x64, did you put a LCK prefix on the CMPXCHG8B/CMPXCHG16B
> or just leave them naked (non-atomic)?
>
> Note that Intel documents CMPXCHG8B and CMPXCHG16B as
> "To simplify the interface to the processor’s bus, the destination operand
> receives a write cycle without regard to the result of the comparison."
>
> Without a LCK prefix a CMPXCHG could in theory interleave with another
> but clobber the other changes and one might not see this happen
> unless you explicitly test for it.
>
>

Iirc, it was on one of the first hyperthreaded x86's. I was using a
LOCK'ed cmpxchg8b, or (DWCAS) if you will for doubleword compare and
swap. The main thread just did a DWCAS and tried to increment the high
part of the counter (two contiguous unsigned 32-bit words) in a loop.
The other threads just mutated the low part of the word without using
cmpxchg8b... Just MOV's. Iirc, the processor suffered from the aliasing
problem as well.

Iirc, it sure messed with the failure rate of the main thread doing the
DWCAS. It kind of simulated a LL/SC livelock like scenario wrt mutating
the reservation granule from false sharing threads.

Re: Safepoints

<sev92o$1vao$2@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19724&group=comp.arch#19724

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!ux6ld97kLXxG8kVFFLnoWg.user.46.165.242.75.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: Safepoints
Date: Tue, 10 Aug 2021 18:27:51 -0700
Organization: Aioe.org NNTP Server
Message-ID: <sev92o$1vao$2@gioia.aioe.org>
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me>
<seeb8j$f0t$1@dont-email.me> <1JzOI.8$xM2.7@fx22.iad>
<seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me>
<5hHOI.1976$lK.1523@fx41.iad> <segqla$4k9$1@dont-email.me>
<r_cPI.1442$yW1.495@fx08.iad>
<9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com>
<9%IPI.2750$WG5.976@fx38.iad> <senrfc$mkj$2@gioia.aioe.org>
<qWTPI.3160$WG5.2339@fx38.iad> <sepcgc$1nks$1@gioia.aioe.org>
<D5wQI.2739$yW1.1705@fx08.iad> <sev8to$1vao$1@gioia.aioe.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gioia.aioe.org; logging-data="64856"; posting-host="ux6ld97kLXxG8kVFFLnoWg.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.12.0
Content-Language: en-US
X-Notice: Filtered by postfilter v. 0.9.2
 by: Chris M. Thomasson - Wed, 11 Aug 2021 01:27 UTC

On 8/10/2021 6:25 PM, Chris M. Thomasson wrote:
> On 8/10/2021 7:31 AM, EricP wrote:
>> Chris M. Thomasson wrote:
>>>
>>> True. Fwiw, iirc, I tried to simulate a live lock using DWCAS where a
>>> thread would try to increment the high part of the counter, and a
>>> shit load of other threads would just mutate the low part of the
>>> counter willy nilly. It would definitely increase the failure rate!
>>>
>>> ;^)
>>
>> Just out of curiosity, what was this test run on, x86? x64?
>> What did the test do and what result did you see?
>> If x86/x64, did you put a LCK prefix on the CMPXCHG8B/CMPXCHG16B
>> or just leave them naked (non-atomic)?
>>
>> Note that Intel documents CMPXCHG8B and CMPXCHG16B as
>> "To simplify the interface to the processor’s bus, the destination
>> operand
>> receives a write cycle without regard to the result of the comparison."
>>
>> Without a LCK prefix a CMPXCHG could in theory interleave with another
>> but clobber the other changes and one might not see this happen
>> unless you explicitly test for it.
>>
>>
>
> Iirc, it was on one of the first hyperthreaded x86's. I was using a
> LOCK'ed cmpxchg8b, or (DWCAS) if you will for doubleword compare and
> swap. The main thread just did a DWCAS and tried to increment the high
> part of the counter (two contiguous unsigned 32-bit words) in a loop.
> The other threads just mutated the low part of the word without using

To be more precise, the other threads just mutated the low part of the
doubleword.

> cmpxchg8b... Just MOV's. Iirc, the processor suffered from the aliasing
> problem as well.
>
> Iirc, it sure messed with the failure rate of the main thread doing the
> DWCAS. It kind of simulated a LL/SC livelock  like scenario wrt mutating
> the reservation granule from false sharing threads.

Re: Safepoints

<5ZTQI.727$9F6.229@fx13.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19731&group=comp.arch#19731

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.roellig-ltd.de!open-news-network.org!peer03.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx13.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Safepoints
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me> <se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me> <1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me> <a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad> <segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad> <9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com> <30b55b24-ba70-4da4-8bc9-a37d9503df82n@googlegroups.com> <35a020f8-6228-4b68-a39c-d89e8475bb90n@googlegroups.com> <7f7207de-30cf-4f82-845c-44cdfb2853f3n@googlegroups.com> <XFAQI.17105$lK.8288@fx41.iad> <a91e6c35-aae5-4c1e-8809-85e6e36573d6n@googlegroups.com> <PNDQI.17240$lK.1540@fx41.iad> <7dc6c144-aad1-4a0d-afc6-8e8b801fb509n@googlegroups.com>
In-Reply-To: <7dc6c144-aad1-4a0d-afc6-8e8b801fb509n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Lines: 76
Message-ID: <5ZTQI.727$9F6.229@fx13.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Wed, 11 Aug 2021 17:41:21 UTC
Date: Wed, 11 Aug 2021 13:40:18 -0400
X-Received-Bytes: 5074
 by: EricP - Wed, 11 Aug 2021 17:40 UTC

MitchAlsup wrote:
> On Tuesday, August 10, 2021 at 6:17:06 PM UTC-5, EricP wrote:
>> MitchAlsup wrote:
>>> On Tuesday, August 10, 2021 at 2:43:53 PM UTC-5, EricP wrote:
>>>> MitchAlsup wrote:
>>>>> On Tuesday, August 10, 2021 at 12:58:10 PM UTC-5, Paul A. Clayton wrote:
>>>>>> On Saturday, August 7, 2021 at 5:39:18 PM UTC-4, MitchAlsup wrote:
>>>>> <
>>>>>>> Also note: single stepping through an ATOMIC event and achieving
>>>>>>> success cannot be allowed.
>>>>> <
>>>>>> While that does seem to be generally "asking for trouble", I
>>>>>> am not certain that single stepping is impossible — not
>>>>>> having given it that much thought and, especially, not
>>>>>> having the best grasp of synchronization issues.
>>>>> <
>>>>> Consider single stepping through an ATOMIC event under actual
>>>>> contention::
>>>>> <
>>>>> By the time the human gets the carrot > at the control terminal,
>>>>> million upon millions of instructions have been performed, and
>>>>> by the time the human makes his first response, millions and
>>>>> millions of more instructions have been performed.
>>>>> <
>>>>> Under contention, it is doubtful that the single stepping event
>>>>> will ever succeed, so all one can single step through is the failure
>>>>> cases. So why not fail when control is transferred out of the event ?
>>>>>> At minimum, it seems one could provide a trace of processor
>>>>>> state for a successful atomic event, allowing something
>>>>>> similar to single stepping.
>>>>> <
>>>>> The problem is NOT single stepping !!
>>>>> The problem is that you cannot make it SMELL ATOMIC !!
>>> <
>>>> It could have a HW debug trace log (a circular buffer register)
>>>> that starts being written if you single step into an ESM sequence,
>>>> that a suitably privileged debugger could extract on exception.
>>> <
>>> It is a given that you will probably have one of these anyway.
>> Ah, good, so it's just a matter of documenting it. :-)
>>> <
>>> You still can't make things operating at the speed of humans
>>> smell atomic with dozens of 5 GHz cores !!! and no actual lock
>>> mechanism.
> <
>> Single step shuts off at LOCK start and begins HW logging.
>> Single step restarts at COMMIT and turns log off.
>> Between start and commit ESM does its normal thing atomically
>> without interference and the HW log is the scope inside.
>> The debugger puts breakpoint instructions inside the ESM region to
>> trigger exceptions and the log displays what happened up to the break.
> <
> What if a large number of the 8-gazillion other threads perform a
> variety of ATOMIC events while the debugger is gaining control?

Why would you be debugging a system with a zillion threads running?
But that's ok.

> <
> The user (or debugger) might have to sift through Gigabytes of
> HW logging to find the portion appropriate for his thread. And
> one could argue that the debugger might not have permission
> to perform such a sifting, should "secure" SW be performing
> ATOMIC events; certainly the user should not have such permission !

The log buffer is of a single threads' instruction history and
is part of that thread context just like the single step flag.

Also putting a break instruction in the shared transaction code
would break all executers and abort their transactions.
The modified code page would have to be copy-on-write so
only the thread being debugged can see the break instruction,
and the page table pages have to be copied and adjusted
accordingly when the debug thread executes.

Re: Safepoints

<e81f4a1e-2fea-4191-894e-476878a518cen@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19733&group=comp.arch#19733

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1007:: with SMTP id d7mr366544qte.158.1628710877450;
Wed, 11 Aug 2021 12:41:17 -0700 (PDT)
X-Received: by 2002:a05:6808:6cc:: with SMTP id m12mr8791673oih.51.1628710877235;
Wed, 11 Aug 2021 12:41:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 11 Aug 2021 12:41:17 -0700 (PDT)
In-Reply-To: <5ZTQI.727$9F6.229@fx13.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:407c:5db4:b829:fe2;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:407c:5db4:b829:fe2
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me>
<1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad>
<segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad>
<9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com> <30b55b24-ba70-4da4-8bc9-a37d9503df82n@googlegroups.com>
<35a020f8-6228-4b68-a39c-d89e8475bb90n@googlegroups.com> <7f7207de-30cf-4f82-845c-44cdfb2853f3n@googlegroups.com>
<XFAQI.17105$lK.8288@fx41.iad> <a91e6c35-aae5-4c1e-8809-85e6e36573d6n@googlegroups.com>
<PNDQI.17240$lK.1540@fx41.iad> <7dc6c144-aad1-4a0d-afc6-8e8b801fb509n@googlegroups.com>
<5ZTQI.727$9F6.229@fx13.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e81f4a1e-2fea-4191-894e-476878a518cen@googlegroups.com>
Subject: Re: Safepoints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Wed, 11 Aug 2021 19:41:17 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 6882
 by: MitchAlsup - Wed, 11 Aug 2021 19:41 UTC

On Wednesday, August 11, 2021 at 12:41:26 PM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Tuesday, August 10, 2021 at 6:17:06 PM UTC-5, EricP wrote:
> >> MitchAlsup wrote:
> >>> On Tuesday, August 10, 2021 at 2:43:53 PM UTC-5, EricP wrote:
> >>>> MitchAlsup wrote:
> >>>>> On Tuesday, August 10, 2021 at 12:58:10 PM UTC-5, Paul A. Clayton wrote:
> >>>>>> On Saturday, August 7, 2021 at 5:39:18 PM UTC-4, MitchAlsup wrote:
> >>>>> <
> >>>>>>> Also note: single stepping through an ATOMIC event and achieving
> >>>>>>> success cannot be allowed.
> >>>>> <
> >>>>>> While that does seem to be generally "asking for trouble", I
> >>>>>> am not certain that single stepping is impossible — not
> >>>>>> having given it that much thought and, especially, not
> >>>>>> having the best grasp of synchronization issues.
> >>>>> <
> >>>>> Consider single stepping through an ATOMIC event under actual
> >>>>> contention::
> >>>>> <
> >>>>> By the time the human gets the carrot > at the control terminal,
> >>>>> million upon millions of instructions have been performed, and
> >>>>> by the time the human makes his first response, millions and
> >>>>> millions of more instructions have been performed.
> >>>>> <
> >>>>> Under contention, it is doubtful that the single stepping event
> >>>>> will ever succeed, so all one can single step through is the failure
> >>>>> cases. So why not fail when control is transferred out of the event ?
> >>>>>> At minimum, it seems one could provide a trace of processor
> >>>>>> state for a successful atomic event, allowing something
> >>>>>> similar to single stepping.
> >>>>> <
> >>>>> The problem is NOT single stepping !!
> >>>>> The problem is that you cannot make it SMELL ATOMIC !!
> >>> <
> >>>> It could have a HW debug trace log (a circular buffer register)
> >>>> that starts being written if you single step into an ESM sequence,
> >>>> that a suitably privileged debugger could extract on exception.
> >>> <
> >>> It is a given that you will probably have one of these anyway.
> >> Ah, good, so it's just a matter of documenting it. :-)
> >>> <
> >>> You still can't make things operating at the speed of humans
> >>> smell atomic with dozens of 5 GHz cores !!! and no actual lock
> >>> mechanism.
> > <
> >> Single step shuts off at LOCK start and begins HW logging.
> >> Single step restarts at COMMIT and turns log off.
> >> Between start and commit ESM does its normal thing atomically
> >> without interference and the HW log is the scope inside.
> >> The debugger puts breakpoint instructions inside the ESM region to
> >> trigger exceptions and the log displays what happened up to the break.
> > <
> > What if a large number of the 8-gazillion other threads perform a
> > variety of ATOMIC events while the debugger is gaining control?
<
> Why would you be debugging a system with a zillion threads running?
<
There is no need for ATOMICity when you are running only one thread.
<
But more importantly, how do you debug ATOMIC events on a server
running a zillion threads on a data base ??
<
Without the zillion threads you cannot see the interference and reason
about the ATOMIC failures (or successes which SHOULD have been
failures) !! Which is why you need that kind of interference.
<
> But that's ok.
> > <
> > The user (or debugger) might have to sift through Gigabytes of
> > HW logging to find the portion appropriate for his thread. And
> > one could argue that the debugger might not have permission
> > to perform such a sifting, should "secure" SW be performing
> > ATOMIC events; certainly the user should not have such permission !
<
> The log buffer is of a single threads' instruction history and
> is part of that thread context just like the single step flag.
<
The HW loggers we used at AMD contained:: user, supervisor, hypervisor,
secure, and various other sub-system memory requests (such as the TLB
and table walkers), prefetches, DRAM controller events. So we could
determine an exact order of events over the whole system, feed this into
our simulators to figure out exactly where something happened (or not).
The log was, in effect, a compressed version of "everything".
<
The HW traces were considered AMD "eyes only".
>
> Also putting a break instruction in the shared transaction code
> would break all executers and abort their transactions.
<
Another reason one cannot single-step ATOMIC events.
<
> The modified code page would have to be copy-on-write so
> only the thread being debugged can see the break instruction,
> and the page table pages have to be copied and adjusted
> accordingly when the debug thread executes.

Re: Safepoints

<a3f971e5-4db6-4427-b231-03fdc78ebc95n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19752&group=comp.arch#19752

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:b82:: with SMTP id h2mr5168625qti.214.1628792785473;
Thu, 12 Aug 2021 11:26:25 -0700 (PDT)
X-Received: by 2002:a05:6808:1981:: with SMTP id bj1mr12754297oib.155.1628792785281;
Thu, 12 Aug 2021 11:26:25 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 12 Aug 2021 11:26:25 -0700 (PDT)
In-Reply-To: <7f7207de-30cf-4f82-845c-44cdfb2853f3n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=64.26.99.248; posting-account=6JNn0QoAAAD-Scrkl0ClrfutZTkrOS9S
NNTP-Posting-Host: 64.26.99.248
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me>
<1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad>
<segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad>
<9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com> <30b55b24-ba70-4da4-8bc9-a37d9503df82n@googlegroups.com>
<35a020f8-6228-4b68-a39c-d89e8475bb90n@googlegroups.com> <7f7207de-30cf-4f82-845c-44cdfb2853f3n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a3f971e5-4db6-4427-b231-03fdc78ebc95n@googlegroups.com>
Subject: Re: Safepoints
From: paaroncl...@gmail.com (Paul A. Clayton)
Injection-Date: Thu, 12 Aug 2021 18:26:25 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Paul A. Clayton - Thu, 12 Aug 2021 18:26 UTC

On Tuesday, August 10, 2021 at 2:53:56 PM UTC-4, MitchAlsup wrote:
>On Tuesday, August 10, 2021 at 12:58:10 PM UTC-5, Paul A. Clayton wrote:
>> On Saturday, August 7, 2021 at 5:39:18 PM UTC-4, MitchAlsup wrote:
>>> On Saturday, August 7, 2021 at 4:10:33 PM UTC-5, Paul A. Clayton wrote:
[snip]
>> I am almost surprised, Mitch, that you did not use a
>> PRED-like instruction modifier to extend the semantics of
>> memory access instructions in its shadow.☺ In addition to
>> allowing any load/store instruction to have the extra
>> annotation applied independent of ordinary instruction
>> encoding constraints, the encoding freedom might facilitate
>> more extensive annotations, e.g., by adding an optional
>> 32-bit or 64-bit "immediate" to the LOCK instruction
>> modifier.)
>
> To your credit, you are the first person to have noted this.
>
> But, ESM grew out of ASF and ASF had no predicates, indeed, when
> I did the ESM stuff (circa 2007) I did not have a PRED in my ISA.
>
> You have a sharp thinking cap !!

Thank you for the compliment. (I plan to add it to my list.)

As is often the case, the association was caught by
coincidence of mental bias (reusing mechanisms and coherent
design) and circumstance (ASF's use of x86's LOCK prefix was
in mind, so a LOCK instruction modifier is not a large
inventive step when applied to an ISA that uses such modifiers
as an extension mechanism for special cases). The role of
mental bias is one reason why mental diversity is useful in
teams. While one cannot program seredipity, environmental
diversity (i.e., do not just focus on one problem) seems to help.

(I do wish that more of my gifts were used. While noticing
inconsistencies or opportunities for greater coherence is
useful in working as a library page [now in two counties
#&☠%~⛈⛤🗡🔧 part-time minimum-wage jobs!], I suspect I
would be more useful in a research role. Sadly, for me and
perhaps for humanity [in that less effective use of resources
hurts humanity], I am not fit for most conventional roles. I
have been able to provide a little edutainment on the
Internet with my gifts.)

> In retrospect, 8 instructions is not enough to perform some of the
> things ESM is capable of doing.

>> I was thinking of a similar "dummy" SC to replace CLL (clear
>> lock), inspired by the dummy stores used to implement
>> memory-based conditional move (where the register CMOV/
>> SELECT instruction sets up/retains the store address or
>> retains/sets up a dummy address). The main problem would be
>> forcing a failure so that the previous stores do not become
>> visible. For a single cache block reservation this would be
>> trivial, but extending the reservation to an arbitrary set
>
> Arbitrary: yes, but a fixed (smallish) number of them.

I was thinking arbitrarily *large*. Small read and write sets
are great for providing guarantees, but I think it would be
nice to have support for relatively large transactions. With a
strict limit to only eight cache blocks participating, software
might reasonably know an address not in those eight blocks. I
would also like the terminating SC to be able to be a new block.

(One "solution" would be to use page fault/permission violation
as a terminating condition for SC/the transaction. Software
might reasonably be able to define an address that is guaranteed
to be unwritable.)

>> of addresses makes it difficult to be sure that the SC will
>> fail. On a general register machine, one might reserve one
>> register name to indicate a SC that always fails. If the ISA
>> has a zero register, using this as a base register for a SC might
>> be defined as always failing. Using an architecturally defined
>> stack pointer or thread-local-storage pointer as a base address
>> might work, but CLL does seem less crufty and potentially
>> constraining.
>
> Whatever mechanics fits with the rest of your architecture can be
> made to work adequately. I chose to do this all in the Miss Buffer.

"Work adequately" may be above average (Sturgeon's Law), but
having a coherent/elegant design is desirable.

[snip]
>> I consider LL/SC transactional memory (with the limit of
>> single load read set and single store write set with the
>> read set identical to the write set), but this is a
>> classification/nomenclature issue.
>
> {{eyes wider open than usual and staring with a gaze}} Interesting

If one defines a transaction as "a failable operation
collecting multiple operations that together would normally
not be atomic into an atomic unit, with retry as a typical
fallback", LL/SC ticks all the boxes: two operations
("multiple"), a store would not normally be atomic with a
load but is made atomic, and retry is the typical fallback.

This is an example of my clumper classification tendency.

[snip]
> ASF and ESM are, in essence, a way to get out of the game of inventing
> more and more exotic synchronization instructions over time. SW can
> create whatever ATOMIC primitives it desires and wrap them up in
> subroutines, macros, or code to be inlined.

Yes, and it is very nice for that purpose. I just feel that
a broader transactional system could use much of the hardware
support for and ASF/ESM-like system and unify the interface
somewhat among "large" atomic primitives and something more
associated with transactional memory.

[snip]
> Yes, at first blush, one can run ESM under a SW lock........but why ?
> ESM is present to allow the ILLUSION of ATOMICity without ever
> locking anything!
[snip]
>>> Also note: single stepping through an ATOMIC event and achieving
>>> success cannot be allowed.
>>
>> While that does seem to be generally "asking for trouble", I
>> am not certain that single stepping is impossible — not
>> having given it that much thought and, especially, not
>> having the best grasp of synchronization issues.
>
> Consider single stepping through an ATOMIC event under actual
> contention::
>
> By the time the human gets the carrot > at the control terminal,
> million upon millions of instructions have been performed, and
> by the time the human makes his first response, millions and
> millions of more instructions have been performed.
>
> Under contention, it is doubtful that the single stepping event
> will ever succeed, so all one can single step through is the failure
> cases. So why not fail when control is transferred out of the event ?
>
>> At minimum, it seems one could provide a trace of processor
>> state for a successful atomic event, allowing something
>> similar to single stepping.
>
> The problem is NOT single stepping !!
> The problem is that you cannot make it SMELL ATOMIC !!

I think I did not communicate my proposal clearly. The
operation would be "in the past". One would be single-
stepping through a sequence that has already happened. Yes,
this means that the full debugging power of single-stepping
would not be available — one would not be able to modify
values and continue — and some confusion would be possible
(checking the value of a variable outside the transaction
could give a later version of that variable's value).

Such limited single-stepping might still be useful for
understanding what the software is doing.

>> For ESM, I think one could even
>> provide a "lock" that prevents other threads from
>> introducing conflicts while post-atomic single-stepping;
>
> You are setting yourself up for deadlock.

With software locks, single-stepping through a critical
section effectively produces "coma-lock". Single-thread
single-stepping through parallel code will not provide the
same "experience" as single-stepping through serial code.
Even with running all communicating threads in lock step,
the actual behavior would not be realistic as timing
influences behavior in a parallel system.

Re: Safepoints

<54f5dd93-7082-4735-b222-8e9404eda380n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=19758&group=comp.arch#19758

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5744:: with SMTP id 4mr5886348qtx.326.1628801877371;
Thu, 12 Aug 2021 13:57:57 -0700 (PDT)
X-Received: by 2002:a9d:7517:: with SMTP id r23mr4815240otk.182.1628801877092;
Thu, 12 Aug 2021 13:57:57 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 12 Aug 2021 13:57:56 -0700 (PDT)
In-Reply-To: <a3f971e5-4db6-4427-b231-03fdc78ebc95n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=104.59.204.55; posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 104.59.204.55
References: <se1s18$sf$1@z-news.wcss.wroc.pl> <se36el$en5$1@dont-email.me>
<se4aae$qma$1@dont-email.me> <sedj2d$b0l$1@dont-email.me> <seeb8j$f0t$1@dont-email.me>
<1JzOI.8$xM2.7@fx22.iad> <seejpg$ce6$1@dont-email.me> <seeqsh$1ak$1@dont-email.me>
<a0DOI.93$fI7.10@fx33.iad> <sef0rg$9ps$1@dont-email.me> <5hHOI.1976$lK.1523@fx41.iad>
<segqla$4k9$1@dont-email.me> <r_cPI.1442$yW1.495@fx08.iad>
<9b51ff18-801c-482e-b883-a921aa5f014dn@googlegroups.com> <30b55b24-ba70-4da4-8bc9-a37d9503df82n@googlegroups.com>
<35a020f8-6228-4b68-a39c-d89e8475bb90n@googlegroups.com> <7f7207de-30cf-4f82-845c-44cdfb2853f3n@googlegroups.com>
<a3f971e5-4db6-4427-b231-03fdc78ebc95n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <54f5dd93-7082-4735-b222-8e9404eda380n@googlegroups.com>
Subject: Re: Safepoints
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Thu, 12 Aug 2021 20:57:57 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 214
 by: MitchAlsup - Thu, 12 Aug 2021 20:57 UTC

On Thursday, August 12, 2021 at 1:26:26 PM UTC-5, Paul A. Clayton wrote:
> On Tuesday, August 10, 2021 at 2:53:56 PM UTC-4, MitchAlsup wrote:
> >On Tuesday, August 10, 2021 at 12:58:10 PM UTC-5, Paul A. Clayton wrote:
> >> On Saturday, August 7, 2021 at 5:39:18 PM UTC-4, MitchAlsup wrote:
> >>> On Saturday, August 7, 2021 at 4:10:33 PM UTC-5, Paul A. Clayton wrote:
> [snip]
> >> I am almost surprised, Mitch, that you did not use a
> >> PRED-like instruction modifier to extend the semantics of
> >> memory access instructions in its shadow.☺ In addition to
> >> allowing any load/store instruction to have the extra
> >> annotation applied independent of ordinary instruction
> >> encoding constraints, the encoding freedom might facilitate
> >> more extensive annotations, e.g., by adding an optional
> >> 32-bit or 64-bit "immediate" to the LOCK instruction
> >> modifier.)
> >
> > To your credit, you are the first person to have noted this.
> >
> > But, ESM grew out of ASF and ASF had no predicates, indeed, when
> > I did the ESM stuff (circa 2007) I did not have a PRED in my ISA.
> >
> > You have a sharp thinking cap !!
> Thank you for the compliment. (I plan to add it to my list.)
>
> As is often the case, the association was caught by
> coincidence of mental bias (reusing mechanisms and coherent
> design) and circumstance (ASF's use of x86's LOCK prefix was
> in mind, so a LOCK instruction modifier is not a large
> inventive step when applied to an ISA that uses such modifiers
> as an extension mechanism for special cases). The role of
> mental bias is one reason why mental diversity is useful in
> teams. While one cannot program seredipity, environmental
> diversity (i.e., do not just focus on one problem) seems to help.
>
> (I do wish that more of my gifts were used. While noticing
> inconsistencies or opportunities for greater coherence is
> useful in working as a library page [now in two counties
> #&☠%~⛈⛤🗡🔧 part-time minimum-wage jobs!], I suspect I
> would be more useful in a research role. Sadly, for me and
> perhaps for humanity [in that less effective use of resources
> hurts humanity], I am not fit for most conventional roles. I
> have been able to provide a little edutainment on the
> Internet with my gifts.)
> > In retrospect, 8 instructions is not enough to perform some of the
> > things ESM is capable of doing.
>
> >> I was thinking of a similar "dummy" SC to replace CLL (clear
> >> lock), inspired by the dummy stores used to implement
> >> memory-based conditional move (where the register CMOV/
> >> SELECT instruction sets up/retains the store address or
> >> retains/sets up a dummy address). The main problem would be
> >> forcing a failure so that the previous stores do not become
> >> visible. For a single cache block reservation this would be
> >> trivial, but extending the reservation to an arbitrary set
> >
> > Arbitrary: yes, but a fixed (smallish) number of them.
<
> I was thinking arbitrarily *large*. Small read and write sets
> are great for providing guarantees, but I think it would be
> nice to have support for relatively large transactions. With a
> strict limit to only eight cache blocks participating, software
> might reasonably know an address not in those eight blocks. I
> would also like the terminating SC to be able to be a new block.
<
Hardware in general does not do "large" well. In particular, the
number I chose is exactly he number of miss buffers in Opteron
which I reused with very minor additions, so it was essentially
FREE.
>
> (One "solution" would be to use page fault/permission violation
> as a terminating condition for SC/the transaction. Software
> might reasonably be able to define an address that is guaranteed
> to be unwritable.)
<
It is the turning of these things on and then off again that creates
all the difficulty.
<
> >> of addresses makes it difficult to be sure that the SC will
> >> fail. On a general register machine, one might reserve one
> >> register name to indicate a SC that always fails. If the ISA
> >> has a zero register, using this as a base register for a SC might
> >> be defined as always failing. Using an architecturally defined
> >> stack pointer or thread-local-storage pointer as a base address
> >> might work, but CLL does seem less crufty and potentially
> >> constraining.
> >
> > Whatever mechanics fits with the rest of your architecture can be
> > made to work adequately. I chose to do this all in the Miss Buffer.
<
> "Work adequately" may be above average (Sturgeon's Law), but
> having a coherent/elegant design is desirable.
<
We are trying to go from 1 (T&S, CAS) and 2 (DCAS) to a handful
or more in this step. Until we have history on using these mechanisms
we have no data on how many would be "nice".
<
In any event, having 5 cache lines participate, means we can
convert 3 ATOMIC events into 1 when moving data around in
a concurrent data structure. This LOWERS the system wide
interference, maybe enough that we don't need "large" as defined
above. In any event, until we have data, 8 is more than enough.
>
> [snip]
> >> I consider LL/SC transactional memory (with the limit of
> >> single load read set and single store write set with the
> >> read set identical to the write set), but this is a
> >> classification/nomenclature issue.
> >
> > {{eyes wider open than usual and staring with a gaze}} Interesting
<
> If one defines a transaction as "a failable operation
> collecting multiple operations that together would normally
> not be atomic into an atomic unit, with retry as a typical
> fallback", LL/SC ticks all the boxes: two operations
> ("multiple"), a store would not normally be atomic with a
> load but is made atomic, and retry is the typical fallback.
>
> This is an example of my clumper classification tendency.
<
Repeat after me: ESM is not TM, ESM is not TM, and never will be.
<
Transactions that are suitably small enough can be done inside
ESM, transactions of 2×-4× the size ESM supports can be
accelerated using ESM. But large general HW TM is not in going
to happen.
>
> [snip]
> > ASF and ESM are, in essence, a way to get out of the game of inventing
> > more and more exotic synchronization instructions over time. SW can
> > create whatever ATOMIC primitives it desires and wrap them up in
> > subroutines, macros, or code to be inlined.
<
> Yes, and it is very nice for that purpose. I just feel that
> a broader transactional system could use much of the hardware
> support for and ASF/ESM-like system and unify the interface
> somewhat among "large" atomic primitives and something more
> associated with transactional memory.
<
See history comment above:: it is not yet time for TM.
>
> [snip]
> > Yes, at first blush, one can run ESM under a SW lock........but why ?
> > ESM is present to allow the ILLUSION of ATOMICity without ever
> > locking anything!
> [snip]
> >>> Also note: single stepping through an ATOMIC event and achieving
> >>> success cannot be allowed.
> >>
> >> While that does seem to be generally "asking for trouble", I
> >> am not certain that single stepping is impossible — not
> >> having given it that much thought and, especially, not
> >> having the best grasp of synchronization issues.
> >
> > Consider single stepping through an ATOMIC event under actual
> > contention::
> >
> > By the time the human gets the carrot > at the control terminal,
> > million upon millions of instructions have been performed, and
> > by the time the human makes his first response, millions and
> > millions of more instructions have been performed.
> >
> > Under contention, it is doubtful that the single stepping event
> > will ever succeed, so all one can single step through is the failure
> > cases. So why not fail when control is transferred out of the event ?
> >
> >> At minimum, it seems one could provide a trace of processor
> >> state for a successful atomic event, allowing something
> >> similar to single stepping.
> >
> > The problem is NOT single stepping !!
> > The problem is that you cannot make it SMELL ATOMIC !!
<
> I think I did not communicate my proposal clearly. The
> operation would be "in the past". One would be single-
> stepping through a sequence that has already happened. Yes,
> this means that the full debugging power of single-stepping
> would not be available — one would not be able to modify
> values and continue — and some confusion would be possible
> (checking the value of a variable outside the transaction
> could give a later version of that variable's value).
<
Once again: the overhead is all in the turning stuff on and then back off.
>
> Such limited single-stepping might still be useful for
> understanding what the software is doing.
<
While agree, I still don't see HOW to retain the Illusion of ATOMICity
over ALL kinds of memory references (including I/O) which may
interact with the ATOMICness of 3rd party views.
<
> >> For ESM, I think one could even
> >> provide a "lock" that prevents other threads from
> >> introducing conflicts while post-atomic single-stepping;
> >
> > You are setting yourself up for deadlock.
<
> With software locks, single-stepping through a critical
> section effectively produces "coma-lock". Single-thread
> single-stepping through parallel code will not provide the
> same "experience" as single-stepping through serial code.
<
But if you have only one thread, the ATOMIC stuff will NEVER fail
Yet it is the failure cases that are most interesting to get right.
So, single stepping one thread buys so little it does not seem
worth it.
<
> Even with running all communicating threads in lock step,
> the actual behavior would not be realistic as timing
> influences behavior in a parallel system.


Click here to read the complete article
Pages:1234
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor