Message-ID:

No user-servicable parts inside. Refer to qualified service personnel.

Il 23/10/2021 18:09, David Brown ha scritto:
> On 23/10/2021 00:07, pozz wrote:
>> Even I write software for embedded systems for more than 10 years,
>> there's an argument that from time to time let me think for hours and
>> leave me with many doubts.
>
> It's nice to see a thread like this here - the group needs such discussions!
>
>>
>> Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
>> The software is bare metal, without any OS. The main pattern is the well
>> known mainloop (background code) that is interrupted by ISR.
>>
>> Interrupts are used mainly for timings and for low-level driver. For
>> example, the UART reception ISR move the last received char in a FIFO
>> buffer, while the mainloop code pops new data from the FIFO.
>>
>>
>> static struct {
>> unsigned char buf[RXBUF_SIZE];
>> uint8_t in;
>> uint8_t out;
>> } rxfifo;
>>
>> /* ISR */
>> void uart_rx_isr(void) {
>> unsigned char c = UART->DATA;
>> rxfifo.buf[in % RXBUF_SIZE] = c;
>> rxfifo.in++;
>
> Unless you are sure that RXBUF_SIZE is a power of two, this is going to
> be quite slow on an AVR. Modulo means division, and while division by a
> constant is usually optimised to a multiplication by the compiler, you
> still have a multiply, a shift, and some compensation for it all being
> done as signed integer arithmetic.
>
> It's also wrong, for non-power of two sizes, since the wrapping of your
> increment and your modulo RXBUF_SIZE get out of sync.

Yes, RXBUF_SIZE is a power of two.

> The usual choice is to track "head" and "tail", and use something like:
>
> void uart_rx_isr(void) {
> unsigned char c = UART->DATA;
> // Reset interrupt flag
> uint8_t next = rxfifo.tail;
> rxfifo.buf[next] = c;
> next++;
> if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
> rxfifo.tail = next;
> }

This isn't the point of this thread, anyway...
You insist that tail is always in the range [0...RXBUF_SIZE - 1]. My
approach is different.

RXBUF_SIZE is a power of two, usualy <=256. head and tail are uint8_t
and *can* reach the maximum value of 255, even RXBUF_SIZE is 128. All
works well.

Suppose rxfifo.in=rxfifo.out=127, FIFO is empty. When a new char is
received, it is saved into rxfifo.buf[127 % 128=127] and rxfifo.in will
be increased to 128.
Now mainloop detect the new char (in != out), reads the new char at
rxfifo.buf[127 % 128=127] and increase out that will be 128.

The next byte will be saved into rxfifo.rxbuf[rxfifo.in % 128=128 % 128
= 0] and rxfifo.in will be 129. Again, the next byte will be saved to
rxbuf[rxfifo.in % 128=129 % 128=1] and rxfifo.in will be 130.

When the mainloop tries to pop data from fifo, it tests

rxfifo.in(130) !=rxfifo.out(128)

The test is true, so the code extracts chars from rxbuf[out % 128] that
is rxbuf[0]... and so on.

I hope that explanation is good.

>
>> // Reset interrupt flag
>> }
>>
>> /* Called regularly from mainloop code */
>> int uart_task(void) {
>> int c = -1;
>> if (out != in) {
>> c = rxfifo.buf[out % RXBUF_SIZE];
>> out++;
>> }
>> return -1;
>> }
>
> int uart_task(void) {
> int c = -1;
> uint8_t next = rxfifo.head;
> if (next != rxfifo.tail) {
> c = rxfifo.buf[next];
> next++;
> if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
> rxfifo.head = next;
> }
> return c;
> }
>
> These don't track buffer overflow at all - you need to call uart_task()
> often enough to avoid that.

Sure, with a good number for RXBUF_SIZE, buffer overflow shouldn't
happen ever. Anyway, if it happens, the higher level layers (protocol)
should detect a corrupted packet.

> (I'm skipping volatiles so we don't get ahead of your point.)
>
>>
>>
>> From a 20-years old article[1] by Nigle Jones, this seems a situation
>> where volatile must be used for rxfifo.in, because is modified by an ISR
>> and used in the mainloop code.
>>
>
> Certainly whenever data is shared between ISR's and mainloop code, or
> different threads, then you need to think about how to make sure data is
> synchronised and exchanged. "volatile" is one method, atomics are
> another, and memory barriers can be used.
>
>> I don't think so, rxfifo.in is read from memory only one time in
>> uart_task(), so there isn't the risk that compiler can optimize badly.
>
> That is incorrect in two ways. One - baring compiler bugs (which do
> occur, but they are very rare compared to user bugs), there is no such
> thing as "optimising badly". If optimising changes the behaviour of the
> code, other than its size and speed, the code is wrong.

Yes of course, but I don't think the absence of volatile for rxfifo.in,
even if it can change in ISR, could be a *real* problem with *modern and
current* compilers.

voltile attribute needs to avoid compiler optimization (that would be a
bad thing, because of volatile nature of the variabile), but on that
code it's difficult to think of an optimization, caused by the absence
of volatile, that changes the behaviour erroneously... except reorering.

> Two - it is a
> very bad idea to imagine that having code inside a function somehow
> "protects" it from re-ordering or other optimisation.

I didn't say this, at the contrary I was thinking exactly to reordering
issues.

> Functions can be inlined, outlined, cloned, and shuffled about.
> Link-time optimisation, code in headers, C++ modules, and other
> cross-unit optimisations are becoming more and more common. So while it
> might be true /today/ that the compiler has no alternative but to read
> rxfifo.in once per call to uart_task(), you cannot assume that will be
> the case with later compilers or with more advanced optimisation
> techniques enabled. It is safer, more portable, and more future-proof
> to avoid such assumptions.

Ok, you are talking of future scenarios. I don't think actually this
could be a real problem. Anyway your observation makes sense.

>
>> Even if ISR is fired immediately after the if statement, this doesn't
>> bring to a dangerous state: the just received data will be processed at
>> the next call to uart_task().
>>
>> So IMHO volatile isn't necessary here. And critical sections (i.e.
>> disabling interrupts) aren't useful too.
>>
>> However I'm thinking about memory barrier. Suppose the compiler reorder
>> the instructions in uart_task() as follows:
>>
>>
>> c = rxfifo.buf[out % RXBUF_SIZE]
>> if (out != in) {
>>     out++;
>>     return c;
>> } else {
>>     return -1;
>> }
>>
>>
>> Here there's a big problem, because compiler decided to firstly read
>> rxfifo.buf[] and then test in and out equality. If the ISR is fired
>> immediately after moving data to c (most probably an internal register),
>> the condition in the if statement will be true and the register value is
>> returned. However the register value isn't correct.
>
> You are absolutely correct.
>
>>
>> I don't think any modern C compiler reorder uart_task() in this way, but
>> we can't be sure. The result shouldn't change for the compiler, so it
>> can do this kind of things.
>
> It is not an unreasonable re-arrangement. On processors with
> out-of-order execution (which does not apply to the AVR or Cortex-M),
> compilers will often push loads as early as they can in the instruction
> stream so that they start the cache loading process as quickly as
> possible. (But note that on such "big" processors, much of this
> discussion on volatile and memory barriers is not sufficient, especially
> if there is more than one core. You need atomics and fences, but that's
> a story for another day.)
>
>>
>> How to fix this issue if I want to be extremely sure the compiler will
>> not reorder this way? Applying volatile to rxfifo.in shouldn't help for
>> this, because compiler is allowed to reorder access of non volatile
>> variables yet[2].
>>
>
> The important thing about "volatile" is that it is /accesses/ that are
> volatile, not objects. A volatile object is nothing more than an object
> for which all accesses are volatile by default. But you can use
> volatile accesses on non-volatile objects. This macro is your friend:
>
> #define volatileAccess(v) *((volatile typeof((v)) *) &(v))
>
> (Linux has the same macro, called ACCESS_ONCE. It uses a gcc extension
> - if you are using other compilers then you can make an uglier
> equivalent using _Generic. However, if you are using a C compiler that
> supports C11, it is probably gcc or clang, and you can use the "typeof"
> extension.)
>
> That macro will let you make a volatile read or write to an object
> without requiring that /all/ accesses to it are volatile.

This is a good point. The code in ISR can't be interrupted, so there's
no need to have volatile access in ISR.

>> One solution is adding a memory barrier in this way:
>>
>>
>> int uart_task(void) {
>> int c = -1;
>> if (out != in) {
>>     memory_barrier();
>>     c = rxfifo.buf[out % RXBUF_SIZE];
>>     out++;
>> }
>> return -1;
>> }
>>
>
> Note that you are forcing the compiler to read "out" twice here, as it
> can't keep the value of "out" in a register across the memory barrier.

Yes, you're right. A small penalty to avoid the problem of reordering.

> (And as I mentioned before, the compiler might be able to do larger
> scale optimisation across compilation units or functions, and in that
> way keep values across multiple calls to uart_task.)
>
>>
>> However this approach appears to me dangerous. You have to check and
>> double check if, when and where memory barriers are necessary and it's
>> simple to skip a barrier where it's nedded and add a barrier where it
>> isn't needed.
>
> Memory barriers are certainly useful, but they are a shotgun approach -
> they affect /everything/ involving reads and writes to memory. (But
> remember they don't affect ordering of calculations.)
>
>>
>> So I'm thinking that a sub-optimal (regarding efficiency) but reliable
>> (regarding the risk to skip a barrier where it is needed) could be to
>> enter a critical section (disabling interrupts) anyway, if it isn't
>> strictly needed.
>>
>>
>> int uart_task(void) {
>> ENTER_CRITICAL_SECTION();
>> int c = -1;
>> if (out != in) {
>> c = rxfifo.buf[out % RXBUF_SIZE];
>> out++;
>> }
>> EXIT_CRITICAL_SECTION();
>> return -1;
>> }
>
> Critical sections for something like this are /way/ overkill. And a
> critical section with a division in the middle? Not a good idea.
>
>>
>>
>> Another solution could be to apply volatile keyword to rxfifo.in *AND*
>> rxfifo.buf too, so compiler can't change the order of accesses them.
>>
>> Do you have other suggestions?
>>
>
> Marking "in" and "buf" as volatile is /far/ better than using a critical
> section, and likely to be more efficient than a memory barrier. You can
> also use volatileAccess rather than making buf volatile, and it is often
> slightly more efficient to cache volatile variables in a local variable
> while working with them.

Yes, I think so too. Lastly I read many experts say volatile is often a
bad thing, so I'm re-thinking about its use compared with other approaches.

>> [1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
>> [2] https://blog.regehr.org/archives/28
>

Subject	Replies	Author
How to write a simple driver in bare metal systems: volatile, memory By: pozz on Fri, 22 Oct 2021	58	pozz

No user-servicable parts inside. Refer to qualified service personnel.

computers / comp.arch.embedded / Re: How to write a simple driver in bare metal systems: volatile, memory barrier, critical sections and so on