Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Just go with the flow control, roll with the crunches, and, when you get a prompt, type like hell.


computers / comp.arch.embedded / Re: How to write a simple driver in bare metal systems: volatile, memory barrier, critical sections and so on

Re: How to write a simple driver in bare metal systems: volatile, memory barrier, critical sections and so on

<sl1c3a$dfr$1@dont-email.me>

  copy mid

https://www.novabbs.com/computers/article-flat.php?id=646&group=comp.arch.embedded#646

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: david.br...@hesbynett.no (David Brown)
Newsgroups: comp.arch.embedded
Subject: Re: How to write a simple driver in bare metal systems: volatile,
memory barrier, critical sections and so on
Date: Sat, 23 Oct 2021 18:09:13 +0200
Organization: A noiseless patient Spider
Lines: 247
Message-ID: <sl1c3a$dfr$1@dont-email.me>
References: <skvcnd$5dv$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 23 Oct 2021 16:09:14 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="c15e4a1fc7736cdb6f8490df24172930";
logging-data="13819"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/EF2tzZd1J7RLgw4lTvPUuJ2YVfqJqv7Q="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
Cancel-Lock: sha1:NwEaoccV29CUKtmgwT6L8/cKXhw=
In-Reply-To: <skvcnd$5dv$1@dont-email.me>
Content-Language: en-GB
 by: David Brown - Sat, 23 Oct 2021 16:09 UTC

On 23/10/2021 00:07, pozz wrote:
> Even I write software for embedded systems for more than 10 years,
> there's an argument that from time to time let me think for hours and
> leave me with many doubts.

It's nice to see a thread like this here - the group needs such discussions!

>
> Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
> The software is bare metal, without any OS. The main pattern is the well
> known mainloop (background code) that is interrupted by ISR.
>
> Interrupts are used mainly for timings and for low-level driver. For
> example, the UART reception ISR move the last received char in a FIFO
> buffer, while the mainloop code pops new data from the FIFO.
>
>
> static struct {
>   unsigned char buf[RXBUF_SIZE];
>   uint8_t in;
>   uint8_t out;
> } rxfifo;
>
> /* ISR */
> void uart_rx_isr(void) {
>   unsigned char c = UART->DATA;
>   rxfifo.buf[in % RXBUF_SIZE] = c;
>   rxfifo.in++;

Unless you are sure that RXBUF_SIZE is a power of two, this is going to
be quite slow on an AVR. Modulo means division, and while division by a
constant is usually optimised to a multiplication by the compiler, you
still have a multiply, a shift, and some compensation for it all being
done as signed integer arithmetic.

It's also wrong, for non-power of two sizes, since the wrapping of your
increment and your modulo RXBUF_SIZE get out of sync.

The usual choice is to track "head" and "tail", and use something like:

void uart_rx_isr(void) {
unsigned char c = UART->DATA;
// Reset interrupt flag
uint8_t next = rxfifo.tail;
rxfifo.buf[next] = c;
next++;
if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
rxfifo.tail = next;
}

>   // Reset interrupt flag
> }
>
> /* Called regularly from mainloop code */
> int uart_task(void) {
>   int c = -1;
>   if (out != in) {
>     c = rxfifo.buf[out % RXBUF_SIZE];
>     out++;
>   }
>   return -1;
> }

int uart_task(void) {
int c = -1;
uint8_t next = rxfifo.head;
if (next != rxfifo.tail) {
c = rxfifo.buf[next];
next++;
if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
rxfifo.head = next;
}
return c;
}

These don't track buffer overflow at all - you need to call uart_task()
often enough to avoid that.

(I'm skipping volatiles so we don't get ahead of your point.)

>
>
> From a 20-years old article[1] by Nigle Jones, this seems a situation
> where volatile must be used for rxfifo.in, because is modified by an ISR
> and used in the mainloop code.
>

Certainly whenever data is shared between ISR's and mainloop code, or
different threads, then you need to think about how to make sure data is
synchronised and exchanged. "volatile" is one method, atomics are
another, and memory barriers can be used.

> I don't think so, rxfifo.in is read from memory only one time in
> uart_task(), so there isn't the risk that compiler can optimize badly.

That is incorrect in two ways. One - baring compiler bugs (which do
occur, but they are very rare compared to user bugs), there is no such
thing as "optimising badly". If optimising changes the behaviour of the
code, other than its size and speed, the code is wrong. Two - it is a
very bad idea to imagine that having code inside a function somehow
"protects" it from re-ordering or other optimisation.

Functions can be inlined, outlined, cloned, and shuffled about.
Link-time optimisation, code in headers, C++ modules, and other
cross-unit optimisations are becoming more and more common. So while it
might be true /today/ that the compiler has no alternative but to read
rxfifo.in once per call to uart_task(), you cannot assume that will be
the case with later compilers or with more advanced optimisation
techniques enabled. It is safer, more portable, and more future-proof
to avoid such assumptions.

> Even if ISR is fired immediately after the if statement, this doesn't
> bring to a dangerous state: the just received data will be processed at
> the next call to uart_task().
>
> So IMHO volatile isn't necessary here. And critical sections (i.e.
> disabling interrupts) aren't useful too.
>
> However I'm thinking about memory barrier. Suppose the compiler reorder
> the instructions in uart_task() as follows:
>
>
>   c = rxfifo.buf[out % RXBUF_SIZE]
>   if (out != in) {
>     out++;
>     return c;
>   } else {
>     return -1;
>   }
>
>
> Here there's a big problem, because compiler decided to firstly read
> rxfifo.buf[] and then test in and out equality. If the ISR is fired
> immediately after moving data to c (most probably an internal register),
> the condition in the if statement will be true and the register value is
> returned. However the register value isn't correct.

You are absolutely correct.

>
> I don't think any modern C compiler reorder uart_task() in this way, but
> we can't be sure. The result shouldn't change for the compiler, so it
> can do this kind of things.

It is not an unreasonable re-arrangement. On processors with
out-of-order execution (which does not apply to the AVR or Cortex-M),
compilers will often push loads as early as they can in the instruction
stream so that they start the cache loading process as quickly as
possible. (But note that on such "big" processors, much of this
discussion on volatile and memory barriers is not sufficient, especially
if there is more than one core. You need atomics and fences, but that's
a story for another day.)

>
> How to fix this issue if I want to be extremely sure the compiler will
> not reorder this way? Applying volatile to rxfifo.in shouldn't help for
> this, because compiler is allowed to reorder access of non volatile
> variables yet[2].
>

The important thing about "volatile" is that it is /accesses/ that are
volatile, not objects. A volatile object is nothing more than an object
for which all accesses are volatile by default. But you can use
volatile accesses on non-volatile objects. This macro is your friend:

#define volatileAccess(v) *((volatile typeof((v)) *) &(v))

(Linux has the same macro, called ACCESS_ONCE. It uses a gcc extension
- if you are using other compilers then you can make an uglier
equivalent using _Generic. However, if you are using a C compiler that
supports C11, it is probably gcc or clang, and you can use the "typeof"
extension.)

That macro will let you make a volatile read or write to an object
without requiring that /all/ accesses to it are volatile.

> One solution is adding a memory barrier in this way:
>
>
> int uart_task(void) {
>   int c = -1;
>   if (out != in) {
>     memory_barrier();
>     c = rxfifo.buf[out % RXBUF_SIZE];
>     out++;
>   }
>   return -1;
> }
>

Note that you are forcing the compiler to read "out" twice here, as it
can't keep the value of "out" in a register across the memory barrier.
(And as I mentioned before, the compiler might be able to do larger
scale optimisation across compilation units or functions, and in that
way keep values across multiple calls to uart_task.)

>
> However this approach appears to me dangerous. You have to check and
> double check if, when and where memory barriers are necessary and it's
> simple to skip a barrier where it's nedded and add a barrier where it
> isn't needed.

Memory barriers are certainly useful, but they are a shotgun approach -
they affect /everything/ involving reads and writes to memory. (But
remember they don't affect ordering of calculations.)

>
> So I'm thinking that a sub-optimal (regarding efficiency) but reliable
> (regarding the risk to skip a barrier where it is needed) could be to
> enter a critical section (disabling interrupts) anyway, if it isn't
> strictly needed.
>
>
> int uart_task(void) {
>   ENTER_CRITICAL_SECTION();
>   int c = -1;
>   if (out != in) {
>     c = rxfifo.buf[out % RXBUF_SIZE];
>     out++;
>   }
>   EXIT_CRITICAL_SECTION();
>   return -1;
> }

Critical sections for something like this are /way/ overkill. And a
critical section with a division in the middle? Not a good idea.

>
>
> Another solution could be to apply volatile keyword to rxfifo.in *AND*
> rxfifo.buf too, so compiler can't change the order of accesses them.
>
> Do you have other suggestions?
>

Marking "in" and "buf" as volatile is /far/ better than using a critical
section, and likely to be more efficient than a memory barrier. You can
also use volatileAccess rather than making buf volatile, and it is often
slightly more efficient to cache volatile variables in a local variable
while working with them.

>
>
> [1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
> [2] https://blog.regehr.org/archives/28

SubjectRepliesAuthor
o How to write a simple driver in bare metal systems: volatile, memory

By: pozz on Fri, 22 Oct 2021

58pozz
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor