Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Research is what I'm doing when I don't know what I'm doing. -- Wernher von Braun


devel / comp.arch.embedded / Re: Stack analysis tool that really work?

SubjectAuthor
* Stack analysis tool that really work?pozz
+- Re: Stack analysis tool that really work?Niklas Holsti
`* Re: Stack analysis tool that really work?Don Y
 +* Re: Stack analysis tool that really work?Niklas Holsti
 |`* Re: Stack analysis tool that really work?Don Y
 | `* Re: Stack analysis tool that really work?Niklas Holsti
 |  `- Re: Stack analysis tool that really work?Don Y
 `* Re: Stack analysis tool that really work?pozz
  +* Re: Stack analysis tool that really work?Don Y
  |`* Re: Stack analysis tool that really work?pozz
  | `* Re: Stack analysis tool that really work?Don Y
  |  `* Re: Stack analysis tool that really work?Niklas Holsti
  |   `- Re: Stack analysis tool that really work?Don Y
  `* Re: Stack analysis tool that really work?Niklas Holsti
   `* Re: Stack analysis tool that really work?Don Y
    +* Re: Stack analysis tool that really work?Niklas Holsti
    |`* Re: Stack analysis tool that really work?Don Y
    | `* Re: Stack analysis tool that really work?Niklas Holsti
    |  `* Re: Stack analysis tool that really work?Don Y
    |   `* Re: Stack analysis tool that really work?Niklas Holsti
    |    +* Re: Stack analysis tool that really work?Don Y
    |    |`* Re: Stack analysis tool that really work?Niklas Holsti
    |    | `* Re: Stack analysis tool that really work?Don Y
    |    |  +* Re: Stack analysis tool that really work?Don Y
    |    |  |`* Re: Stack analysis tool that really work?Niklas Holsti
    |    |  | `* Re: Stack analysis tool that really work?Don Y
    |    |  |  `* Re: Stack analysis tool that really work?Niklas Holsti
    |    |  |   `- Re: Stack analysis tool that really work?Don Y
    |    |  `* Re: Stack analysis tool that really work?Niklas Holsti
    |    |   `* Re: Stack analysis tool that really work?Don Y
    |    |    `* Re: Stack analysis tool that really work?Niklas Holsti
    |    |     +- Re: Stack analysis tool that really work?Niklas Holsti
    |    |     `* Re: Stack analysis tool that really work?Don Y
    |    |      `- Re: Stack analysis tool that really work?Niklas Holsti
    |    `* Re: Stack analysis tool that really work?George Neuner
    |     `* Re: Stack analysis tool that really work?Niklas Holsti
    |      `* Re: Stack analysis tool that really work?George Neuner
    |       `* Re: Stack analysis tool that really work?Niklas Holsti
    |        +* Re: Stack analysis tool that really work?Paul Rubin
    |        |`- Re: Stack analysis tool that really work?Niklas Holsti
    |        `- Re: Stack analysis tool that really work?Don Y
    `* Re: Stack analysis tool that really work?Niklas Holsti
     `- Re: Stack analysis tool that really work?Don Y

Pages:12
Re: Stack analysis tool that really work?

<sekfad$uvl$2@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=591&group=comp.arch.embedded#591

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: blockedo...@foo.invalid (Don Y)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Fri, 6 Aug 2021 16:06:45 -0700
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <sekfad$uvl$2@dont-email.me>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<seiqoj$8mk$1@dont-email.me> <in5iifF4i5gU1@mid.individual.net>
<sekf60$uvl$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 6 Aug 2021 23:06:53 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6cf6db0820c4e91bdcad9bf6ed3258ca";
logging-data="31733"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+uHXdVTHH04Gc1UFQnH4Bm"
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
Thunderbird/52.1.1
Cancel-Lock: sha1:ZcfibbzfQXYzlhPnDWpzdqEfHlc=
In-Reply-To: <sekf60$uvl$1@dont-email.me>
Content-Language: en-US
 by: Don Y - Fri, 6 Aug 2021 23:06 UTC

On 8/6/2021 4:04 PM, Don Y wrote:
>> That said, for some processors it is easy to recognize at decode-time most of
>> the instructions that access the stack, and some versions of Bound-T let one
>> specify different access times for stack accesses and for general
>> (unclassified) accesses. That can be useful if the stack is located in fast
>> memory, but other data are in slower memory.
>
> I'm thinking, specifically, about I/Os -- which are increasingly memory
> mapped (including register spaces).

Sorry, I should be more clear. I'm looking at the issues that would affect
*my* needs in *my* environment -- realizing these may be different than
those you've previously targeted. (e.g., VMM, FPU emulation, etc.)

Re: Stack analysis tool that really work?

<egrrggh9q1hasur4lt1unptujjhvk2v5ra@4ax.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=592&group=comp.arch.embedded#592

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: gneun...@comcast.net (George Neuner)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Fri, 06 Aug 2021 22:55:34 -0400
Organization: A noiseless patient Spider
Lines: 43
Message-ID: <egrrggh9q1hasur4lt1unptujjhvk2v5ra@4ax.com>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me> <seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net> <seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net> <sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net> <segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="935278f59d23786217b7ce1137571e71";
logging-data="10773"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/qiN+e7mIh101oWD+3tck0CkbDLlJKgjQ="
User-Agent: ForteAgent/8.00.32.1272
Cancel-Lock: sha1:e3uiCjCy/MQze3KkZ/kx6fInEBs=
 by: George Neuner - Sat, 7 Aug 2021 02:55 UTC

On Thu, 5 Aug 2021 18:39:51 +0300, Niklas Holsti
<niklas.holsti@tidorum.invalid> wrote:

> :
>At present the tool assumes that every instruction and control transfer
>can be assigned its own WCET, in machine cycles, independent of other
>context or data values. The only dynamic aspects of instruction
>execution time that are modelled are the possible pipeline stalls due to
>read-after-write interlocks. As a special case, in the version for
>SPARC, the concurrent execution of the Integer Unit and the Floating
>Point Unit is modelled to estimate where and for how long the IU has to
>wait for the FPU to complete an operation.
>
>Good algorithms for computing WCET bounds for most kinds of instruction
>caches are known, but I did not get around to implementing any of those,
>so only cache-less processors are supported now. If you need WCET
>analysis of caches, the AbsInt tool is best.
>
>Data caches are still a hard problem, where the WCET analyses tend to be
>quite pessimistic. Moreover, the increasing use of eviction algorithms
>other than LRU (for example, pseudo-random eviction) lessens the
>precision of the cache analyses, even for instruction caches.
> :

How do you handle branch/jump mispredicts?

More importantly, how do you handle chains of conditional branches
and/or switch/case constructs which can mispredict at every decision
point? The branch targets may not be in cache - neither mispredicted
targets nor the actual one. Worst case for a horribly mispredicted
switch/case can be absolutely dreadful.

Similar problem for table driven code: unless the jump is always made
through the /same/ table element, then to a 1st approximation the jump
will mispredict /every/ time.

Late binding OO code is a problem too, though not to same degree: a
lot of code really is monomorphic with a given call usually or always
targeting the same class and method. But real polymorhic code (like
table driven) suffers greatly when dealing with heterogenous objects.

George

Re: Stack analysis tool that really work?

<in768jFeenmU1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=593&group=comp.arch.embedded#593

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Sat, 7 Aug 2021 13:40:19 +0300
Organization: Tidorum Ltd
Lines: 81
Message-ID: <in768jFeenmU1@mid.individual.net>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<egrrggh9q1hasur4lt1unptujjhvk2v5ra@4ax.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net Dvy2RqEgkhKvIx9EVGyUSAam6u/1vCpdA1Rdt1JdFDy6hX1yJ1
Cancel-Lock: sha1:h/s9x56fGLOaoMhquEK5hTvWhmM=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.12.0
In-Reply-To: <egrrggh9q1hasur4lt1unptujjhvk2v5ra@4ax.com>
Content-Language: en-US
 by: Niklas Holsti - Sat, 7 Aug 2021 10:40 UTC

On 2021-08-07 5:55, George Neuner wrote:
> On Thu, 5 Aug 2021 18:39:51 +0300, Niklas Holsti
> <niklas.holsti@tidorum.invalid> wrote:
>
>> :
>> At present the tool assumes that every instruction and control transfer
>> can be assigned its own WCET, in machine cycles, independent of other
>> context or data values. The only dynamic aspects of instruction
>> execution time that are modelled are the possible pipeline stalls due to
>> read-after-write interlocks.
...

> How do you handle branch/jump mispredicts?

See above re the assumptions of the tool.

The WCET analysis function of the Bound-T tool is (at present) suited
only for simple, deterministic processors. The tool development started
in the late 1990s when the company I then worked for developed
applications mainly for such processors.

If the processor uses a static branch prediction scheme (for example,
predict forward branches as not taken and backward branches as taken),
the mispredict penalty can be included in the WCET assigned to the
non-predicted edge in the control-flow graph at instruction decode-time.

For dynamic (history-dependent) branch predictors, various models for
static analysis have been proposed and published, but they have the same
problems as cache-analysis models: the HW microarchitectures are
becoming more and more complex and varied, making the modelling effort
more expensive, both in implementation effort and in analysis time, and
reducing the customer base for each specific model.

The WCET-analysis tools from AbsInt include detailed signal-by-signal
simulations of the microarchitectures. Unfortunately such details are
seldom documented well, and I believe AbsInt has had to enter special
agreements with processor suppliers for access to such information. And
the details can change with each processor revision...

Speculative executions such as branch prediction can create so-called
"timing anomalies" which make the WCET analysis much harder. A timing
anomaly is any property of a processor that makes it invalid to estimate
the WCET of a sequence of instructions by assuming that the first
instruction encounters its worst case (for example, a cache miss instead
of a cache hit) and then continuing the analysis of the rest of the
sequence while propagating that assumption forward as an assumption on
the state of the processor.

The canonical example of timing anomalies are processors where a cache
hit at one instruction changes the processor state in such a way that a
later instruction takes much longer than it would have taken, had the
first instruction encountered a cache miss instead of a hit. Propagating
the "first instruction missed" assumption to the context of the later
instruction may make the analysis under-estimate the execution time of
that later instruction. To handle timing anomalies, the analysis must
consider all possible combinations: first instruction hits or misses,
second instruction hits or misses, etc. Exponential complexity.

It amuses me that such processor properties, which are/were the bane of
static WCET analysis, are now in the limelight as the origin of the side
channels for various malware, such as Spectre exploits.

> More importantly, how do you handle chains of conditional branches
> and/or switch/case constructs which can mispredict at every decision
> point? The branch targets may not be in cache - neither mispredicted
> targets nor the actual one. Worst case for a horribly mispredicted
> switch/case can be absolutely dreadful.

I don't know if those questions are addressed to me (re the Bound-T
tool) or to WCET analysis in general. The advanced WCET tools, like the
ones from AbsInt, do try to cover such cases by their cache analysis to
provide a safe upper bound on the WCET, but the degree of pessimism
(over-estimation) increases with the complexity of the code.

That said, in many programs most of the processing time is taken by
relatively simple loops, for which the present I-cache analyses work
quite well, and for which even D-cache analysis can (I believe) work
satisfactorily if the memory addressing patterns are regular.

Re: Stack analysis tool that really work?

<in77v3FeoomU1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=594&group=comp.arch.embedded#594

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Sat, 7 Aug 2021 14:09:23 +0300
Organization: Tidorum Ltd
Lines: 118
Message-ID: <in77v3FeoomU1@mid.individual.net>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<seiqoj$8mk$1@dont-email.me> <in5iifF4i5gU1@mid.individual.net>
<sekf60$uvl$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: individual.net +xBNQPNNAB8K1xvAmePSbAc8cZCZ8cZc93MMxZ96KHyofxEs5t
Cancel-Lock: sha1:dXtPul63D/qe5IHwwi80uvHT1k8=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.12.0
In-Reply-To: <sekf60$uvl$1@dont-email.me>
Content-Language: en-US
 by: Niklas Holsti - Sat, 7 Aug 2021 11:09 UTC

On 2021-08-07 2:04, Don Y wrote:
> On 8/6/2021 12:58 PM, Niklas Holsti wrote:

>>> OK.  I'll have to build a more current version of GNAT.
>>
>> For playing around, I would just use the GNAT Community Edition. Or
>> the FSF GNAT that comes with MinGW32.
>
> Ugh!  No, I'll just build a fresh copy.  I do all my development
> under NetBSD so have everything I want/need ('cept the Ada tools)
> there, already.

You are braver than I am. Note that the front-end of GNAT is implemented
in Ada, so it has to be bootstrapped with an Ada compiler. And the
current version of GNAT requires quite an up-to-date Ada compiler for
the bootstrap. In practice, I think people have found that only GNAT can
build GNAT.

Many Linux distributions come with GNAT pre-built. Debian-derived
distributions have good support, I hear.

>> The analysis works on the control-flow graph (per subprogram). The
>> WCET for each basic block is the sum of the WCETs of the instructions
>> in that block. The WCET for an execution path through the graph (from
>> entry to return) is the sum of the WCETs of the basic blocks in the
>> bath, plus the WCETs assigned to each edge (control transfer) between
>> basic blocks in the path.
>
> So, the costs of conditional control transfers are handled separately (?)

I'm not sure I understand your question. Each edge in the control-flow
graph, that is, each transition from one instruction or one basic block
to a possible next instruction or block, is assigned a WCET value when
the instructions are decoded and entered in the control-flow graph.
Usually this WCET is zero for a fall-through edge from a non-branch
instruction to the next, and is non-zero only for edges from branch
instructions, whether conditional or unconditional. For a conditional
branch, the "not taken" edge usually has a zero WCET and the "taken"
edge has some non-zero WCET, as specified in the processor documentation
(remember we are assuming a simple processor).

>>> I assume this is semi table-driven (?)
>>
>> I don't understand that question. Please clarify.
>
> A large number of opcodes, each with particular costs.
> Do you build a jungle of conditionals to subdivide the
> "opcode space" into groups of similar-cost operations?
> Or, do you just have a table of:
>
> {opcode, mask, type[1], cost}
>
> [1] used to apply some further heuristics to your
> refinement of "cost"

I would say that the approach to translating instructions into their
analysis models (elements in the control-flow graph) is usually not
table-driven, but emulates the field-by-field decoding and case analysis
that would be done by a disassembler or emulator (and Bound-T can
produce a disassembly of the code). The internal model includes not only
the cost of the instruction, but also its effect on the computation. For
example, an instruction like "increment the accumulator", INC A, would
be decoded into a model that says

WCET: 1 cycle.
Control flow: to next instruction, unconditional.
Effect: A := A + 1.

and also the effect, if any, on condition flags.

>>> -- as a preface to inquiring as to
>>> how you develop (and verify!) a test suite?
>>
>> In the same way as for any complex program. Final validation by
>> running and measuring test code on a real processor or a
>> cycle-accurate simulation.
>
> Ah, OK.  So, you could "can" that particular exemplar and use it
> to test a modification to the code (without having to run the app
> on "real hardware", again)

Yep.

> I don't rely on "live" tests for my code. Rather, I use tools to
> generate good test coverage and then verify the results are what I
> expect. I find this easier and more easily extensible (I can test
> ARM code by running an x86 port of that code)

At my former job, implementing SW for ESA satellite applications, we
almost always tested the code first on normal PCs, in a simulated I/O
environment, and then on the embedded target. Easier for Ada than for C.

>> For processors where cache misses are much slower than cache hits
>> (which is fast coming to mean almost all processors) IMO an I-cache
>> analysis is necessary for static WCET analysis to be useful.
>
> I look at it the other way around. Assume there is NO cache. You
> know that your code WILL run in less time than this case. Regardless
> of the frequency or presence of competing events.
>
> Processors are fast enough, now, that you can usually afford to "step up"
> in capability for comparably little cost.

Sometimes, but not always. The ESA applications we made were usually
running (worst case) at about 70% processor load, and faster processors
for space applications were very expensive, not only in euros but also
in power, volume and mass, which are scarce resources in space.

Re: Stack analysis tool that really work?

<in78kdFeu52U1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=595&group=comp.arch.embedded#595

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Sat, 7 Aug 2021 14:20:44 +0300
Organization: Tidorum Ltd
Lines: 30
Message-ID: <in78kdFeu52U1@mid.individual.net>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<seiqoj$8mk$1@dont-email.me> <in5iifF4i5gU1@mid.individual.net>
<sekf60$uvl$1@dont-email.me> <sekfad$uvl$2@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: individual.net xQSAtjCBNg6yYdByhdcWFwiMxwzzKuTfg5N/cFcb3SOLdo3PpE
Cancel-Lock: sha1:OGq1T5myEaxWVz02ldVAiRoP5Og=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.12.0
In-Reply-To: <sekfad$uvl$2@dont-email.me>
Content-Language: en-US
 by: Niklas Holsti - Sat, 7 Aug 2021 11:20 UTC

On 2021-08-07 2:06, Don Y wrote:
> On 8/6/2021 4:04 PM, Don Y wrote:
>>> That said, for some processors it is easy to recognize at decode-time
>>> most of the instructions that access the stack, and some versions of
>>> Bound-T let one specify different access times for stack accesses and
>>> for general (unclassified) accesses. That can be useful if the stack
>>> is located in fast memory, but other data are in slower memory.
>>
>> I'm thinking, specifically, about I/Os -- which are increasingly memory
>> mapped (including register spaces).
>
> Sorry, I should be more clear.  I'm looking at the issues that would affect
> *my* needs in *my* environment -- realizing these may be different than
> those you've previously targeted.  (e.g., VMM, FPU emulation, etc.)

Ok. If you describe your needs, perhaps I can comment.

In my experience, access to memory-mapped registers tends to be not much
slower than access to memory. Also, quite often the addresses in MMIO
accesses are static or easy to derive in static analysis, so the WCET
tool could choose the proper access time, if the tool knows enough about
the system architecture.

If by "FPU emulation" you mean SW-implemented FP instructions, of course
we have encountered those. They are often hand-coded assembler with very
complex and tricky control flow, which makes it hard to find loop bounds
automatically, so manual annotations must be used instead.

What is VMM?

Re: Stack analysis tool that really work?

<sem7o3$v1d$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=596&group=comp.arch.embedded#596

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: blockedo...@foo.invalid (Don Y)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Sat, 7 Aug 2021 08:09:43 -0700
Organization: A noiseless patient Spider
Lines: 156
Message-ID: <sem7o3$v1d$1@dont-email.me>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<seiqoj$8mk$1@dont-email.me> <in5iifF4i5gU1@mid.individual.net>
<sekf60$uvl$1@dont-email.me> <in77v3FeoomU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 7 Aug 2021 15:09:55 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6cf6db0820c4e91bdcad9bf6ed3258ca";
logging-data="31789"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19sYUUdx9jpYGFyF2gCtLav"
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
Thunderbird/52.1.1
Cancel-Lock: sha1:l7GCaP16g4FYS74U9l+wFFbom6A=
In-Reply-To: <in77v3FeoomU1@mid.individual.net>
Content-Language: en-US
 by: Don Y - Sat, 7 Aug 2021 15:09 UTC

On 8/7/2021 4:09 AM, Niklas Holsti wrote:
> On 2021-08-07 2:04, Don Y wrote:
>> On 8/6/2021 12:58 PM, Niklas Holsti wrote:
>
>>>> OK. I'll have to build a more current version of GNAT.
>>>
>>> For playing around, I would just use the GNAT Community Edition. Or the FSF
>>> GNAT that comes with MinGW32.
>>
>> Ugh! No, I'll just build a fresh copy. I do all my development
>> under NetBSD so have everything I want/need ('cept the Ada tools)
>> there, already.
>
> You are braver than I am. Note that the front-end of GNAT is implemented in
> Ada, so it has to be bootstrapped with an Ada compiler. And the current version
> of GNAT requires quite an up-to-date Ada compiler for the bootstrap. In
> practice, I think people have found that only GNAT can build GNAT.
>
> Many Linux distributions come with GNAT pre-built. Debian-derived distributions
> have good support, I hear.

I don't run Linux. I already support Solaris/SPARC, Windows and NetBSD hosts.
Adding yet another just makes things harder.

I build all of the "programs" ("packages") on my NetBSD boxes. In the past,
the userland was built by folks of dubious abilities. They'd just try to
"get it to compile" (not even "compile without warnings"). So, there would
often be bugs and failures in the ported code.

[I recall a prebuilt gnuplot that didn't even successfully pass the
COMPREHENSIVE test suite -- because the person porting it didn't know
what certain functions should look like, graphically!]

Building myself lets me see the warnings/errors thrown. And, also lets
me inspect the sources as, often, the configuration options are not
completely documented (but apparent *if* you look at the sources).

As building package Q may rely on package B and L, which, in turn rely on
F, G and Z, this is often a bit involved. But, if those dependencies
can be used by other packages, that's less work, later.

[I keep track of what I build, the order I build them, any patches I
make to the sources, etc.]

>>> The analysis works on the control-flow graph (per subprogram). The WCET for
>>> each basic block is the sum of the WCETs of the instructions in that block.
>>> The WCET for an execution path through the graph (from entry to return) is
>>> the sum of the WCETs of the basic blocks in the bath, plus the WCETs
>>> assigned to each edge (control transfer) between basic blocks in the path.
>>
>> So, the costs of conditional control transfers are handled separately (?)
>
> I'm not sure I understand your question. Each edge in the control-flow graph,
> that is, each transition from one instruction or one basic block to a possible
> next instruction or block, is assigned a WCET value when the instructions are
> decoded and entered in the control-flow graph. Usually this WCET is zero for a
> fall-through edge from a non-branch instruction to the next, and is non-zero
> only for edges from branch instructions, whether conditional or unconditional.
> For a conditional branch, the "not taken" edge usually has a zero WCET and the
> "taken" edge has some non-zero WCET, as specified in the processor
> documentation (remember we are assuming a simple processor).

OK, that's what I was addressing; namely, that the two outcomes of a
(conditional) branch can have different costs -- yet those appeared to be
"outside" the blocks that you were describing.

>>>> I assume this is semi table-driven (?)
>>>
>>> I don't understand that question. Please clarify.
>>
>> A large number of opcodes, each with particular costs.
>> Do you build a jungle of conditionals to subdivide the
>> "opcode space" into groups of similar-cost operations?
>> Or, do you just have a table of:
>>
>> {opcode, mask, type[1], cost}
>>
>> [1] used to apply some further heuristics to your
>> refinement of "cost"
>
> I would say that the approach to translating instructions into their analysis
> models (elements in the control-flow graph) is usually not table-driven, but
> emulates the field-by-field decoding and case analysis that would be done by a
> disassembler or emulator (and Bound-T can produce a disassembly of the code).

Oh! I would have opted for a (possibly big) table. But, that's just personal
preference (I like table driven algorithms)

> The internal model includes not only the cost of the instruction, but also its
> effect on the computation. For example, an instruction like "increment the
> accumulator", INC A, would be decoded into a model that says
>
> WCET: 1 cycle.
> Control flow: to next instruction, unconditional.
> Effect: A := A + 1.
>
> and also the effect, if any, on condition flags.

Why do you care about the "effect"? Are you trying to determine which branches
will be taken? How many iterations through loops? etc.

Doesn't this require a fairly complete model of the processor?

>> I don't rely on "live" tests for my code. Rather, I use tools to
>> generate good test coverage and then verify the results are what I
>> expect. I find this easier and more easily extensible (I can test
>> ARM code by running an x86 port of that code)
>
> At my former job, implementing SW for ESA satellite applications, we almost
> always tested the code first on normal PCs, in a simulated I/O environment, and
> then on the embedded target. Easier for Ada than for C.

Historically, it's been hard for me to have access to real hardware.
So, my development style pushes all the I/Os out to the fringes of
the algorithms; so there is little more than a "copy" that occurs
between the algorithm and the I/O.

So, I can debug most of my code (95+%) just by simulating input
values and capturing output values. It's no worse than testing
a math library.

Then, to move to real hardware, just ensure that the transfers in and
out are atomic/intact/correct.

E.g., for my speech synthesizers, I just had them generate D/AC values
that I captured using some test scaffolding. Then, massaged the captured
data into a form that a media player could process and "played" them
on a PC.

For my gesture recognizer, I captured/simulated input from the various
types of input devices in a form that the algorithms expected. Then,
just passed the data to the algorithms and noted their "conclusions".

>>> For processors where cache misses are much slower than cache hits (which is
>>> fast coming to mean almost all processors) IMO an I-cache analysis is
>>> necessary for static WCET analysis to be useful.
>>
>> I look at it the other way around. Assume there is NO cache. You
>> know that your code WILL run in less time than this case. Regardless
>> of the frequency or presence of competing events.
>>
>> Processors are fast enough, now, that you can usually afford to "step up"
>> in capability for comparably little cost.
>
> Sometimes, but not always. The ESA applications we made were usually running
> (worst case) at about 70% processor load, and faster processors for space
> applications were very expensive, not only in euros but also in power, volume
> and mass, which are scarce resources in space.

Yes, of course. I'm speaking of the types of applications that I have dealt
with, not "universally".

My "cost constrained" days were back in the days of the "small/cheap CPU"
where you had to predict performance accurately -- but, doing so was
relatively easy (because processors had very predictable instruction
execution times).

Re: Stack analysis tool that really work?

<in7rubFipjuU1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=597&group=comp.arch.embedded#597

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Sat, 7 Aug 2021 19:50:18 +0300
Organization: Tidorum Ltd
Lines: 55
Message-ID: <in7rubFipjuU1@mid.individual.net>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<seiqoj$8mk$1@dont-email.me> <in5iifF4i5gU1@mid.individual.net>
<sekf60$uvl$1@dont-email.me> <in77v3FeoomU1@mid.individual.net>
<sem7o3$v1d$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: individual.net 4pBkLGy1Csy0hsasjXMCRwOytUEJRBaOgmBpZKHUcapzILNIwZ
Cancel-Lock: sha1:C52t+k+6LDRdKxskzamNALCWRnM=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.12.0
In-Reply-To: <sem7o3$v1d$1@dont-email.me>
Content-Language: en-US
 by: Niklas Holsti - Sat, 7 Aug 2021 16:50 UTC

On 2021-08-07 18:09, Don Y wrote:
> On 8/7/2021 4:09 AM, Niklas Holsti wrote:

(On how the Bound-T tool models instructions for stack and WCET analysis:)

>> The internal model includes not only the cost of the instruction, but
>> also its effect on the computation. For example, an instruction like
>> "increment the accumulator", INC A, would be decoded into a model that
>> says
>>
>>      WCET: 1 cycle.
>>      Control flow: to next instruction, unconditional.
>>      Effect: A := A + 1.
>>
>> and also the effect, if any, on condition flags.
>
> Why do you care about the "effect"? Are you trying to determine
> which branches will be taken? How many iterations through loops?
> etc.

The effect is necessary for any analysis that depends on run-time
values. Loop bounds are the prime example for WCET analysis, but
stack-usage analysis also needs it. For example, the effect of a push
instruction is to increment the "local stack height" model variable by
some number that reflects the size of the pushed data (which size can,
in principle, be non-static and computed at run time).

The stack-usage analysis finds the maximum possible value of the local
stack height in each subprogram, both globally and at each call site
within the subprogram, then adds up the local stack heights along each
possible call path to find the total usage for that call path, and then
finds and reports the path with the largest total usage.

> Doesn't this require a fairly complete model of the processor?

Yes indeed. The model could be somewhat simpler for stack-usage analysis
than for WCET analysis. For stack usage, the relevant effect of an
instruction is just to increase or decrease the local stack height by a
static constant. If push/pop happens in loops they are usually balanced
so that a loop iteration has no net effect on stack usage, making loop
iteration bounds unnecessary.

> My "cost constrained" days were back in the days of the "small/cheap
> CPU" where you had to predict performance accurately -- but, doing so
> was relatively easy (because processors had very predictable
> instruction execution times).

And that is the kind of processor that Bound-T was designed to support
for WCET analysis. Stack-usage analysis is not so sensitive.

Re: Stack analysis tool that really work?

<in7slsFit3vU1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=598&group=comp.arch.embedded#598

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Sat, 7 Aug 2021 20:02:52 +0300
Organization: Tidorum Ltd
Lines: 8
Message-ID: <in7slsFit3vU1@mid.individual.net>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<seiqoj$8mk$1@dont-email.me> <in5iifF4i5gU1@mid.individual.net>
<sekf60$uvl$1@dont-email.me> <in77v3FeoomU1@mid.individual.net>
<sem7o3$v1d$1@dont-email.me> <in7rubFipjuU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net YQNkR+mdZxl01VTXAqV12wI4agLYyPynbGIZlVza/LFSOtvT2+
Cancel-Lock: sha1:7CJ+cFUGD07toU3a7LOn6aIFF1Q=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.12.0
In-Reply-To: <in7rubFipjuU1@mid.individual.net>
Content-Language: en-US
 by: Niklas Holsti - Sat, 7 Aug 2021 17:02 UTC

Oops, a correction:

On 2021-08-07 19:50, Niklas Holsti wrote:

> For stack usage, the relevant effect of an instruction is just to
> increase or decrease the local stack height by a static constant.
I meant to say that the change in local stack height is _often_ a static
constant, not that it _always_ is static.

Re: Stack analysis tool that really work?

<semfoj$o70$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=599&group=comp.arch.embedded#599

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: blockedo...@foo.invalid (Don Y)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Sat, 7 Aug 2021 10:26:32 -0700
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <semfoj$o70$1@dont-email.me>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<seiqoj$8mk$1@dont-email.me> <in5iifF4i5gU1@mid.individual.net>
<sekf60$uvl$1@dont-email.me> <sekfad$uvl$2@dont-email.me>
<in78kdFeu52U1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 7 Aug 2021 17:26:43 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6cf6db0820c4e91bdcad9bf6ed3258ca";
logging-data="24800"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX185ijfTALUuMdprOvLRZXhs"
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
Thunderbird/52.1.1
Cancel-Lock: sha1:NVOpHu6XFgMpAfMrGGG+sqkiAQo=
In-Reply-To: <in78kdFeu52U1@mid.individual.net>
Content-Language: en-US
 by: Don Y - Sat, 7 Aug 2021 17:26 UTC

On 8/7/2021 4:20 AM, Niklas Holsti wrote:
> On 2021-08-07 2:06, Don Y wrote:
>> On 8/6/2021 4:04 PM, Don Y wrote:
>>>> That said, for some processors it is easy to recognize at decode-time most
>>>> of the instructions that access the stack, and some versions of Bound-T let
>>>> one specify different access times for stack accesses and for general
>>>> (unclassified) accesses. That can be useful if the stack is located in fast
>>>> memory, but other data are in slower memory.
>>>
>>> I'm thinking, specifically, about I/Os -- which are increasingly memory
>>> mapped (including register spaces).
>>
>> Sorry, I should be more clear. I'm looking at the issues that would affect
>> *my* needs in *my* environment -- realizing these may be different than
>> those you've previously targeted. (e.g., VMM, FPU emulation, etc.)
>
>
> Ok. If you describe your needs, perhaps I can comment.

I'm targeting some of the bigger ARM offerings -- A53/55. My impression is
they go through more "contortions" to eke out additional performance than some
of the smaller processors.

> In my experience, access to memory-mapped registers tends to be not much slower
> than access to memory. Also, quite often the addresses in MMIO accesses are
> static or easy to derive in static analysis, so the WCET tool could choose the
> proper access time, if the tool knows enough about the system architecture.

Yes, that last phrase being the kicker. Can this be simplified to something
as crude as a "memory map"?

> If by "FPU emulation" you mean SW-implemented FP instructions, of course we
> have encountered those. They are often hand-coded assembler with very complex
> and tricky control flow, which makes it hard to find loop bounds automatically,
> so manual annotations must be used instead.
>
> What is VMM?

Virtual Memory Management. I.e., when an opcode fetch (or argument reference)
can not only take longer than a cache miss... but *considerably* longer as
the physical memory is mapped in while the instruction stream "stalls".

[Note that a page fault need not map physical memory in the traditional sense.
It can also cause some "special" function to be invoked to provide the
requisite data/access. So, the cost of a fault can vary depend on
*what* is faulting and which pager is handling that fault]

Re: Stack analysis tool that really work?

<semg7r$r71$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=600&group=comp.arch.embedded#600

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: blockedo...@foo.invalid (Don Y)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Sat, 7 Aug 2021 10:34:39 -0700
Organization: A noiseless patient Spider
Lines: 74
Message-ID: <semg7r$r71$1@dont-email.me>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<seiqoj$8mk$1@dont-email.me> <in5iifF4i5gU1@mid.individual.net>
<sekf60$uvl$1@dont-email.me> <in77v3FeoomU1@mid.individual.net>
<sem7o3$v1d$1@dont-email.me> <in7rubFipjuU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 7 Aug 2021 17:34:51 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="6cf6db0820c4e91bdcad9bf6ed3258ca";
logging-data="27873"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/ChNDMk1pu05GE1u1iCdcW"
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
Thunderbird/52.1.1
Cancel-Lock: sha1:UR6bqzJmi7IcpGtxsZT7AH3ibXU=
In-Reply-To: <in7rubFipjuU1@mid.individual.net>
Content-Language: en-US
 by: Don Y - Sat, 7 Aug 2021 17:34 UTC

On 8/7/2021 9:50 AM, Niklas Holsti wrote:
> On 2021-08-07 18:09, Don Y wrote:
>> On 8/7/2021 4:09 AM, Niklas Holsti wrote:
>
>
> (On how the Bound-T tool models instructions for stack and WCET analysis:)
>
>>> The internal model includes not only the cost of the instruction, but also
>>> its effect on the computation. For example, an instruction like "increment
>>> the accumulator", INC A, would be decoded into a model that says
>>>
>>> WCET: 1 cycle.
>>> Control flow: to next instruction, unconditional.
>>> Effect: A := A + 1.
>>>
>>> and also the effect, if any, on condition flags.
>>
>> Why do you care about the "effect"? Are you trying to determine
>> which branches will be taken? How many iterations through loops?
>> etc.
>
> The effect is necessary for any analysis that depends on run-time values. Loop
> bounds are the prime example for WCET analysis, but stack-usage analysis also
> needs it. For example, the effect of a push instruction is to increment the
> "local stack height" model variable by some number that reflects the size of
> the pushed data (which size can, in principle, be non-static and computed at
> run time).

OK. But, you're not trying to emulate/simulate the complete algorithm;
just handle the side effects of value changes.

But wouldn't you *have* to do a more thorough emulation?

foo(count) {
for (i = 0; i < 5 * count; i++) {
diddle()
}
}

> The stack-usage analysis finds the maximum possible value of the local stack
> height in each subprogram, both globally and at each call site within the
> subprogram, then adds up the local stack heights along each possible call path
> to find the total usage for that call path, and then finds and reports the path
> with the largest total usage.
>
>> Doesn't this require a fairly complete model of the processor?
>
> Yes indeed. The model could be somewhat simpler for stack-usage analysis than
> for WCET analysis. For stack usage, the relevant effect of an instruction is
> just to increase or decrease the local stack height by a static constant. If
> push/pop happens in loops they are usually balanced so that a loop iteration
> has no net effect on stack usage, making loop iteration bounds unnecessary.

reverse(count, string) {
while (count-- > 0) {
reverse(count-1, &string[1])
emit(string[0])
}
}

No, that's a shitty example. I'll have to think on it when I have more time...

>> My "cost constrained" days were back in the days of the "small/cheap
>> CPU" where you had to predict performance accurately -- but, doing so
>> was relatively easy (because processors had very predictable
>> instruction execution times).
>
> And that is the kind of processor that Bound-T was designed to support for WCET
> analysis. Stack-usage analysis is not so sensitive.

Ah, OK.

Time to get my neighbor's lunch ready...

Re: Stack analysis tool that really work?

<in7v7cFjeflU1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=601&group=comp.arch.embedded#601

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Sat, 7 Aug 2021 20:46:19 +0300
Organization: Tidorum Ltd
Lines: 55
Message-ID: <in7v7cFjeflU1@mid.individual.net>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<seiqoj$8mk$1@dont-email.me> <in5iifF4i5gU1@mid.individual.net>
<sekf60$uvl$1@dont-email.me> <sekfad$uvl$2@dont-email.me>
<in78kdFeu52U1@mid.individual.net> <semfoj$o70$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net DbrucQgRnhRlTVg6jDRHkwO4FTmRL3InFbdElrE1mmGzhuZDQO
Cancel-Lock: sha1:0KUQ32/zeZGFgNhgof+iXW0kso4=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.12.0
In-Reply-To: <semfoj$o70$1@dont-email.me>
Content-Language: en-US
 by: Niklas Holsti - Sat, 7 Aug 2021 17:46 UTC

On 2021-08-07 20:26, Don Y wrote:
> On 8/7/2021 4:20 AM, Niklas Holsti wrote:

(In a discussion about WCET analysis more than stack analysis:)

>> Ok. If you describe your needs, perhaps I can comment.
>
> I'm targeting some of the bigger ARM offerings -- A53/55. My
> impression is they go through more "contortions" to eke out
> additional performance than some of the smaller processors.

Out of scope for Bound-T, I'm sure. Even the AbsInt tool has no support
for static WCET analysis of those processors and their ARM page suggests
a hybrid tool instead (https://www.absint.com/ait/arm.htm).

>> In my experience, access to memory-mapped registers tends to be not
>> much slower than access to memory. Also, quite often the addresses in
>> MMIO accesses are static or easy to derive in static analysis, so the
>> WCET tool could choose the proper access time, if the tool knows
>> enough about the system architecture.
>
> Yes, that last phrase being the kicker. Can this be simplified to
> something as crude as a "memory map"?

It seems so, if the access time is simply a function of the accessed
address.

>> What is VMM?
>
> Virtual Memory Management. I.e., when an opcode fetch (or argument
> reference) can not only take longer than a cache miss... but
> *considerably* longer as the physical memory is mapped in while the
> instruction stream "stalls".

I don't recall seeing any static WCET analysis for page faults, but
there may be some for Translation Look-aside Buffer misses. Out of my
competence, and out of scope for Bound-T, certainly.

> [Note that a page fault need not map physical memory in the
> traditional sense. It can also cause some "special" function to be
> invoked to provide the requisite data/access. So, the cost of a
> fault can vary depend on *what* is faulting and which pager is
> handling that fault]

You may be able to map some of that into a very capable schedulability
analyzer, one that can handle chains of "tasks" passing data/messages to
each other. But translating the application logic and system behaviour
into a model for such a schedulability analyzer is not trivial.

Re: Stack analysis tool that really work?

<in80k2Fjmv2U1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=602&group=comp.arch.embedded#602

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Sat, 7 Aug 2021 21:10:09 +0300
Organization: Tidorum Ltd
Lines: 77
Message-ID: <in80k2Fjmv2U1@mid.individual.net>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<seiqoj$8mk$1@dont-email.me> <in5iifF4i5gU1@mid.individual.net>
<sekf60$uvl$1@dont-email.me> <in77v3FeoomU1@mid.individual.net>
<sem7o3$v1d$1@dont-email.me> <in7rubFipjuU1@mid.individual.net>
<semg7r$r71$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: individual.net SzEOMxxFjz16HDyNWDWSSAUVtMvGbAU6Xbo3rkkrNPI5PZOgsq
Cancel-Lock: sha1:IIoDjhPE9fbNoptjsMfiymxP12c=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.12.0
In-Reply-To: <semg7r$r71$1@dont-email.me>
Content-Language: en-US
 by: Niklas Holsti - Sat, 7 Aug 2021 18:10 UTC

On 2021-08-07 20:34, Don Y wrote:
> On 8/7/2021 9:50 AM, Niklas Holsti wrote:
>> On 2021-08-07 18:09, Don Y wrote:
>>> On 8/7/2021 4:09 AM, Niklas Holsti wrote:
>>
>>
>> (On how the Bound-T tool models instructions for stack and WCET
>> analysis:)
>>
>>>> The internal model includes not only the cost of the instruction,
>>>> but also its effect on the computation. For example, an instruction
>>>> like "increment the accumulator", INC A, would be decoded into a
>>>> model that says
>>>>
>>>>      WCET: 1 cycle.
>>>>      Control flow: to next instruction, unconditional.
>>>>      Effect: A := A + 1.
>>>>
>>>> and also the effect, if any, on condition flags.
>>>
>>> Why do you care about the "effect"?  Are you trying to determine
>>> which branches will be taken?  How many iterations through loops?
>>> etc.
>>
>> The effect is necessary for any analysis that depends on run-time
>> values. Loop bounds are the prime example for WCET analysis, but
>> stack-usage analysis also needs it. For example, the effect of a push
>> instruction is to increment the "local stack height" model variable by
>> some number that reflects the size of the pushed data (which size can,
>> in principle, be non-static and computed at run time).
>
> OK.  But, you're not trying to emulate/simulate the complete algorithm;
> just handle the side effects of value changes.
>
> But wouldn't you *have* to do a more thorough emulation?
>
> foo(count) {
>    for (i = 0; i < 5 * count; i++) {
>        diddle()
>    }
> }

The point and aim of _static_ analysis is to abstract the computation to
model only the aspects that are relevant for the goals of the analysis,
and especially to avoid any step-by-step emulation or interpretation --
that would become "dynamic analysis".

For that example, analysing the subprogram foo stand-alone, the WCET
analysis in Bound-T would conclude that, in the loop:

- The variable "i" is an induction variable that increases by one on
each iteration. This looks promising, and means that Bound-T can model
the value of "i" as the initial value (zero) plus the loop iteration
number (starting the count at iteration number zero).

- But the analysis will not find a numeric upper bound on the iteration
number, because the value of "count" is unknown.

If this analysis of foo occurs while analysing some higher-level
subprogram that calls foo, Bound-T would next try to compute bounds for
the actual value of "count" in each such call. Say that the call is simply

foo (33);

This would make Bound-T reanalyze foo in the context count=33, which
would show that the loop can be repeated only under the condition

5 * (iteration number) < 33

equivalent to (iteration number) < 33/5 = 6, which bounds the number of
loop iterations and lets Bound-T produce a WCET bound for this specific
call of foo (assuming that Bound-T can produce a WCET bound for the
"diddle" subprogram too).

Such repeated analyses down a call-path with increasing amount of
context will of course increase the analysis time.

Re: Stack analysis tool that really work?

<sen0lb$9s9$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=603&group=comp.arch.embedded#603

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: blockedo...@foo.invalid (Don Y)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Sat, 7 Aug 2021 15:14:52 -0700
Organization: A noiseless patient Spider
Lines: 83
Message-ID: <sen0lb$9s9$1@dont-email.me>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<seiqoj$8mk$1@dont-email.me> <in5iifF4i5gU1@mid.individual.net>
<sekf60$uvl$1@dont-email.me> <sekfad$uvl$2@dont-email.me>
<in78kdFeu52U1@mid.individual.net> <semfoj$o70$1@dont-email.me>
<in7v7cFjeflU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 7 Aug 2021 22:15:08 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="86656738227dbe3d8b634bfc241002d0";
logging-data="10121"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Xut2Gb3MuHLIhSvVlil9d"
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
Thunderbird/52.1.1
Cancel-Lock: sha1:noJ3UY9HUHOzy3i2p/R5vlTIuX8=
In-Reply-To: <in7v7cFjeflU1@mid.individual.net>
Content-Language: en-US
 by: Don Y - Sat, 7 Aug 2021 22:14 UTC

On 8/7/2021 10:46 AM, Niklas Holsti wrote:
> On 2021-08-07 20:26, Don Y wrote:
>> On 8/7/2021 4:20 AM, Niklas Holsti wrote:
>
> (In a discussion about WCET analysis more than stack analysis:)
>
>>> Ok. If you describe your needs, perhaps I can comment.
>>
>> I'm targeting some of the bigger ARM offerings -- A53/55. My
>> impression is they go through more "contortions" to eke out
>> additional performance than some of the smaller processors.
>
> Out of scope for Bound-T, I'm sure. Even the AbsInt tool has no support for
> static WCET analysis of those processors and their ARM page suggests a hybrid
> tool instead (https://www.absint.com/ait/arm.htm).

You can see why I favor hand-waving away all of the "tricks" the processors
can play to improve performance! Granted, the numbers that result are TRULY
"worst case" -- and likely significantly inflated!

But, if you size for that, then all of the "noncritical" stuff runs "for free"!

>>> In my experience, access to memory-mapped registers tends to be not much
>>> slower than access to memory. Also, quite often the addresses in MMIO
>>> accesses are static or easy to derive in static analysis, so the WCET tool
>>> could choose the proper access time, if the tool knows enough about the
>>> system architecture.
>>
>> Yes, that last phrase being the kicker. Can this be simplified to something
>> as crude as a "memory map"?
>
> It seems so, if the access time is simply a function of the accessed address.
>
>>> What is VMM?
>>
>> Virtual Memory Management. I.e., when an opcode fetch (or argument
>> reference) can not only take longer than a cache miss... but
>> *considerably* longer as the physical memory is mapped in while the
>> instruction stream "stalls".
>
> I don't recall seeing any static WCET analysis for page faults, but there may
> be some for Translation Look-aside Buffer misses. Out of my competence, and out
> of scope for Bound-T, certainly.

Unlike a cache miss, it's not easy to predict those costs without an
intimate model of the software.

E.g., two "closely located" addresses can have entirely different
behaviors (timing) on a page fault. And, you likely can't predict where
those addresses will reside as that's a function of how they are mapped
at runtime.

Unlike a conventional VMM system (faulting in/out pages from a secondary/disk
store), I rely on the mechanism heavily for communication and process container
hacks.

I.e., there's a good reason to over-specify the hardware -- so you can be
inefficient in your use of it! :-/

>> [Note that a page fault need not map physical memory in the
>> traditional sense. It can also cause some "special" function to be
>> invoked to provide the requisite data/access. So, the cost of a
>> fault can vary depend on *what* is faulting and which pager is
>> handling that fault]
>
> You may be able to map some of that into a very capable schedulability
> analyzer, one that can handle chains of "tasks" passing data/messages to each
> other. But translating the application logic and system behaviour into a model
> for such a schedulability analyzer is not trivial.

As my workload is dynamically defined (can grow or shrink, algorithmically),
I don't really fret this. It's not like a closed system where what's inside
*has* to work. If something isn't working, I can shed load (or, move it to
another processor). And, if something is underutilized, *add* load (possibly
moved from another processor).

You typically do this intuitively on your workstation; if things start
to run sluggishly, you kill off some applications (and make a mental note
to run them, again, later). And, if the workstation isn't "doing anything",
you *add* application processes.

If you needed to "make world", you might power up another workstation so
it didn't impede the activities of the "normal" workstation you're using.

Re: Stack analysis tool that really work?

<uci0hgpvq5a2rr3oel6i68qvgecnb48rke@4ax.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=604&group=comp.arch.embedded#604

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: gneun...@comcast.net (George Neuner)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Sun, 08 Aug 2021 18:34:25 -0400
Organization: A noiseless patient Spider
Lines: 43
Message-ID: <uci0hgpvq5a2rr3oel6i68qvgecnb48rke@4ax.com>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me> <seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net> <seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net> <sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net> <segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net> <egrrggh9q1hasur4lt1unptujjhvk2v5ra@4ax.com> <in768jFeenmU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="dd4cf3c2e313b1580ca4134e1f642f98";
logging-data="1424"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19yeT/6mzwNEWl9tssMKTA2bd6KAKNF/Rk="
User-Agent: ForteAgent/8.00.32.1272
Cancel-Lock: sha1:PkdQTSBaR2r74T1n3ZplSLKZR+o=
 by: George Neuner - Sun, 8 Aug 2021 22:34 UTC

On Sat, 7 Aug 2021 13:40:19 +0300, Niklas Holsti
<niklas.holsti@tidorum.invalid> wrote:

>On 2021-08-07 5:55, George Neuner wrote:
>
>> More importantly, how do you handle chains of conditional branches
>> and/or switch/case constructs which can mispredict at every decision
>> point? The branch targets may not be in cache - neither mispredicted
>> targets nor the actual one. Worst case for a horribly mispredicted
>> switch/case can be absolutely dreadful.
>
>
>I don't know if those questions are addressed to me (re the Bound-T
>tool) or to WCET analysis in general.

A little of both really. I was hoping you had some smart(er) approach
to estimating misprediction effects in systems that use dynamic
prediction - even if it's just heuristic.

A lot of more powerful chips now are being used even in 'small'
systems, and some of them do have dynamic prediction. Note that Don
is talking about Cortex A53, A55, etc.

>The advanced WCET tools, like the
>ones from AbsInt, do try to cover such cases by their cache analysis to
>provide a safe upper bound on the WCET, but the degree of pessimism
>(over-estimation) increases with the complexity of the code.
>
>That said, in many programs most of the processing time is taken by
>relatively simple loops, for which the present I-cache analyses work
>quite well, and for which even D-cache analysis can (I believe) work
>satisfactorily if the memory addressing patterns are regular.

Dynamic prediction handles loop control quite well ... it's all the
other branching code that is the problem.

WCET has use outside the 'embedded' world also. A lot of my work was
in QA/QC vision systems: tons of code, tons of features, workstation
class processors, and still having to be /hard real time/.

George

Re: Stack analysis tool that really work?

<inc519Feuf2U1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=605&group=comp.arch.embedded#605

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Mon, 9 Aug 2021 10:50:00 +0300
Organization: Tidorum Ltd
Lines: 56
Message-ID: <inc519Feuf2U1@mid.individual.net>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<egrrggh9q1hasur4lt1unptujjhvk2v5ra@4ax.com>
<in768jFeenmU1@mid.individual.net>
<uci0hgpvq5a2rr3oel6i68qvgecnb48rke@4ax.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net aVd6PXz4qUbK6xkWWFlA8AHrM8+XcPc6GAzF8RAyVNZghREAQy
Cancel-Lock: sha1:tDQuLZVeZZq1Y5LPiG+kYO6gaTA=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.12.0
In-Reply-To: <uci0hgpvq5a2rr3oel6i68qvgecnb48rke@4ax.com>
Content-Language: en-US
 by: Niklas Holsti - Mon, 9 Aug 2021 07:50 UTC

On 2021-08-09 1:34, George Neuner wrote:
> On Sat, 7 Aug 2021 13:40:19 +0300, Niklas Holsti
> <niklas.holsti@tidorum.invalid> wrote:
>
>> On 2021-08-07 5:55, George Neuner wrote:
>>
>>> More importantly, how do you handle chains of conditional branches
>>> and/or switch/case constructs which can mispredict at every decision
>>> point? The branch targets may not be in cache - neither mispredicted
>>> targets nor the actual one. Worst case for a horribly mispredicted
>>> switch/case can be absolutely dreadful.
>>
>>
>> I don't know if those questions are addressed to me (re the Bound-T
>> tool) or to WCET analysis in general.
>
> A little of both really. I was hoping you had some smart(er) approach
> to estimating misprediction effects in systems that use dynamic
> prediction - even if it's just heuristic.

Sorry, but no.

> A lot of more powerful chips now are being used even in 'small'
> systems, and some of them do have dynamic prediction. Note that Don
> is talking about Cortex A53, A55, etc.

Indeed, and therefore static WCET analysis is waning, and hybrid (partly
measurement-based) WCET estimation is waxing.

> Dynamic prediction handles loop control quite well ... it's all the
> other branching code that is the problem.
>
> WCET has use outside the 'embedded' world also. A lot of my work was
> in QA/QC vision systems: tons of code, tons of features, workstation
> class processors, and still having to be /hard real time/.

(I would call that "embedded", because it is a computerized system used
for a single purpose.)

One could say, perhaps meanly, that that is an ill-posed problem, with
the wrong choice of processors. But cost matters, of course...

If you are not satisfied with Don's approach (extreme over-provision of
processor power) you could try the hybrid WCET-estimation tools
(RapiTime or TimeWeaver) which do not need to model the processors, but
need to measure fine-grained execution times (on the basic-block level).
The problem with such tools is that they cannot guarantee to produce an
upper bound on the WCET, only a bound that holds with high probability.
And, AIUI, at present that probability cannot be computed, and certainly
depends on the test suite being measured. For example, on whether those
tests lead to mispredictions in chains of conditional branches.

Re: Stack analysis tool that really work?

<87czqmpj9h.fsf@nightsong.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=606&group=comp.arch.embedded#606

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: no.em...@nospam.invalid (Paul Rubin)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Mon, 09 Aug 2021 11:26:02 -0700
Organization: A noiseless patient Spider
Lines: 10
Message-ID: <87czqmpj9h.fsf@nightsong.com>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<egrrggh9q1hasur4lt1unptujjhvk2v5ra@4ax.com>
<in768jFeenmU1@mid.individual.net>
<uci0hgpvq5a2rr3oel6i68qvgecnb48rke@4ax.com>
<inc519Feuf2U1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="c9893294079cdd1de3d3d077bf25c8b0";
logging-data="11619"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18zV5zW7RRQl2CWLuRlX13K"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
Cancel-Lock: sha1:nsh7eMOSx69mC0Hxls6I0CaoVxo=
sha1:Npl8ne21j/QzHAZjZFZOWFmpAzg=
 by: Paul Rubin - Mon, 9 Aug 2021 18:26 UTC

Niklas Holsti <niklas.holsti@tidorum.invalid> writes:
> And, AIUI, at present that probability cannot be
> computed, and certainly depends on the test suite being measured. For
> example, on whether those tests lead to mispredictions in chains of
> conditional branches.

Maybe architectural simulation of the target cpu can help, if the
architecture is known (i.e. exact workings of the pipelines, branch
predictors etc). And maybe forthcoming RISC-V cpus will be more open
about this than the current ARM stuff is.

Re: Stack analysis tool that really work?

<seru91$unc$1@dont-email.me>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=607&group=comp.arch.embedded#607

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: blockedo...@foo.invalid (Don Y)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Mon, 9 Aug 2021 12:04:56 -0700
Organization: A noiseless patient Spider
Lines: 101
Message-ID: <seru91$unc$1@dont-email.me>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<egrrggh9q1hasur4lt1unptujjhvk2v5ra@4ax.com>
<in768jFeenmU1@mid.individual.net>
<uci0hgpvq5a2rr3oel6i68qvgecnb48rke@4ax.com>
<inc519Feuf2U1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 9 Aug 2021 19:05:05 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="42541d25e731b7df3cfef577e52db1fa";
logging-data="31468"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1//9MQq7Py2oyljWwH5oSnk"
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
Thunderbird/52.1.1
Cancel-Lock: sha1:gS0kAp43VQxiA15YBnpYQx6xgWA=
In-Reply-To: <inc519Feuf2U1@mid.individual.net>
Content-Language: en-US
 by: Don Y - Mon, 9 Aug 2021 19:04 UTC

On 8/9/2021 12:50 AM, Niklas Holsti wrote:
> If you are not satisfied with Don's approach (extreme over-provision of
> processor power)

It's not necessarily "extreme". I only grossly underestimate the processor's
ability to do the "must do" aspects of the design. This ensures that they
actually *will* get done.

And, what's *left* will likely exceed the needs of what I'd *like* to (also)
get done -- but, as that is (by definition) not a "must do" aspect of the
design, there are no criteria as to HOW MUCH must get done (including
"none" and "all")

In my world, I can defer execution of some tasks, *move* tasks from an
overburdened processor to a less-burdened one, *add* physical processors
to the mix (on demand), etc.

So, trying to match (at design time) the load to the capabilities is
neither necessary nor desireable/economical.

There's no (a priori) "hard limit" on the number/types of applications that
you can run on your PC, is there? Instead, you dynamically review the
performance you are experiencing and adjust your expectations, accordingly.

Maybe kill off some application that isn't *as* useful to you (at the
moment) as some other. Or, defer invoking an application until some
*other* has exceeded its utility.

You could, of course, just buy a bigger/faster PC! Yet, you don't (up
to a point). Because that changes the economics of the solution.
And, because you can "make do" with your current investment just by
more intelligently scheduling its use!

By minimizing the "must do" aspect of the problem, you give yourself
the most flexibility in how you address that/those requirements.
The rest is "free", figuratively speaking.

For example, one of my applications handles telephony. It screens
incoming callers, does speaker identification ("who is calling"),
etc. To train the recognizer, I have it "listen in" on every call
given that it now KNOWS who I am talking with and can extract/update
speech parameters from the additional "training material" that is
present, FOR FREE, in the ongoing conversation. (instead of explicitly
requiring callers to "train" the recognizer in a "training session")

But, there's no reason that this has to happen:
- in real time
- while the conversation is in progress
- anytime "soon"

So, if you *record* the conversation (relatively inexpensive as you
are already moving the audio through the processor), you can pick
some *later* time to run the model update task. Maybe in the middle
of the night when there are less demands (i.e., no one telephoning
you at 3AM!)

And, if something happens to "come up" that is more important than
this activity, you can checkpoint the operation, kill off the process
and let the more important activity have access to those resources.

Yes, it would be a much simpler design if you could just say:
"I *need* the following resources to be able to update the
recognizer models AS THE CONVERSATION WAS HAPPENING". But,
when the conversation was *over*, you'd have all those extra
re$ource$ sitting idle.

Turn "hard" requirements into *soft* ones -- and then shift them,
in time, to periods of lower resource utilization.

[The "hard real-time is hard but soft real-time is HARDER" idiom]

> you could try the hybrid WCET-estimation tools (RapiTime or
> TimeWeaver) which do not need to model the processors, but need to measure
> fine-grained execution times (on the basic-block level). The problem with such
> tools is that they cannot guarantee to produce an upper bound on the WCET, only
> a bound that holds with high probability. And, AIUI, at present that
> probability cannot be computed, and certainly depends on the test suite being
> measured. For example, on whether those tests lead to mispredictions in chains
> of conditional branches.

The equivalent mechanism in my world is monitoring *actual* load.
As it increases, you have a mismatch between needs and capabilities.
So, adjust one, the other, or both!

Shed load (remove processes from the current node)

and/or

Add capacity (*move* process to another node, including bringing
that other node "on-line" to handle the load!)

If you keep in mind the fact that you only have to deal with
the MUST DO tasks, this is a lot easier to wrap your head
around. E.g., eventually, there *will* be a situation where
what you *want* to do exceeds the capabilities that you
have available -- period. So, you have to remember that only
some of those are MUST DOs.

Yeah, it would be nice to have 3000 applications loaded and ready
at a mouse-click... but, that wouldn't be worth the cost of
the system required to support it!

Re: Stack analysis tool that really work?

<indfp8Fnja9U1@mid.individual.net>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=608&group=comp.arch.embedded#608

  copy link   Newsgroups: comp.arch.embedded
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: niklas.h...@tidorum.invalid (Niklas Holsti)
Newsgroups: comp.arch.embedded
Subject: Re: Stack analysis tool that really work?
Date: Mon, 9 Aug 2021 22:59:36 +0300
Organization: Tidorum Ltd
Lines: 25
Message-ID: <indfp8Fnja9U1@mid.individual.net>
References: <sdukj9$8tu$1@dont-email.me> <sea47k$j3s$1@dont-email.me>
<seb3fa$82q$1@dont-email.me> <imvvdmFu7i4U1@mid.individual.net>
<seek0l$e3r$1@dont-email.me> <in05quF11e0U1@mid.individual.net>
<sef5c3$5so$1@dont-email.me> <in1tnaFbu8rU1@mid.individual.net>
<segoc2$6dj$1@dont-email.me> <in2f27FfeehU1@mid.individual.net>
<egrrggh9q1hasur4lt1unptujjhvk2v5ra@4ax.com>
<in768jFeenmU1@mid.individual.net>
<uci0hgpvq5a2rr3oel6i68qvgecnb48rke@4ax.com>
<inc519Feuf2U1@mid.individual.net> <87czqmpj9h.fsf@nightsong.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net Gjlnj3iEd229ViMUZRi7WQJNBc0TRtNJyW7SCo8GCFZlaYE/4m
Cancel-Lock: sha1:cwYRBdxz0KD9qt1rzKTF64yZEZI=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0)
Gecko/20100101 Thunderbird/78.12.0
In-Reply-To: <87czqmpj9h.fsf@nightsong.com>
Content-Language: en-US
 by: Niklas Holsti - Mon, 9 Aug 2021 19:59 UTC

On 2021-08-09 21:26, Paul Rubin wrote:
> Niklas Holsti <niklas.holsti@tidorum.invalid> writes:
>> And, AIUI, at present that probability cannot be
>> computed, and certainly depends on the test suite being measured. For
>> example, on whether those tests lead to mispredictions in chains of
>> conditional branches.
>
> Maybe architectural simulation of the target cpu can help, if the
> architecture is known (i.e. exact workings of the pipelines, branch
> predictors etc).

For timing, one needs *micro* -architectural simulation (as the term is
commonly used, for example in comp.arch). But I think you mean that, so
this is a terminological quibble.

> And maybe forthcoming RISC-V cpus will be more open about this than
> the current ARM stuff is.

I suspect that the microarchitecture will be where the various RISC-V
implementors will compete (that, and peripherals and packaging), so I'm
not optimistic that they will be very open about their
microarchitectures. However, I don't know how far into the micro-level
the RISC-V standardization and open-source licensing extends.

Pages:12
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor