Message-ID:

Vulcans worship peace above all. -- McCoy, "Return to Tomorrow", stardate 4768.3

devel / comp.arch.embedded / Re: Text on FSM

Hi Don,

Sorry for the delay here ... you know what's going on.

On Tue, 14 Mar 2023 19:40:07 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 3/14/2023 6:29 PM, George Neuner wrote:
>> On Mon, 13 Mar 2023 22:59:25 -0700, Don Y
>> <blockedofcourse@foo.invalid> wrote:
>>
>>> On 3/13/2023 8:07 PM, George Neuner wrote:
>>
>> When you merge two FSM you often get redundant "don't care" nodes, but
>> you also can get nodes which either are impossible to enter [dead
>> code], or impossible to leave [halt], because there are no legal
>> transitions that will permit it. Joining FSM involves identifying and
>> pruning both types of nodes.
>
>Then how did you decide they were equivalent? Clearly, (at least)
>one had a different set of options/transitions that is not supported
>in the "merged" implementation.

You merge multiple DFA by constructing an equivalent NDFA where all
the transitions that lead to the same action are coalesced into a
single node (effectively eliminating the redundancies). Some of the
impossible halt states may also be eliminated as redundancies.

Once the minimum state NDFA is built, you turn /that/ back into a DFA
to increase performance.

>>>> Then there is flex.
>>>>
>>>> flex has some DFA optimizations available. First, flex can compress
>>>> the data tables which encode its DFA states. Second, flex can
>>>> discover "equivalence" classes: groups of characters which result in
>>>> the same action. And finally, flex can [sometimes] discover
>>>> "meta-equivalence" classes: commonly expected sequences of characters
>>>> and/or other equivalence classes.
>>>
>>> But, those are mapping equivalent *inputs* together, not equivalent
>>> *states*. I.e., treating "BEGIN" and "begin" as equivalent.
>>
>> No. Consider the case of just recognizing a decimal digit: compare
>> the graph using the alternation: (0|1|2|3|4|5|6|7|8|9), vs the graph
>> using the class [:digit:].
>>
>> Using the OR alternation, including start you have 11 nodes. Start has
>> 10 transitions exiting, and each digit node has a single transition
>> entering.
>
>Are you using "node" as a synonym for "state"?

I am using graph terminology - mostly because all FA builders start by
constructing a state /graph/. After the graph is complete, it may be
implemented using tables, or Duff's device, etc. ... but it doesn't
start that way. 8-)

>E.g.,
>
> State Powered_Up
>On '1' Goto Entry Executing Accept_Digit()
>On '2' Goto Entry Executing Accept_Digit()
>On '3' Goto Entry Executing Accept_Digit()
>On '4' Goto Entry Executing Accept_Digit()
>On '5' Goto Entry Executing Accept_Digit()
>On '6' Goto Entry Executing Accept_Digit()
>On '7' Goto Entry Executing Accept_Digit()
>On '8' Goto Entry Executing Accept_Digit()
>On '9' Goto Entry Executing Accept_Digit()
>..
>
> State Operation_Complete
>On '1' Goto Entry Executing Accept_Digit()
>On '2' Goto Entry Executing Accept_Digit()
>On '3' Goto Entry Executing Accept_Digit()
>On '4' Goto Entry Executing Accept_Digit()
>On '5' Goto Entry Executing Accept_Digit()
>On '6' Goto Entry Executing Accept_Digit()
>On '7' Goto Entry Executing Accept_Digit()
>On '8' Goto Entry Executing Accept_Digit()
>On '9' Goto Entry Executing Accept_Digit()
>..
>
>In each of these, I would have a single transition exiting the
>state labeled "0+1+2+3+4+5+6+7+8+9" invoking Accept_Digit()
>and destined for "Entry"
>
>Operation_Complete (i.e., the *end* of some undisclosed sequence
>of actions, preparing to restart on another iteration) is equivalent
>to Powered_Up -- presumably the state just before that sequence of
>actions is first started (assuming "..." are equivalent)
>
>I can detect this with an implication table. Even if a series of
>states are involved (step1, step2, step3) An algorithm can
>similarly detect it. And, remove one of these two states (or
>series of states) from the machine (for example, "Operation_Complete"),
>replacing all references to it in the machine with "Powered_Up".

Absolutely it /could/ be done with table based algorithms, but I've
never seen it done that way in any real (non toy) implementation ...
graph based solutions scale better to large problems.

>> Using the digit class, you have 2 nodes, with 10 transitions that all
>> get you from start to the digit class node.
>
>I don't see where the extra nodes (states) come from.

Think about the "class" as a state in an /NDFA/ ... there are multiple
transitions that can get you in - one transition (action) that gets
you out.

When the "class" is implemented it becomes (for lack of better term) a
subroutine (subgraph) in the resulting DFA. The more the subgraph can
be reused, the more savings in the DFA.

>> Obviously this is simplistic, because the members of the character
>> class form a subgraph which itself has to be recognized. The
>> important point here is that the subgraph as a whole can represent a
>> /single/ node in a much more complex graph - its constituent
>> characters need not be repeated in the complex graph. More on this
>> below.
>>
>> A complex DFA that combines many different regex may present other
>> opportunities to recognize given (possibly arbitrary) sets of
>> characters - opportunites that may not be apparent from looking at the
>> constituent regex.
>>
>>> Would it recognize a common sequence of state transitions
>>> that occurs in two different places in the grammar? E.g.,
>>> specifying the syntax for a numeric quantity in two different
>>> places only to realize that it's actually the same part of
>>> the grammar expressed as two different instances?
>>
>> When given the option to find equivalence classes, flex can identify
>> sets of characters that are used repeatedly. Those characters are
>> gathered into an "equivalence" that then can be a node in the DFA
>> instead of redundantly repeating individual characters.
>
>OK, but that just replaces N *identical*, "parallel" transitions
>with a single one labeled with a class instead of a discrete value.
>Just like "0+1+2+3+4+5+6+7+8+9". It hasn't reduced the number of states.

It will reduce the number of states IF the class is reused with the
same action. Consider

integer: [+-]?[:digit:]+
float: (((integer)\.)|((integer)?\.(integer)))(\e(integer))?

The [:digit:] class is resused 5 times. The definition of "integer"
also forms a (5x) reused meta-class that flex could recognize if told
to look for them.

Since, in this example, the [:digit:] class is always used with the
same action, it will be implemented in the DFA state graph just once.
Since the class itself consists of ~13 states: START that waits for
input, 0..9 that accept input, common EXIT, and common ERROR out if
something else is input ... treating it AS a class saves 52 states (13
x 4) in the state graph.

The common exit and error states out may be eliminated from the final
FA (assuming no conflicting uses of [:digit:], but they will be
included in initial construction of the state graph (think of them
like subroutine preamble/postamble).

>> Remember DFA are deterministic - a node can't take different actions
>> depending on which of multiple transitions entered (or left) it ... so
>> if you want the same character to be recognized in a different context
>> (leading to a different action), you must repeat it in a different
>> node.

>>> <number> ::= <digit><digits>
>>>
>>> <value> ::= <digit><digits>
>>>
>>> <expr> ::= <number> <op> <number> | <value> <op> <value>
>>>
>>> (silly example; but the inputs in each case are the same)
>>
>> You're mixing abstraction levels here: <digit>, <digits>, <number>,
>> and <value> are lexical tokens, whereas <expr> is syntax.
>
>I'm making the point that <number> and <value> are equivalent
>results. A machine that determines if an input satisfied
>one would similarly satisfy the other.
>
>So, the two machines would be duplicates of each other and
>subject to being optimized into one (because <expr> treats
><number> and <value> as equivalent in *its* machine.

Click here to read the complete article

On 3/22/2023 1:37 PM, George Neuner wrote:
>>>>> Then there is flex.
>>>>>
>>>>> flex has some DFA optimizations available. First, flex can compress
>>>>> the data tables which encode its DFA states. Second, flex can
>>>>> discover "equivalence" classes: groups of characters which result in
>>>>> the same action. And finally, flex can [sometimes] discover
>>>>> "meta-equivalence" classes: commonly expected sequences of characters
>>>>> and/or other equivalence classes.
>>>>
>>>> But, those are mapping equivalent *inputs* together, not equivalent
>>>> *states*. I.e., treating "BEGIN" and "begin" as equivalent.
>>>
>>> No. Consider the case of just recognizing a decimal digit: compare
>>> the graph using the alternation: (0|1|2|3|4|5|6|7|8|9), vs the graph
>>> using the class [:digit:].
>>>
>>> Using the OR alternation, including start you have 11 nodes. Start has
>>> 10 transitions exiting, and each digit node has a single transition
>>> entering.
>>
>> Are you using "node" as a synonym for "state"?
>
> I am using graph terminology - mostly because all FA builders start by
> constructing a state /graph/. After the graph is complete, it may be
> implemented using tables, or Duff's device, etc. ... but it doesn't
> start that way. 8-)

I build FSMs similarly. But, you can't commit graphs to
ASCII text whereas tables are a natural consequence.

The difference seems largely to be that DFA are geared towards
expressing "languages" (sequences of symbols) whereas FSMs
are geared towards expressing sequences of events/actions.

E.g., you can build an expression to describe the legal
symbol sequences to create a particular type of token
(NUMBER, BEGIN, END, etc.) with the assumption that every
symbol *in* those tokens is handled by a single action
(accumulate_digit).

By contrast, an FSM will often have a variety of very different
symbols recognized in a given state and acted upon differently
(POWER_FAIL, POWER_RESTORED, LOW_BATTERY, CLEAR, ENTER, BARCODE_DETECTED,
etc.). These tend to have more "work" associated with their
recognition than a set of equivalent symbols (e.g., digits).

And, while each may be handled differently from the others,
they tend to be handled the same way when encountered in
different states. I.e., a POWER_FAIL is processed the same
each place it is considered significant.

So, you want a way of expressing this common set of conditions/symbols/events
in a way that can be reused in many different states.

>> E.g.,
>>
>> State Powered_Up
>> On '1' Goto Entry Executing Accept_Digit()
>> On '2' Goto Entry Executing Accept_Digit()
>> On '3' Goto Entry Executing Accept_Digit()
>> On '4' Goto Entry Executing Accept_Digit()
>> On '5' Goto Entry Executing Accept_Digit()
>> On '6' Goto Entry Executing Accept_Digit()
>> On '7' Goto Entry Executing Accept_Digit()
>> On '8' Goto Entry Executing Accept_Digit()
>> On '9' Goto Entry Executing Accept_Digit()
Check Exceptions
>> ..
>>
>> State Operation_Complete
>> On '1' Goto Entry Executing Accept_Digit()
>> On '2' Goto Entry Executing Accept_Digit()
>> On '3' Goto Entry Executing Accept_Digit()
>> On '4' Goto Entry Executing Accept_Digit()
>> On '5' Goto Entry Executing Accept_Digit()
>> On '6' Goto Entry Executing Accept_Digit()
>> On '7' Goto Entry Executing Accept_Digit()
>> On '8' Goto Entry Executing Accept_Digit()
>> On '9' Goto Entry Executing Accept_Digit()
Check Exceptions
>> ..

State Exceptions (this is a misnomer but used for syntactic simplicity)
On Barcode_Detected Goto HandleBarcode Executing OverwriteAccumulator()
On Power_Fail Goto SignalOutage Executing AlertOperator()
On Low_Battery Goto SafeguardData Executing OrderlyShutdown()

>> In each of these, I would have a single transition exiting the
>> state labeled "0+1+2+3+4+5+6+7+8+9" invoking Accept_Digit()
>> and destined for "Entry"
>>
>> Operation_Complete (i.e., the *end* of some undisclosed sequence
>> of actions, preparing to restart on another iteration) is equivalent
>> to Powered_Up -- presumably the state just before that sequence of
>> actions is first started (assuming "..." are equivalent)
>>
>> I can detect this with an implication table. Even if a series of
>> states are involved (step1, step2, step3) An algorithm can
>> similarly detect it. And, remove one of these two states (or
>> series of states) from the machine (for example, "Operation_Complete"),
>> replacing all references to it in the machine with "Powered_Up".
>
> Absolutely it /could/ be done with table based algorithms, but I've
> never seen it done that way in any real (non toy) implementation ...
> graph based solutions scale better to large problems.
>
>>> Using the digit class, you have 2 nodes, with 10 transitions that all
>>> get you from start to the digit class node.
>>
>> I don't see where the extra nodes (states) come from.
>
> Think about the "class" as a state in an /NDFA/ ... there are multiple
> transitions that can get you in - one transition (action) that gets
> you out.
>
> When the "class" is implemented it becomes (for lack of better term) a
> subroutine (subgraph) in the resulting DFA. The more the subgraph can
> be reused, the more savings in the DFA.
>
>>> Obviously this is simplistic, because the members of the character
>>> class form a subgraph which itself has to be recognized. The
>>> important point here is that the subgraph as a whole can represent a
>>> /single/ node in a much more complex graph - its constituent
>>> characters need not be repeated in the complex graph. More on this
>>> below.
>>>
>>> A complex DFA that combines many different regex may present other
>>> opportunities to recognize given (possibly arbitrary) sets of
>>> characters - opportunites that may not be apparent from looking at the
>>> constituent regex.
>>>
>>>> Would it recognize a common sequence of state transitions
>>>> that occurs in two different places in the grammar? E.g.,
>>>> specifying the syntax for a numeric quantity in two different
>>>> places only to realize that it's actually the same part of
>>>> the grammar expressed as two different instances?
>>>
>>> When given the option to find equivalence classes, flex can identify
>>> sets of characters that are used repeatedly. Those characters are
>>> gathered into an "equivalence" that then can be a node in the DFA
>>> instead of redundantly repeating individual characters.
>>
>> OK, but that just replaces N *identical*, "parallel" transitions
>> with a single one labeled with a class instead of a discrete value.
>> Just like "0+1+2+3+4+5+6+7+8+9". It hasn't reduced the number of states.
>
> It will reduce the number of states IF the class is reused with the
> same action. Consider
>
> integer: [+-]?[:digit:]+
> float: (((integer)\.)|((integer)?\.(integer)))(\e(integer))?
>
> The [:digit:] class is resused 5 times. The definition of "integer"
> also forms a (5x) reused meta-class that flex could recognize if told
> to look for them.

OK. It's a special case of alternation *and* equivalence.
I build "state subroutines" to handle sets of symbols that
are handled the same way but "invoked" from different states
(see Exceptions). But, the individual symbols can invoke
different actions *and* different next states -- as long as
they are consistent in each "application".

> Since, in this example, the [:digit:] class is always used with the
> same action, it will be implemented in the DFA state graph just once.
> Since the class itself consists of ~13 states: START that waits for
> input, 0..9 that accept input, common EXIT, and common ERROR out if
> something else is input ... treating it AS a class saves 52 states (13
> x 4) in the state graph.
>
> The common exit and error states out may be eliminated from the final
> FA (assuming no conflicting uses of [:digit:], but they will be
> included in initial construction of the state graph (think of them
> like subroutine preamble/postamble).
>
>>> Remember DFA are deterministic - a node can't take different actions
>>> depending on which of multiple transitions entered (or left) it ... so
>>> if you want the same character to be recognized in a different context
>>> (leading to a different action), you must repeat it in a different
>>> node.
>
>>>> <number> ::= <digit><digits>
>>>>
>>>> <value> ::= <digit><digits>
>>>>
>>>> <expr> ::= <number> <op> <number> | <value> <op> <value>
>>>>
>>>> (silly example; but the inputs in each case are the same)
>>>
>>> You're mixing abstraction levels here: <digit>, <digits>, <number>,
>>> and <value> are lexical tokens, whereas <expr> is syntax.
>>
>> I'm making the point that <number> and <value> are equivalent
>> results. A machine that determines if an input satisfied
>> one would similarly satisfy the other.
>>
>> So, the two machines would be duplicates of each other and
>> subject to being optimized into one (because <expr> treats
>> <number> and <value> as equivalent in *its* machine.
>
> Yes they are duplicates at some level of abstraction, but that level
> is above what the tool can recognize and deal with.

Click here to read the complete article

On Wed, 22 Mar 2023 18:15:43 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:
>On 3/22/2023 1:37 PM, George Neuner wrote:

>I build FSMs similarly. But, you can't commit graphs to
>ASCII text whereas tables are a natural consequence.

Actually you can serialize graphs as text, but the result may be
varying degrees of difficult to read for a human.

Trivial example: fire up Racket and enter

#lang racket
(require racket/serialize)
(let (
[outer (mcons 0 (mcons 1 (mcons 2 (mcons 3 '()))))]
[inner (mcons 9 (mcons 8 (mcons 7 '())))]
[cycle '()]
)
(writeln outer)
(writeln inner)
(set-mcdr! (mcdr (mcdr (mcdr outer))) outer)
(set-mcdr! (mcdr (mcdr inner)) inner)
(set-mcar! (mcdr outer) inner)
(writeln outer)
(writeln (serialize outer))
(writeln (deserialize (serialize outer)))
)

This creates a simple graph having 2 cycles and prints it. See what
happens. The result of (serialize _) is a string that can be saved to
a file. You can substitute structs for mcons (mutable cons) if you
want to experiment with making more complicated graphs.

Tables are great as implementation ... but for a construction tool
that needs to store and modify graphs, it /can/ be a pain to
reconstruct complicated graphs from table representations. It's a
hell of a lot easier to read back a suitably serialized form.

>The difference seems largely to be that DFA are geared towards
>expressing "languages" (sequences of symbols) whereas FSMs
>are geared towards expressing sequences of events/actions.

The terms "FA" (finite automaton) and "FSM" (finite state machine)
are, in fact, synonymous.

What is confusing is that we got to this point through discussion of
parsing and lexing tools - which ARE geared toward languages.
Moreover, yacc and bison do NOT implement a general FA, but rather a
particular variety of FA that useful for language parsing and which
involves an auxiliary stack.

Purely as a techical matter, (f)lex can create general FA assuming
that transition conditions can be represented as character input to
the reader. The "reader" function is completely redefineable: the
default is to read from STDIN, but, in fact, a custom reader can do
absolutely anything under the hood so long as it returns a character
(or EOF) when called.

In practice you would not want to do this. A decent UML tool would be
a much better choice.

>By contrast, an FSM will often have a variety of very different
>symbols recognized in a given state and acted upon differently
>(POWER_FAIL, POWER_RESTORED, LOW_BATTERY, CLEAR, ENTER, BARCODE_DETECTED,
>etc.). These tend to have more "work" associated with their
>recognition than a set of equivalent symbols (e.g., digits).
>
>And, while each may be handled differently from the others,
>they tend to be handled the same way when encountered in
>different states. I.e., a POWER_FAIL is processed the same
>each place it is considered significant.

Yes. And you could (at least theoretically) represent this in flex by
encoding POWER_FAIL, etc. as characters or strings and sending those
characters or strings as input to the reader when those events occur.
Internal state transitions can be handled the same way: send
characters to the reader.

Again, this is an abuse of the tool. Just because you can do it does
not mean you should do it.

>I build "state subroutines" to handle sets of symbols that
>are handled the same way but "invoked" from different states
>(see Exceptions). But, the individual symbols can invoke
>different actions *and* different next states -- as long as
>they are consistent in each "application".

flex (not lex) permits defining contextual "start" states, which the
code can arbitrarily switch among. The same input can be treated
differently in different start states. These really are coroutines -
not subroutines - and the user code decides which state to switch to
next, but flex does provides a stack so you can use them as
subroutines (without having to track the nesting yourself).

>> If instead you did something like:
>>
>> integer: [:digit:] return 'i'
>> hex: [:digit:]|['a'-'f'] return 'h';
>>
>> This would blow up in your face because 0..9 would never be recognized
>> a hex digit, but more importantly the 2 uses of the class lead
>> /immediately/ to different actions so the class subroutine (subgraph)
>> would have to be repeated in the FA with different exit actions.
>
>Yes. If the tool places an implicit priority on the rules
>based on the order in which they are encountered. I intentionally
>don't specify this in the design of the tables, leaving the
>"post processor" some latitude in how it implements them
>and the runtime some potential efficiencies.

The tool places priority on the longest, most precise match. It falls
back on definition order when the input - as given - matches multiple
patterns.

But again, start states can (sometimes) be used to get around this
behavior.

George

On 3/25/2023 9:45 PM, George Neuner wrote:
>> The difference seems largely to be that DFA are geared towards
>> expressing "languages" (sequences of symbols) whereas FSMs
>> are geared towards expressing sequences of events/actions.
>
> The terms "FA" (finite automaton) and "FSM" (finite state machine)
> are, in fact, synonymous.

Yes, though (IME) taught to different audiences and in different ways.

My hardware classes talked about FSMs, Meely/Moore, "state diagrams"
and optimization techniques.

My software classes talked about DFAs, EBNFs, *railroad* diagrams
but never a mention of optimization tools or techniques.

They also seem to be applied differently. E.g., in a (hardware) FSM,
it is not uncommon to list a logical expression as the stimulus
for a transition (e.g., "LineFeed & /Last_Line" vs. "LineFeed & LastLine"
directing the machine to two different states with two different outputs
or actions). In DFAs, it was always just sequences of symbols -- the
sorts of things that would specify a grammar (inherently serial, one-at-a-time
"conditions").

> What is confusing is that we got to this point through discussion of
> parsing and lexing tools - which ARE geared toward languages.

From:
"AFAICT, none of these tools knows how to optimize states by
noting equivalences (?) [George would know for sure]
OTOH, when dealing with hardware machines, it's a common step
to reduce via implication tables, etc."

I.e., one could express a FSM as a yacc grammar -- write an "action
routine" (admittedly, probably very terse due to the format of a yacc
file) for each "symbol" as if symbols were events/input conditions.
So, conceivably, the lexer could generate LF_Not_LastLine and
LF_LastLine symbols that the yacc parser could act on (with suitable
actions assigned on the "right hand side")

Given this duality, the pertinence of my above comment is evident:
could yacc (et al.) identify the same sorts of equivalent states
that I would identify and eliminate with implication table analysis
if it was an FSM?

> Moreover, yacc and bison do NOT implement a general FA, but rather a
> particular variety of FA that useful for language parsing and which
> involves an auxiliary stack.
>
> Purely as a techical matter, (f)lex can create general FA assuming
> that transition conditions can be represented as character input to
> the reader. The "reader" function is completely redefineable: the
> default is to read from STDIN, but, in fact, a custom reader can do
> absolutely anything under the hood so long as it returns a character
> (or EOF) when called.

Therein lies a notable limitation. In a (hardware) FSM, there are no limits
to the number of inputs that can CONCURRENTLY be examined by the machine.
E.g., I could label a transition with:
A*/B*/C*D*E*F*/G*H*I*J*/K*/L*/M + N*O*P*Q + /R*/S*/T*U*V + W + X*Y*Z
To represent this to lex/yacc, I would have to reduce it to a "narrow"
symbol -- possible if there are only a limited number of such combinations
in the grammar (as sourced by the lexer).

> In practice you would not want to do this. A decent UML tool would be
> a much better choice.
>
>> By contrast, an FSM will often have a variety of very different
>> symbols recognized in a given state and acted upon differently
>> (POWER_FAIL, POWER_RESTORED, LOW_BATTERY, CLEAR, ENTER, BARCODE_DETECTED,
>> etc.). These tend to have more "work" associated with their
>> recognition than a set of equivalent symbols (e.g., digits).
>>
>> And, while each may be handled differently from the others,
>> they tend to be handled the same way when encountered in
>> different states. I.e., a POWER_FAIL is processed the same
>> each place it is considered significant.
>
> Yes. And you could (at least theoretically) represent this in flex by
> encoding POWER_FAIL, etc. as characters or strings and sending those
> characters or strings as input to the reader when those events occur.
> Internal state transitions can be handled the same way: send
> characters to the reader.
>
> Again, this is an abuse of the tool. Just because you can do it does
> not mean you should do it.

It would have appeal *if* it could perform reductions/optimizations
that would otherwise have to be done by hand (or with another tool)

>> I build "state subroutines" to handle sets of symbols that
>> are handled the same way but "invoked" from different states
>> (see Exceptions). But, the individual symbols can invoke
>> different actions *and* different next states -- as long as
>> they are consistent in each "application".
>
> flex (not lex) permits defining contextual "start" states, which the
> code can arbitrarily switch among. The same input can be treated
> differently in different start states. These really are coroutines -
> not subroutines - and the user code decides which state to switch to
> next, but flex does provides a stack so you can use them as
> subroutines (without having to track the nesting yourself).
>
>>> If instead you did something like:
>>>
>>> integer: [:digit:] return 'i'
>>> hex: [:digit:]|['a'-'f'] return 'h';
>>>
>>> This would blow up in your face because 0..9 would never be recognized
>>> a hex digit, but more importantly the 2 uses of the class lead
>>> /immediately/ to different actions so the class subroutine (subgraph)
>>> would have to be repeated in the FA with different exit actions.
>>
>> Yes. If the tool places an implicit priority on the rules
>> based on the order in which they are encountered. I intentionally
>> don't specify this in the design of the tables, leaving the
>> "post processor" some latitude in how it implements them
>> and the runtime some potential efficiencies.
>
> The tool places priority on the longest, most precise match. It falls
> back on definition order when the input - as given - matches multiple
> patterns.

In a (hardware) FSM, one would see all of the "possible exits" from a
particular state and could notice ambiguities:
X*Y*/Z
X*Y
clearly overlap.

Furthermore, one could detect these conflicts with a simple tool;
it need not understand the entire machine, just look at a single state
and the transitions leaving it.

> But again, start states can (sometimes) be used to get around this
> behavior.

What's interesting (being a hardware-software person) is that, despite
the obvious duality, the approaches taken to these technologies is so
disjointed. DFA tend to use a parser-generator of preference while FSMs
(in software) have a variety of different implementations with dramatic
design and runtime differences in efficiencies.

Similarly, that hardware FSMs tend to be designed with total disregard
to the possible applicability of parser generators, regex compilers, etc.

Its as if each domain has its own notion of how the technology should
be applied and implemented.

On 26/03/23 15:45, George Neuner wrote:
> On Wed, 22 Mar 2023 18:15:43 -0700, Don Y
> <blockedofcourse@foo.invalid> wrote:
> The terms "FA" (finite automaton) and "FSM" (finite state machine)
> are, in fact, synonymous.
>
> What is confusing is that we got to this point through discussion of
> parsing and lexing tools - which ARE geared toward languages.
> Moreover, yacc and bison do NOT implement a general FA, but rather a
> particular variety of FA that useful for language parsing and which
> involves an auxiliary stack.

The stack means it's not a FA. Yacc and bison exist for the sole purpose
of processing LALR2 grammars that cannot be processed with an FA. Also
because the grammars are LALR, the stack is a bottom-up stack, so it
doesn't resemble anything you'll see in a top-down parser, and you'll
get parse errors that probably don't really tell you what is wrong with
the input :P. Lex/Flex on the other hand exists to process only finite
states. The FSM algorithms they use are more efficient than any
algorithm that can handle LALR2, which is why these tools still exist as
independent tools.

Notably, the combination of yacc&lex (or flex&bison) still isn't
powerful enough even to parse C without extra help - goto labels blow
thing up and there is a hand-coded hack in the C language lexers for it.

ANTLR also implements some version of an LR/LALR parser, but instead of
a finite 2 tokens lookahead, it transforms arbitrary lookahead
expressions into something finite (an FSM), and if it can't do that, it
fails. Terence Parr got his PhD for figuring out how to do that
transformation... and lived to tell the tale. :)

Anyone interested in the overlap between regular languages and finite
state machines should refer to the excellent
<https://github.com/katef/libfsm>. You can give it an assortment of
regular expressions and it will unify them and construct a DFA to
process them. The README at the top of that page has a simple example,
and there's a tutorial if you want to look further. This library is
perfectly at home processing arbitrary binary file formats and
protocols, not just programming language text files. But only the parts
that are reducible to a FA... Nevertheless there is absolutely nothing
wrong with using this kind of library to write arbitrary FSMs.

I'm currently building a generalised parsing engine that also has the
capability of processing arbitrary binary file and network stream
formats, using a VM approach that interprets something very like a BNF,
but in prefix notation (+a means one-or-more "a"s, not a+). It's tiny,
efficient, embeddable, but can take a protocol description in a very few
bytes of VM code to handle almost any new protocol or format. I don't
think that has been done before, and I've wanted to do it for 25 years.

Clifford Heath.

On Sun, 26 Mar 2023 03:35:27 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>On 3/25/2023 9:45 PM, George Neuner wrote:
>
>My hardware classes talked about FSMs, Meely/Moore, "state diagrams"
>and optimization techniques.
>
>My software classes talked about DFAs, EBNFs, *railroad* diagrams
>but never a mention of optimization tools or techniques.

This I think is a teaching failure.

Before we go on here we have to clarify a possible terminology trap:
"deterministic" vs "non-deterministic".

In the context of FA, "deterministic" means that the machine can be
only in one state at any given time. "non-deterministic" means that
the machine (at least logically) can simultaneously be in a set of
multiple states.

To explain this better, I'm falling back on lexing because it is
simple minded. You will need to generalize the concepts to consider
other possible uses.

Ignoring the behavior of any real-world tools and just thinking about
an *ideal* recognizer, consider

integer: [:digit:]+
hex : [:digit:]+|[a-fA-F]+

Lacking further input, the sequence "1234" is ambiguous - the
recognizer doesn't know yet whether it has an integer value or a hex
value. Logically it must consider both patterns simultaneously, and
so logically the recognizer must be an NDFA.

For every NDFA there is a corresponding DFA which contains an equal or
greater number of states. Where the NDFA logically would be in a set
of states simultaneously, the corresponding DFA will contain not only
those explicit NDFA states but also additional states which represent
possible combinations of those states which the NDFA could find itself
in. The additional states are required because a DFA can be in only
one state at any given time, so it needs a way to (logically)
represent being in multiple states simultaneously. The additional
"set" states serve to disambiguate ambiguous state transitions ...
eventually the DFA must arrive in one of the explicit states of the
original NDFA.

The typical notion of FSM as taught to hardware oriented students
corresponds to non-deterministic FA. Hardware can directly implement
an NDFA, but software can only *emulate* it - with all the caveats
implied by emulation.

Algorithms to transform graph based NDFA to DFA and back again have
been known at least since the 1950s, as have ways of generating table
driven vs switch based machines from a graph. But, typically, none of
this ever is taught to hardware oriented students (or even most
software oriented students) - if they learn anything at all, they
learn some practical methods to manually achieve the same results.

From the software viewpoint, you rarely, if ever, would try to design
a DFA directly. Instead you would design an NDFA that does what you
want, and then (for performance) you have it algorithmically
transformed into its corresponding DFA form. The transformation
[assuming it's done right ;-)] produces an optimal DFA state machine.

(f)lex is a tool that can - at least technically - create general
state machines. However, because it was designed for string
recognition, its machine description language is specialized for that
use.

yacc and bison don't even try to create general state machines - they
create a very specific type of FA which is optimized for parsing. And
again, because they were designed for parsing, their machine
description languages are specialized for that task.

UML tools are what you need to consider for more general FA / FSM.

>They also seem to be applied differently. E.g., in a (hardware) FSM,
>it is not uncommon to list a logical expression as the stimulus
>for a transition (e.g., "LineFeed & /Last_Line" vs. "LineFeed & LastLine"
>directing the machine to two different states with two different outputs
>or actions). In DFAs, it was always just sequences of symbols -- the
>sorts of things that would specify a grammar (inherently serial, one-at-a-time
>"conditions").

FSM *are* FA are just alternate terms for the same concept.

There is nothing whatsoever which limits one or the other to any
particular uses. Any apparent difference is an artifact of how they
are taught to students in different disciplines: hardware students
learn practice but rarely, if ever, learn the theory.

And, in truth, only CS students taking language / compiler courses
ever will learn how to build NDFA and DFA state graphs, convert one
graph form into the other, or how to generate table driven or switch
code from a state graph.

>> Purely as a techical matter, (f)lex can create general FA assuming
>> that transition conditions can be represented as character input to
>> the reader. The "reader" function is completely redefineable: the
>> default is to read from STDIN, but, in fact, a custom reader can do
>> absolutely anything under the hood so long as it returns a character
>> (or EOF) when called.
>
>Therein lies a notable limitation. In a (hardware) FSM, there are no limits
>to the number of inputs that can CONCURRENTLY be examined by the machine.
>E.g., I could label a transition with:
> A*/B*/C*D*E*F*/G*H*I*J*/K*/L*/M + N*O*P*Q + /R*/S*/T*U*V + W + X*Y*Z
>To represent this to lex/yacc, I would have to reduce it to a "narrow"
>symbol -- possible if there are only a limited number of such combinations
>in the grammar (as sourced by the lexer).

You could just use the string above to represent the condition.

But this is where (f)lex falls down hard: you would have to define
strings that represent all possible combinations of your simultaneous
conditions, and to drive the resulting DFA the code that monitors your
hardware must be able to send those condition strings into the
recognizer.

If you can do that, (f)lex will happily generate a working state
machine for you.

>> In practice you would not want to do this. A decent UML tool would be
>> a much better choice.

>In a (hardware) FSM, one would see all of the "possible exits" from a
>particular state and could notice ambiguities:
> X*Y*/Z
> X*Y
>clearly overlap.
>
>Furthermore, one could detect these conflicts with a simple tool;
>it need not understand the entire machine, just look at a single state
>and the transitions leaving it.

That's why you need a tool designed for the purpose. All of our
discussion here about what is possible with (f)lex is academic ...
nobody in their right mind should be doing it.

>What's interesting (being a hardware-software person) is that, despite
>the obvious duality, the approaches taken to these technologies is so
>disjointed. DFA tend to use a parser-generator of preference while FSMs
>(in software) have a variety of different implementations with dramatic
>design and runtime differences in efficiencies.
>
>Similarly, that hardware FSMs tend to be designed with total disregard
>to the possible applicability of parser generators, regex compilers, etc.
>
>Its as if each domain has its own notion of how the technology should
>be applied and implemented.

Unfortunately yes. I think very few people ever think about it enough
to recognize that.

On 3/26/2023 11:32 PM, George Neuner wrote:
> On Sun, 26 Mar 2023 03:35:27 -0700, Don Y
> <blockedofcourse@foo.invalid> wrote:
>
>> On 3/25/2023 9:45 PM, George Neuner wrote:
>>
>> My hardware classes talked about FSMs, Meely/Moore, "state diagrams"
>> and optimization techniques.
>>
>> My software classes talked about DFAs, EBNFs, *railroad* diagrams
>> but never a mention of optimization tools or techniques.
>
> This I think is a teaching failure.

I agree, and disagree.

Where do you draw the line between disciplines? With hardware,
you're exposed to lots of tools for logic reduction, etc. You
learn about hazzards, races, etc. These aren't mentioned in
software contexts -- but still exist (perhaps even moreso
as software *isn't* a set of concurrent actions; possibly
why so many software people have difficulty "thinking in
parallel"?).

The curriculum I was exposed to was a mixture of hardware courses
and software courses. People following a "pure EE" degree
took the same hardware courses as I *and* some of the "core"
software courses that I did. OTOH, folks pursuing the CS
option skipped some of the more "advanced" hardware courses
and, instead, took the more advanced *software* courses -- which
were devoid of any "pure EE" students.

Should these overlapping courses have been taught with more
of a multi-discipline emphasis? Should the instructors have
conditioned each presentation with "for you CS students..."
and "for you EE students..."?

Or, should they have expected the students to be aware enough to
recognize these dualities??

> Before we go on here we have to clarify a possible terminology trap:
> "deterministic" vs "non-deterministic".
>
> In the context of FA, "deterministic" means that the machine can be
> only in one state at any given time. "non-deterministic" means that
> the machine (at least logically) can simultaneously be in a set of
> multiple states.

Yes ("logically" -- virtually? -- being the key concept, there)

> To explain this better, I'm falling back on lexing because it is
> simple minded. You will need to generalize the concepts to consider
> other possible uses.
>
> Ignoring the behavior of any real-world tools and just thinking about
> an *ideal* recognizer, consider
>
> integer: [:digit:]+
> hex : [:digit:]+|[a-fA-F]+
>
> Lacking further input, the sequence "1234" is ambiguous - the
> recognizer doesn't know yet whether it has an integer value or a hex
> value. Logically it must consider both patterns simultaneously, and
> so logically the recognizer must be an NDFA.
>
> For every NDFA there is a corresponding DFA which contains an equal or
> greater number of states. Where the NDFA logically would be in a set
> of states simultaneously, the corresponding DFA will contain not only
> those explicit NDFA states but also additional states which represent
> possible combinations of those states which the NDFA could find itself
> in. The additional states are required because a DFA can be in only
> one state at any given time, so it needs a way to (logically)
> represent being in multiple states simultaneously. The additional
> "set" states serve to disambiguate ambiguous state transitions ...
> eventually the DFA must arrive in one of the explicit states of the
> original NDFA.
>
> The typical notion of FSM as taught to hardware oriented students
> corresponds to non-deterministic FA. Hardware can directly implement
> an NDFA, but software can only *emulate* it - with all the caveats
> implied by emulation.

Here being where the "virtually" comes into play.

The hardware machine *is* only in a single ACTUAL state at any given
time (because there is just one set of state variables and that tuple
defines THE state). Until an [a..f] is encountered, it is content
being in that single state.

However, once one is encountered, it has to recognize that it
actually is in one *particular* variant of that state (assuming
that "hex" and "integer" can have different contexts, elsewhere
in the grammar)

> Algorithms to transform graph based NDFA to DFA and back again have
> been known at least since the 1950s, as have ways of generating table
> driven vs switch based machines from a graph. But, typically, none of
> this ever is taught to hardware oriented students (or even most
> software oriented students) - if they learn anything at all, they
> learn some practical methods to manually achieve the same results.
>
> From the software viewpoint, you rarely, if ever, would try to design
> a DFA directly. Instead you would design an NDFA that does what you
> want, and then (for performance) you have it algorithmically
> transformed into its corresponding DFA form. The transformation
> [assuming it's done right ;-)] produces an optimal DFA state machine.
>
> (f)lex is a tool that can - at least technically - create general
> state machines. However, because it was designed for string
> recognition, its machine description language is specialized for that
> use.

Exactly.

> yacc and bison don't even try to create general state machines - they
> create a very specific type of FA which is optimized for parsing. And
> again, because they were designed for parsing, their machine
> description languages are specialized for that task.
>
> UML tools are what you need to consider for more general FA / FSM.

Which brings us full circle to the top of the thread.
I contend that to be expressive enough (i.e., to acts AS
equivalents for) to generate code, such a notation would
be just as complex as writing that code.

And, given that one *must* write code -- but needn't always
reduce a design to an FSM -- you end up developing a second tool
that the developer is reliant upon but with less "practice"
than that of writing code.

>> They also seem to be applied differently. E.g., in a (hardware) FSM,
>> it is not uncommon to list a logical expression as the stimulus
>> for a transition (e.g., "LineFeed & /Last_Line" vs. "LineFeed & LastLine"
>> directing the machine to two different states with two different outputs
>> or actions). In DFAs, it was always just sequences of symbols -- the
>> sorts of things that would specify a grammar (inherently serial, one-at-a-time
>> "conditions").
>
> FSM *are* FA are just alternate terms for the same concept.
>
> There is nothing whatsoever which limits one or the other to any
> particular uses. Any apparent difference is an artifact of how they
> are taught to students in different disciplines: hardware students
> learn practice but rarely, if ever, learn the theory.

In hardware designs, you can directly see the costs of an
implementation: how many FFs to represent the state,
how much combinatorial logic to determine next_state and
outputs, etc. So, optimization (can) results in a savings
of circuitry. And, can speed up the machine by eliminating
a serial "step".

[E.g., in the state diagram I posted, one could add a
FINISHED state at the "bottom" of each sequence of states
in the different paths through the graph. But, this would
just add an extra clock cycle before the machine could
return to the *top* of the graph and start the next sequence]

And, because hardware folks *do* think in parallel, they
see solutions in a different way than serial-thinking
software types.

E.g., if you wanted to know how much data was in a FIFO,
you'd subtract tail_pointer from head_pointer (neglecting
wrap) to get a result.

But, this takes an operation invoked AFTER head and tail
are stable.

A hardware person would move that computation in parallel
with the head and tail update actions. E.g., at reset,
head=tail, amount=0. Each time head is advanced, increase
amount SIMULTANEOUSLY. Each time tail is advanced, decrease
amount simultaneously. When both head and tail advance
at the same time, amount remains unchanged.

Because this "difference" was moved up in time, the
circuit can run faster -- you don't have to wait for the
values of the head and tail pointers to "settle" and
THEN propagate through the differencing logic; that was
already done WHILE the pointers were being updated!

The software person could adopt a similar strategy,
but it doesn't *save* anything because the "amount"
still has to be processed in a separate instruction
cycle -- either increment/decrement/do-nothing *or*
compute head-tail.

> And, in truth, only CS students taking language / compiler courses
> ever will learn how to build NDFA and DFA state graphs, convert one
> graph form into the other, or how to generate table driven or switch
> code from a state graph.

Click here to read the complete article

On Mon, 27 Mar 2023 16:18:51 +1100, Clifford Heath
<no.spam@please.net> wrote:

>On 26/03/23 15:45, George Neuner wrote:
>> On Wed, 22 Mar 2023 18:15:43 -0700, Don Y
>> <blockedofcourse@foo.invalid> wrote:
>> The terms "FA" (finite automaton) and "FSM" (finite state machine)
>> are, in fact, synonymous.
>>
>> What is confusing is that we got to this point through discussion of
>> parsing and lexing tools - which ARE geared toward languages.
>> Moreover, yacc and bison do NOT implement a general FA, but rather a
>> particular variety of FA that useful for language parsing and which
>> involves an auxiliary stack.
>
>
>The stack means it's not a FA.

No, it still is an FA ... it just is a specialized form.

>Yacc and bison exist for the sole purpose
>of processing LALR2 grammars that cannot be processed with an FA.

You mean LALR(1) in the case of yacc, and LR(1) in the case of bison.
LALR is a restricted case of LR, it does /not/ mean LR with lookahead.
Neither tool can handle 2 tokens of lookahead.

Stackless FA, in fact, can process LR(1) grammars ... they just need
(typically many) more states in the machine to do so. The stack FA
was created specifically to reduce the memory footprint of the parser
- necessary in ~1970, but generally much less of a concern now.

>Also
>because the grammars are LALR, the stack is a bottom-up stack, so it
>doesn't resemble anything you'll see in a top-down parser, ...

True.

> ... and you'll
>get parse errors that probably don't really tell you what is wrong with
>the input :P.

You can't rely on the tool for error handling (or even just messages)
.... you really need to add deliberate error handling.

>Lex/Flex on the other hand exists to process only finite
>states. The FSM algorithms they use are more efficient than any
>algorithm that can handle LALR2, which is why these tools still exist as
>independent tools.

They exist separately because they were intended for different tasks
AND because they needed a considerable (at the time) amount of memory
to analyze the input spec and generate a recognizer for it.

In fact, regex tools existed already for a number of years before
either lex or yacc came about. The difference was most previous tools
directly /interpreted/ regex patterns, and typically tried them one at
a time under host program control (e.g., see RE2C), whereas lex
compiled multiple patterns into a single recognizer that (effectively)
tried all patterns simultaneously. This made lex recognizers much
faster (though larger) and far more efficient for handling large
numbers of patterns [such as in a language compiler].

>Notably, the combination of yacc&lex (or flex&bison) still isn't
>powerful enough even to parse C without extra help - goto labels blow
>thing up and there is a hand-coded hack in the C language lexers for it.
>
>ANTLR also implements some version of an LR/LALR parser ...

ANTLR implements LL(*) which is LL with unbounded lookahead. There
are other LL(k) tools which require the programmer to choose a fixed
amount of lookahead (and fail to process the grammar if the k value is
too small). ANTLR analyzes the grammar and computes what lookahead is
required pattern by pattern.

> ... but instead of
>a finite 2 tokens lookahead, it transforms arbitrary lookahead
>expressions into something finite (an FSM), and if it can't do that, it
>fails. Terence Parr got his PhD for figuring out how to do that
>transformation... and lived to tell the tale. :)

Almost: substitute variable k for 2. And again, ANTLR is LL.

>Anyone interested in the overlap between regular languages and finite
>state machines should refer to the excellent
><https://github.com/katef/libfsm>. You can give it an assortment of
>regular expressions and it will unify them and construct a DFA to
>process them. The README at the top of that page has a simple example,
>and there's a tutorial if you want to look further. This library is
>perfectly at home processing arbitrary binary file formats and
>protocols, not just programming language text files. But only the parts
>that are reducible to a FA... Nevertheless there is absolutely nothing
>wrong with using this kind of library to write arbitrary FSMs.
>
>I'm currently building a generalised parsing engine that also has the
>capability of processing arbitrary binary file and network stream
>formats, using a VM approach that interprets something very like a BNF,
>but in prefix notation (+a means one-or-more "a"s, not a+). It's tiny,
>efficient, embeddable, but can take a protocol description in a very few
>bytes of VM code to handle almost any new protocol or format. I don't
>think that has been done before, and I've wanted to do it for 25 years.

I would be interested to see that (when it's finished, of course).
Good luck!

>Clifford Heath.

On Mon, 27 Mar 2023 01:16:17 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:
>On 3/26/2023 11:32 PM, George Neuner wrote:

>The hardware machine *is* only in a single ACTUAL state at any given
>time (because there is just one set of state variables and that tuple
>defines THE state). Until an [a..f] is encountered, it is content
>being in that single state.

The hardware is in a single state, but that state simultaneously
reflects the current value (for lack of better term) of potentially
many variables / signals.

The same is true of software DFA. The major difference wrt hardware is
in that transition between states can be based on only 1 logical
condition. That one logical condition may, in fact, be a
conglomeration of any number of things, but so far as the /machine/ is
concerned, it manifests as one unique input. Therefore every possible
(combination of things) condition which might cause a state transition
has to be enumerated separately.

Software NDFA trade processing speed for (sometimes lots of) state
memory. NDFA have no more capability than DFA, but they can do the
same job with fewer explicit machine states, because a single state in
NDFA can represent multiple states of the corresponding DFA. The
tradeoff is that transition conditions in NDFA often are much more
complex than in DFA, and thus evaluating them takes more time and
effort.

>However, once one is encountered, it has to recognize that it
>actually is in one *particular* variant of that state (assuming
>that "hex" and "integer" can have different contexts, elsewhere
>in the grammar)

That is why hardware is a better analogy to NDFA, where many (or all)
of the variants may be represented by a single machine state. In DFA,
there would need to be a separate state for each variant.

>> UML tools are what you need to consider for more general FA / FSM.
>
>Which brings us full circle to the top of the thread.
>I contend that to be expressive enough (i.e., to acts AS
>equivalents for) to generate code, such a notation would
>be just as complex as writing that code.
>
>And, given that one *must* write code -- but needn't always
>reduce a design to an FSM -- you end up developing a second tool
>that the developer is reliant upon but with less "practice"
>than that of writing code.

Agree and disagree.

YMMV, but a lot of hand written state machines I have seen over the
years included a lot of duplicated condition / transition decision
code that could have been simplified or eliminated by the introdution
of additional explicit states.

Reminded of the proverb: "programmers are great at figuring out what
CAN be in parallel, but not what SHOULD be done in parallel".

A tool can aid in figuring out what states are necessary, given the
conditions, to create an optimal (software) machine.

>In hardware designs, you can directly see the costs of an
>implementation: how many FFs to represent the state,
>how much combinatorial logic to determine next_state and
>outputs, etc. So, optimization (can) results in a savings
>of circuitry. And, can speed up the machine by eliminating
>a serial "step".

You can think of a state in a software DFA as being analogous to a
1-bit latch. Effectively all you need to know is whether or not the
state currently is active.

State transitions don't have a simple mapping to hardware as they can
be (essentially) unrestricted logical expressions. Evaluation needs
at least a simple ALU (full set of logic ops), an accumulator, and a
(per combination) latch to store the result.
[If the signals all are simple on/off the accumulator and the latches
maybe all could be 1-bit.]

>> And, in truth, only CS students taking language / compiler courses
>> ever will learn how to build NDFA and DFA state graphs, convert one
>> graph form into the other, or how to generate table driven or switch
>> code from a state graph.
>
>My education is dated in that *all* CS students learned how to design
>grammars, build compilers, etc. when I was taught. Now, I suspect
>"CS" means "programmer".

No. CS students learn theory. CSE and maybe also IS students learn
about development toolchains.

This dichotemy between theory and practice has existed at least since
the 80's (when I was in college) and probably started even earlier.
Prior to ~ late 90s, explicit CSE degrees didn't exist - there were
just certificate programming courses (if applicable), and the project
management aspects had to be learned on the job.

CSEs really are just "developers with a degree".

>>> What's interesting (being a hardware-software person) is that, despite
>>> the obvious duality, the approaches taken to these technologies is so
>>> disjointed. DFA tend to use a parser-generator of preference while FSMs
>>> (in software) have a variety of different implementations with dramatic
>>> design and runtime differences in efficiencies.
>>>
>>> Similarly, that hardware FSMs tend to be designed with total disregard
>>> to the possible applicability of parser generators, regex compilers, etc.
>>>
>>> Its as if each domain has its own notion of how the technology should
>>> be applied and implemented.
>>
>> Unfortunately yes. I think very few people ever think about it enough
>> to recognize that.
>
>Because they likely don't work in both domains.
>
>Think about it; as a hardware person, I see nothing different between:
> ready * /buffer_full
>and
> /(/ready + buffer_full)
>I could draw either representation schematically and recognize that
>the same gates were involved. I would choose the "expression"
>(rendition) that best "fit into" what *followed* that "signal".
>
>For software people, this seems to require a conscious effort
>("What are the equivalent ways of expressing this and which
>makes most sense to someone reading my code, later?") so you
>often see expressions that you have to THINK about instead of
>being more intuitively expressed.

I'm primarily a software person, though I have done simple (mostly
TTL) interface hardware, and some not so simple FPGA programming [but
that I think still counts as "software"]. I have done a lot of
bit-banging and bare hardware programming.

I think the problem really is that too many programmers now do NOT
ever learn assembler. I had learned a few different assembler
languages before I learned C, and I think it helped immensely because
I never had any trouble with pointers or indirections, etc., or
manually managing memory ... the very things that tend to confound C
newbies.

>Likewise, a hardware person KNOWS that changing multiple
>signals "concurrently" can lead to races and hazards.
>But, a software person has to be lectured in atomic operators
>(because time is serial to him -- ASSUMING he thinks about it!).

Too much specialization in education.

Concurrency, parallelism and atomic operations tend to be addressed
(not "taught" per se) only in OS classes. Many CS students do not
take OS classes. Atomics and threading are covered in CSE, but only
the practical uses of them and not the theory (or how they evolved
which I think is almost as important).

>Folks taught in (just) one domain often are poor practitioners
>in the other.

The software industry, in particular, now tends to frown upon
generalists for developer positions, and for management any prior
developer experience no longer much matters.

If you can't demonstrate significant expertise in ___ of the week, in
most places you won't even make it past HR to be interviewed by the
people who can recognize that your prior experience has relevance and
that you could quickly learn whatever is needed to do the job.

On 29/03/23 02:17, George Neuner wrote:
> On Mon, 27 Mar 2023 16:18:51 +1100, Clifford Heath
> <no.spam@please.net> wrote:
>
>> On 26/03/23 15:45, George Neuner wrote:
>>> On Wed, 22 Mar 2023 18:15:43 -0700, Don Y
>>> <blockedofcourse@foo.invalid> wrote:
>>> The terms "FA" (finite automaton) and "FSM" (finite state machine)
>>> are, in fact, synonymous.
>>>
>>> What is confusing is that we got to this point through discussion of
>>> parsing and lexing tools - which ARE geared toward languages.
>>> Moreover, yacc and bison do NOT implement a general FA, but rather a
>>> particular variety of FA that useful for language parsing and which
>>> involves an auxiliary stack.
>>
>> The stack means it's not a FA.
>
> No, it still is an FA ... it just is a specialized form.

Ok, it's an FA operating on a stack. The stack makes the whole thing
non-regular, aka infinite, so it's only an FA if you exclude the stack
from the machine.

> Stackless FA, in fact, can process LR(1) grammars ... they just need
> (typically many) more states in the machine to do so

No. A stack is not finite. Every FA is finite, that's why they're called
FA. If you want to process a regular language, you can use an FA. If you
want to process an irregular language, you cannot - you need somewhere
to store unbounded staes and an FA *cannot* do that. It's in the
definition of such things!

>> ... and you'll
>> get parse errors that probably don't really tell you what is wrong with
>> the input :P.
>
> You can't rely on the tool for error handling (or even just messages)
> ... you really need to add deliberate error handling.

I wasn't talking about error recovery, just about reporting. Both are
hugely easier in an LL grammar. In the PEG parsers that I favour, you
can almost always just report the rules on the stack at the furthest
point reached, and (in all the grammars I've implemented) that gives a
better error report than anything you'd bother to create manually.

It amuses me that the folk who understand grammar well enough to be able
to produce powerful parser generators seem to be universally incapable
of generating code that can report parse failures in plain language.
Something about their brain's language centres has become so esoteric
that normal language escapes them.

>> Lex/Flex on the other hand exists to process only finite
>> states. The FSM algorithms they use are more efficient than any
>> algorithm that can handle LALR2, which is why these tools still exist as
>> independent tools.
>
> They exist separately because they were intended for different tasks

The articles published at the time I first used them (in 1980) clearly
stated that the two tools were needed "because we don't have a single
algorithm that is equally efficient at both tokenisation and parsing".

That's the entire reason that there *are* two separate tasks.
The same justification existed for the existence of egrep vs grep, BTW.

I tried to find the actual text, but it eludes me.

In a PEG parser, both tasks are equally efficient, and they are combined
into one grammar, so there aren't two tasks any more.

> In fact, regex tools existed already for a number of years before
> either lex or yacc came about. The difference was most previous tools
> directly /interpreted/ regex patterns,

Ken Thompson's implementation in the mid 1960s (documented in a 1968
CACM paper) translated the regexp into machine code. The list of
possible states was just a sequence of function call instructions.

The technique of converting multiple NFAs into a single DFA has also
been in use since the early 70s.

Have a read of Russ Cox's excellent presentation of these topics here:
<https://swtch.com/~rsc/regexp/regexp1.html>.

I implemented Thompson's algorithm here before deprecating it for PEGs:
<https://github.com/cjheath/strpp/blob/main/include/strregex.h>

It's baffling that many major languages *still* implement Regex using
the incredibly inferior backtracking approach.

> whereas lex
> compiled multiple patterns into a single recognizer that (effectively)
> tried all patterns simultaneously.

Exactly, that's the NFA* -> DFA thing I talked about.

> ANTLR implements LL(*) which is LL with unbounded lookahead.

It's unbounded, but must be regular. Many languages (including my
Constellation Query Language) require unbounded non-regular look-ahead,
which PEG provides, at some extra cost in memory. But the pathological
cases which *require* memoization only occur rarely, so a global packrat
strategy is sub-optimal.

> There
> are other LL(k) tools which require the programmer to choose a fixed
> amount of lookahead (and fail to process the grammar if the k value is
> too small). ANTLR analyzes the grammar and computes what lookahead is
> required pattern by pattern.

That's a poor description of how it works. It looks ahead using an FA,
so lookahead must be regular ("Finite State").

>> Anyone interested in the overlap between regular languages and finite
>> state machines should refer to the excellent
>> <https://github.com/katef/libfsm>.

Did you look at the FSM on the main README page of that site?
It shows two RE's being combined into one DFA. Very neat stuff.

>> I'm currently building a generalised parsing engine that also has the
>> capability of processing arbitrary binary file and network stream
>> formats, using a VM approach that interprets something very like a BNF,
>> but in prefix notation (+a means one-or-more "a"s, not a+). It's tiny,
>> efficient, embeddable, but can take a protocol description in a very few
>> bytes of VM code to handle almost any new protocol or format. I don't
>> think that has been done before, and I've wanted to do it for 25 years.
>
> I would be interested to see that (when it's finished, of course).
> Good luck!

The putative grammar for Px is here (but this doesn't describe captures
fully):
<https://github.com/cjheath/strpp/blob/main/grammars/px.px>

and the Pegexp engine is here (a template that I'm specialising to add
non-regular aka full LL grammar capability):
<https://github.com/cjheath/strpp/blob/main/include/pegexp.h>

The Px grammar rewritten as a named-map of Pegexp expressions is here:
<https://github.com/cjheath/strpp/blob/main/test/peg_test.cpp#L55-L91>
but I'll use a better structure for a compiled Px grammar, so that names
don't need to be looked up at runtime.

I've almost finished dicking with the structure of input streams that
will make it feasible for this to process data directly arriving on a
socket, and only caching as much as is needed for back-up and retry.
It's also possible to compile with/without UTF-8 support, but I can make
that more convenient. It's possible to specify binary matching even in a
Unicode parser though.

I want captures to do things like turn the ASCII digits on an HTTP
Content-Length header into a binary integer, save that integer as a
capture variable, and use that variable to count bytes in a later
repetition. This will enable a simple grammar describe all of HTTP/2.

By nesting parsers (incrementally feeding capture sections to a nested
parser) it should be possible to for example, run a protocol engine that
generates an HTTP/2 request (generating from an HTTP request grammar),
parses the response chunks, feeds base64-encoded chunks into a
conversion function (not specified in Px), and the output of that
conversion into e.g. a JPEG parser that actually verifies the JPEG
format, and can e.g. extract (as a parse capture) the GPS location from
inside the Exif data attached... and all without having to extend or
recompile the engine. Just load the target grammar, and if it succeeds,
you get the GPS location... and all file formats have been validated.

I envisage a world where the file-system is type-safe; almost no file is
a pure byte-stream, and it's not possible to save a JPEG file that
doesn't match the JPEG syntax. The file system must be pre-loaded with a
grammar for every new file type before writing such a file.

Clifford Heath.

"Progress?" (Y/N)

On 3/28/2023 12:25 PM, George Neuner wrote:

>>> UML tools are what you need to consider for more general FA / FSM.
>>
>> Which brings us full circle to the top of the thread.
>> I contend that to be expressive enough (i.e., to acts AS
>> equivalents for) to generate code, such a notation would
>> be just as complex as writing that code.
>>
>> And, given that one *must* write code -- but needn't always
>> reduce a design to an FSM -- you end up developing a second tool
>> that the developer is reliant upon but with less "practice"
>> than that of writing code.
>
> Agree and disagree.
>
> YMMV, but a lot of hand written state machines I have seen over the
> years included a lot of duplicated condition / transition decision
> code that could have been simplified or eliminated by the introdution
> of additional explicit states.

I think that is a consequence of the design approach. It is not
uncommon (an understatement?) for software to be developed
incrementally; build part of the machine, build a bit more,
and more, ... done.

This just doesn't work in hardware. You'd have to effectively
discard all of your previous work each time you "add a bit more".

As a result, you think about the entire machine before you settle
on an architecture or even begin the implementation.

Imagine designing a processor. You need to have an idea as to what
the entire instruction set is likely going to be before you can
figure out what inputs the control store needs to be able to
examine.

Software, OTOH, can always squeeze in a tweek to an existing
design. It's only when you're "done" that you can (*might*)
step back and look at your result -- and risk a refactoring.

> Reminded of the proverb: "programmers are great at figuring out what
> CAN be in parallel, but not what SHOULD be done in parallel".
>
> A tool can aid in figuring out what states are necessary, given the
> conditions, to create an optimal (software) machine.

This can have an advantage with incremental design. But, again,
means the developer has to be "fluent" in the tool as well
as the implementation language(s), etc.

>>> And, in truth, only CS students taking language / compiler courses
>>> ever will learn how to build NDFA and DFA state graphs, convert one
>>> graph form into the other, or how to generate table driven or switch
>>> code from a state graph.
>>
>> My education is dated in that *all* CS students learned how to design
>> grammars, build compilers, etc. when I was taught. Now, I suspect
>> "CS" means "programmer".
>
> No. CS students learn theory. CSE and maybe also IS students learn
> about development toolchains.

We learned of none of that. The theory being that it was too fluid
and dependent on the your actual career.

E.g., EVERY CS course used a different language, operating system/environment,
etc. None of this was considered important to the material being presented.
Just an annoying "implementation" detail.

There was no mention of MPUs (MCUs and SoCs not existing back then), hardware
interfaces, etc. You didn't "count bytes" or microseconds but, rather, dealt
with all resources just as "big O". More "implementation details".

[A similar approach was taken with *hardware*. Learn the concepts
and "how to learn" and worry about the implementation details once
you're on the job -- whatever that might be.]

> This dichotemy between theory and practice has existed at least since
> the 80's (when I was in college) and probably started even earlier.
> Prior to ~ late 90s, explicit CSE degrees didn't exist - there were
> just certificate programming courses (if applicable), and the project
> management aspects had to be learned on the job.

Ah, project management wasn't taught, at all! Nor the economics
associated with design. More implementation details. (This being
the biggest shortcoming, IMO, in my education. What value all
the theory if it's not economically feasible to use it? OTOH,
why limit the education to those things that are feasible *today*
and compromise the education for *tomorrow*?(

>> For software people, this seems to require a conscious effort
>> ("What are the equivalent ways of expressing this and which
>> makes most sense to someone reading my code, later?") so you
>> often see expressions that you have to THINK about instead of
>> being more intuitively expressed.
>
> I'm primarily a software person, though I have done simple (mostly
> TTL) interface hardware, and some not so simple FPGA programming [but
> that I think still counts as "software"]. I have done a lot of
> bit-banging and bare hardware programming.
>
> I think the problem really is that too many programmers now do NOT
> ever learn assembler. I had learned a few different assembler
> languages before I learned C, and I think it helped immensely because
> I never had any trouble with pointers or indirections, etc., or
> manually managing memory ... the very things that tend to confound C
> newbies.

Yes! In my case, my interest in hardware (I thought "computer science"
was going to teach me to design *computers*) led me to select more
"elective" courses that concentrated on those aspects of designs.
So, when a language concept was presented, I could visualize what
the hardware had to do to make it happen. E.g., I think of
a pointer to an "array" as "&array[0]." as that's just what's going
through my mind as I write the reference.

It also facilitated my design of the instruction sets for my CPUs
as I could think of what the *software* would want to do and
how I could design the processor to facilitate those activities.

[I keep looking for my notes on the various (hypothetical)
"machines" that we discussed and the consequences for
information hiding, parameter passing, etc. Back then,
they were just "arbitrary letters" (e.g., S-machine)
intended to illustrate different concepts. And, how different
languages would rely on them]

>> Likewise, a hardware person KNOWS that changing multiple
>> signals "concurrently" can lead to races and hazards.
>> But, a software person has to be lectured in atomic operators
>> (because time is serial to him -- ASSUMING he thinks about it!).
>
> Too much specialization in education.
>
> Concurrency, parallelism and atomic operations tend to be addressed
> (not "taught" per se) only in OS classes. Many CS students do not
> take OS classes. Atomics and threading are covered in CSE, but only
> the practical uses of them and not the theory (or how they evolved
> which I think is almost as important).

Again, all of this was part of the "CS" curriculum, "back then". But,
always as abstractions. Petri nets, etc. No need to deal with an
actual implementation because the "4 years" of an education would
lead to a very different set of implementations between when the
course was taught and the material applied!

>> Folks taught in (just) one domain often are poor practitioners
>> in the other.
>
> The software industry, in particular, now tends to frown upon
> generalists for developer positions, and for management any prior
> developer experience no longer much matters.

Did it ever? :> ---------------^^^^^^^^^^^^ Think: "Peter Principle"

> If you can't demonstrate significant expertise in ___ of the week, in
> most places you won't even make it past HR to be interviewed by the
> people who can recognize that your prior experience has relevance and
> that you could quickly learn whatever is needed to do the job.

Yes. I see friends who have "checklists" of specific skillsets
(i.e., familiar with product/platform X) as the first level
sort for candidates. I think the feeling is that they can't
afford to wait for you to get up to speed on platform X and
likely won't invest much in you when product Y comes along
(fish or cut bait!).

This has been the opposite of my experiences. The application
domain is *so* broad that you can't realistically expect someone to
have experience with some arbitrary X that is your bread-and-butter.
So, you want someone who has demonstrated an ability to work in
a variety of application domains, price points, etc. as reassurance
that they can learn your constraints.

[How many people have designed tablet press instrumentation?
Or, marine autopilots? Or video game hardware? Or, marking
engines? Or... If you set those as preconditions for
job openings, you end up with few-to-none applicants!]

A variety of experiences also tends to be indicative of folks
who enjoy learning -- instead of just refining a given set of
skills indefinitely. How much do you learn designing version Y
(Y >> X) of the same product?

Click here to read the complete article

On Wed, 29 Mar 2023 09:00:34 +1100, Clifford Heath
<no.spam@please.net> wrote:

>On 29/03/23 02:17, George Neuner wrote:
>> On Mon, 27 Mar 2023 16:18:51 +1100, Clifford Heath
>> <no.spam@please.net> wrote:
>>
>>> On 26/03/23 15:45, George Neuner wrote:
>>>> On Wed, 22 Mar 2023 18:15:43 -0700, Don Y
>>>> <blockedofcourse@foo.invalid> wrote:
>>>> The terms "FA" (finite automaton) and "FSM" (finite state machine)
>>>> are, in fact, synonymous.
>>>>
>>>> What is confusing is that we got to this point through discussion of
>>>> parsing and lexing tools - which ARE geared toward languages.
>>>> Moreover, yacc and bison do NOT implement a general FA, but rather a
>>>> particular variety of FA that useful for language parsing and which
>>>> involves an auxiliary stack.
>>>
>>> The stack means it's not a FA.
>>
>> No, it still is an FA ... it just is a specialized form.
>
>Ok, it's an FA operating on a stack. The stack makes the whole thing
>non-regular, aka infinite, so it's only an FA if you exclude the stack
>from the machine.
>
>> Stackless FA, in fact, can process LR(1) grammars ... they just need
>> (typically many) more states in the machine to do so
>
>
>No. A stack is not finite.

Nor is the input stream. So what? The stack is NOT part of the
machine, it is a memory used BY the state machine.

>Every FA is finite, that's why they're called
>FA. If you want to process a regular language, you can use an FA. If you
>want to process an irregular language, you cannot - you need somewhere
>to store unbounded staes and an FA *cannot* do that. It's in the
>definition of such things!

Nowhere in the definition of finite automaton does it say the
automaton is limited to what can be encoded by its states. In
particular there is no prohibition against using an external memory.
Recall that Turing machines used tapes of infinite length.

In any event, I'm still not following why you think this somehow is
important.

>>> ... and you'll
>>> get parse errors that probably don't really tell you what is wrong with
>>> the input :P.
>>
>> You can't rely on the tool for error handling (or even just messages)
>> ... you really need to add deliberate error handling.
>
>I wasn't talking about error recovery, just about reporting. Both are
>hugely easier in an LL grammar. In the PEG parsers that I favour, you
>can almost always just report the rules on the stack at the furthest
>point reached, and (in all the grammars I've implemented) that gives a
>better error report than anything you'd bother to create manually.

I wasn't talking about recovery either. When using an LR parser the
grammar designer/implementer has to augment BOTH error reporting and
error handling - which may or may not involve "recovery". See next.

>It amuses me that the folk who understand grammar well enough to be able
>to produce powerful parser generators seem to be universally incapable
>of generating code that can report parse failures in plain language.
>Something about their brain's language centres has become so esoteric
>that normal language escapes them.

LR works by incrementally assembling a sequence of tokens and looking
for a pattern that matches it.

LL works by selecting a pattern and incrementally looking to match
that pattern with the sequence of tokens beginning at the current
position in the input. Of course the pattern may be an alternation
having multiple possibilities, but the principle of operation remains.

Very, very different.

Neither method innately knows the context when a pattern match fails,
but in LL the context is readily apparent from the driver code which
directs the parse, so it is easy to provide a (somewhat) meaningful
error message just by maintaining a stack of the non-terminals already
matched and dumping the last N entries.

In contrast, in LR the context of the current match is given by the
machine state and the stack of unreduced (as-yet unmatched) tokens.
There is nothing readily available that could be used to provide a
user meaningful message ... you'd have to examine the machine state to
figure out even what you /might/ be looking for. Your position in the
input is about as close as you can get to a meaningful message without
the user code manually tracking context.

>>> Lex/Flex on the other hand exists to process only finite
>>> states. The FSM algorithms they use are more efficient than any
>>> algorithm that can handle LALR2, which is why these tools still exist as
>>> independent tools.
>>
>> They exist separately because they were intended for different tasks
>
>The articles published at the time I first used them (in 1980) clearly
>stated that the two tools were needed "because we don't have a single
>algorithm that is equally efficient at both tokenisation and parsing".

That was true, but wasn't the reason AT THE TIME they were written.
They were separate first and foremost because they were written at
different times. They never were combined because most machines of
that time did not have enough memory to handle the analysis and
recognizer generation for even moderately complex grammars ... making
the tool larger by including lexing was out of the question.

After a while, it was simply inertia that kept them from being
combined. Everyone was used to the status quo and so even when memory
sizes grew to the point where having a combination tool could be
useful, very few people cared.

Inertia is the reason why a lot of potentially interesting things
never happened. Diversion is the other reason - the people who could
have done it were doing other things.

>Ken Thompson's implementation in the mid 1960s (documented in a 1968
>CACM paper) translated the regexp into machine code. The list of
>possible states was just a sequence of function call instructions.

Yes, but Thompson's method was not widely used - again because of
memory sizes. Most uses of regex used many patterns, and it was more
efficient (memory-wise) to simply interpret the pattern directly: one
driver function to handle N patterns.

>The technique of converting multiple NFAs into a single DFA has also
>been in use since the early 70s.

Yes, and lex is from ~1973, IIRC. It was the first /publically/
available tool able to combine multiple NDFAs into single DFA.

>> ANTLR implements LL(*) which is LL with unbounded lookahead.
>
>It's unbounded, but must be regular. Many languages (including my
>Constellation Query Language) require unbounded non-regular look-ahead,
>which PEG provides, at some extra cost in memory. But the pathological
>cases which *require* memoization only occur rarely, so a global packrat
>strategy is sub-optimal.
>
>> There
>> are other LL(k) tools which require the programmer to choose a fixed
>> amount of lookahead (and fail to process the grammar if the k value is
>> too small). ANTLR analyzes the grammar and computes what lookahead is
>> required pattern by pattern.
>
>That's a poor description of how it works. It looks ahead using an FA,
>so lookahead must be regular ("Finite State").

No. Lookahead (and backtracking both) simply requires maintaining a
queue of as-yet unmatched tokens. It certainly could be done by a
state machine, but it does NOT require a state machine.

>>> Anyone interested in the overlap between regular languages and finite
>>> state machines should refer to the excellent
>>> <https://github.com/katef/libfsm>.
>
>Did you look at the FSM on the main README page of that site?
>It shows two RE's being combined into one DFA. Very neat stuff.

I haven't examined their method. It may be that they have found some
particularly efficient way to do it. That would be great. But
algorithms for merging FAs in graph representation have been around at
least since the 60s.

>>> I'm currently building a generalised parsing engine that also has the
>>> capability of processing arbitrary binary file and network stream
>>> formats, using a VM approach that interprets something very like a BNF,
>>> but in prefix notation (+a means one-or-more "a"s, not a+). It's tiny,
>>> efficient, embeddable, but can take a protocol description in a very few
>>> bytes of VM code to handle almost any new protocol or format. I don't
>>> think that has been done before, and I've wanted to do it for 25 years.
>>
>> I would be interested to see that (when it's finished, of course).
>> Good luck!
>
>The putative grammar for Px is here (but this doesn't describe captures
>fully):
><https://github.com/cjheath/strpp/blob/main/grammars/px.px>
>
>and the Pegexp engine is here (a template that I'm specialising to add
>non-regular aka full LL grammar capability):
><https://github.com/cjheath/strpp/blob/main/include/pegexp.h>
>
>The Px grammar rewritten as a named-map of Pegexp expressions is here:
><https://github.com/cjheath/strpp/blob/main/test/peg_test.cpp#L55-L91>
>but I'll use a better structure for a compiled Px grammar, so that names
>don't need to be looked up at runtime.
>
>I've almost finished dicking with the structure of input streams that
>will make it feasible for this to process data directly arriving on a
>socket, and only caching as much as is needed for back-up and retry.
>It's also possible to compile with/without UTF-8 support, but I can make
>that more convenient. It's possible to specify binary matching even in a
>Unicode parser though.
>
>I want captures to do things like turn the ASCII digits on an HTTP
>Content-Length header into a binary integer, save that integer as a
>capture variable, and use that variable to count bytes in a later
>repetition. This will enable a simple grammar describe all of HTTP/2.
>
>By nesting parsers (incrementally feeding capture sections to a nested
>parser) it should be possible to for example, run a protocol engine that
>generates an HTTP/2 request (generating from an HTTP request grammar),
>parses the response chunks, feeds base64-encoded chunks into a
>conversion function (not specified in Px), and the output of that
>conversion into e.g. a JPEG parser that actually verifies the JPEG
>format, and can e.g. extract (as a parse capture) the GPS location from
>inside the Exif data attached... and all without having to extend or
>recompile the engine. Just load the target grammar, and if it succeeds,
>you get the GPS location... and all file formats have been validated.
>
>I envisage a world where the file-system is type-safe; almost no file is
>a pure byte-stream, and it's not possible to save a JPEG file that
>doesn't match the JPEG syntax. The file system must be pre-loaded with a
>grammar for every new file type before writing such a file.
>
>Clifford Heath.

Click here to read the complete article

On 3/28/23 11:17 AM, George Neuner wrote:
> On Mon, 27 Mar 2023 16:18:51 +1100, Clifford Heath
> <no.spam@please.net> wrote:
>
>> On 26/03/23 15:45, George Neuner wrote:
>>> On Wed, 22 Mar 2023 18:15:43 -0700, Don Y
>>> <blockedofcourse@foo.invalid> wrote:
>>> The terms "FA" (finite automaton) and "FSM" (finite state machine)
>>> are, in fact, synonymous.
>>>
>>> What is confusing is that we got to this point through discussion of
>>> parsing and lexing tools - which ARE geared toward languages.
>>> Moreover, yacc and bison do NOT implement a general FA, but rather a
>>> particular variety of FA that useful for language parsing and which
>>> involves an auxiliary stack.
>>
>>
>> The stack means it's not a FA.
>
> No, it still is an FA ... it just is a specialized form.

The key point is that when talking about "class" of machines, stacks are
generally considered at least "effectively" infinite. So a stack based
machine.

Give a FA an (unbounded) stack, and you have a different class of
machine. Yes, in practice, you normally have a limit on the size of the
stack, but it is generally big enough to handle the problem at hand.

In the same way, a computer with 16 GB of ram and a TB of storage is
still technically a FA, but normally isn't treated as one, as the
methods to analyize a FA are impractical at that scale.

>> Clifford Heath.

On 29/03/23 12:27, George Neuner wrote:
> On Wed, 29 Mar 2023 09:00:34 +1100, Clifford Heath
> <no.spam@please.net> wrote:
>
>> On 29/03/23 02:17, George Neuner wrote:
>>> On Mon, 27 Mar 2023 16:18:51 +1100, Clifford Heath
>>> <no.spam@please.net> wrote:
>>>
>>>> On 26/03/23 15:45, George Neuner wrote:
>>>>> On Wed, 22 Mar 2023 18:15:43 -0700, Don Y
>>>>> <blockedofcourse@foo.invalid> wrote:
>>>>> The terms "FA" (finite automaton) and "FSM" (finite state machine)
>>>>> are, in fact, synonymous.
>>>>>
>>>>> What is confusing is that we got to this point through discussion of
>>>>> parsing and lexing tools - which ARE geared toward languages.
>>>>> Moreover, yacc and bison do NOT implement a general FA, but rather a
>>>>> particular variety of FA that useful for language parsing and which
>>>>> involves an auxiliary stack.
>>>>
>>>> The stack means it's not a FA.
>>>
>>> No, it still is an FA ... it just is a specialized form.
>>
>> Ok, it's an FA operating on a stack. The stack makes the whole thing
>> non-regular, aka infinite, so it's only an FA if you exclude the stack
>>from the machine.
>>
>>> Stackless FA, in fact, can process LR(1) grammars ... they just need
>>> (typically many) more states in the machine to do so
>>
>>
>> No. A stack is not finite.
>
> Nor is the input stream. So what? The stack is NOT part of the
> machine, it is a memory used BY the state machine.
>
>
>> Every FA is finite, that's why they're called
>> FA. If you want to process a regular language, you can use an FA. If you
>> want to process an irregular language, you cannot - you need somewhere
>> to store unbounded staes and an FA *cannot* do that. It's in the
>> definition of such things!
>
> Nowhere in the definition of finite automaton does it say the
> automaton is limited to what can be encoded by its states. In
> particular there is no prohibition against using an external memory.
> Recall that Turing machines used tapes of infinite length.
>
> In any event, I'm still not following why you think this somehow is
> important.
>
>
>>>> ... and you'll
>>>> get parse errors that probably don't really tell you what is wrong with
>>>> the input :P.
>>>
>>> You can't rely on the tool for error handling (or even just messages)
>>> ... you really need to add deliberate error handling.
>>
>> I wasn't talking about error recovery, just about reporting. Both are
>> hugely easier in an LL grammar. In the PEG parsers that I favour, you
>> can almost always just report the rules on the stack at the furthest
>> point reached, and (in all the grammars I've implemented) that gives a
>> better error report than anything you'd bother to create manually.
>
> I wasn't talking about recovery either. When using an LR parser the
> grammar designer/implementer has to augment BOTH error reporting and
> error handling - which may or may not involve "recovery". See next.
>
>
>> It amuses me that the folk who understand grammar well enough to be able
>> to produce powerful parser generators seem to be universally incapable
>> of generating code that can report parse failures in plain language.
>> Something about their brain's language centres has become so esoteric
>> that normal language escapes them.
>
> LR works by incrementally assembling a sequence of tokens and looking
> for a pattern that matches it.
>
> LL works by selecting a pattern and incrementally looking to match
> that pattern with the sequence of tokens beginning at the current
> position in the input. Of course the pattern may be an alternation
> having multiple possibilities, but the principle of operation remains.
>
> Very, very different.
>
> Neither method innately knows the context when a pattern match fails,
> but in LL the context is readily apparent from the driver code which
> directs the parse, so it is easy to provide a (somewhat) meaningful
> error message just by maintaining a stack of the non-terminals already
> matched and dumping the last N entries.
>
> In contrast, in LR the context of the current match is given by the
> machine state and the stack of unreduced (as-yet unmatched) tokens.
> There is nothing readily available that could be used to provide a
> user meaningful message ... you'd have to examine the machine state to
> figure out even what you /might/ be looking for. Your position in the
> input is about as close as you can get to a meaningful message without
> the user code manually tracking context.
>
>
>>>> Lex/Flex on the other hand exists to process only finite
>>>> states. The FSM algorithms they use are more efficient than any
>>>> algorithm that can handle LALR2, which is why these tools still exist as
>>>> independent tools.
>>>
>>> They exist separately because they were intended for different tasks
>>
>> The articles published at the time I first used them (in 1980) clearly
>> stated that the two tools were needed "because we don't have a single
>> algorithm that is equally efficient at both tokenisation and parsing".
>
> That was true, but wasn't the reason AT THE TIME they were written.
> They were separate first and foremost because they were written at
> different times. They never were combined because most machines of
> that time did not have enough memory to handle the analysis and
> recognizer generation for even moderately complex grammars ... making
> the tool larger by including lexing was out of the question.
>
> After a while, it was simply inertia that kept them from being
> combined. Everyone was used to the status quo and so even when memory
> sizes grew to the point where having a combination tool could be
> useful, very few people cared.
>
> Inertia is the reason why a lot of potentially interesting things
> never happened. Diversion is the other reason - the people who could
> have done it were doing other things.
>
>
>> Ken Thompson's implementation in the mid 1960s (documented in a 1968
>> CACM paper) translated the regexp into machine code. The list of
>> possible states was just a sequence of function call instructions.
>
> Yes, but Thompson's method was not widely used - again because of
> memory sizes. Most uses of regex used many patterns, and it was more
> efficient (memory-wise) to simply interpret the pattern directly: one
> driver function to handle N patterns.
>
>> The technique of converting multiple NFAs into a single DFA has also
>> been in use since the early 70s.
>
> Yes, and lex is from ~1973, IIRC. It was the first /publically/
> available tool able to combine multiple NDFAs into single DFA.
>
>
>>> ANTLR implements LL(*) which is LL with unbounded lookahead.
>>
>> It's unbounded, but must be regular. Many languages (including my
>> Constellation Query Language) require unbounded non-regular look-ahead,
>> which PEG provides, at some extra cost in memory. But the pathological
>> cases which *require* memoization only occur rarely, so a global packrat
>> strategy is sub-optimal.
>>
>>> There
>>> are other LL(k) tools which require the programmer to choose a fixed
>>> amount of lookahead (and fail to process the grammar if the k value is
>>> too small). ANTLR analyzes the grammar and computes what lookahead is
>>> required pattern by pattern.
>>
>> That's a poor description of how it works. It looks ahead using an FA,
>> so lookahead must be regular ("Finite State").
>
> No. Lookahead (and backtracking both) simply requires maintaining a
> queue of as-yet unmatched tokens. It certainly could be done by a
> state machine, but it does NOT require a state machine.
>
>
>>>> Anyone interested in the overlap between regular languages and finite
>>>> state machines should refer to the excellent
>>>> <https://github.com/katef/libfsm>.
>>
>> Did you look at the FSM on the main README page of that site?
>> It shows two RE's being combined into one DFA. Very neat stuff.
>
> I haven't examined their method. It may be that they have found some
> particularly efficient way to do it. That would be great. But
> algorithms for merging FAs in graph representation have been around at
> least since the 60s.
>
>
>>>> I'm currently building a generalised parsing engine that also has the
>>>> capability of processing arbitrary binary file and network stream
>>>> formats, using a VM approach that interprets something very like a BNF,
>>>> but in prefix notation (+a means one-or-more "a"s, not a+). It's tiny,
>>>> efficient, embeddable, but can take a protocol description in a very few
>>>> bytes of VM code to handle almost any new protocol or format. I don't
>>>> think that has been done before, and I've wanted to do it for 25 years.
>>>
>>> I would be interested to see that (when it's finished, of course).
>>> Good luck!
>>
>> The putative grammar for Px is here (but this doesn't describe captures
>> fully):
>> <https://github.com/cjheath/strpp/blob/main/grammars/px.px>
>>
>> and the Pegexp engine is here (a template that I'm specialising to add
>> non-regular aka full LL grammar capability):
>> <https://github.com/cjheath/strpp/blob/main/include/pegexp.h>
>>
>> The Px grammar rewritten as a named-map of Pegexp expressions is here:
>> <https://github.com/cjheath/strpp/blob/main/test/peg_test.cpp#L55-L91>
>> but I'll use a better structure for a compiled Px grammar, so that names
>> don't need to be looked up at runtime.
>>
>> I've almost finished dicking with the structure of input streams that
>> will make it feasible for this to process data directly arriving on a
>> socket, and only caching as much as is needed for back-up and retry.
>> It's also possible to compile with/without UTF-8 support, but I can make
>> that more convenient. It's possible to specify binary matching even in a
>> Unicode parser though.
>>
>> I want captures to do things like turn the ASCII digits on an HTTP
>> Content-Length header into a binary integer, save that integer as a
>> capture variable, and use that variable to count bytes in a later
>> repetition. This will enable a simple grammar describe all of HTTP/2.
>>
>> By nesting parsers (incrementally feeding capture sections to a nested
>> parser) it should be possible to for example, run a protocol engine that
>> generates an HTTP/2 request (generating from an HTTP request grammar),
>> parses the response chunks, feeds base64-encoded chunks into a
>> conversion function (not specified in Px), and the output of that
>> conversion into e.g. a JPEG parser that actually verifies the JPEG
>> format, and can e.g. extract (as a parse capture) the GPS location from
>> inside the Exif data attached... and all without having to extend or
>> recompile the engine. Just load the target grammar, and if it succeeds,
>> you get the GPS location... and all file formats have been validated.
>>
>> I envisage a world where the file-system is type-safe; almost no file is
>> a pure byte-stream, and it's not possible to save a JPEG file that
>> doesn't match the JPEG syntax. The file system must be pre-loaded with a
>> grammar for every new file type before writing such a file.
>>
>> Clifford Heath.
>
> George
>

Click here to read the complete article

Subject	Author
Re: Text on FSM	George Neuner
Re: Text on FSM	Don Y
Re: Text on FSM	George Neuner
Re: Text on FSM	Don Y
Re: Text on FSM	George Neuner
Re: Text on FSM	Don Y
Re: Text on FSM	George Neuner
Re: Text on FSM	Don Y
Re: Text on FSM	Clifford Heath
Re: Text on FSM	George Neuner
Re: Text on FSM	Clifford Heath
Re: Text on FSM	George Neuner
Re: Text on FSM	Clifford Heath
Re: Text on FSM	Richard Damon