Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

If you're not part of the solution, you're part of the precipitate.


devel / comp.compilers / Re: Integer sizes and DFAs

SubjectAuthor
* Integer sizes and DFAsChristopher F Clark
`- Re: Integer sizes and DFAsKaz Kylheku

1
Integer sizes and DFAs

<22-03-071@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=348&group=comp.compilers#348

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: christop...@compiler-resources.com (Christopher F Clark)
Newsgroups: comp.compilers
Subject: Integer sizes and DFAs
Date: Sat, 26 Mar 2022 17:16:15 +0200
Organization: Compilers Central
Lines: 27
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-071@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="83479"; mail-complaints-to="abuse@iecc.com"
Keywords: design
Posted-Date: 26 Mar 2022 12:22:25 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Christopher F Clark - Sat, 26 Mar 2022 15:16 UTC

One of the posters mentioned that ignoring the size of integers used
in the DFAs was "not computer science". Actually, it is. It's called
the RAM model of computing. And, in it you assume each "integer"
occupies one location and is also sufficient to address any memory
location. Sa, accessing any integer any where in memory is an O(1)
operation.

Note how that even works for the DNA DFAs, assuming 32 bit integers
and a byte addressed machine. With that model, you can fit 10 million
(1E7) states in the 2 billion byte (2GB) address space (each state has
4, 4 byte entries or 16 bytes). You might even fit something over 100
million (1E8) states there. Thus, your memory access time is linear
in the worst case (assuming every request is a cache miss).

With 64 bit integers, one is essentially only limited by total memory
(including disk) and it is still linear although the worst case time
is every access is a page fault. Probably not particularly practical,
but CS is about theory and how you model it.

--
******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------

Re: Integer sizes and DFAs

<22-03-072@comp.compilers>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=349&group=comp.compilers#349

  copy link   Newsgroups: comp.compilers
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: 480-992-...@kylheku.com (Kaz Kylheku)
Newsgroups: comp.compilers
Subject: Re: Integer sizes and DFAs
Date: Sat, 26 Mar 2022 18:39:53 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 57
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-072@comp.compilers>
References: <22-03-071@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="19106"; mail-complaints-to="abuse@iecc.com"
Keywords: performance, architecture, comment
Posted-Date: 26 Mar 2022 14:49:17 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
 by: Kaz Kylheku - Sat, 26 Mar 2022 18:39 UTC

On 2022-03-26, Christopher F Clark <christopher.f.clark@compiler-resources.com> wrote:
> One of the posters mentioned that ignoring the size of integers used
> in the DFAs was "not computer science". Actually, it is. It's called
> the RAM model of computing. And, in it you assume each "integer"
> occupies one location and is also sufficient to address any memory
> location. Sa, accessing any integer any where in memory is an O(1)
> operation.

The O notation is based on the independent variable(s) like n being able
to taken on unlimited, arbitrarily large values.

In certain common situations we model elementary operations as being
O(1), with certain justifications. That simplifies the analysis, while
leaving its results still relevant and true for practical applications
running on present hardware.

The justification needs to be articulated though.

> Note how that even works for the DNA DFAs, assuming 32 bit integers
> and a byte addressed machine. With that model, you can fit 10 million
> (1E7) states in the 2 billion byte (2GB) address space (each state has
> 4, 4 byte entries or 16 bytes).

Each 32 bit state has 4 byte entries only if it doesn't express that
it branches to other states for certain input symbols.

A DFA state is a row in a table, not just the number which
identifies/enumerates that row.

If the input symbols are bytes, then the abstract row size capped to 256
entries.

(Larger symbol sets can exist (e.g. the example of Unicode) but
it's a reasonabl assumption that the symbol set is finite. Still, it can
be large.)

If we have a DFA built from certain lexical rules, and we extend those
rules, it could be the case that existing states blow up in size,
because of having to branch to many more other states on new characters
that were not included in the old rules.

If a machine initially recongnizes the set of strings { "foo", "bar" }
then its state zero might have a condensed representation of two
transitions on the sybmols 'f' and 'b', such as a tiny two-element
array that is subject to a linear search.

If we extend the recognized set of strings, state 0 may have to branch
on more symbols than just 'f' and 'b'. Depending on how we are
representing states, the size of the object may change.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
[No existing computer is actually Turing equivalent because every real
computer has a finite memory, but we manage to do O(whatever) calculations
anyway. I'm not sure how productive this line of argument is beyond noting
what sorts of tasks are or aren't likely to fit in real computers. -John]

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor