Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Do molecular biologists wear designer genes?


devel / comp.lang.c / Re: about line atomizer

SubjectAuthor
* about line atomizerfir
+* Re: about line atomizerfir
|`* Re: about line atomizerScott Lurndal
| `- Re: about line atomizerfir
`* Re: about line atomizerBen
 `* Re: about line atomizerfir
  `* Re: about line atomizerfir
   `* Re: about line atomizerfir
    `* Re: about line atomizerfir
     +- Re: about line atomizerfir
     `* Re: about line atomizerBen
      `* Re: about line atomizerfir
       `* Re: about line atomizerfir
        `* Re: about line atomizerfir
         `* Re: about line atomizerfir
          `- Re: about line atomizerfir

1
about line atomizer

<f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21505&group=comp.lang.c#21505

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:4310:b0:67b:3fc1:86eb with SMTP id u16-20020a05620a431000b0067b3fc186ebmr11857147qko.495.1651069128767; Wed, 27 Apr 2022 07:18:48 -0700 (PDT)
X-Received: by 2002:a05:622a:12:b0:2f3:3ea5:71db with SMTP id x18-20020a05622a001200b002f33ea571dbmr19046875qtw.411.1651069128565; Wed, 27 Apr 2022 07:18:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 27 Apr 2022 07:18:48 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.122; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.122
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>
Subject: about line atomizer
From: profesor...@gmail.com (fir)
Injection-Date: Wed, 27 Apr 2022 14:18:48 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 153
 by: fir - Wed, 27 Apr 2022 14:18 UTC

i was writing about code splitter (which breaks input text file on
logical lines) - this splitter i find very important thus i was
writing about it many times.... second important thing ius line
atomizer wchich in turn breaks logical line on atoms

and note i write on this as i find this not just a piece of code to discuss
but somethink that has somewhat scientific value... also i faind it quite
oryginal i guess.. people dont see write this way though for me this method
seems superior (at least in terms of clarity)

what i got is basic atomizer , this atomizer divides on alphanumeric words of various sized words and also returns punctuations, each pny by one
this if i would atomize

void foo(int x, int y) {
i would get such result

1: "void"
2: "foo"
3: "("
4 "int"
5 "x"
6 ","
7 "int"
8 "y"
9 ")"
10 ""{"

so this is only 2 types of atoms i would say

and on this i found the code of furia, i ommited quited literals "aaaaaas ss"
and ' djkj dkj' and this was mistake becouse i should add it yet. as a 3rd type of atom )
also i guess i should add things like that 23.67876 as a sole atom .. i aloso probably should add commentary pieces as another type of atom, also
i think if i should add $ and @ as an alphanumeric signs (could be part of
alphanumeric words , not be teken as sole operators/punctuation)
also think if not add #fffff as a sole atom (think i could take # as a hex prefix
shorter alternative to 0xfffff)

what else i could treat as a sole atoms you think?

the basic atomizer code is

chunk FindFirstAtom_4furia(chunk txt) //not generalized
{ chunk result = {0};

if(ChunkIsEmpty(txt))
return result;

int len = ChunkLength(txt);

/// find begining
int i;

for(i=0; i<len; i++)
{
if(IsWhiteSpace(txt.beg[i])) {
continue;
}

if(IsPunctuation(txt.beg[i])) {
result.beg = &txt.beg[i];
result.end = &txt.beg[i];
return result;
}

if(IsAlpha(txt.beg[i])) {
result.beg = &txt.beg[i];
goto find_alpha_atom_end;
}
}

return result; // begining not found

find_alpha_atom_end:

for(int j=i+1; j<len; j++)
{
if(IsAlpha(txt.beg[j])) {
continue;
}

if(IsWhiteSpace(txt.beg[j])) {
result.end = &txt.beg[j-1];
return result;
}

if(IsPunctuation(txt.beg[j])) {
result.end = &txt.beg[j-1];
return result;
}
}

result.end = &txt.beg[len-1];
return result;
}

chunks LineAtomizer4furia(chunk txt)
{

static chunk* internal_ram_for_chunks = NULL;

int chunks_number_reserved_storage = 1000;

internal_ram_for_chunks = (chunk*) realloc(internal_ram_for_chunks, chunks_number_reserved_storage * sizeof(chunk) );

chunks results = {0};

///////// collect them

chunk txt4scan = txt;

int n;

for( n=0; ; n++)
{
chunk atom = FindFirstAtom_4furia(txt4scan);
txt4scan.beg = atom.end + 1;

// printf("\n>>>");
// PrintChunk(atom);

if(ChunkIsEmpty(atom))
{
results.first = &internal_ram_for_chunks[0];
results.last = &internal_ram_for_chunks[n-1];

return results;
}

internal_ram_for_chunks[n] = atom;

if( n > chunks_number_reserved_storage ) //resize storage if need
{
chunks_number_reserved_storage*=4;
internal_ram_for_chunks = (chunk*) realloc(internal_ram_for_chunks, chunks_number_reserved_storage * sizeof(chunk) );
}
}

}

void TestFuriaAtomizer()
{ chunk ch = Chunk(" internal_ram_for_chunks = #22fa34e 0x44zfa3 0x32af2 3230z09 \"sass ass\" s 'aasdd2as asd' `sa d2 2#$` (chunk*) realloc(internal_ram_for_chunks, chunks_number_reserved_storage * 298728/ sizeof(chunk) ); ");
chunks atoms = LineAtomizer4furia(ch);
PrintChunks(atoms);

}

Re: about line atomizer

<23cfba6b-f7ff-4170-9dc8-0a15edc69174n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21506&group=comp.lang.c#21506

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ac8:59d0:0:b0:2f1:fc58:2fce with SMTP id f16-20020ac859d0000000b002f1fc582fcemr19684667qtf.290.1651074880116; Wed, 27 Apr 2022 08:54:40 -0700 (PDT)
X-Received: by 2002:a05:6214:76d:b0:443:6801:6d0f with SMTP id f13-20020a056214076d00b0044368016d0fmr19980858qvz.60.1651074879974; Wed, 27 Apr 2022 08:54:39 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr3.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 27 Apr 2022 08:54:39 -0700 (PDT)
In-Reply-To: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.44; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.44
References: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <23cfba6b-f7ff-4170-9dc8-0a15edc69174n@googlegroups.com>
Subject: Re: about line atomizer
From: profesor...@gmail.com (fir)
Injection-Date: Wed, 27 Apr 2022 15:54:40 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 7
 by: fir - Wed, 27 Apr 2022 15:54 UTC

there is for example confusion for me what treat as a number chunk,
right now i just treat all alphanumeric particles as atoms, (like zzzz 1111 1zzz z111)
probebly i could treat those starting with digit 0-9 as number (including 1zzzzz)
and maybe all started with # also as numbers (not fully sure as to this)
#ff is for me better than 0xff (looks better and is shorter)
there is also problem with this dot if 1zzz is a number do atomize this dot as a part of a
number? 1zzz.zzz or treat it as 3 atoms? (and what with 1zz.zz.zz.z.zz ? ) 1.1 is number 1.a is probably a number a.a is 3 atoms i guess but a.1 is 3 atoms too? or maybe two?
this dot thing is specifically unclear imo

Re: about line atomizer

<jRdaK.5316$_o6b.1025@fx42.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21507&group=comp.lang.c#21507

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!paganini.bofh.team!pasdenom.info!nntpfeed.proxad.net!proxad.net!feeder1-1.proxad.net!193.141.40.65.MISMATCH!npeer.as286.net!npeer-ng0.as286.net!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx42.iad.POSTED!not-for-mail
X-newsreader: xrn 9.03-beta-14-64bit
Sender: scott@dragon.sl.home (Scott Lurndal)
From: sco...@slp53.sl.home (Scott Lurndal)
Reply-To: slp53@pacbell.net
Subject: Re: about line atomizer
Newsgroups: comp.lang.c
References: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com> <23cfba6b-f7ff-4170-9dc8-0a15edc69174n@googlegroups.com>
Lines: 47
Message-ID: <jRdaK.5316$_o6b.1025@fx42.iad>
X-Complaints-To: abuse@usenetserver.com
NNTP-Posting-Date: Wed, 27 Apr 2022 16:05:35 UTC
Organization: UsenetServer - www.usenetserver.com
Date: Wed, 27 Apr 2022 16:05:35 GMT
X-Received-Bytes: 2123
 by: Scott Lurndal - Wed, 27 Apr 2022 16:05 UTC

fir <profesor.fir@gmail.com> writes:
>there is for example confusion for me what treat as a number chunk,
>right now i just treat all alphanumeric particles as atoms, (like zzzz 1111 1zzz z111)
>probebly i could treat those starting with digit 0-9 as number (including 1zzzzz)
>and maybe all started with # also as numbers (not fully sure as to this)
>#ff is for me better than 0xff (looks better and is shorter)
> there is also problem with this dot if 1zzz is a number do atomize this dot as a part of a
>number? 1zzz.zzz or treat it as 3 atoms? (and what with 1zz.zz.zz.z.zz ? ) 1.1 is number 1.a is probably a number a.a is 3 atoms i guess but a.1 is 3 atoms too? or maybe two?
>this dot thing is specifically unclear imo

Rather than re-inventing the wheel, why don't you just use
lex(1) to generate your tokenizer?

DIGIT [0-9]
ID [a-zA-Z_][a-zA-Z0-9_]*

%%

{DIGIT}+ {
push_uint64(yyextra, strtoul(yytext, NULL, 10));
}

0[xX][0-9a-fA-F]+ {
push_uint64(yyextra, strtoul(yytext, NULL, 16));
}

n[0-9]"."[a-zA-Z0-9_]+"["[0-9]"]" {
push_identifier(yyextra, yytext);
}

{ID} {
push_identifier(yyextra, yytext);
}

"+"|"-"|"*"|"/"|"%"|"<<"|">>"|"&"|"|"|"^"|"=="|"!="|"<="|">="|"<"|">"|"=" {
push_opname(yyextra, yytext);
}

"(" {
push_lparen(yyextra);
}

")" {
push_rparen(yyextra);
}

Re: about line atomizer

<3635838b-7423-4f41-8222-0b4de4abf499n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21508&group=comp.lang.c#21508

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:622a:1491:b0:2f3:8173:840a with SMTP id t17-20020a05622a149100b002f38173840amr4638453qtx.530.1651076881608;
Wed, 27 Apr 2022 09:28:01 -0700 (PDT)
X-Received: by 2002:ae9:c30d:0:b0:69e:bd20:40cc with SMTP id
n13-20020ae9c30d000000b0069ebd2040ccmr16939488qkg.10.1651076881400; Wed, 27
Apr 2022 09:28:01 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!3.eu.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 27 Apr 2022 09:28:01 -0700 (PDT)
In-Reply-To: <jRdaK.5316$_o6b.1025@fx42.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.133; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.133
References: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>
<23cfba6b-f7ff-4170-9dc8-0a15edc69174n@googlegroups.com> <jRdaK.5316$_o6b.1025@fx42.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3635838b-7423-4f41-8222-0b4de4abf499n@googlegroups.com>
Subject: Re: about line atomizer
From: profesor...@gmail.com (fir)
Injection-Date: Wed, 27 Apr 2022 16:28:01 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 66
 by: fir - Wed, 27 Apr 2022 16:28 UTC

środa, 27 kwietnia 2022 o 18:05:50 UTC+2 Scott Lurndal napisał(a):
> fir <profes...@gmail.com> writes:
> >there is for example confusion for me what treat as a number chunk,
> >right now i just treat all alphanumeric particles as atoms, (like zzzz 1111 1zzz z111)
> >probebly i could treat those starting with digit 0-9 as number (including 1zzzzz)
> >and maybe all started with # also as numbers (not fully sure as to this)
> >#ff is for me better than 0xff (looks better and is shorter)
> > there is also problem with this dot if 1zzz is a number do atomize this dot as a part of a
> >number? 1zzz.zzz or treat it as 3 atoms? (and what with 1zz.zz.zz.z.zz ? ) 1.1 is number 1.a is probably a number a.a is 3 atoms i guess but a.1 is 3 atoms too? or maybe two?
> >this dot thing is specifically unclear imo
> Rather than re-inventing the wheel, why don't you just use
> lex(1) to generate your tokenizer?
>
> DIGIT [0-9]
> ID [a-zA-Z_][a-zA-Z0-9_]*
>
> %%
>
>
> {DIGIT}+ {
> push_uint64(yyextra, strtoul(yytext, NULL, 10));
> }
>
> 0[xX][0-9a-fA-F]+ {
> push_uint64(yyextra, strtoul(yytext, NULL, 16));
> }
>
> n[0-9]"."[a-zA-Z0-9_]+"["[0-9]"]" {
> push_identifier(yyextra, yytext);
> }
>
> {ID} {
> push_identifier(yyextra, yytext);
> }
>
>
> "+"|"-"|"*"|"/"|"%"|"<<"|">>"|"&"|"|"|"^"|"=="|"!="|"<="|">="|"<"|">"|"=" {
> push_opname(yyextra, yytext);
> }
>
> "(" {
> push_lparen(yyextra);
> }
>
> ")" {
> push_rparen(yyextra);
> }
im rather inventyng new wheel then reinventing old i guess - and thats a bit of
difference
i dont know what this yaxx is, always seem too boring to learn but if someone want
to talk something i may read

this splitter+atomizer (based on chunks) approach is especially good for some reasons imo,
(bit sad i not invented something new from the time i wourked out thi splitter+atomizer and
it was already few years ago)

Re: about line atomizer

<874k2eo5ij.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21509&group=comp.lang.c#21509

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!aioe.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben)
Newsgroups: comp.lang.c
Subject: Re: about line atomizer
Date: Wed, 27 Apr 2022 17:28:04 +0100
Organization: A noiseless patient Spider
Lines: 43
Message-ID: <874k2eo5ij.fsf@bsb.me.uk>
References: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="ea2fbcde32935fc0fbc4de3f4e5a010f";
logging-data="30383"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/uybOyZstqoWSFD2cEPVuVNzcM03HO+9A="
Cancel-Lock: sha1:26NjNK4ffwaUB2nb1pPYgwYj56Y=
sha1:oNKUkPMpES9n/yFgaBtYV0fLtUM=
X-BSB-Auth: 1.e2a9ea8c36a512b14598.20220427172804BST.874k2eo5ij.fsf@bsb.me.uk
 by: Ben - Wed, 27 Apr 2022 16:28 UTC

fir <profesor.fir@gmail.com> writes:

> what i got is basic atomizer , this atomizer divides on alphanumeric
> words of various sized words and also returns punctuations, each pny
> by one this if i would atomize
>
> void foo(int x, int y) {
> i would get such result
>
> 1: "void"
> 2: "foo"
> 3: "("
> 4 "int"
> 5 "x"
> 6 ","
> 7 "int"
> 8 "y"
> 9 ")"
> 10 ""{"
>
> so this is only 2 types of atoms i would say
>
> and on this i found the code of furia, i ommited quited literals
> "aaaaaas ss" and ' djkj dkj' and this was mistake becouse i should add
> it yet. as a 3rd type of atom ) also i guess i should add things like
> that 23.67876 as a sole atom .. i aloso probably should add commentary
> pieces as another type of atom, also i think if i should add $ and @
> as an alphanumeric signs (could be part of alphanumeric words , not be
> teken as sole operators/punctuation) also think if not add #fffff as a
> sole atom (think i could take # as a hex prefix shorter alternative to
> 0xfffff)
>
> what else i could treat as a sole atoms you think?

That depends on your objectives, but even for a simple subset of C you
should probably recognise sequences like ++, --, += (etc), ..., ->, <<
and so on being single tokens.

As for numbers, "1ull" is a number in C as are "0x.1P-1" and "1f". And
in C20 so is ".0dd".

--
Ben.

Re: about line atomizer

<54d723a8-d5ce-401b-97c8-bcf1f6ed5274n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21511&group=comp.lang.c#21511

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:6214:2aab:b0:446:4053:7a2b with SMTP id js11-20020a0562142aab00b0044640537a2bmr20636772qvb.127.1651079383972;
Wed, 27 Apr 2022 10:09:43 -0700 (PDT)
X-Received: by 2002:a05:622a:6114:b0:2f0:ffc8:53f8 with SMTP id
hg20-20020a05622a611400b002f0ffc853f8mr19595204qtb.681.1651079383762; Wed, 27
Apr 2022 10:09:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 27 Apr 2022 10:09:43 -0700 (PDT)
In-Reply-To: <874k2eo5ij.fsf@bsb.me.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.133; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.133
References: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com> <874k2eo5ij.fsf@bsb.me.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <54d723a8-d5ce-401b-97c8-bcf1f6ed5274n@googlegroups.com>
Subject: Re: about line atomizer
From: profesor...@gmail.com (fir)
Injection-Date: Wed, 27 Apr 2022 17:09:43 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: fir - Wed, 27 Apr 2022 17:09 UTC

środa, 27 kwietnia 2022 o 18:28:17 UTC+2 Ben napisał(a):
> >
> > what else i could treat as a sole atoms you think?
> That depends on your objectives, but even for a simple subset of C you
> should probably recognise sequences like ++, --, += (etc), ..., ->, <<
> and so on being single tokens.
>

i thinked of it (becouse i also see that most of the punctuation marks work alone but some ot them work like two ++ -- // += etc) but this eventually is not so good idea (im stil not sure) this is becouse if i check
in later code things like atom[i]=='+' writing atom[i]=='+' && atom[i+1]=='=' is not so harm and fact
that i can assume all punctuations are 1 char long not gives me more in that further code, I am not sure
though but for now i decided not do that (if i would do that i would generally do more on that atom level
not few '++' and for now i didnt seen how more i can do - this is hovever some thing for consideration)

normally after this 1 plitter/log-line layer and 2 atoms layer i go stright to if's (small or big how to say it) forest hovvever i noticed recently i would need possibly yet something among atoms and ifs, at the moment hovever im not sure how to name it - it is for example a piece of code that would present an
expressions for me like in function callargument list
boo(x+x+2, foo(a+b/2,f()*3+zise,7*foo2.z.z.3) );

i got a code that emits assembly for expresions (right now only simple) but here like above i got 4 of them so it seems i need maybe some 3rd layer after splitter and atomizer hovever im not sure yet what to think on this, maybe its a kind of 'unwinder' (?) (code that unwind things and produce sequential list?)
got no much idea but there is the think that atom level just should produce atoms in a form that next layer would get as most simple to operate (and for sure giving "akjhsakjh21298&^%&^" as a sole atom is very helpful here, same with 23.233d numbers giving ++ as a sole gives much less..still its kinda unclear
i agree...++ goes in opposite direction (or at least it may be seen that it goes) kinda as atommiser broke things on atoms then some next layer could reckognize particules like this ++ or foo2.z.z.3 above (whatever it is z.3 could be seen as a shortkut for z[3] eventually or a field of structure if structure could permision to mame fields with a numbers)

> As for numbers, "1ull" is a number in C as are "0x.1P-1" and "1f". And
> in C20 so is ".0dd".
>

some conservative wiew would be treat number as alphanumeric begining with digit or dot and having
at most one dot in it

this eventually hovever leads to existence of bad atoms ..i dont ermember but in my simple splitter bad atoms do not exist becouse yopu ony got 2 kind of them either 1-sized punctuation symbol or n-sized alphanumeric ..now it would yeild to bad weird atoms too

> --
> Ben.

Re: about line atomizer

<90ff3f36-3954-477f-aee5-8b019d4dcdc4n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21512&group=comp.lang.c#21512

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a37:2f04:0:b0:663:397d:7051 with SMTP id v4-20020a372f04000000b00663397d7051mr16888801qkh.333.1651080700640;
Wed, 27 Apr 2022 10:31:40 -0700 (PDT)
X-Received: by 2002:ac8:5bca:0:b0:2f3:713a:9b04 with SMTP id
b10-20020ac85bca000000b002f3713a9b04mr8660396qtb.599.1651080700434; Wed, 27
Apr 2022 10:31:40 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 27 Apr 2022 10:31:40 -0700 (PDT)
In-Reply-To: <54d723a8-d5ce-401b-97c8-bcf1f6ed5274n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.179; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.179
References: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>
<874k2eo5ij.fsf@bsb.me.uk> <54d723a8-d5ce-401b-97c8-bcf1f6ed5274n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <90ff3f36-3954-477f-aee5-8b019d4dcdc4n@googlegroups.com>
Subject: Re: about line atomizer
From: profesor...@gmail.com (fir)
Injection-Date: Wed, 27 Apr 2022 17:31:40 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 74
 by: fir - Wed, 27 Apr 2022 17:31 UTC

środa, 27 kwietnia 2022 o 19:09:51 UTC+2 fir napisał(a):
> środa, 27 kwietnia 2022 o 18:28:17 UTC+2 Ben napisał(a):
> > >
> > > what else i could treat as a sole atoms you think?
> > That depends on your objectives, but even for a simple subset of C you
> > should probably recognise sequences like ++, --, += (etc), ..., ->, <<
> > and so on being single tokens.
> >
> i thinked of it (becouse i also see that most of the punctuation marks work alone but some ot them work like two ++ -- // += etc) but this eventually is not so good idea (im stil not sure) this is becouse if i check
> in later code things like atom[i]=='+' writing atom[i]=='+' && atom[i+1]=='=' is not so harm and fact
> that i can assume all punctuations are 1 char long not gives me more in that further code, I am not sure
> though but for now i decided not do that (if i would do that i would generally do more on that atom level
> not few '++' and for now i didnt seen how more i can do - this is hovever some thing for consideration)
>
> normally after this 1 plitter/log-line layer and 2 atoms layer i go stright to if's (small or big how to say it) forest hovvever i noticed recently i would need possibly yet something among atoms and ifs, at the moment hovever im not sure how to name it - it is for example a piece of code that would present an
> expressions for me like in function callargument list
> boo(x+x+2, foo(a+b/2,f()*3+zise,7*foo2.z.z.3) );
>
> i got a code that emits assembly for expresions (right now only simple) but here like above i got 4 of them so it seems i need maybe some 3rd layer after splitter and atomizer hovever im not sure yet what to think on this, maybe its a kind of 'unwinder' (?) (code that unwind things and produce sequential list?)
> got no much idea but there is the think that atom level just should produce atoms in a form that next layer would get as most simple to operate (and for sure giving "akjhsakjh21298&^%&^" as a sole atom is very helpful here, same with 23.233d numbers giving ++ as a sole gives much less..still its kinda unclear
> i agree...++ goes in opposite direction (or at least it may be seen that it goes) kinda as atommiser broke things on atoms then some next layer could reckognize particules like this ++ or foo2.z.z.3 above (whatever it is z.3 could be seen as a shortkut for z[3] eventually or a field of structure if structure could permision to mame fields with a numbers)
> > As for numbers, "1ull" is a number in C as are "0x.1P-1" and "1f". And
> > in C20 so is ".0dd".
> >
> some conservative wiew would be treat number as alphanumeric begining with digit or dot and having
> at most one dot in it
>
> this eventually hovever leads to existence of bad atoms ..i dont ermember but in my simple splitter bad atoms do not exist becouse yopu ony got 2 kind of them either 1-sized punctuation symbol or n-sized alphanumeric ..now it would yeild to bad weird atoms too
>
sorry if that was probably extremly hard to read

what i think is i need to balance clarity of rules and and usability of the outcome and those dotted numbers make problem here.. they are needed to be atoimised for usability but their form breaks clarity

for example im rather sure i shouldnt atomize a.b.c as one atom (becouse then i would need to split it again in compiler) but for sure i need 11.22 treat as one.. now things like 11.22ssz yeilds to extended number or bad atom (i would rather think is as extended number) but this 11.22ssz.3.s.s.s$zzz.z.z.33.@zz is rather for sure bad atom )or begining of this is number wrongly attached to series of atoms)... not sure how decision to take here

quotes also btw lead to bad atoms becouse im sure quotes should be always "aa jlk k " (begining and ending with quote mark) so aa"xx" or "aaa"z makes already bad atoms - so im not sure if i should make compile error or allow that?

Re: about line atomizer

<3d129f8b-79e2-4db1-a642-461935c148c9n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21513&group=comp.lang.c#21513

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:622a:142:b0:2f3:86eb:bd9e with SMTP id v2-20020a05622a014200b002f386ebbd9emr1444604qtw.307.1651082249052;
Wed, 27 Apr 2022 10:57:29 -0700 (PDT)
X-Received: by 2002:a05:622a:1909:b0:2f3:3c03:1139 with SMTP id
w9-20020a05622a190900b002f33c031139mr19845732qtc.264.1651082248890; Wed, 27
Apr 2022 10:57:28 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!2.eu.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 27 Apr 2022 10:57:28 -0700 (PDT)
In-Reply-To: <90ff3f36-3954-477f-aee5-8b019d4dcdc4n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.179; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.179
References: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>
<874k2eo5ij.fsf@bsb.me.uk> <54d723a8-d5ce-401b-97c8-bcf1f6ed5274n@googlegroups.com>
<90ff3f36-3954-477f-aee5-8b019d4dcdc4n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3d129f8b-79e2-4db1-a642-461935c148c9n@googlegroups.com>
Subject: Re: about line atomizer
From: profesor...@gmail.com (fir)
Injection-Date: Wed, 27 Apr 2022 17:57:29 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 35
 by: fir - Wed, 27 Apr 2022 17:57 UTC

środa, 27 kwietnia 2022 o 19:31:48 UTC+2 fir napisał(a):
> sorry if that was probably extremly hard to read
>
> what i think is i need to balance clarity of rules and and usability of the outcome and those dotted numbers make problem here.. they are needed to be atoimised for usability but their form breaks clarity
>
> for example im rather sure i shouldnt atomize a.b.c as one atom (becouse then i would need to split it again in compiler) but for sure i need 11.22 treat as one.. now things like 11.22ssz yeilds to extended number or bad atom (i would rather think is as extended number) but this 11.22ssz.3.s.s.s$zzz.z.z.33.@zz is rather for sure bad atom )or begining of this is number wrongly attached to series of atoms)... not sure how decision to take here
>
> quotes also btw lead to bad atoms becouse im sure quotes should be always "aa jlk k " (begining and ending with quote mark) so aa"xx" or "aaa"z makes already bad atoms - so im not sure if i should make compile error or allow that?

i will maybe do that way -> aa"xx" are two atoms aa and "xx"
11.22ssz.3.s.s.s$zzz.z.z.33.@zz i will treat as a number (it is idientifier is letters digits and _$@ number is the same but begins with digit and also may have dots. i forgot it also may have minus sign, and there is a question if to allow space among this minus and digit, i probably disalow that space)

treating this as a number seems just simpler than thinking when to cut it and if i would cut it on a second dot then seeing this rest as a numbers punctuations and identfiers glued together seems to weird

im not quite happy with that state of things as this is some mess imo

damn im not even sure of that maybe i should more strictly and treat only 11.22 abive as a number and that contained .3. as alphanumeric...damn problematic case

Re: about line atomizer

<899ec799-e400-450e-b8d4-97e373599b9en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21514&group=comp.lang.c#21514

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ac8:5f46:0:b0:2f3:371f:7cd with SMTP id y6-20020ac85f46000000b002f3371f07cdmr20053046qta.94.1651083780732;
Wed, 27 Apr 2022 11:23:00 -0700 (PDT)
X-Received: by 2002:a05:622a:189c:b0:2f3:654c:369f with SMTP id
v28-20020a05622a189c00b002f3654c369fmr12448068qtc.50.1651083780578; Wed, 27
Apr 2022 11:23:00 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!3.eu.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 27 Apr 2022 11:23:00 -0700 (PDT)
In-Reply-To: <3d129f8b-79e2-4db1-a642-461935c148c9n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.220; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.220
References: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>
<874k2eo5ij.fsf@bsb.me.uk> <54d723a8-d5ce-401b-97c8-bcf1f6ed5274n@googlegroups.com>
<90ff3f36-3954-477f-aee5-8b019d4dcdc4n@googlegroups.com> <3d129f8b-79e2-4db1-a642-461935c148c9n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <899ec799-e400-450e-b8d4-97e373599b9en@googlegroups.com>
Subject: Re: about line atomizer
From: profesor...@gmail.com (fir)
Injection-Date: Wed, 27 Apr 2022 18:23:00 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 45
 by: fir - Wed, 27 Apr 2022 18:23 UTC

środa, 27 kwietnia 2022 o 19:57:36 UTC+2 fir napisał(a):
> środa, 27 kwietnia 2022 o 19:31:48 UTC+2 fir napisał(a):
> > sorry if that was probably extremly hard to read
> >
> > what i think is i need to balance clarity of rules and and usability of the outcome and those dotted numbers make problem here.. they are needed to be atoimised for usability but their form breaks clarity
> >
> > for example im rather sure i shouldnt atomize a.b.c as one atom (becouse then i would need to split it again in compiler) but for sure i need 11.22 treat as one.. now things like 11.22ssz yeilds to extended number or bad atom (i would rather think is as extended number) but this 11.22ssz.3.s.s.s$zzz.z.z.33.@zz is rather for sure bad atom )or begining of this is number wrongly attached to series of atoms)... not sure how decision to take here
> >
> > quotes also btw lead to bad atoms becouse im sure quotes should be always "aa jlk k " (begining and ending with quote mark) so aa"xx" or "aaa"z makes already bad atoms - so im not sure if i should make compile error or allow that?
> i will maybe do that way -> aa"xx" are two atoms aa and "xx"
> 11.22ssz.3.s.s.s$zzz.z.z.33.@zz i will treat as a number (it is idientifier is letters digits and _$@ number is the same but begins with digit and also may have dots. i forgot it also may have minus sign, and there is a question if to allow space among this minus and digit, i probably disalow that space)
>
> treating this as a number seems just simpler than thinking when to cut it and if i would cut it on a second dot then seeing this rest as a numbers punctuations and identfiers glued together seems to weird
>
> im not quite happy with that state of things as this is some mess imo
>
> damn im not even sure of that maybe i should more strictly and treat only 11.22 abive as a number and that contained .3. as alphanumeric...damn problematic case

it (i mean those dotted and negative numbers) makes so damn problem thet i got serious doubt if i should do that on this atomic layer for exampla take -11.22 this seems to be number but it depends also what as before and after like 11-11.22.3 etc ..this is so complex as to rules that maybe it sadly shouldnt be resolved here...maybe seems so (those cases seem to be proof that not, txis syntax is to bad to fit to this level - it would fit if number would have defnite form like in quites ( "asasa") opening quote, some signs ,ending quote and that type of quote would meen "that is a number " without that it is too mess

Re: about line atomizer

<e729d8da-b5a4-4de1-8a89-0810722ecfc8n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21516&group=comp.lang.c#21516

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a0c:e906:0:b0:456:540b:4e87 with SMTP id a6-20020a0ce906000000b00456540b4e87mr2071823qvo.47.1651087609652;
Wed, 27 Apr 2022 12:26:49 -0700 (PDT)
X-Received: by 2002:a37:b605:0:b0:69e:6d6f:aea7 with SMTP id
g5-20020a37b605000000b0069e6d6faea7mr17000428qkf.655.1651087609489; Wed, 27
Apr 2022 12:26:49 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 27 Apr 2022 12:26:49 -0700 (PDT)
In-Reply-To: <899ec799-e400-450e-b8d4-97e373599b9en@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.138; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.138
References: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>
<874k2eo5ij.fsf@bsb.me.uk> <54d723a8-d5ce-401b-97c8-bcf1f6ed5274n@googlegroups.com>
<90ff3f36-3954-477f-aee5-8b019d4dcdc4n@googlegroups.com> <3d129f8b-79e2-4db1-a642-461935c148c9n@googlegroups.com>
<899ec799-e400-450e-b8d4-97e373599b9en@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e729d8da-b5a4-4de1-8a89-0810722ecfc8n@googlegroups.com>
Subject: Re: about line atomizer
From: profesor...@gmail.com (fir)
Injection-Date: Wed, 27 Apr 2022 19:26:49 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 25
 by: fir - Wed, 27 Apr 2022 19:26 UTC

środa, 27 kwietnia 2022 o 20:23:08 UTC+2 fir napisał(a):
> >
> > damn im not even sure of that maybe i should more strictly and treat only 11.22 abive as a number and that contained .3. as alphanumeric...damn problematic case
> it (i mean those dotted and negative numbers) makes so damn problem thet i got serious doubt if i should do that on this atomic layer for exampla take -11.22 this seems to be number but it depends also what as before and after like 11-11.22.3 etc ..this is so complex as to rules that maybe it sadly shouldnt be resolved here...maybe seems so (those cases seem to be proof that not, txis syntax is to bad to fit to this level - it would fit if number would have defnite form like in quites ( "asasa") opening quote, some signs ,ending quote and that type of quote would meen "that is a number " without that it is too mess

finally i dont know...i wrote two wersions, one not glues 11.22 type atoms and other that does (with this previously mentioned rule if it srarts with digit it not takes dot as an end of an atom)

i would need to check what is more practical, it seem like this glueaing wersion could be more to use bot writing something GiveNumber(i) which will check if there is integer chunk or integer-dot-integer is probably not much more complex and the atom layer is cleaner so i dont know...it seems i got more will and skill to resolve this type of problems back then (maybe my head reshaped on less focused state)

Re: about line atomizer

<87h76emf9v.fsf@bsb.me.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21517&group=comp.lang.c#21517

  copy link   Newsgroups: comp.lang.c
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ben.use...@bsb.me.uk (Ben)
Newsgroups: comp.lang.c
Subject: Re: about line atomizer
Date: Wed, 27 Apr 2022 21:40:12 +0100
Organization: A noiseless patient Spider
Lines: 19
Message-ID: <87h76emf9v.fsf@bsb.me.uk>
References: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>
<874k2eo5ij.fsf@bsb.me.uk>
<54d723a8-d5ce-401b-97c8-bcf1f6ed5274n@googlegroups.com>
<90ff3f36-3954-477f-aee5-8b019d4dcdc4n@googlegroups.com>
<3d129f8b-79e2-4db1-a642-461935c148c9n@googlegroups.com>
<899ec799-e400-450e-b8d4-97e373599b9en@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: reader02.eternal-september.org; posting-host="ea2fbcde32935fc0fbc4de3f4e5a010f";
logging-data="21047"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19nLRxOjQJbBbuavSXi0Iuc/CF8XHFQBRc="
Cancel-Lock: sha1:p1HXxewQLvIVII0TqB7oh/sgvyM=
sha1:zdNh9bNe5ej1tbXa1Bh9VWD9c+w=
X-BSB-Auth: 1.6ad1a7c093192a09b75e.20220427214012BST.87h76emf9v.fsf@bsb.me.uk
 by: Ben - Wed, 27 Apr 2022 20:40 UTC

fir <profesor.fir@gmail.com> writes:

> for exampla take -11.22 this seems to be number but it depends also
> what as before and after like 11-11.22.3 etc

No. In C, -11.22 is not a number, it's an operator followed by a number
(a floating point constant).

> ..this is so complex as to rules that maybe it sadly shouldnt be
> resolved here.

The generally accepted technique is to split the input into lexical
tokens and then apply a parser to that token stream.

All of these are well-understood problems with well-known and effective
solutions.

--
Ben.

Re: about line atomizer

<e08c06c4-4488-4413-b1c4-b64a9c4666ffn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21518&group=comp.lang.c#21518

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:2683:b0:69c:8c9c:5f80 with SMTP id c3-20020a05620a268300b0069c8c9c5f80mr17907275qkp.367.1651094390580;
Wed, 27 Apr 2022 14:19:50 -0700 (PDT)
X-Received: by 2002:ac8:5b86:0:b0:2e2:72c:9e06 with SMTP id
a6-20020ac85b86000000b002e2072c9e06mr20245078qta.113.1651094390418; Wed, 27
Apr 2022 14:19:50 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 27 Apr 2022 14:19:50 -0700 (PDT)
In-Reply-To: <87h76emf9v.fsf@bsb.me.uk>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.116; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.116
References: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>
<874k2eo5ij.fsf@bsb.me.uk> <54d723a8-d5ce-401b-97c8-bcf1f6ed5274n@googlegroups.com>
<90ff3f36-3954-477f-aee5-8b019d4dcdc4n@googlegroups.com> <3d129f8b-79e2-4db1-a642-461935c148c9n@googlegroups.com>
<899ec799-e400-450e-b8d4-97e373599b9en@googlegroups.com> <87h76emf9v.fsf@bsb.me.uk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e08c06c4-4488-4413-b1c4-b64a9c4666ffn@googlegroups.com>
Subject: Re: about line atomizer
From: profesor...@gmail.com (fir)
Injection-Date: Wed, 27 Apr 2022 21:19:50 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 20
 by: fir - Wed, 27 Apr 2022 21:19 UTC

środa, 27 kwietnia 2022 o 22:40:26 UTC+2 Ben napisał(a):
> fir <profes...@gmail.com> writes:
>
> > for exampla take -11.22 this seems to be number but it depends also
> > what as before and after like 11-11.22.3 etc
> No. In C, -11.22 is not a number, it's an operator followed by a number
> (a floating point constant).
> > ..this is so complex as to rules that maybe it sadly shouldnt be
> > resolved here.
> The generally accepted technique is to split the input into lexical
> tokens and then apply a parser to that token stream.
>
> All of these are well-understood problems with well-known and effective
> solutions.
>
sorta but not fully... they not use as far as i know the chunks resizable arrays also some
decisions how to see things i think may be different.. and note my goal is not to write compiler but to rethink some inner things in a way i see it

Re: about line atomizer

<d8f47cb4-82ac-4d9e-8a16-69be6b2372b1n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21519&group=comp.lang.c#21519

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:a18:b0:69f:97af:c06f with SMTP id i24-20020a05620a0a1800b0069f97afc06fmr2295080qka.561.1651097928771;
Wed, 27 Apr 2022 15:18:48 -0700 (PDT)
X-Received: by 2002:a05:6214:2627:b0:446:6577:6ef8 with SMTP id
gv7-20020a056214262700b0044665776ef8mr21417818qvb.85.1651097928641; Wed, 27
Apr 2022 15:18:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 27 Apr 2022 15:18:48 -0700 (PDT)
In-Reply-To: <e08c06c4-4488-4413-b1c4-b64a9c4666ffn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.99; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.99
References: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>
<874k2eo5ij.fsf@bsb.me.uk> <54d723a8-d5ce-401b-97c8-bcf1f6ed5274n@googlegroups.com>
<90ff3f36-3954-477f-aee5-8b019d4dcdc4n@googlegroups.com> <3d129f8b-79e2-4db1-a642-461935c148c9n@googlegroups.com>
<899ec799-e400-450e-b8d4-97e373599b9en@googlegroups.com> <87h76emf9v.fsf@bsb.me.uk>
<e08c06c4-4488-4413-b1c4-b64a9c4666ffn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d8f47cb4-82ac-4d9e-8a16-69be6b2372b1n@googlegroups.com>
Subject: Re: about line atomizer
From: profesor...@gmail.com (fir)
Injection-Date: Wed, 27 Apr 2022 22:18:48 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 81
 by: fir - Wed, 27 Apr 2022 22:18 UTC

środa, 27 kwietnia 2022 o 23:19:57 UTC+2 fir napisał(a):
> środa, 27 kwietnia 2022 o 22:40:26 UTC+2 Ben napisał(a):
> > fir <profes...@gmail.com> writes:
> >
> > > for exampla take -11.22 this seems to be number but it depends also
> > > what as before and after like 11-11.22.3 etc
> > No. In C, -11.22 is not a number, it's an operator followed by a number
> > (a floating point constant).
> > > ..this is so complex as to rules that maybe it sadly shouldnt be
> > > resolved here.
> > The generally accepted technique is to split the input into lexical
> > tokens and then apply a parser to that token stream.
> >
> > All of these are well-understood problems with well-known and effective
> > solutions.
> >
> sorta but not fully... they not use as far as i know the chunks resizable arrays also some
> decisions how to see things i think may be different.. and note my goal is not to write compiler but to rethink some inner things in a way i see it

specifically im not convinced they write compilers in such nice form like

main()
{ chunk d = LoadChunk(input_file_name);
chunks lines = SplitOnLogLines4Furia(d);
int n = ChunksLength(lines);
for(int i=0; i<n; i++)
{ chunk current_line = lines.first[i];
if(ChunkIsEmpty(current_line) continue;
chunks atoms = LineAtomizer4furia(current_line);
int atoms_len = ChunksLength(atoms);
if(atoms_len<1) continue;

//acces atoms for each line here, emit assembly
}

}

maybe im reinventing things, but i dont heard on this and see many people do text processing
in C much more worse way

this pice of code above shows btw how the clarity of this splitter produces and what atomizer produces is important in inner loop

this code is clean imo but the attempts to atomize numbers showed to be messy

im not sure if i will not get tired by this in this month and eventually get back to it in future,
but if no maybe i will say something about those inner constructions in that inner loop
(whose now i got more like prototyped and badly written, but im not sure if some relatively clean schemes grow

for example from what i was posting some may seen thet there is a scheme of reallock
with static seed contained in function, so this work like resizable aray covered inside function
this seem to be a good candidate for such scheme

this way for example scanning pass when you need to build info on variables defined in
source could be done like

for(int i=0; i<n; i++)
{ AddVariableDefinition( FindVariableDefinition(current_line_atoms));
//say FindVariableDefinition returns int2 structure with numbers of atoms like "int" "size_x"
// (which i later nid for example when compiling expresions to know what is int and what is float

}

i find some sense in searching for such kind of 'streamline' patterns

Re: about line atomizer

<05c6a98e-1ff4-4fca-aa2a-779a688e9d3en@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21520&group=comp.lang.c#21520

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:2552:b0:67b:32e2:2400 with SMTP id s18-20020a05620a255200b0067b32e22400mr17114507qko.768.1651100168828;
Wed, 27 Apr 2022 15:56:08 -0700 (PDT)
X-Received: by 2002:a05:622a:14cb:b0:2e1:9fc5:424d with SMTP id
u11-20020a05622a14cb00b002e19fc5424dmr20631035qtx.543.1651100168693; Wed, 27
Apr 2022 15:56:08 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 27 Apr 2022 15:56:08 -0700 (PDT)
In-Reply-To: <d8f47cb4-82ac-4d9e-8a16-69be6b2372b1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.206; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.206
References: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>
<874k2eo5ij.fsf@bsb.me.uk> <54d723a8-d5ce-401b-97c8-bcf1f6ed5274n@googlegroups.com>
<90ff3f36-3954-477f-aee5-8b019d4dcdc4n@googlegroups.com> <3d129f8b-79e2-4db1-a642-461935c148c9n@googlegroups.com>
<899ec799-e400-450e-b8d4-97e373599b9en@googlegroups.com> <87h76emf9v.fsf@bsb.me.uk>
<e08c06c4-4488-4413-b1c4-b64a9c4666ffn@googlegroups.com> <d8f47cb4-82ac-4d9e-8a16-69be6b2372b1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <05c6a98e-1ff4-4fca-aa2a-779a688e9d3en@googlegroups.com>
Subject: Re: about line atomizer
From: profesor...@gmail.com (fir)
Injection-Date: Wed, 27 Apr 2022 22:56:08 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 27
 by: fir - Wed, 27 Apr 2022 22:56 UTC

czwartek, 28 kwietnia 2022 o 00:18:56 UTC+2 fir napisał(a):
>
> for(int i=0; i<n; i++)
> {
> AddVariableDefinition( FindVariableDefinition(current_line_atoms));
> //say FindVariableDefinition returns int2 structure with numbers of atoms like "int" "size_x"
> // (which i later nid for example when compiling expresions to know what is int and what is float
>
> }
>
this example is btw a bit fictional as i got such code for scaning this variables (and this is a piece of cace to be written in half an hour or something like that) but in reality this FindVariableDefinition not returns the info but calls inside AddVariableDefinition in couple of places of its inner tree (this is for example becouse it also gets track what current scope of function it is which it also reckognizes (wchich is trivial)

all in all im not sure what comes from this fictional example, if nothing or something...if that piece of code cant be shaped this way maybe it may be shaped other streamlined way ... all in all what i find interesting is those finding ways to shape code nice way and maybe i will find something interesting yet

> i find some sense in searching for such kind of 'streamline' patterns

Re: about line atomizer

<312f1100-cd1c-4006-b21b-dc82103af3dfn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21574&group=comp.lang.c#21574

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:c86:b0:69f:c7cb:935a with SMTP id q6-20020a05620a0c8600b0069fc7cb935amr3649332qki.229.1651409840593;
Sun, 01 May 2022 05:57:20 -0700 (PDT)
X-Received: by 2002:a05:620a:4543:b0:69f:c79d:8691 with SMTP id
u3-20020a05620a454300b0069fc79d8691mr3527371qkp.529.1651409840438; Sun, 01
May 2022 05:57:20 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Sun, 1 May 2022 05:57:20 -0700 (PDT)
In-Reply-To: <05c6a98e-1ff4-4fca-aa2a-779a688e9d3en@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.138; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.138
References: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>
<874k2eo5ij.fsf@bsb.me.uk> <54d723a8-d5ce-401b-97c8-bcf1f6ed5274n@googlegroups.com>
<90ff3f36-3954-477f-aee5-8b019d4dcdc4n@googlegroups.com> <3d129f8b-79e2-4db1-a642-461935c148c9n@googlegroups.com>
<899ec799-e400-450e-b8d4-97e373599b9en@googlegroups.com> <87h76emf9v.fsf@bsb.me.uk>
<e08c06c4-4488-4413-b1c4-b64a9c4666ffn@googlegroups.com> <d8f47cb4-82ac-4d9e-8a16-69be6b2372b1n@googlegroups.com>
<05c6a98e-1ff4-4fca-aa2a-779a688e9d3en@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <312f1100-cd1c-4006-b21b-dc82103af3dfn@googlegroups.com>
Subject: Re: about line atomizer
From: profesor...@gmail.com (fir)
Injection-Date: Sun, 01 May 2022 12:57:20 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 24
 by: fir - Sun, 1 May 2022 12:57 UTC

feeling my head partially went back old ways on thinking on c again (partially)
could say that after splitter and atomizer i need this one thing mentioned here
which i would name expression digger (so you got 3 splittr atommizzr diggr)

this is becouse in this proto-c code i got i got generali lines of defnitions and lines of calls
so its like
d c
c d
d c
c c
d and so on (when call may be allo a call to entity not only function, like 1,2,3,a,foo() all are calls)

some extension on this is (those punctuation mark based) expresions
but even before then the extension is a what i could name list/vector
it is a way to post a vector of calls and definition in one line so this yeald
to need to write an expression digger which takes chunks and return list of subchunks

need to get a bit more clear on this (in my code)

Re: about line atomizer

<e82bf2e2-42ff-4341-b9b9-73245b10fa0dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=21575&group=comp.lang.c#21575

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:2552:b0:67b:32e2:2400 with SMTP id s18-20020a05620a255200b0067b32e22400mr5307019qko.768.1651409954911;
Sun, 01 May 2022 05:59:14 -0700 (PDT)
X-Received: by 2002:a37:62d1:0:b0:69f:85e0:9b7a with SMTP id
w200-20020a3762d1000000b0069f85e09b7amr5480208qkb.100.1651409954794; Sun, 01
May 2022 05:59:14 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Sun, 1 May 2022 05:59:14 -0700 (PDT)
In-Reply-To: <312f1100-cd1c-4006-b21b-dc82103af3dfn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.172.255.138; posting-account=Sb6m8goAAABbWsBL7gouk3bfLsuxwMgN
NNTP-Posting-Host: 5.172.255.138
References: <f623d01f-e1e7-4809-a4fc-e84d837ff0a2n@googlegroups.com>
<874k2eo5ij.fsf@bsb.me.uk> <54d723a8-d5ce-401b-97c8-bcf1f6ed5274n@googlegroups.com>
<90ff3f36-3954-477f-aee5-8b019d4dcdc4n@googlegroups.com> <3d129f8b-79e2-4db1-a642-461935c148c9n@googlegroups.com>
<899ec799-e400-450e-b8d4-97e373599b9en@googlegroups.com> <87h76emf9v.fsf@bsb.me.uk>
<e08c06c4-4488-4413-b1c4-b64a9c4666ffn@googlegroups.com> <d8f47cb4-82ac-4d9e-8a16-69be6b2372b1n@googlegroups.com>
<05c6a98e-1ff4-4fca-aa2a-779a688e9d3en@googlegroups.com> <312f1100-cd1c-4006-b21b-dc82103af3dfn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e82bf2e2-42ff-4341-b9b9-73245b10fa0dn@googlegroups.com>
Subject: Re: about line atomizer
From: profesor...@gmail.com (fir)
Injection-Date: Sun, 01 May 2022 12:59:14 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 34
 by: fir - Sun, 1 May 2022 12:59 UTC

niedziela, 1 maja 2022 o 14:57:28 UTC+2 fir napisał(a):
> feeling my head partially went back old ways on thinking on c again (partially)
> could say that after splitter and atomizer i need this one thing mentioned here
> which i would name expression digger (so you got 3 splittr atommizzr diggr)
>
> this is becouse in this proto-c code i got i got generali lines of defnitions and lines of calls
> so its like
> d
> c
> c
> d
> d
> c
> c
> c
> d
> and so on (when call may be allo a call to entity not only function, like 1,2,3,a,foo() all are calls)
>
> some extension on this is (those punctuation mark based) expresions
> but even before then the extension is a what i could name list/vector
> it is a way to post a vector of calls and definition in one line so this yeald
> to need to write an expression digger which takes chunks and return list of subchunks
>
> need to get a bit more clear on this (in my code)

the fourth thing/stage after the splitter atomizer digger i could probably name as
microcompiler - it just compiles what was dug

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor