Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

War is never imperative. -- McCoy, "Balance of Terror", stardate 1709.2


devel / comp.lang.c / preprocessor shenanigans: labels in pseudo-assembly code

SubjectAuthor
* preprocessor shenanigans: labels in pseudo-assembly codeluser droog
+- Re: preprocessor shenanigans: labels in pseudo-assembly codeluser droog
`- Re: preprocessor shenanigans: labels in pseudo-assembly codeluser droog

1
preprocessor shenanigans: labels in pseudo-assembly code

<ecd9f451-4b81-4bfe-8b84-a318214ef763n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17660&group=comp.lang.c#17660

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a05:620a:90c:: with SMTP id v12mr602533qkv.190.1627490875404;
Wed, 28 Jul 2021 09:47:55 -0700 (PDT)
X-Received: by 2002:ac8:65cc:: with SMTP id t12mr466694qto.30.1627490875227;
Wed, 28 Jul 2021 09:47:55 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Wed, 28 Jul 2021 09:47:55 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=24.207.183.245; posting-account=G1KGwgkAAAAyw4z0LxHH0fja6wAbo7Cz
NNTP-Posting-Host: 24.207.183.245
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ecd9f451-4b81-4bfe-8b84-a318214ef763n@googlegroups.com>
Subject: preprocessor shenanigans: labels in pseudo-assembly code
From: luser.dr...@gmail.com (luser droog)
Injection-Date: Wed, 28 Jul 2021 16:47:55 +0000
Content-Type: text/plain; charset="UTF-8"
 by: luser droog - Wed, 28 Jul 2021 16:47 UTC

I've been playing some more with my 8086 emulator which is
written with a heavy use of factorization using macros (esp.
X-macros). Layered on top of the emulator is a Forth interpreter
written in a mixture of pre-compiled Forth code (arrays of 16bit
pointers to other Forth words) and assembly macros (which
evaluate to arrays of bytes.

This actually works out surprisingly well for what it does but one
big chore is having to manually calculate offsets for branching.
On the plus side, the assembler macros are simple enough that
it's not too hard to figure out how many bytes each instruction
takes. But it's an extra burden that *real* assemblers don't place
on the user.

First I'll explain more of what I have and later describe how I
think it can be extended to help with branch calculation.

The macros for each assembly operator evaluate to the bytes
of that instruction. Eg.

#define INT(no) 0xCD,0x##no
#define LITTLEENDIAN(w) (w)%0x100,(w)/0x100
#define MOVI(r,a) 0xb8+r,LITTLEENDIAN(a)
#define MOVBI(r,a) 0xb0+r,a
#define AXCH(r) 0x90+r
#define CALL 0xE8

In my program, these are invoked within a few larger macros
which build Forth dictionary entries and copy them into the 8086
memory.

typedef unsigned char UC;
typedef unsigned short US;
#define NAMESTRING(...) # __VA_ARGS__

#define CODE(n, e, ...) \
const US e = P_CODE_PTR; \
{ \
struct code_entry x = { \
.link = link, \
.flags = flags, \
.name_len = sizeof( NAMESTRING(n) ) - 1, \
.name = NAMESTRING(n) , \
.code = P_PARAM_PTR , \
/*.param = { __VA_ARGS__ , NEXT }*/ \
}; \
link = p - mem; \
memcpy( p, &x, sizeof x ); \
if(trace)printf("%s:%x ", #e, e); \
p += sizeof x; \
UC params[] = { __VA_ARGS__ , NEXT }; \
memcpy( p, params, sizeof params ); \
p += sizeof params; \
} \
/*end CODE()*/

All the assembly instructions are passed to this macro as the ... argument.
Then they are used to fill the short array params[] which is then copied
into memory.

So, I think it should be possible to break up the assembly code into
"phrases" which could be grouped with parens, and within a phrase
a pointer to the start of the phrase is available. This won't help with
forward branches but it should make backward branches much easier.
You get to *drop a dot* by starting a new section wrapped in parens.

It may make the macro gymnastics more involved. I may need to bring
in PP_NARG and dispatch to a set of macros, one for each length.

If that works, then it should also be possible to add named labels to
each phrase. But then it may start to need lots of extra parens that
may be distracting for code that doesn't need named labels.

Re: preprocessor shenanigans: labels in pseudo-assembly code

<b73dc02b-7996-4831-80c3-ac2527b23369n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17686&group=comp.lang.c#17686

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:ac8:75d4:: with SMTP id z20mr445573qtq.360.1627612806293;
Thu, 29 Jul 2021 19:40:06 -0700 (PDT)
X-Received: by 2002:a37:6453:: with SMTP id y80mr147276qkb.199.1627612806088;
Thu, 29 Jul 2021 19:40:06 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Thu, 29 Jul 2021 19:40:05 -0700 (PDT)
In-Reply-To: <ecd9f451-4b81-4bfe-8b84-a318214ef763n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=24.207.183.245; posting-account=G1KGwgkAAAAyw4z0LxHH0fja6wAbo7Cz
NNTP-Posting-Host: 24.207.183.245
References: <ecd9f451-4b81-4bfe-8b84-a318214ef763n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b73dc02b-7996-4831-80c3-ac2527b23369n@googlegroups.com>
Subject: Re: preprocessor shenanigans: labels in pseudo-assembly code
From: luser.dr...@gmail.com (luser droog)
Injection-Date: Fri, 30 Jul 2021 02:40:06 +0000
Content-Type: text/plain; charset="UTF-8"
 by: luser droog - Fri, 30 Jul 2021 02:40 UTC

On Wednesday, July 28, 2021 at 11:48:03 AM UTC-5, luser droog wrote:
> So, I think it should be possible to break up the assembly code into
> "phrases" which could be grouped with parens, and within a phrase
> a pointer to the start of the phrase is available. This won't help with
> forward branches but it should make backward branches much easier.
> You get to *drop a dot* by starting a new section wrapped in parens.
>
> It may make the macro gymnastics more involved. I may need to bring
> in PP_NARG and dispatch to a set of macros, one for each length.
>
> If that works, then it should also be possible to add named labels to
> each phrase. But then it may start to need lots of extra parens that
> may be distracting for code that doesn't need named labels.

I tried that and it works but doesn't help. I coded up the idea, using
ppnarg, pasting and dispatching, wrapping the phrases so the
_ pointer (ie. "dot" or "dollar", the current offset) is correctly updated
with the size of the previous phrases.

Then I tried using on some actual assembly code from the forth
interpreter and it doesn't seem to help me calculate the relative
offsets that I want.

$ cat label.h
#include "ppnarg.h"

#define NAMESTRING(...) # __VA_ARGS__
#define PHRASE(code,more) \
{ \
unsigned int so_far = _; \
unsigned char bytes[] = { code }; \
memcpy( ptr, bytes, sizeof bytes ); \
unsigned int _ = so_far + sizeof bytes; \
more \
}
#define CODE1(a) PHRASE(a,)
#define CODE2(a,b) PHRASE(a,PHRASE(b,))
#define CODE3(a,b,c) PHRASE(a,PHRASE(b,PHRASE(c,)))
#define CODE4(a,b,c,d) PHRASE(a, PHRASE(b, PHRASE(c, PHRASE(d,))))
#define CODE5(a,b,c,d,e) PHRASE(a, PHRASE(b, PHRASE(c, PHRASE(d, PHRASE(e,)))))
#define CODE_(...) CAT(CODE, __VA_ARGS__ )
#define STET(a) a
#define CAT(a,b) a ## b
#define CODEPHRASES(...) CODE_( STET(PP_NARG(__VA_ARGS__)) ) ( __VA_ARGS__ )

#define CODE(n, ...) \
{ \
char name[] = NAMESTRING( n ); \
unsigned int _ = 0; \
CODEPHRASES( __VA_ARGS__ ) \
}

CODE(this, (MOVAL,37,JMP,-4))

CODE(this, (MOVAL,37,JMP,-4),
(MOVAL,36,HALT) )

CODE(cmove, (POP(CX), POP(DX), POP(AX), PUSH(SI), CLD, OR(,R,CX,CX), JZ, 9),
(MOV(,R,SI,AX), MOV(,R,DI,DX)),
(BYTE+MOVS, DEC_(R,CX), JNZ, -5),
(POP(SI)))

CODE(plusminus, (POP(BX), OR(,R,BX,BX), JS, 9),
(POP(AX), OR(,R,AX,AX), JNS, 2),
(NEG(R,AX), JMP, 7),
(POP(AX), OR(,R,AX,AX), JS, 2),
(NEG(R,AX), PUSH(AX)))

$ cpp -P label.h | indent
{ char name[] = "this";
unsigned int _ = 0;
{
unsigned int so_far = _;
unsigned char bytes[] = { (MOVAL, 37, JMP, -4) };
memcpy (ptr, bytes, sizeof bytes);
unsigned int _ = so_far + sizeof bytes;
}
}

{
char name[] = "this";
unsigned int _ = 0;
{
unsigned int so_far = _;
unsigned char bytes[] = { (MOVAL, 37, JMP, -4) };
memcpy (ptr, bytes, sizeof bytes);
unsigned int _ = so_far + sizeof bytes;
{
unsigned int so_far = _;
unsigned char bytes[] = { (MOVAL, 36, HALT) };
memcpy (ptr, bytes, sizeof bytes);
unsigned int _ = so_far + sizeof bytes;
}
}
}

{
char name[] = "cmove";
unsigned int _ = 0;
{
unsigned int so_far = _;
unsigned char bytes[] =
{ (POP (CX), POP (DX), POP (AX), PUSH (SI), CLD, OR (, R, CX, CX), JZ,
9) };
memcpy (ptr, bytes, sizeof bytes);
unsigned int _ = so_far + sizeof bytes;
{
unsigned int so_far = _;
unsigned char bytes[] = { (MOV (, R, SI, AX), MOV (, R, DI, DX)) };
memcpy (ptr, bytes, sizeof bytes);
unsigned int _ = so_far + sizeof bytes;
{
unsigned int so_far = _;
unsigned char bytes[] = { (BYTE + MOVS, DEC_ (R, CX), JNZ, -5) };
memcpy (ptr, bytes, sizeof bytes);
unsigned int _ = so_far + sizeof bytes;
{
unsigned int so_far = _;
unsigned char bytes[] = { (POP (SI)) };
memcpy (ptr, bytes, sizeof bytes);
unsigned int _ = so_far + sizeof bytes;
}
}
}
}
}

{
char name[] = "plusminus";
unsigned int _ = 0;
{
unsigned int so_far = _;
unsigned char bytes[] = { (POP (BX), OR (, R, BX, BX), JS, 9) };
memcpy (ptr, bytes, sizeof bytes);
unsigned int _ = so_far + sizeof bytes;
{
unsigned int so_far = _;
unsigned char bytes[] = { (POP (AX), OR (, R, AX, AX), JNS, 2) };
memcpy (ptr, bytes, sizeof bytes);
unsigned int _ = so_far + sizeof bytes;
{
unsigned int so_far = _;
unsigned char bytes[] = { (NEG (R, AX), JMP, 7) };
memcpy (ptr, bytes, sizeof bytes);
unsigned int _ = so_far + sizeof bytes;
{
unsigned int so_far = _;
unsigned char bytes[] = { (POP (AX), OR (, R, AX, AX), JS, 2) };
memcpy (ptr, bytes, sizeof bytes);
unsigned int _ = so_far + sizeof bytes;
{
unsigned int so_far = _;
unsigned char bytes[] = { (NEG (R, AX), PUSH (AX)) };
memcpy (ptr, bytes, sizeof bytes);
unsigned int _ = so_far + sizeof bytes;
}
}
}
}
}
}

So, then I had another idea. I really just need to insert a byte
here and there in the sequence which is a count of a group
of preceding or following bytes. That led me to 2 macros
PREPEND_COUNT(...) and APPEND_COUNT(...) to wrap
bytes that will be jumped around. To jump backward, the
count needs to be negated and increased by one (to count
the count byte itself). So I tried this out on one of the
fragments from above.

$ cat label2.h
#define PREPEND_COUNT(tweak,...) tweak PP_NARG(__VA_ARGS__), __VA_ARGS__
#define APPEND_COUNT(tweak,...) __VA_ARGS__, tweak PP_NARG(__VA_ARGS__)
#define JMP_FORWARD(...) PREPEND_COUNT(, __VA_ARGS__ )
#define JMP_BACK(...) APPEND_COUNT(-1-, __VA_ARGS__ )

#define MOV(k,m,r,g) k+MOV, m ## r ## g
#define DEC_(m,r) DEC, m ## r

POP(CX), POP(DX), POP(AX), PUSH(SI), CLD, OR(,R,CX,CX), JZ,
JMP_FORWARD(MOV(,R,SI,AX), MOV(,R,DI,DX),
JMP_BACK(BYTE+MOVS, DEC_(R,CX), JNZ)),
POP(SI)

$ cpp -P label2.h | indent
POP (CX), POP (DX), POP (AX), PUSH (SI), CLD, OR (, R, CX, CX), JZ,
9, +MOV, RSIAX, +MOV, RDIDX, BYTE + MOVS, DEC, RCX, JNZ, -1 - 4, POP (SI)

Not quite what I imagined, but this actually helps with counting offsets.
Unfortunately it makes the assembly code harder to read, I think.

Re: preprocessor shenanigans: labels in pseudo-assembly code

<a7089f2a-b487-4abf-9dcc-93e863fcf342n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=17687&group=comp.lang.c#17687

  copy link   Newsgroups: comp.lang.c
X-Received: by 2002:a37:a004:: with SMTP id j4mr904796qke.499.1627628815469;
Fri, 30 Jul 2021 00:06:55 -0700 (PDT)
X-Received: by 2002:a05:620a:448c:: with SMTP id x12mr945833qkp.39.1627628815279;
Fri, 30 Jul 2021 00:06:55 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.c
Date: Fri, 30 Jul 2021 00:06:55 -0700 (PDT)
In-Reply-To: <ecd9f451-4b81-4bfe-8b84-a318214ef763n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=24.207.183.245; posting-account=G1KGwgkAAAAyw4z0LxHH0fja6wAbo7Cz
NNTP-Posting-Host: 24.207.183.245
References: <ecd9f451-4b81-4bfe-8b84-a318214ef763n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a7089f2a-b487-4abf-9dcc-93e863fcf342n@googlegroups.com>
Subject: Re: preprocessor shenanigans: labels in pseudo-assembly code
From: luser.dr...@gmail.com (luser droog)
Injection-Date: Fri, 30 Jul 2021 07:06:55 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: luser droog - Fri, 30 Jul 2021 07:06 UTC

On Wednesday, July 28, 2021 at 11:48:03 AM UTC-5, luser droog wrote:

> So, I think it should be possible to break up the assembly code into
> "phrases" which could be grouped with parens, and within a phrase
> a pointer to the start of the phrase is available. This won't help with
> forward branches but it should make backward branches much easier.
> You get to *drop a dot* by starting a new section wrapped in parens.
>
> It may make the macro gymnastics more involved. I may need to bring
> in PP_NARG and dispatch to a set of macros, one for each length.
>
> If that works, then it should also be possible to add named labels to
> each phrase. But then it may start to need lots of extra parens that
> may be distracting for code that doesn't need named labels.

Back to the first idea. I added labels to for each phrase. Then I changed
it to run through the argument list twice. Creating labels during the
first pass, and just counting the length of each byte sequence.
Then it runs through the list again copying each byte separately so
the _ variable (dot pointer) is constantly updated. Currently limited
to 5 labelled phrases of up to 9 bytes each.

It's a lot of machinery, but the usage is looking better. The offset
calculations are now easy and readable.

$ cat label2.h
#include "ppnarg.h"

#define FIRST(a,...) a
#define REST(a,...) __VA_ARGS__
#define STET(...) __VA_ARGS__
#define CAT(a,b) a ## b
#define LABEL(label,offset) \
unsigned int label = offset;
#define LABEL1(a) LABEL(FIRST a, _)
#define LABEL2(a,b) LABEL1(a) LABEL(FIRST b, FIRST a +PP_NARG(REST a))
#define LABEL3(a,b,c) LABEL2(a,b) LABEL(FIRST c, FIRST b +PP_NARG(REST b))
#define LABEL4(a,b,c,d) LABEL3(a,b,c) LABEL(FIRST d, FIRST c +PP_NARG(REST c))
#define LABEL5(a,b,c,d,e) LABEL4(a,b,c,d) LABEL(FIRST e, FIRST d +PP_NARG(REST d))
#define LABEL_(...) CAT(LABEL, __VA_ARGS__)
#define LABELS(...) LABEL_( STET(PP_NARG(__VA_ARGS__)) ) ( __VA_ARGS__ )
#define BYTE1(a) ++_; *ptr++ = (a);
#define BYTE2(a,b) BYTE1(a) BYTE1(b)
#define BYTE3(a,b,c) BYTE2(a,b) BYTE1(c)
#define BYTE4(a,b,c,d) BYTE3(a,b,c) BYTE1(d)
#define BYTE5(a,b,c,d,e) BYTE4(a,b,c,d) BYTE1(e)
#define BYTE6(a,b,c,d,e,f) BYTE5(a,b,c,d,e) BYTE1(f)
#define BYTE7(a,b,c,d,e,f,g) BYTE6(a,b,c,d,e,f) BYTE1(g)
#define BYTE8(a,b,c,d,e,f,g,h) BYTE7(a,b,c,d,e,f,g) BYTE1(h)
#define BYTE9(a,b,c,d,e,f,g,h,i) BYTE8(a,b,c,d,e,f,g,h) BYTE1(i)
#define BYTE_(...) CAT(BYTE, __VA_ARGS__)
#define BYTES(...) BYTE_( STET(PP_NARG(__VA_ARGS__)) ) ( __VA_ARGS__ )
#define PHRASE(code) BYTES( REST code )
#define PHRASE1(a) PHRASE(a)
#define PHRASE2(a,b) PHRASE(a) PHRASE(b)
#define PHRASE3(a,b,c) PHRASE(a) PHRASE2(b,c)
#define PHRASE4(a,b,c,d) PHRASE(a) PHRASE3(b,c,d)
#define PHRASE5(a,b,c,d,e) PHRASE(a) PHRASE4(b,c,d,e)
#define PHRASE_(...) CAT(PHRASE, __VA_ARGS__)
#define PHRASES(...) PHRASE_( STET(PP_NARG(__VA_ARGS__)) ) ( __VA_ARGS__ )

#define CODE(n, ...) \
{ \
char name[] = # n ; \
unsigned int _ = 0; \
LABELS( __VA_ARGS__ ) \
PHRASES( __VA_ARGS__ ) \
}

#define OR(k,m,r,g) k+OR,m##r##g
#define MOV(k,m,r,g) k+MOV,m##r##g
#define DEC_(m,r) DEC,m##r

CODE(cmove, (L1, POP(CX), POP(DX), POP(AX), PUSH(SI), CLD, OR(,R,CX,CX), JZ, L4-_),
(L2, MOV(,R,SI,AX), MOV(,R,DI,DX)),
(L3, BYTE+MOVS, DEC_(R,CX), JNZ, L3-_),
(L4, POP(SI)))
$ cpp -P label2.h
{ char name[] = "cmove" ; unsigned int _ = 0; unsigned int L1 = _; unsigned int L2 = L1 +9; unsigned int L3 = L2 +4; unsigned int L4 = L3 +5; ++_; *ptr++ = (POP(CX)); ++_; *ptr++ = (POP(DX)); ++_; *ptr++ = (POP(AX)); ++_; *ptr++ = (PUSH(SI)); ++_; *ptr++ = (CLD); ++_; *ptr++ = (+OR); ++_; *ptr++ = (RCXCX); ++_; *ptr++ = (JZ); ++_; *ptr++ = (L4-_); ++_; *ptr++ = (+MOV); ++_; *ptr++ = (RSIAX); ++_; *ptr++ = (+MOV); ++_; *ptr++ = (RDIDX); ++_; *ptr++ = (BYTE+MOVS); ++_; *ptr++ = (DEC); ++_; *ptr++ = (RCX); ++_; *ptr++ = (JNZ); ++_; *ptr++ = (L3-_); ++_; *ptr++ = (POP(SI)); }
$

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor