Welcome to novaBBS (click a section below)

mail files register newsreader groups login

Message-ID:

Life is NP-hard, and then you die. -- Dave Cock

On the ugly Heapsort implementation which becomes beautiful in VFX Forth

Subject	Author
On the ugly Heapsort implementation which becomes beautiful in VFX Forth	Michał Kasprzak
Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth	Stephen Pelc
Re: On the ugly Heapsort implementation which becomes beautiful in	Marcel Hendrix
Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth	Anton Ertl
Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth	antispam
Re: On the ugly Heapsort implementation which becomes beautiful in	Michał Kasprzak
Re: On the ugly Heapsort implementation which becomes beautiful in	minf...@arcor.de
Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth	none
Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth	Paul Rubin
Re: On the ugly Heapsort implementation which becomes beautiful in	Michał Kasprzak
Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth	antispam
Re: On the ugly Heapsort implementation which becomes beautiful in	Michał Kasprzak
Re: On the ugly Heapsort implementation which becomes beautiful in	Michał Kasprzak
Re: On the ugly Heapsort implementation which becomes beautiful in	none
Re: On the ugly Heapsort implementation which becomes beautiful in	Marcel Hendrix
Re: On the ugly Heapsort implementation which becomes beautiful in	none
Re: On the ugly Heapsort implementation which becomes beautiful in	Michał Kasprzak
Re: On the ugly Heapsort implementation which becomes beautiful in	Kerr-Mudd, John
Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth	none
Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth	Paul Rubin

On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19661&group=comp.lang.forth#19661

copy link Newsgroups: comp.lang.forth

X-Received: by 2002:a05:622a:1b01:b0:343:582f:3e07 with SMTP id bb1-20020a05622a1b0100b00343582f3e07mr8512888qtb.578.1660424291706;
Sat, 13 Aug 2022 13:58:11 -0700 (PDT)
X-Received: by 2002:ac8:5995:0:b0:343:77a0:7c39 with SMTP id
e21-20020ac85995000000b0034377a07c39mr6677147qte.663.1660424291379; Sat, 13
Aug 2022 13:58:11 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Sat, 13 Aug 2022 13:58:11 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=5.173.145.29; posting-account=6_WQ-goAAAB1rf-V6zYvzNSXvyUB0tj4
NNTP-Posting-Host: 5.173.145.29
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
Subject: On the ugly Heapsort implementation which becomes beautiful in VFX Forth
From: siarcze...@gmail.com (Michał Kasprzak)
Injection-Date: Sat, 13 Aug 2022 20:58:11 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 15348

by: Michał Kasprzak - Sat, 13 Aug 2022 20:58 UTC

Hello Waldek, Stephen and others!

Not so long ago my compatriot professor Waldek Hebisch "has coded" the Forth version of the Heap Sort Algorithm.

http://www.math.uni.wroc.pl/~hebisch/prog/taxi_hs.fs

Heap Sort is a beautiful algorithm which learning will give a lot to each of you! You will learn what "binary trees" are, when they are "complete", how to represent a binary tree as an ordinary array, why this array should be indexed from 1, and what relates this binary index to moving through the tree. You will learn what "heap condition" a complete binary tree must satisfy in order to be a "heap". Finally, you will learn the surprising non-obvious thing, why you are creating a heap of arbitrary data in O(n) pessimistic time, which is faster than you thought!
The Heap Sort Algorithm sorts in pessimistic O(n*log n) time which is better than QuickSort which does it in O(n^2). Admittedly Merge Sort has this time too, but Heap Sort sorts in place and Merge Sort needs extra second memory.

You can INCLUDE the above file after downloading it and after SET-SIZE to 2GB and reloading 64-bit VFX Forth or I will paste 2 words here:

: sift
( rra ra ir ii )
DUP 2 * 1+ ( rra ra ir ii jj )
BEGIN
>R OVER R@ SWAP R> OVER OVER ( rra ra ir ii jj ir jj ir jj )
> WHILE ( rra ra ir ii jj ir jj )
1+ > IF ( rra ra ir ii jj )
>R >R OVER R> SWAP R> SWAP OVER ( rra ra ir ii jj ra jj )
CELLS + DUP @ SWAP CELL+ @ < IF ( rra ra ir ii jj )
1 +
THEN
THEN ( rra ra ir ii jj )
>R >R >R OVER OVER R> ROT ROT R> ROT ROT R@
ROT ROT R> ( rra ra ir ii jj rra ra jj )
CELLS OVER + @ ROT ( rra ra ir ii jj ra rr_jj rra )
OVER < IF ( rra ra ir ii jj ra rr_jj )
>R >R OVER CELLS R> + R> SWAP !
SWAP DROP DUP DUP + 1+
ELSE
DROP DROP DROP ( rra ra ir ii )
OVER 1 +
THEN ( rra ra ir ii jj )
REPEAT
DROP DROP DROP SWAP DROP ( rra ra ii )
CELLS + !
;

: heapsort ( ra n )
0 OVER 1 - 2 / DO
OVER OVER OVER ( ra n ra n ra )
I CELLS + @ ROT ROT I ( ra n rra ra n I )
sift
-1 +LOOP
1 - 1 SWAP DO ( ra )
DUP DUP I CELLS + ( ra ra ra+I )
DUP @ ( ra ra ra+I rra )
SWAP ROT ( ra rra ra+I ra )
@ SWAP ! ( ra rra )
I 1 = IF
OVER !
ELSE
OVER I 0 ( rra ra I 0 )
sift
THEN
-1 +LOOP
;

As everybody can see the above code is ugly. Alas, this code is even an example of why not to use Forth. Admit, the code discourages you.

You probably want to see how it works. Let's create an array of 10 numbers and sort it:

create example 13 , 7 , 9 , 12 , 1 , 8 , 2 , 99 , 14 , 3 ,
: print 10 0 do example i cells + @ . loop ;
print
example 10 heapsort
print

Note 1: The word heapsort incorrectly leaves an address on the stack. There is no DROP before ; in heapsort.
There is even more junk on the stack after all the rama_taxis is done.

Question to Waldek: But basically your heapsort works so it could be used all over the world. The question of copyright arises. Do you give your heapsort code to the community for free and agree to any use of your heapsort, even without mentioning that it is you who "has coded" it?

Question to all of you: Can you rewrite this algorithm so that it becomes Forth's pride and not Forth's insult?
Maybe you might want to use variables or locals or arrange human comments?
You don't have to stay in CORE, but even if you want to stay like Waldek, then VARIABLE and ( ) are rather in Core, right Waldek? I don't know where this fear of using variables by Forthers comes from.

But when we use VFX Forth, the above code becomes beautiful.
It's because of Stephen who designed the VFX code generator.
Be sure to disassemble words sift and heapsort to see this beauty!

dasm sift
( 00977880 488BD3 ) MOV RDX, RBX
( 00977883 48D1E3 ) SHL RBX, # 1
( 00977886 48FFC3 ) INC RBX
( 00977889 488D6DF8 ) LEA RBP, [RBP+-08]
( 0097788D 48895500 ) MOV [RBP], RDX
( 00977891 90 ) NOP
( 00977892 90 ) NOP
( 00977893 90 ) NOP
( 00977894 90 ) NOP
( 00977895 90 ) NOP
( 00977896 90 ) NOP
( 00977897 90 ) NOP
( 00977898 53 ) PUSH RBX
( 00977899 488B1C24 ) MOV RBX, [RSP]
( 0097789D 5A ) POP RDX
( 0097789E 483B5508 ) CMP RDX, [RBP+08]
( 009778A2 488D6DF0 ) LEA RBP, [RBP+-10]
( 009778A6 488B4D18 ) MOV RCX, [RBP+18]
( 009778AA 48894D00 ) MOV [RBP], RCX
( 009778AE 48895D08 ) MOV [RBP+08], RBX
( 009778B2 488BDA ) MOV RBX, RDX
( 009778B5 0F8DC1000000 ) JNL/GE 0097797C
( 009778BB 48FFC3 ) INC RBX
( 009778BE 483B5D00 ) CMP RBX, [RBP]
( 009778C2 488B5D08 ) MOV RBX, [RBP+08]
( 009778C6 488D6D10 ) LEA RBP, [RBP+10]
( 009778CA 0F8D2A000000 ) JNL/GE 009778FA
( 009778D0 53 ) PUSH RBX
( 009778D1 48FF7500 ) PUSH QWORD [RBP]
( 009778D5 5B ) POP RBX
( 009778D6 5A ) POP RDX
( 009778D7 488BCA ) MOV RCX, RDX
( 009778DA 48C1E103 ) SHL RCX, # 03
( 009778DE 48034D10 ) ADD RCX, [RBP+10]
( 009778E2 488B01 ) MOV RAX, 0 [RCX]
( 009778E5 483B4108 ) CMP RAX, [RCX+08]
( 009778E9 48895D00 ) MOV [RBP], RBX
( 009778ED 488BDA ) MOV RBX, RDX
( 009778F0 0F8D04000000 ) JNL/GE 009778FA
( 009778F6 4883C301 ) ADD RBX, # 01
( 009778FA 53 ) PUSH RBX
( 009778FB 48FF7500 ) PUSH QWORD [RBP]
( 009778FF 48FF7508 ) PUSH QWORD [RBP+08]
( 00977903 5B ) POP RBX
( 00977904 5A ) POP RDX
( 00977905 488B0C24 ) MOV RCX, [RSP]
( 00977909 58 ) POP RAX
( 0097790A 48C1E003 ) SHL RAX, # 03
( 0097790E 48034510 ) ADD RAX, [RBP+10]
( 00977912 4C8B00 ) MOV R8, 0 [RAX]
( 00977915 4C3B4518 ) CMP R8, [RBP+18]
( 00977919 488D6DF0 ) LEA RBP, [RBP+-10]
( 0097791D 488B4520 ) MOV RAX, [RBP+20]
( 00977921 48894500 ) MOV [RBP], RAX
( 00977925 48894D08 ) MOV [RBP+08], RCX
( 00977929 48895510 ) MOV [RBP+10], RDX
( 0097792D 48895D18 ) MOV [RBP+18], RBX
( 00977931 498BD8 ) MOV RBX, R8
( 00977934 0F8E31000000 ) JLE/NG 0097796B
( 0097793A 53 ) PUSH RBX
( 0097793B 48FF7500 ) PUSH QWORD [RBP]
( 0097793F 488B5D10 ) MOV RBX, [RBP+10]
( 00977943 48C1E303 ) SHL RBX, # 03
( 00977947 5A ) POP RDX
( 00977948 4803DA ) ADD RBX, RDX
( 0097794B 5A ) POP RDX
( 0097794C 488913 ) MOV 0 [RBX], RDX
( 0097794F 488B5D08 ) MOV RBX, [RBP+08]
( 00977953 48035D08 ) ADD RBX, [RBP+08]
( 00977957 48FFC3 ) INC RBX
( 0097795A 488B5508 ) MOV RDX, [RBP+08]
( 0097795E 48895510 ) MOV [RBP+10], RDX
( 00977962 488D6D10 ) LEA RBP, [RBP+10]
( 00977966 E90C000000 ) JMP 00977977
( 0097796B 488B5D18 ) MOV RBX, [RBP+18]
( 0097796F 4883C301 ) ADD RBX, # 01
( 00977973 488D6D10 ) LEA RBP, [RBP+10]
( 00977977 E91CFFFFFF ) JMP 00977898
( 0097797C 488B5D10 ) MOV RBX, [RBP+10]
( 00977980 48C1E303 ) SHL RBX, # 03
( 00977984 48035D20 ) ADD RBX, [RBP+20]
( 00977988 488B5528 ) MOV RDX, [RBP+28]
( 0097798C 488913 ) MOV 0 [RBX], RDX
( 0097798F 488B5D30 ) MOV RBX, [RBP+30]
( 00977993 488D6D38 ) LEA RBP, [RBP+38]
( 00977997 C3 ) RET/NEXT
( 280 bytes, 86 instructions )

dasm heapsort
( 009779D0 488BD3 ) MOV RDX, RBX
( 009779D3 4883C3FF ) ADD RBX, # -01
( 009779D7 B902000000 ) MOV ECX, # 00000002
( 009779DC 488BC3 ) MOV RAX, RBX
( 009779DF 488BDA ) MOV RBX, RDX
( 009779E2 4899 ) CQO
( 009779E4 48F7F9 ) IDIV RCX
( 009779E7 488D6DF0 ) LEA RBP, [RBP+-10]
( 009779EB 48C7450000000000 ) MOV QWord [RBP], # 00000000
( 009779F3 48895D08 ) MOV [RBP+08], RBX
( 009779F7 488BD8 ) MOV RBX, RAX
( 009779FA E81195A9FF487A970000000 CALL 00410F10 (DO) 0000000000977A48
( 00977A07 90 ) NOP
( 00977A08 498BD6 ) MOV RDX, R14
( 00977A0B 48C1E203 ) SHL RDX, # 03
( 00977A0F 48035500 ) ADD RDX, [RBP]
( 00977A13 498BCE ) MOV RCX, R14
( 00977A16 488D6DE0 ) LEA RBP, [RBP+-20]
( 00977A1A 48895D00 ) MOV [RBP], RBX
( 00977A1E 488B4520 ) MOV RAX, [RBP+20]
( 00977A22 48894508 ) MOV [RBP+08], RAX
( 00977A26 488B02 ) MOV RAX, 0 [RDX]
( 00977A29 48894510 ) MOV [RBP+10], RAX
( 00977A2D 48895D18 ) MOV [RBP+18], RBX
( 00977A31 488BD9 ) MOV RBX, RCX
( 00977A34 E847FEFFFF ) CALL 00977880 SIFT
( 00977A39 4983C6FF ) ADD R14, # -01
( 00977A3D 4983C7FF ) ADD R15, # -01
( 00977A41 71C5 ) JNO 00977A08
( 00977A43 415E ) POP R14
( 00977A45 415F ) POP R15
( 00977A47 58 ) POP RAX
( 00977A48 4883C3FF ) ADD RBX, # -01
( 00977A4C 488D6DF8 ) LEA RBP, [RBP+-08]
( 00977A50 48C7450001000000 ) MOV QWord [RBP], # 00000001
( 00977A58 E8B394A9FFD77A970000000 CALL 00410F10 (DO) 0000000000977AD7
( 00977A65 90 ) NOP
( 00977A66 90 ) NOP
( 00977A67 90 ) NOP
( 00977A68 498BD6 ) MOV RDX, R14
( 00977A6B 48C1E203 ) SHL RDX, # 03
( 00977A6F 4803D3 ) ADD RDX, RBX
( 00977A72 488B0A ) MOV RCX, 0 [RDX]
( 00977A75 488B03 ) MOV RAX, 0 [RBX]
( 00977A78 488902 ) MOV 0 [RDX], RAX
( 00977A7B 498BD6 ) MOV RDX, R14
( 00977A7E 4883FA01 ) CMP RDX, # 01
( 00977A82 488D6DF8 ) LEA RBP, [RBP+-08]
( 00977A86 48895D00 ) MOV [RBP], RBX
( 00977A8A 488BD9 ) MOV RBX, RCX
( 00977A8D 0F8514000000 ) JNZ/NE 00977AA7
( 00977A93 488B5500 ) MOV RDX, [RBP]
( 00977A97 48891A ) MOV 0 [RDX], RBX
( 00977A9A 488B5D00 ) MOV RBX, [RBP]
( 00977A9E 488D6D08 ) LEA RBP, [RBP+08]
( 00977AA2 E921000000 ) JMP 00977AC8
( 00977AA7 498BD6 ) MOV RDX, R14
( 00977AAA 488D6DE8 ) LEA RBP, [RBP+-18]
( 00977AAE 48895500 ) MOV [RBP], RDX
( 00977AB2 488B5518 ) MOV RDX, [RBP+18]
( 00977AB6 48895508 ) MOV [RBP+08], RDX
( 00977ABA 48895D10 ) MOV [RBP+10], RBX
( 00977ABE BB00000000 ) MOV EBX, # 00000000
( 00977AC3 E8B8FDFFFF ) CALL 00977880 SIFT
( 00977AC8 4983C6FF ) ADD R14, # -01
( 00977ACC 4983C7FF ) ADD R15, # -01
( 00977AD0 7196 ) JNO 00977A68
( 00977AD2 415E ) POP R14
( 00977AD4 415F ) POP R15
( 00977AD6 58 ) POP RAX
( 00977AD7 C3 ) RET/NEXT
( 264 bytes, 71 instructions )

Click here to read the complete article

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<tdae7o$342na$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19667&group=comp.lang.forth#19667

copy link Newsgroups: comp.lang.forth

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: step...@vfxforth.com (Stephen Pelc)
Newsgroups: comp.lang.forth
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth
Date: Sun, 14 Aug 2022 09:11:20 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 29
Message-ID: <tdae7o$342na$1@dont-email.me>
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 14 Aug 2022 09:11:20 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="e9c94bfa78cc5d27537c01b350ef1714";
logging-data="3279594"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/3EqblCluuGCx+Mrovp7bQ"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:Kmw1451VeJggXSkUXKDmTQoboyk=
X-Usenapp: v1.22/l - Full License

by: Stephen Pelc - Sun, 14 Aug 2022 09:11 UTC

On 13 Aug 2022 at 22:58:11 CEST, "Michał Kasprzak" <siarczek83@gmail.com>
wrote:
> Question to Stephen Pelc: After all, your wonderful optimization fairy tale
> can be carried on. Why not to use single instruction:
> MOV RDX, [8*R14+00977830]
> instead of two of yours:
> MOV RDX, R14
> MOV RDX, [8*RDX+00977830]
> or even load the target registry rbx right away:
> MOV RBX, [8*R14+00977830]
> instead of your 3 instructions?
> MOV RDX, R14
> MOV RDX, [8*RDX+00977830]
> MOV RBX, RDX

R14 is a spacial register which must not be changed. The optimiser simply
copies it rather than marking it as read-only. That'll get fixed in two CPUs
time. The line MOV RBX, RDX comes from the stack shuffle at a basic block
boundary and is not part of the complex load. Whether the destination register
choice is optimal depends very much on coding style.

Stephen
--
Stephen Pelc, stephen@vfxforth.com
MicroProcessor Engineering, Ltd. - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, +44 (0)78 0390 3612,
+34 649 662 974
http://www.mpeforth.com - free VFX Forth downloads

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<8e482221-e1a0-4e34-9959-34fa26e58ce6n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19668&group=comp.lang.forth#19668

copy link Newsgroups: comp.lang.forth

X-Received: by 2002:ad4:4eec:0:b0:476:b2b2:1809 with SMTP id dv12-20020ad44eec000000b00476b2b21809mr9660742qvb.40.1660469677530;
Sun, 14 Aug 2022 02:34:37 -0700 (PDT)
X-Received: by 2002:a0c:e0cc:0:b0:474:94d4:53e with SMTP id
x12-20020a0ce0cc000000b0047494d4053emr9487680qvk.2.1660469677414; Sun, 14 Aug
2022 02:34:37 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Sun, 14 Aug 2022 02:34:37 -0700 (PDT)
In-Reply-To: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:1c05:2f14:600:8886:e67:72f1:efb9;
posting-account=-JQ2RQoAAAB6B5tcBTSdvOqrD1HpT_Rk
NNTP-Posting-Host: 2001:1c05:2f14:600:8886:e67:72f1:efb9
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8e482221-e1a0-4e34-9959-34fa26e58ce6n@googlegroups.com>
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in
VFX Forth
From: mhx...@iae.nl (Marcel Hendrix)
Injection-Date: Sun, 14 Aug 2022 09:34:37 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 8308

by: Marcel Hendrix - Sun, 14 Aug 2022 09:34 UTC

On Saturday, August 13, 2022 at 10:58:12 PM UTC+2, siarc...@gmail.com wrote:
[..]
> Question to all of you: Can you rewrite this algorithm so that it becomes Forth's
> pride and not Forth's insult?
> Maybe you might want to use variables or locals or arrange human comments?
> You don't have to stay in CORE, but even if you want to stay like Waldek, then
> VARIABLE and ( ) are rather in Core, right Waldek? I don't know where this fear
> of using variables by Forthers comes from.
[..]

The most beautiful thing about this is that the unmodified
Forth code works even on my machine using a different compiler :--)

FORTH> ' heapsort idis
$01340180 : heapsort
$0134018A pop rbx
$0134018B lea rdi, [rbx -1 +] qword
$0134018F push rbx
$01340190 mov rbx, 0 d#
$01340197 mov rcx, rbx
$0134019A mov rbx, rdi
$0134019D sar rbx, 1 b#
$013401A1 call (DO) offset NEAR
$013401AB lea rax, [rax 0 +] qword
$013401B0 pop rdi
$013401B1 mov rax, [rbp 0 +] qword
$013401B5 mov rdx, [rbp 0 +] qword
$013401B9 push rdi
$013401BA push rbx
$013401BB push [rdi rax*8] qword
$013401BE push rdi
$013401BF push rbx
$013401C0 push rdx
$013401C1 lea rbx, [rdx*2 1 +] qword
$013401C9 lea rax, [rax 0 +] qword
$013401D0 pop rdi
$013401D1 pop rax
$013401D2 cmp rbx, rax
$013401D5 push rax
$013401D6 push rdi
$013401D7 push rbx
$013401D8 push rax
$013401D9 jge $01340255 offset NEAR
$013401DF pop rdi
$013401E0 lea rax, [rbx 1 +] qword
$013401E4 cmp rax, rdi
$013401E7 jge $01340208 offset NEAR
$013401ED pop rbx
$013401EE pop rdi
$013401EF pop rax
$013401F0 pop rdx
$013401F1 mov rcx, [rdx rbx*8] qword
$013401F5 cmp rcx, [rdx rbx*8 8 +] qword
$013401FA push rdx
$013401FB push rax
$013401FC push rdi
$013401FD jge $01340207 offset NEAR
$01340203 lea rbx, [rbx 1 +] qword
$01340207 push rbx
$01340208 pop rbx
$01340209 pop rdi
$0134020A pop rax
$0134020B pop rdx
$0134020C pop r9
$0134020E mov rcx, [rdx rbx*8] qword
$01340212 cmp rcx, r9
$01340215 push r9
$01340217 push rdx
$01340218 push rax
$01340219 push rdi
$0134021A push rbx
$0134021B push rdx
$0134021C push [rdx rbx*8] qword
$0134021F mov rbx, rcx
$01340222 jle $0134023F offset NEAR
$01340228 pop rbx
$01340229 pop rdi
$0134022A pop rax
$0134022B pop rdx
$0134022C mov [rdi rdx*8] qword, rbx
$01340230 push rax
$01340231 lea rbx, [rax*2 1 +] qword
$01340239 push rbx
$0134023A jmp $0134024D offset NEAR
$0134023F pop rbx
$01340240 pop rbx
$01340241 pop rbx
$01340242 pop rbx
$01340243 mov rdi, [rsp] qword
$01340247 push rbx
$01340248 lea rbx, [rdi 1 +] qword
$0134024C push rbx
$0134024D pop rbx
$0134024E jmp $013401D0 offset NEAR
$01340253 push rbx
$01340254 pop rbx
$01340255 pop rbx
$01340256 pop rbx
$01340257 pop rbx
$01340258 pop rdi
$01340259 pop rdi
$0134025A pop rax
$0134025B mov [rdi rbx*8] qword, rax
$0134025F mov rbx, -1 d#
$01340266 add [rbp 0 +] qword, rbx
$0134026A add [rbp 8 +] qword, rbx
$0134026E pop rbx
$0134026F jno $013401B0 offset NEAR
$01340275 add rbp, #24 b#
$01340279 push 1 b#
$0134027B lea rbx, [rbx -1 +] qword
$0134027F pop rcx
$01340280 call (DO) offset NEAR
$0134028A nop
$0134028B lea rax, [rax 0 +] qword
$01340290 mov rdi, [rbp 0 +] qword
$01340294 mov rax, [rbx rdi*8] qword
$01340298 mov rdx, [rbx] qword
$0134029B mov [rbx rdi*8] qword, rdx
$0134029F mov rdi, [rbp 0 +] qword
$013402A3 cmp rdi, 1 b#
$013402A7 push rbx
$013402A8 mov rbx, rcx
$013402AB mov rcx, rax
$013402AE mov rbx, rcx
$013402B1 jne $013402C3 offset NEAR
$013402B7 pop rdi
$013402B8 mov [rdi] qword, rbx
$013402BB mov rbx, rdi
$013402BE jmp $01340369 offset NEAR
$013402C3 pop rdi
$013402C4 mov rax, [rbp 0 +] qword
$013402C8 push rdi
$013402C9 push rbx
$013402CA push rdi
$013402CB push rax
$013402CC xor rcx, rcx
$013402CF mov rbx, 1 d#
$013402D6 nop
$013402D7 nop
$013402D8 pop rdi
$013402D9 cmp rbx, rdi
$013402DC push rdi
$013402DD push rcx
$013402DE push rbx
$013402DF push rdi
$013402E0 jge $0134035E offset NEAR
$013402E6 pop rdi
$013402E7 lea rax, [rbx 1 +] qword
$013402EB cmp rax, rdi
$013402EE jge $0134030F offset NEAR
$013402F4 pop rbx
$013402F5 pop rdi
$013402F6 pop rax
$013402F7 pop rdx
$013402F8 mov rcx, [rdx rbx*8] qword
$013402FC cmp rcx, [rdx rbx*8 8 +] qword
$01340301 push rdx
$01340302 push rax
$01340303 push rdi
$01340304 jge $0134030E offset NEAR
$0134030A lea rbx, [rbx 1 +] qword
$0134030E push rbx
$0134030F pop rbx
$01340310 pop rdi
$01340311 pop rax
$01340312 pop rdx
$01340313 pop r9
$01340315 mov rcx, [rdx rbx*8] qword
$01340319 cmp rcx, r9
$0134031C push r9
$0134031E push rdx
$0134031F push rax
$01340320 push rdi
$01340321 push rbx
$01340322 push rdx
$01340323 push [rdx rbx*8] qword
$01340326 mov rbx, rcx
$01340329 jle $01340346 offset NEAR
$0134032F pop rbx
$01340330 pop rdi
$01340331 pop rax
$01340332 pop rdx
$01340333 mov [rdi rdx*8] qword, rbx
$01340337 push rax
$01340338 lea rbx, [rax*2 1 +] qword
$01340340 push rbx
ok
FORTH> create example 13 , 7 , 9 , 12 , 1 , 8 , 2 , 99 , 14 , 3 , ok
FORTH> : print 10 0 do example i cells + @ . loop ; ok
FORTH> print 13 7 9 12 1 8 2 99 14 3 ok
FORTH> example 10 heapsort ok
[1]FORTH> print 1 2 3 7 8 9 12 13 14 99 ok
[1]FORTH> . 20187136 ok

-marcel

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<2022Aug14.120711@mips.complang.tuwien.ac.at>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19670&group=comp.lang.forth#19670

copy link Newsgroups: comp.lang.forth

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth
Date: Sun, 14 Aug 2022 10:07:11 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 211
Message-ID: <2022Aug14.120711@mips.complang.tuwien.ac.at>
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
Injection-Info: reader01.eternal-september.org; posting-host="910bc94d163dbc241ff1d1ae5be3fcf0";
logging-data="3299238"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+RFuGTq0o/YxZhuCrWeUqh"
Cancel-Lock: sha1:yZd+hmFRQEO0sHxxuE//crZgnyA=
X-newsreader: xrn 10.00-beta-3

by: Anton Ertl - Sun, 14 Aug 2022 10:07 UTC

=?UTF-8?Q?Micha=C5=82_Kasprzak?= <siarczek83@gmail.com> writes:
>Hello Waldek, Stephen and others!
>
>Not so long ago my compatriot professor Waldek Hebisch "has coded" the Fort=
>h version of the Heap Sort Algorithm.
>
>http://www.math.uni.wroc.pl/~hebisch/prog/taxi_hs.fs=20
>
>Heap Sort is a beautiful algorithm which learning will give a lot to each o=
>f you! You will learn what "binary trees" are, when they are "complete", ho=
>w to represent a binary tree as an ordinary array, why this array should be=
> indexed from 1, and what relates this binary index to moving through the t=
>ree. You will learn what "heap condition" a complete binary tree must satis=
>fy in order to be a "heap". Finally, you will learn the surprising non-obvi=
>ous thing, why you are creating a heap of arbitrary data in O(n) pessimisti=
>c time, which is faster than you thought!
>The Heap Sort Algorithm sorts in pessimistic O(n*log n) time which is bette=
>r than QuickSort which does it in O(n^2). Admittedly Merge Sort has this ti=
>me too, but Heap Sort sorts in place and Merge Sort needs extra second memo=
>ry.

But Quicksort and Mergesort have good spatial locality, and Heapsort
has bad spatial locality. Using the taxi_hs.fs program linked to
above, and the taxi-qs.fs variant that uses quicksort instead of
heapsort <http://www.complang.tuwien.ac.at/forth/programs/taxi-qs.fs>,
we see this nicely:

Hash table ?sort Heapsort Quicksort
taxi2.4th taxi.cc taxi_hs.fs taxi-qs.fs
gforth-fast vfx64 g++ -O2 vfx64 vfx64
187_333_032 218_295_645 147_818_881 837_764_079 124_043_270 l2 accesses
102_958_191 117_248_139 6_476_984 589_376_272 8_858_405 l2 misses

The sort used by taxi.cc is probably quicksort or mergesort.

The bottom line (on a Ryzen 5800X) is also not great for taxi_qs.fs:

s cycles instructions
3.35 16_039_014_129 10_085_237_291 gforth-fast 9000 taxi2.4th
1.56 7_353_893_755 2_627_580_253 vfx64 9000 taxi2.4th
1.70 8_053_294_428 10_664_441_167 a.out (taxi.cc binary)
10.71 50_793_570_097 64_996_544_038 vfx64 taxi_hs.fs
2.56 11_446_320_529 17_253_502_741 vfx64 taxi-qs.fs

The number of executed instructions is also a lot higher for taxi_hs
than for taxi-qs.fs (and the taxi.cc binary), so it's not just the
cache misses that slow heapsort down.

>: sift
> ( rra ra ir ii )
> DUP 2 * 1+ ( rra ra ir ii jj )
> BEGIN=20
> >R OVER R@ SWAP R> OVER OVER ( rra ra ir ii jj ir jj ir jj )
> > WHILE ( rra ra ir ii jj ir jj )
> 1+ > IF ( rra ra ir ii jj )
> >R >R OVER R> SWAP R> SWAP OVER ( rra ra ir ii jj ra jj )
> CELLS + DUP @ SWAP CELL+ @ < IF ( rra ra ir ii jj )
> 1 +
> THEN
> THEN ( rra ra ir ii jj )
> >R >R >R OVER OVER R> ROT ROT R> ROT ROT R@
> ROT ROT R> ( rra ra ir ii jj rra ra jj )
> CELLS OVER + @ ROT ( rra ra ir ii jj ra rr_jj rra )
> OVER < IF ( rra ra ir ii jj ra rr_jj )
> >R >R OVER CELLS R> + R> SWAP !
> SWAP DROP DUP DUP + 1+
> ELSE
> DROP DROP DROP ( rra ra ir ii )
> OVER 1 +
> THEN ( rra ra ir ii jj )
> REPEAT
> DROP DROP DROP SWAP DROP ( rra ra ii )
> CELLS + !
>;
>
>: heapsort ( ra n )
> 0 OVER 1 - 2 / DO
> OVER OVER OVER ( ra n ra n ra )
> I CELLS + @ ROT ROT I ( ra n rra ra n I )
> sift
> -1 +LOOP
> 1 - 1 SWAP DO ( ra )
> DUP DUP I CELLS + ( ra ra ra+I )
> DUP @ ( ra ra ra+I rra )
> SWAP ROT ( ra rra ra+I ra )
> @ SWAP ! ( ra rra )
> I 1 =3D IF
> OVER !
> ELSE
> OVER I 0 ( rra ra I 0 )
> sift
> THEN
> -1 +LOOP
>;
...
>As everybody can see the above code is ugly. Alas, this code is even an exa=
>mple of why not to use Forth.

One might argue "how to not use Forth", although the suggestions for
alternatives will differ. Waldek Hebisch suggests using locals, but
of course this is an anathema to purists. One might consider heapsort
another challenge to those who claim that every problem can be solved
without locals by good factoring.

>But when we use VFX Forth, the above code becomes beautiful.
>It's because of Stephen who designed the VFX code generator.
>Be sure to disassemble words sift and heapsort to see this beauty!

Actually lxf produces nicer code (but because the 64-bit version has
not been released yet, it does not work for the taxi problem with the
parameters we have been using):

VFX64 lxf
MOV RDX, RBX imul eax , ebx , # 2h
SHL RBX, # 1 inc eax
INC RBX mov [ebp-4h] , ebx
LEA RBP, [RBP+-08] mov ebx , eax
MOV [RBP], RDX lea ebp , [ebp-4h]
NOP cmp [ebp+4h] , ebx
NOP mov eax , [ebp+4h]
NOP mov [ebp-8h] , eax
NOP mov [ebp-4h] , ebx
NOP lea ebp , [ebp-8h]
NOP jle "0804FC80"
NOP inc ebx
PUSH RBX cmp [ebp] , ebx
MOV RBX, [RSP] mov ebx , [ebp+4h]
POP RDX lea ebp , [ebp+8h]
CMP RDX, [RBP+08] jle "0804FC31"
LEA RBP, [RBP+-10] mov eax , ebx
MOV RCX, [RBP+18] shl eax , 2h
MOV [RBP], RCX add eax , [ebp+8h]
MOV [RBP+08], RBX mov ecx , eax
MOV RBX, RDX mov ecx , [ecx]
JNL/GE 0097797C add eax , # 4h
INC RBX mov eax , [eax]
CMP RBX, [RBP] cmp ecx , eax
MOV RBX, [RBP+08] jge "0804FC31"
LEA RBP, [RBP+10] add ebx , # 1h
JNL/GE 009778FA mov eax , ebx
PUSH RBX shl eax , 2h
PUSH QWORD [RBP] add eax , [ebp+8h]
POP RBX mov eax , [eax]
POP RDX cmp [ebp+Ch] , eax
MOV RCX, RDX mov ecx , [ebp+8h]
SHL RCX, # 03 mov [ebp-8h] , ecx
ADD RCX, [RBP+10] mov [ebp-4h] , ebx
MOV RAX, 0 [RCX] mov ebx , eax
CMP RAX, [RCX+08] lea ebp , [ebp-8h]
MOV [RBP], RBX jge "0804FC72"
MOV RBX, RDX mov eax , [ebp+8h]
JNL/GE 009778FA shl eax , 2h
ADD RBX, # 01 add eax , [ebp]
PUSH RBX mov [eax] , ebx
PUSH QWORD [RBP] mov ebx , [ebp+4h]
PUSH QWORD [RBP+08] add ebx , [ebp+4h]
POP RBX inc ebx
POP RDX mov eax , [ebp+4h]
MOV RCX, [RSP] mov [ebp+8h] , eax
POP RAX lea ebp , [ebp+8h]
SHL RAX, # 03 jmp "0804FC7B"
ADD RAX, [RBP+10] mov ebx , [ebp+Ch]
MOV R8, 0 [RAX] add ebx , # 1h
CMP R8, [RBP+18] lea ebp , [ebp+8h]
LEA RBP, [RBP+-10] jmp "0804FBF0"
MOV RAX, [RBP+20] shl dword [ebp+8h] , 2h
MOV [RBP], RAX mov ebx , [ebp+10h]
MOV [RBP+08], RCX add ebx , [ebp+8h]
MOV [RBP+10], RDX mov eax , [ebp+14h]
MOV [RBP+18], RBX mov [ebx] , eax
MOV RBX, R8 mov ebx , [ebp+18h]
JLE/NG 0097796B lea ebp , [ebp+1Ch]
PUSH RBX ret near
PUSH QWORD [RBP]
MOV RBX, [RBP+10]
SHL RBX, # 03
POP RDX
ADD RBX, RDX
POP RDX
MOV 0 [RBX], RDX
MOV RBX, [RBP+08]
ADD RBX, [RBP+08]
INC RBX
MOV RDX, [RBP+08]
MOV [RBP+10], RDX
LEA RBP, [RBP+10]
JMP 00977977
MOV RBX, [RBP+18]
ADD RBX, # 01
LEA RBP, [RBP+10]
JMP 00977898
MOV RBX, [RBP+10]
SHL RBX, # 03
ADD RBX, [RBP+20]
MOV RDX, [RBP+28]
MOV 0 [RBX], RDX
MOV RBX, [RBP+30]
LEA RBP, [RBP+38]
RET/NEXT
( 280 bytes, 86 instructions )( 178 bytes, 50 instructions )

Click here to read the complete article

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<tdc4nc$2fm$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19690&group=comp.lang.forth#19690

copy link Newsgroups: comp.lang.forth

Path: i2pn2.org!i2pn.org!aioe.org!NZ87pNe1TKxNDknVl4tZhw.user.46.165.242.91.POSTED!not-for-mail
From: antis...@math.uni.wroc.pl
Newsgroups: comp.lang.forth
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth
Date: Mon, 15 Aug 2022 00:41:16 -0000 (UTC)
Organization: Aioe.org NNTP Server
Message-ID: <tdc4nc$2fm$1@gioia.aioe.org>
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
Injection-Info: gioia.aioe.org; logging-data="2550"; posting-host="NZ87pNe1TKxNDknVl4tZhw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: tin/2.4.5-20201224 ("Glen Albyn") (Linux/5.10.0-9-amd64 (x86_64))
Cancel-Lock: sha1:6r9sJ7LdCmG0/Ls6lrXeREazXnM=
X-Notice: Filtered by postfilter v. 0.9.2

by: antis...@math.uni.wroc.pl - Mon, 15 Aug 2022 00:41 UTC

Micha? Kasprzak <siarczek83@gmail.com> wrote:
> Hello Waldek, Stephen and others!
>
> Not so long ago my compatriot professor Waldek Hebisch "has coded" the Forth version of the Heap Sort Algorithm.
>
> http://www.math.uni.wroc.pl/~hebisch/prog/taxi_hs.fs

<snip>

> As everybody can see the above code is ugly. Alas, this code is even an example of why not to use Forth. Admit, the code discourages you.
>
> You probably want to see how it works. Let's create an array of 10 numbers and sort it:
>
> create example 13 , 7 , 9 , 12 , 1 , 8 , 2 , 99 , 14 , 3 ,
> : print 10 0 do example i cells + @ . loop ;
> print
> example 10 heapsort
> print
>
> Note 1: The word heapsort incorrectly leaves an address on the stack. There is no DROP before ; in heapsort.
> There is even more junk on the stack after all the rama_taxis is done.

I added missing DROP-s and did few (very minimal cleanups).

> Question to Waldek: But basically your heapsort works so it could be used all over the world. The question of copyright arises. Do you give your heapsort code to the community for free and agree to any use of your heapsort, even without mentioning that it is you who "has coded" it?

Yes. This is pretty standard algorithm and coding is pretty
strightforward.

> Question to all of you: Can you rewrite this algorithm so that it becomes Forth's pride and not Forth's insult?
> Maybe you might want to use variables or locals or arrange human comments?

In the other disscussion I provided version using locals. Concerning comments,
data stack is used to store variables and stack comments say you which variables
are on the stack at important points. Note that actual computaions in both
versions are the same, only difference is that first versions uses stack
manipulations to access and store variables.

There is also different version at Rosetta code:

https://rosettacode.org/wiki/Sorting_algorithms/Heapsort#Forth

> You don't have to stay in CORE, but even if you want to stay like Waldek, then VARIABLE and ( ) are rather in Core, right Waldek? I don't know where this fear of using variables by Forthers comes from.

Well, one reason to write code the way I did was to give compiler
(optimizer) reasonable chance to generate good code. Namely,
given sequence of stack/return stack operations with no
control between operations means that compiler can compute
stack effect of each such seqence. I keeps stack balanced
and in each iteration I access the same stack locations
so compiler can cache used parts of stack in registers and
replace all stack manipulations by register transfers.
After that there will be some redundant transfers, but
rather simple optimizations can remove them. So with
sufficiently smart compiler versions with locals
(which I consider much nicer) and CORE version should
give the same code.

Going beyond CORE (but with no locals) I would use PICK and
TUCK. VARIABLE has both runtime cost and makes program
_much_ harder to understand> in my ugly version one needs
to look at few 2-3 line fragments to understand what
happens, with liberal use of VARIABLE in bigger program
one my be forced to look at hundreds or thousends of
lines and analyse their interactions.

> But when we use VFX Forth, the above code becomes beautiful.
> It's because of Stephen who designed the VFX code generator.
> Be sure to disassemble words sift and heapsort to see this beauty!

Nice, but I admit that I hoped for better result. Version
with locals translated to C gives 21 computational
instructions. C passes arguments in registers, so sticking
to Forth convention we would need 5 additional instructions,
giving 26 in total. AFAICS VFX version has 79 computational
instructions (and 7 NOP-s for padding). In particular
I see 33 accesses to data stack and a bunch of adjustments
of data stack pointer, while 4 accesses and 1 adjustement are
enough.

>
> dasm sift
> ( 00977880 488BD3 ) MOV RDX, RBX
> ( 00977883 48D1E3 ) SHL RBX, # 1
> ( 00977886 48FFC3 ) INC RBX
> ( 00977889 488D6DF8 ) LEA RBP, [RBP+-08]
> ( 0097788D 48895500 ) MOV [RBP], RDX
> ( 00977891 90 ) NOP
> ( 00977892 90 ) NOP
> ( 00977893 90 ) NOP
> ( 00977894 90 ) NOP
> ( 00977895 90 ) NOP
> ( 00977896 90 ) NOP
> ( 00977897 90 ) NOP
> ( 00977898 53 ) PUSH RBX
> ( 00977899 488B1C24 ) MOV RBX, [RSP]
> ( 0097789D 5A ) POP RDX
> ( 0097789E 483B5508 ) CMP RDX, [RBP+08]
> ( 009778A2 488D6DF0 ) LEA RBP, [RBP+-10]
> ( 009778A6 488B4D18 ) MOV RCX, [RBP+18]
> ( 009778AA 48894D00 ) MOV [RBP], RCX
> ( 009778AE 48895D08 ) MOV [RBP+08], RBX
> ( 009778B2 488BDA ) MOV RBX, RDX
> ( 009778B5 0F8DC1000000 ) JNL/GE 0097797C
> ( 009778BB 48FFC3 ) INC RBX
> ( 009778BE 483B5D00 ) CMP RBX, [RBP]
> ( 009778C2 488B5D08 ) MOV RBX, [RBP+08]
> ( 009778C6 488D6D10 ) LEA RBP, [RBP+10]
> ( 009778CA 0F8D2A000000 ) JNL/GE 009778FA
> ( 009778D0 53 ) PUSH RBX
> ( 009778D1 48FF7500 ) PUSH QWORD [RBP]
> ( 009778D5 5B ) POP RBX
> ( 009778D6 5A ) POP RDX
> ( 009778D7 488BCA ) MOV RCX, RDX
> ( 009778DA 48C1E103 ) SHL RCX, # 03
> ( 009778DE 48034D10 ) ADD RCX, [RBP+10]
> ( 009778E2 488B01 ) MOV RAX, 0 [RCX]
> ( 009778E5 483B4108 ) CMP RAX, [RCX+08]
> ( 009778E9 48895D00 ) MOV [RBP], RBX
> ( 009778ED 488BDA ) MOV RBX, RDX
> ( 009778F0 0F8D04000000 ) JNL/GE 009778FA
> ( 009778F6 4883C301 ) ADD RBX, # 01
> ( 009778FA 53 ) PUSH RBX
> ( 009778FB 48FF7500 ) PUSH QWORD [RBP]
> ( 009778FF 48FF7508 ) PUSH QWORD [RBP+08]
> ( 00977903 5B ) POP RBX
> ( 00977904 5A ) POP RDX
> ( 00977905 488B0C24 ) MOV RCX, [RSP]
> ( 00977909 58 ) POP RAX
> ( 0097790A 48C1E003 ) SHL RAX, # 03
> ( 0097790E 48034510 ) ADD RAX, [RBP+10]
> ( 00977912 4C8B00 ) MOV R8, 0 [RAX]
> ( 00977915 4C3B4518 ) CMP R8, [RBP+18]
> ( 00977919 488D6DF0 ) LEA RBP, [RBP+-10]
> ( 0097791D 488B4520 ) MOV RAX, [RBP+20]
> ( 00977921 48894500 ) MOV [RBP], RAX
> ( 00977925 48894D08 ) MOV [RBP+08], RCX
> ( 00977929 48895510 ) MOV [RBP+10], RDX
> ( 0097792D 48895D18 ) MOV [RBP+18], RBX
> ( 00977931 498BD8 ) MOV RBX, R8
> ( 00977934 0F8E31000000 ) JLE/NG 0097796B
> ( 0097793A 53 ) PUSH RBX
> ( 0097793B 48FF7500 ) PUSH QWORD [RBP]
> ( 0097793F 488B5D10 ) MOV RBX, [RBP+10]
> ( 00977943 48C1E303 ) SHL RBX, # 03
> ( 00977947 5A ) POP RDX
> ( 00977948 4803DA ) ADD RBX, RDX
> ( 0097794B 5A ) POP RDX
> ( 0097794C 488913 ) MOV 0 [RBX], RDX
> ( 0097794F 488B5D08 ) MOV RBX, [RBP+08]
> ( 00977953 48035D08 ) ADD RBX, [RBP+08]
> ( 00977957 48FFC3 ) INC RBX
> ( 0097795A 488B5508 ) MOV RDX, [RBP+08]
> ( 0097795E 48895510 ) MOV [RBP+10], RDX
> ( 00977962 488D6D10 ) LEA RBP, [RBP+10]
> ( 00977966 E90C000000 ) JMP 00977977
> ( 0097796B 488B5D18 ) MOV RBX, [RBP+18]
> ( 0097796F 4883C301 ) ADD RBX, # 01
> ( 00977973 488D6D10 ) LEA RBP, [RBP+10]
> ( 00977977 E91CFFFFFF ) JMP 00977898
> ( 0097797C 488B5D10 ) MOV RBX, [RBP+10]
> ( 00977980 48C1E303 ) SHL RBX, # 03
> ( 00977984 48035D20 ) ADD RBX, [RBP+20]
> ( 00977988 488B5528 ) MOV RDX, [RBP+28]
> ( 0097798C 488913 ) MOV 0 [RBX], RDX
> ( 0097798F 488B5D30 ) MOV RBX, [RBP+30]
> ( 00977993 488D6D38 ) LEA RBP, [RBP+38]
> ( 00977997 C3 ) RET/NEXT
> ( 280 bytes, 86 instructions )

Click here to read the complete article

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<eedc6fa6-8743-43ff-b6e0-3a571f73b73cn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19696&group=comp.lang.forth#19696

copy link Newsgroups: comp.lang.forth

X-Received: by 2002:a37:5585:0:b0:6b9:2fad:612c with SMTP id j127-20020a375585000000b006b92fad612cmr12312149qkb.490.1660581748031;
Mon, 15 Aug 2022 09:42:28 -0700 (PDT)
X-Received: by 2002:a05:620a:15a2:b0:6bb:2861:51b with SMTP id
f2-20020a05620a15a200b006bb2861051bmr4237071qkk.478.1660581747763; Mon, 15
Aug 2022 09:42:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Mon, 15 Aug 2022 09:42:27 -0700 (PDT)
In-Reply-To: <tdc4nc$2fm$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=5.173.17.169; posting-account=6_WQ-goAAAB1rf-V6zYvzNSXvyUB0tj4
NNTP-Posting-Host: 5.173.17.169
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com> <tdc4nc$2fm$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <eedc6fa6-8743-43ff-b6e0-3a571f73b73cn@googlegroups.com>
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in
VFX Forth
From: siarcze...@gmail.com (Michał Kasprzak)
Injection-Date: Mon, 15 Aug 2022 16:42:28 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 6692

by: Michał Kasprzak - Mon, 15 Aug 2022 16:42 UTC

Thank you Waldek on behalf of the community and my own for your heapsort code! :-)

I have been working on the code since the morning. I hope not only me, but if not all of you then at least Zombie (Anton's new nickname) is also looking for holes and ways to rewrite the code ;)
So far I have found a non-critical mistake in the implementation of the algorithm. Take it easy, sorting is still working well thanks to a favorable coincidence.

Note 2: The algorithm starts with building the heap. Running from the end through the tree nodes, it tries to restore the heap condition to each node.
Leaf nodes always satisfy the heap condition. So we don't have to fix them. We can omit leaves from the heap-building process. It is why we start the loop with the index of the last parent node.
Remember that in practice unfortunately we index the array from 0, not from 1, as the theory would like.
In our situation, the index of the last parent node is n/2-1. You can check it by drawing on a sheet of paper. Unfortunately, Waldek at the beginning of heapsort calculates it like this:
OVER 1 - 2 /
This means that Waldek calculates (n-1)/2. For example, if we wanted to sort 11 items, n = 11. From the correct formula n/2-1=4 (we use integer arithmetic), from the Waldek formula (n-1)/2=5.
Check on the paper to see who is right. The last parent node has index 4, starting with 0. Node 5 is a leaf.
There should be written 2/ 1- behind OVER at the beginning of the heapsort word in the code (by the way, both 2/ 1- words are in Core):
OVER 2/ 1-
Despite this mistake, the Waldek's heapsort code works, but sometimes it does too much unnecessarily.

Meanwhile, while waiting for more results...
I found Marcel Hendrix's post in another thread from 16 years ago. It shows what Marcel is happy about, and with fond memories, allows you to see how our community made progress ;)
The discussion was about the "forth Insertion Sort". Until the end of my post there is a quote from Marcel:

This one is complete, ready-to-run, and works. It has been tested for all the
Forths I think worthwhile.

-marcel

-- ==========================
( descending INSERTION-SORT )

\ ANEW -insert

0 [IF] From Wikipedia, the free encyclopedia

Insertion sort is much less efficient on large lists than the more
advanced algorithms such as quicksort, heapsort, or merge sort, but it
has various advantages:

- Efficient on (quite) small data sets
- Efficient on data sets which are already substantially sorted
- More efficient in practice than most other simple O(n2) algorithms such
as selection sort or bubble sort: the average time is n2/4 and it is
linear in the best case
- Stable (does not change the relative order of elements with equal keys)
- In-place (only requires a constant amount O(1) of extra memory space)

It is an online algorithm, in that it can sort a list as it receives
it. In abstract terms, each iteration of an insertion sort removes an
element from the input data, inserting it at the correct position in
the already sorted list, until no elements are left in the input.

Sorting is done in-place. The result array after k iterations contains
the first k entries of the input array and is sorted. In each step, the
first remaining entry of the input is removed, inserted into the result
at the right position, thus extending the result.

The most common variant, which operates on arrays, can be described as:

Suppose we have a method called insert designed to insert a value into a
sorted sequence at the beginning of an array. It operates by starting at
the end of the sequence and shifting each element one place to the right
until a suitable position is found for the new element. It has the side
effect of overwriting the value stored immediately after the sorted
sequence in the array. To perform insertion sort, start at the left end
of the array and invoke insert to insert each element encountered into
its correct position. The ordered sequence into which we insert it is
stored at the beginning of the array in the set of indexes already
examined. Each insertion overwrites a single value, but this is okay
because it's the value we're inserting.
[THEN]

( ----------------------------------------------------------------------------------- )

: @+ ( addr -- addr' val ) DUP CELL+ SWAP @ ;
: 0>= ( n -- bool ) 0< 0= ;

: 'a[jj] ( -- addr ) S" jj CELLS array + " EVALUATE ; IMMEDIATE
: 'a[jj+1] ( -- addr ) S" jj 1+ CELLS array + " EVALUATE ; IMMEDIATE
: a[I] ( -- ) S" I CELLS array + @ " EVALUATE ; IMMEDIATE
: a[jj] ( -- ) S" jj CELLS array + @ " EVALUATE ; IMMEDIATE

( ----------------------------------------------------------------------------------- )

: insertion-sort ( array size -- )
0 0 LOCALS| temp jj size array |
size 1
?DO I TO jj
a[I] TO temp
BEGIN jj 1- DUP TO jj 0>=
WHILE temp a[jj] >
WHILE 'a[jj] @+ SWAP !
REPEAT THEN
temp 'a[jj+1] !
LOOP ;

: .DATA ( array count -- ) CR 0 ?DO DUP @ . CELL+ LOOP DROP ;

( ----------------------------------------------------------------------------------- )

9 CONSTANT mycount
CREATE myarray
9 , 3 , 4 , 12 , 7 , 1 , 4 , 3 , 2 ,

: test ( -- )
myarray mycount insertion-sort
myarray mycount .DATA ;

( ----------------------------------------------------------------------------------- )
( tested with Win32Forth 4.2, Win32Forth 6.10.04, gForth, SwiftForth, VFX, iForth 2.0 )

\ FORTH> test
\ 12 9 7 4 4 3 3 2 1 ok

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<88941405-53a6-424a-a1a4-fc17dc3b3eaen@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19697&group=comp.lang.forth#19697

copy link Newsgroups: comp.lang.forth

X-Received: by 2002:a05:6214:2b0f:b0:478:5319:c4ff with SMTP id jx15-20020a0562142b0f00b004785319c4ffmr14209707qvb.66.1660582493523;
Mon, 15 Aug 2022 09:54:53 -0700 (PDT)
X-Received: by 2002:a37:4115:0:b0:6b9:3bd8:4981 with SMTP id
o21-20020a374115000000b006b93bd84981mr11856266qka.663.1660582493368; Mon, 15
Aug 2022 09:54:53 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Mon, 15 Aug 2022 09:54:53 -0700 (PDT)
In-Reply-To: <eedc6fa6-8743-43ff-b6e0-3a571f73b73cn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2003:f7:1f27:a897:88aa:d3e9:4574:e7ec;
posting-account=AqNUYgoAAADmkK2pN-RKms8sww57W0Iw
NNTP-Posting-Host: 2003:f7:1f27:a897:88aa:d3e9:4574:e7ec
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
<tdc4nc$2fm$1@gioia.aioe.org> <eedc6fa6-8743-43ff-b6e0-3a571f73b73cn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <88941405-53a6-424a-a1a4-fc17dc3b3eaen@googlegroups.com>
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in
VFX Forth
From: minfo...@arcor.de (minf...@arcor.de)
Injection-Date: Mon, 15 Aug 2022 16:54:53 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 3488

by: minf...@arcor.de - Mon, 15 Aug 2022 16:54 UTC

siarc...@gmail.com schrieb am Montag, 15. August 2022 um 18:42:29 UTC+2:
> Thank you Waldek on behalf of the community and my own for your heapsort code! :-)
>
> I have been working on the code since the morning. I hope not only me, but if not all of you then at least Zombie (Anton's new nickname) is also looking for holes and ways to rewrite the code ;)
> So far I have found a non-critical mistake in the implementation of the algorithm. Take it easy, sorting is still working well thanks to a favorable coincidence.
>
> Note 2: The algorithm starts with building the heap. Running from the end through the tree nodes, it tries to restore the heap condition to each node.
> Leaf nodes always satisfy the heap condition. So we don't have to fix them. We can omit leaves from the heap-building process. It is why we start the loop with the index of the last parent node.
> Remember that in practice unfortunately we index the array from 0, not from 1, as the theory would like.
> In our situation, the index of the last parent node is n/2-1. You can check it by drawing on a sheet of paper. Unfortunately, Waldek at the beginning of heapsort calculates it like this:
> OVER 1 - 2 /
> This means that Waldek calculates (n-1)/2. For example, if we wanted to sort 11 items, n = 11. From the correct formula n/2-1=4 (we use integer arithmetic), from the Waldek formula (n-1)/2=5.
> Check on the paper to see who is right. The last parent node has index 4, starting with 0. Node 5 is a leaf.
> There should be written 2/ 1- behind OVER at the beginning of the heapsort word in the code (by the way, both 2/ 1- words are in Core):
> OVER 2/ 1-
> Despite this mistake, the Waldek's heapsort code works, but sometimes it does too much unnecessarily.
>
>
> Meanwhile, while waiting for more results...
> I found Marcel Hendrix's post in another thread from 16 years ago. It shows what Marcel is happy about, and with fond memories, allows you to see how our community made progress ;)
> The discussion was about the "forth Insertion Sort". Until the end of my post there is a quote from Marcel:
>

Seems Gavino is back from therapy ...

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<nnd$2edc9dc1$49c18213@c6f049e09e0a8dd5>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19705&group=comp.lang.forth#19705

copy link Newsgroups: comp.lang.forth

Newsgroups: comp.lang.forth
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
From: alb...@cherry (none)
Originator: albert@cherry.(none) (albert)
Message-ID: <nnd$2edc9dc1$49c18213@c6f049e09e0a8dd5>
Organization: KPN B.V.
Date: Tue, 16 Aug 2022 13:11:51 +0200
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!feed.abavia.com!abe006.abavia.com!abp001.abavia.com!news.kpn.nl!not-for-mail
Lines: 37
Injection-Date: Tue, 16 Aug 2022 13:11:51 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
X-Received-Bytes: 2550

by: none - Tue, 16 Aug 2022 11:11 UTC

In article <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>,
Micha� Kasprzak <siarczek83@gmail.com> wrote:
>Hello Waldek, Stephen and others!
>
>Not so long ago my compatriot professor Waldek Hebisch "has coded" the
>Forth version of the Heap Sort Algorithm.
>
>http://www.math.uni.wroc.pl/~hebisch/prog/taxi_hs.fs
>
>Heap Sort is a beautiful algorithm which learning will give a lot to each
>of you! You will learn what "binary trees" are, when they are "complete",
>how to represent a binary tree as an ordinary array, why this array should
>be indexed from 1, and what relates this binary index to moving through
>the tree. You will learn what "heap condition" a complete binary tree must
>satisfy in order to be a "heap". Finally, you will learn the surprising
>non-obvious thing, why you are creating a heap of arbitrary data in O(n)
>pessimistic time, which is faster than you thought!
>The Heap Sort Algorithm sorts in pessimistic O(n*log n) time which is
>better than QuickSort which does it in O(n^2). Admittedly Merge Sort has
>this time too, but Heap Sort sorts in place and Merge Sort needs extra
>second memory.

Not true for a typical use case.
See my MERGE-SORT in ciforth. Merge Sort should properly be used with
records of variable size, pointing to each other via links.
Then it needs no extra storage. The implementation I suggest use at
its worst case O(log(n)) memory for keeping track.
Note that quicksort needs worst case O(n) for this, which can be
diminished heuristically -not guaranteed- to O(log(n)).

Groetjes Albert
--
"in our communism country Viet Nam, people are forced to be
alive and in the western country like US, people are free to
die from Covid 19 lol" duc ha
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<87zgg4x7w2.fsf@nightsong.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19717&group=comp.lang.forth#19717

copy link Newsgroups: comp.lang.forth

Path: i2pn2.org!rocksolid2!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: no.em...@nospam.invalid (Paul Rubin)
Newsgroups: comp.lang.forth
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth
Date: Tue, 16 Aug 2022 13:15:09 -0700
Organization: A noiseless patient Spider
Lines: 7
Message-ID: <87zgg4x7w2.fsf@nightsong.com>
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
<nnd$2edc9dc1$49c18213@c6f049e09e0a8dd5>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: reader01.eternal-september.org; posting-host="f0d43cf04e13d886c70fe043b4866c76";
logging-data="165544"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19OKJkTNUaUJpdHpr8Cz3To"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:rLDw0P9S3vFDMcZeFnVGyAMZ4EQ=
sha1:ybTk0/uyfH/NvrCHk81sW2RyHcs=

by: Paul Rubin - Tue, 16 Aug 2022 20:15 UTC

albert@cherry.(none) (albert) writes:
> its worst case O(log(n)) memory for keeping track.
> Note that quicksort needs worst case O(n) for this, which can be
> diminished heuristically -not guaranteed- to O(log(n)).

Doesn't quicksort guarantee O(log(n)) if you always sort the smaller
partition first after pivoting?

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<f7a0b9b6-8604-4da6-bc40-b6d6caa06128n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19724&group=comp.lang.forth#19724

copy link Newsgroups: comp.lang.forth

X-Received: by 2002:a05:622a:1453:b0:344:5909:ba44 with SMTP id v19-20020a05622a145300b003445909ba44mr10958327qtx.132.1660690061655;
Tue, 16 Aug 2022 15:47:41 -0700 (PDT)
X-Received: by 2002:a37:8084:0:b0:6b9:2963:5eab with SMTP id
b126-20020a378084000000b006b929635eabmr16446423qkd.404.1660690061443; Tue, 16
Aug 2022 15:47:41 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Tue, 16 Aug 2022 15:47:41 -0700 (PDT)
In-Reply-To: <87zgg4x7w2.fsf@nightsong.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.173.144.83; posting-account=6_WQ-goAAAB1rf-V6zYvzNSXvyUB0tj4
NNTP-Posting-Host: 5.173.144.83
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
<nnd$2edc9dc1$49c18213@c6f049e09e0a8dd5> <87zgg4x7w2.fsf@nightsong.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f7a0b9b6-8604-4da6-bc40-b6d6caa06128n@googlegroups.com>
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in
VFX Forth
From: siarcze...@gmail.com (Michał Kasprzak)
Injection-Date: Tue, 16 Aug 2022 22:47:41 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: Michał Kasprzak - Tue, 16 Aug 2022 22:47 UTC

Hello Waldek, I have been still analyzing your heapsort code.
I must praise you for how you swap the values in the nodes of the subtree in the sift word.
I really like how you delay saving the subtree's root value until you reach the target location :-)

It looks like you are probably proud of your sift word and paid less attention to the heapsort word ;)

Your sift word uses 4 parametrs which you named as follows:
rra is the value of the ii-th array element,
ra is the address of the array,
ir is the total number of elements in the array,
ii is the index of the subtree's root node, which is where we want to restore the heap condition.
You also use the notation jj for the index of the left child of node ii.

Note 3: Your word sift controls index ranges very well and also works well for a single-element array, when invoked with the parameters ir = 1 and ii = 0, right?
In the sorting loop in the heapsort word near the end of the code you put the following condition IF:
I 1 = IF
OVER !
ELSE
OVER I 0 ( rra ra I 0 )
sift
THEN
This IF condition is tested n-1 times in the loop. You do these costly tests just to distinguish the case of I = 1.
But this is not needed because the case I = 1 is handled as well as the rest of the cases by sift.
If you wanted to get faster acceleration of this one case, it is definitely not worth doing it for the price of n-1 checks.
But perhaps the real reason for highlighting this case was some concern about the execution of this edge case.
Maybe when you started writing the sift word, you didn't know how it would end, and the sift evolved - you applied optimizations.
Please correct me if I am wrong, but all this IF is not needed and all you do after ELSE is enough.

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<nnd$315d86d6$1163e175@68246f71fd727ab1>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19725&group=comp.lang.forth#19725

copy link Newsgroups: comp.lang.forth

Newsgroups: comp.lang.forth
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com> <nnd$2edc9dc1$49c18213@c6f049e09e0a8dd5> <87zgg4x7w2.fsf@nightsong.com>
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
From: alb...@cherry (none)
Originator: albert@cherry.(none) (albert)
Message-ID: <nnd$315d86d6$1163e175@68246f71fd727ab1>
Organization: KPN B.V.
Date: Wed, 17 Aug 2022 00:57:34 +0200
Path: i2pn2.org!i2pn.org!aioe.org!feeder1.feed.usenet.farm!feed.usenet.farm!feeder.usenetexpress.com!tr3.eu1.usenetexpress.com!94.232.112.246.MISMATCH!feed.abavia.com!abe006.abavia.com!abp001.abavia.com!news.kpn.nl!not-for-mail
Lines: 28
Injection-Date: Wed, 17 Aug 2022 00:57:34 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"

by: none - Tue, 16 Aug 2022 22:57 UTC

In article <87zgg4x7w2.fsf@nightsong.com>,
Paul Rubin <no.email@nospam.invalid> wrote:
>albert@cherry.(none) (albert) writes:
>> its worst case O(log(n)) memory for keeping track.
>> Note that quicksort needs worst case O(n) for this, which can be
>> diminished heuristically -not guaranteed- to O(log(n)).
>
>Doesn't quicksort guarantee O(log(n)) if you always sort the smaller
>partition first after pivoting?

To quote my numerical math professor:
In a pathological case the smaller partition can be O(1) size.
That makes for O(n) steps. By randomizing the partitioning, the chance
however may become vanishingly small.

Knuth gives an example of a naive implementation that takes O(n)
steps for an already sorted set.

There is a good reason to select the smaller partition though.
It cannot prevent the time to explode, but it restricts the storage.

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<87v8qryaqz.fsf@nightsong.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19728&group=comp.lang.forth#19728

copy link Newsgroups: comp.lang.forth

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: no.em...@nospam.invalid (Paul Rubin)
Newsgroups: comp.lang.forth
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth
Date: Tue, 16 Aug 2022 17:28:04 -0700
Organization: A noiseless patient Spider
Lines: 8
Message-ID: <87v8qryaqz.fsf@nightsong.com>
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
<nnd$2edc9dc1$49c18213@c6f049e09e0a8dd5>
<87zgg4x7w2.fsf@nightsong.com>
<nnd$315d86d6$1163e175@68246f71fd727ab1>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: reader01.eternal-september.org; posting-host="a47a672cb88b2f3c078b6a1b80a7837f";
logging-data="215762"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1//4tO2IR6EnNISxJgZwjaR"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Cancel-Lock: sha1:2AWlPbcZYK/O0GDjbnnbDmBceq0=
sha1:5tRS7UCYuxeWj7dzI1C+v/+YZU4=

by: Paul Rubin - Wed, 17 Aug 2022 00:28 UTC

albert@cherry.(none) (albert) writes:
> To quote my numerical math professor:
> In a pathological case the smaller partition can be O(1) size.
> That makes for O(n) steps.

I thought we were talking about the amount of auxiliary space needed,
not the runtime. Yes the runtime can be quadratic in the worst case:
http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<tdilmk$7lk$1@gioia.aioe.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19736&group=comp.lang.forth#19736

copy link Newsgroups: comp.lang.forth

Path: i2pn2.org!i2pn.org!aioe.org!NZ87pNe1TKxNDknVl4tZhw.user.46.165.242.91.POSTED!not-for-mail
From: antis...@math.uni.wroc.pl
Newsgroups: comp.lang.forth
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth
Date: Wed, 17 Aug 2022 12:07:49 -0000 (UTC)
Organization: Aioe.org NNTP Server
Message-ID: <tdilmk$7lk$1@gioia.aioe.org>
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com> <nnd$2edc9dc1$49c18213@c6f049e09e0a8dd5> <87zgg4x7w2.fsf@nightsong.com> <f7a0b9b6-8604-4da6-bc40-b6d6caa06128n@googlegroups.com>
Injection-Info: gioia.aioe.org; logging-data="7860"; posting-host="NZ87pNe1TKxNDknVl4tZhw.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: tin/2.4.5-20201224 ("Glen Albyn") (Linux/5.10.0-9-amd64 (x86_64))
Cancel-Lock: sha1:kR85Fx75RbC3HSYiqUiFYymNAII=
X-Notice: Filtered by postfilter v. 0.9.2

by: antis...@math.uni.wroc.pl - Wed, 17 Aug 2022 12:07 UTC

Micha? Kasprzak <siarczek83@gmail.com> wrote:
> Hello Waldek, I have been still analyzing your heapsort code.
> I must praise you for how you swap the values ??in the nodes of the subtree in the sift word.
> I really like how you delay saving the subtree's root value until you reach the target location :-)

To be clear: that is classice heapsort, it is in textbooks in this
form. No invention on my part.
> It looks like you are probably proud of your sift word and paid less attention to the heapsort word ;)
>
> Your sift word uses 4 parametrs which you named as follows:
> rra is the value of the ii-th array element,
> ra is the address of the array,
> ir is the total number of elements in the array,
> ii is the index of the subtree's root node, which is where we want to restore the heap condition.
> You also use the notation jj for the index of the left child of node ii.
>
> Note 3: Your word sift controls index ranges very well and also works well for a single-element array, when invoked with the parameters ir = 1 and ii = 0, right?
> In the sorting loop in the heapsort word near the end of the code you put the following condition IF:
> I 1 = IF
> OVER !
> ELSE
> OVER I 0 ( rra ra I 0 )
> sift
> THEN
> This IF condition is tested n-1 times in the loop. You do these costly tests just to distinguish the case of I = 1.
> But this is not needed because the case I = 1 is handled as well as the rest of the cases by sift.
> If you wanted to get faster acceleration of this one case, it is definitely not worth doing it for the price of n-1 checks.

Well, concerning "costly": test for I = 1 should be two machine
instructions. If it is more costly, then you have bigger problem.
And if you want to have best speed use this test as loop termination
test.

> But perhaps the real reason for highlighting this case was some concern about the execution of this edge case.
> Maybe when you started writing the sift word, you didn't know how it would end, and the sift evolved - you applied optimizations.

No. I have similar implementations is several languages. Except for
Forth, in other languages I simply return from I = 1 branch and there
is no separate loop termination test. In Forth you can not simply
return: there are stack cleanups and the way to return/exit loop
depends on which kind of loop do you have. So I did not try to
optimize loop termination test.

Let me add that from the start it was clear that solution of
taxi problem using heapsort will be slower than version using
hash tables. Rather, I wanted to show that solution while
slower is still practical. And to see how well Forth
implementations can execute this code. Clearly, efficiency
of Forth implementation have _much_ more influence than the
extra test. In fact, smart enough Forth could see that I = 1
case terminates the loop, skip normal termination test and
arrage jump out of the loop.

> Please correct me if I am wrong, but all this IF is not needed and all you do after ELSE is enough.

Yes, ELSE branch is enough.

--
Waldek Hebisch

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<c3e7bae3-a98a-439e-a6ec-e7e7e3175f8cn@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19796&group=comp.lang.forth#19796

copy link Newsgroups: comp.lang.forth

X-Received: by 2002:ae9:c112:0:b0:6bb:e7de:1bdd with SMTP id z18-20020ae9c112000000b006bbe7de1bddmr9120793qki.338.1661197055467;
Mon, 22 Aug 2022 12:37:35 -0700 (PDT)
X-Received: by 2002:a05:6214:1c0b:b0:495:f3f9:6714 with SMTP id
u11-20020a0562141c0b00b00495f3f96714mr17285653qvc.19.1661197055180; Mon, 22
Aug 2022 12:37:35 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Mon, 22 Aug 2022 12:37:34 -0700 (PDT)
In-Reply-To: <tdilmk$7lk$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=5.173.151.59; posting-account=6_WQ-goAAAB1rf-V6zYvzNSXvyUB0tj4
NNTP-Posting-Host: 5.173.151.59
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
<nnd$2edc9dc1$49c18213@c6f049e09e0a8dd5> <87zgg4x7w2.fsf@nightsong.com>
<f7a0b9b6-8604-4da6-bc40-b6d6caa06128n@googlegroups.com> <tdilmk$7lk$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c3e7bae3-a98a-439e-a6ec-e7e7e3175f8cn@googlegroups.com>
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in
VFX Forth
From: siarcze...@gmail.com (Michał Kasprzak)
Injection-Date: Mon, 22 Aug 2022 19:37:35 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 13985

by: Michał Kasprzak - Mon, 22 Aug 2022 19:37 UTC

I checked the Mouse (Waldek's new nickname) information, how good it looks when compiling its code in C. On my Fedora Linux 36 (which is known to be very convenient for programmers because it has the latest software versions installed out of the box) I have installed:
gcc (GCC) 12.1.1 20220507 (Red Hat 12.1.1-1)
I got similarly good results as Mouse. Although gcc achieves excellent optimization results, they are still far from perfect.
gcc didn't really show off when we replace jj = ir with break in sift, which should have helped gcc, but destroyed the optimization of the entire code ;(

I have a gift for all of you: Mouse's Ugly Heapsort written in x64 assembly language that works in all native 64-bit Forths.
It is supposed to be the best, because the fastest heapsort ever created. We define the best one in such a way that if, after the presentation of the code and your criticism (which the whole world counts on), no one can speed up or improve this code, the code is practically the best.
Since this code will be used by future generations, long after we are gone, try to improve it to leave something for the world after our vacation this year. Of course, all this code is intended for free use by anyone without even mentioning the authors.
First, I will describe the programming technique used.
Since the code is supposed to work on every native 64-bit Forth, we cannot use the built-in assembler because this one is different in every Forth.
We will use an external assembler which will generate a binary code that can be inserted into any 64-bit native Forth.
In this case, I used NASM that you can install on Fedora with the command: sudo dnf install nasm
Source files in NASM are very easy to write, just remember to put one directive at the beginning of the file: BITS 64
The default NASM format is binary (plain code, no extra data), so we don't need to specify the format when assembling. We can add the generation of a so-called listing file, which can help us make choices about the use of certain instructions. We will be dealing with three files written in assembly language: heapsort.asm, prolog.asm, epilog.asm. We assemble these files with the following commands:
nasm -l heapsort.lst heapsort.asm
nasm -l prolog.lst prolog.asm
nasm -l epilog.lst epilog.asm
After assembly we will get 3 short binary files: heapsort, prolog, epilog (without extension).
The code we write must be relocatable, which is very simple and natural for x64 assembler.
The code we write must have a property called "Passthrough", which means it must start at the beginning and always end at the end. Such code can be easily combined with other codes with this property, and also inline in the native Forth.
The question remains how to pass parameters from any Forth to our assembly code.
Each native Forth can implant the parameter stack differently. These can be typical downward stacks, or upward stacking stacks, which has the added benefit of being able to fit two stacks in one block of memory. These stacks may have the top element cached in the CPU register or not.
We need to find some common ground in all native Forths. Native means that it calls words using the processor's call instruction. So each native Forth uses the CPU stack as the return stack.
Each Forth can move an element from the parameter stack to the return stack using the word >R
So we can move 2 arguments onto the return stack using >R >R and the assembly language code can handle them.
Even though Pony's Ciforth does not live up to the natural assumption of using the CPU stack as a return stack, he is not in a lost position either. In Ciforth, the CPU stack is used as the parameter stack, and the parameters we need are already there, but in reverse order. So in Ciforth you just need to use SWAP instead of >R >R
In the future, when creating the word heapsort in Forth, we could automate the detection of what type of situation we are dealing with and using conditional compilation insert >R >R or SWAP
We will write our assembler heapsort in the file heapsort.asm. It requires two parameters: the address of the array to be sorted in rdi and the number of elements to be sorted in rsi. The choice of registers is free, and this has been made to conform to the Linux C calling convention.
We need to save the CPU registers used by Forth before running our code. Since we don't generally know which registers in different Forths need to be kept, we keep all registers used by our code.
The code in the prolog.asm file takes care of this. In addition to saving the used registers, it loads the prepared parameters from the processor stack to the rdi and rsi we set.
It's easy to see what epilog.asm does. Of course, it restores the stored CPU registers and removes used parameters from the return stack.
Of course, if you know your Forth you can simplify the prolog.asm and epilog.asm files. You don't have to keep all the registers used then, just the ones your Forth requires. You don't even need to move parameters to the return stack using >R >R, just transfer them from the parameter stack to the rdi and rsi registers immediately.
I am using VFX Forth to test these solutions.
In VFX Forth it is very easy to inline binary code when defining a word using the DATA-FILE word. If your Forth doesn't have that word, I highly recommend creating it. Ultimately, however, we will not need it. In the final version portable between different Forths, we will use the comma code insertion method.
So in VFX Forth we can write the word heapsort as follows:

: heapsort
>R >R
[ DATA-FILE /home/fedora/prolog DROP
DATA-FILE /home/fedora/heapsort DROP
DATA-FILE /home/fedora/epilog DROP
] ;

Do you like my solution? I am asking for criticism.
Below are 3 files: heapsort.asm, prolog.asm, epilog.asm. See if you can make any improvements to the code in heapsort.asm, even if it is only one CPU cycle or one byte.
As my English is very poor, I would also like to ask you to fix the comments in the heapsort.asm file. These comments, in my opinion, are of great value.
If you try to improve and do not get anything done, share this information - this is also a valuable result!
All language programmers will be able to use this code, let them have something from our Forthers group!

---------- start of file: heapsort.asm, cut below this line ----------
BITS 64
; rdi = array's address
; rsi = array's number of elements = n
heapsort: mov rax, rsi
shr rax, 1
; nothing to sort if there is 0 or 1 element in the array
jz the_end
; now we are sure that always n>=2
; r10 = index of the last parent node
; r10 is also the counter of the heap building loop
lea r10, [rax-1]
; rbx = index of the left child of the last parent node
; rbx is also index of the left child of r10 node in the loop
lea rbx, [rax+rax-1]
; heap building loop
; sift the counter node's value stored in r9 to the proper position
; r9 = array[r10]
build_loop: mov r9, qword [rdi+8*r10]
; r11 is index of target place for sifted values
; we start sifting from the counter node's index
mov r11, r10
; rax is the index of a current child of r11 node
; we start r11 with the index of the left child
mov rax, rbx
jmp sift1_while
; we will find the child with bigger value and store its index in rax
; we will load this bigger value to rdx
; rcx holds the index of the right child
sift1: lea rcx, [rax+1]
mov rdx, qword [rdi+8*rax]
; note that the right child may not exist
; then we consider the left child to be bigger
cmp rcx, rsi
jnb sift1_check
; we load the right child node's value to r8
mov r8, qword [rdi+8*rcx]
; if the right child node's value is bigger than the left child's
cmp r8, rdx
jle sift1_check
; then we will proceed the right child node
mov rdx, r8
mov rax, rcx
; compare the values of the current child node with the sifting node
sift1_check: cmp r9, rdx
; if the sifting node's value is bigger then we end sifting
jnl build_next
; we store the child's value into the parent node's place
mov qword [rdi+8*r11], rdx
; we make the child's node the new parent's node
mov r11, rax
lea rax, [rax+rax+1]
; do sift if there is still at least one child
sift1_while: cmp rax, rsi
jb sift1
; we store sifted value into the target place and repeat the loop
build_next: sub r10, 1
mov qword [rdi+8*r11], r9
sub rbx, 2
cmp r10, -1
jne build_loop
; the sorting loop starts with n-1
; rax will hold the counter of the sorting loop
lea rax, [rsi-1]
; the sorting loop
; rdx = array[0]
; r9 = array[counter]
sort_loop: mov rdx, qword [rdi]
mov r9, qword [rdi+8*rax]
; we will sift the node r10=0 which value is in r9
xor r10d, r10d
; we store array[counter]=array[0]
mov qword [rdi+8*rax], rdx
; now we will store in rdx the left child node's index
mov edx, 1
jmp sift2_while
; sift2 works like sift1 with changed registers
; we will find the child with bigger value and store its index in rdx
; we will load this bigger value to rsi
; rcx holds the index of the right child
sift2: lea rcx, [rdx+1]
mov rsi, qword [rdi+8*rdx]
; note that the right child may not exist
; then we consider the left child to be bigger
cmp rcx, rax
jnb sift2_check
; we load the right child node's value to r8
mov r8, qword [rdi+8*rcx]
; if the right child node's value is bigger than the left child's
cmp r8, rsi
jle sift2_check
; then we will proceed the right child node
mov rsi, r8
mov rdx, rcx
; compare the values of the current child node with the sifting node
sift2_check: cmp r9, rsi
; if the sifting node's value is bigger then we end sifting
jnl sort_next
; we store the child's value into the parent node's place
mov qword [rdi+8*r10], rsi
; we make the child's node the new parent's node
mov r10, rdx
lea rdx, [rdx+rdx+1]
; do sift if there is still at least one child
sift2_while: cmp rdx, rax
jb sift2
; we store sifted value into the target place
sort_next: mov qword [rdi+8*r10], r9
; next step of the sorting loop
sub rax, 1
; the sorting loop ends with 1
jnz sort_loop
the_end:
---------- end of file: heapsort.asm, cut above this line ----------

Click here to read the complete article

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<b5f4752a-7a33-4463-ac4e-f96562404ae8n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19809&group=comp.lang.forth#19809

copy link Newsgroups: comp.lang.forth

X-Received: by 2002:ae9:ed91:0:b0:6bb:29c0:8b0c with SMTP id c139-20020ae9ed91000000b006bb29c08b0cmr17193257qkg.676.1661282153230;
Tue, 23 Aug 2022 12:15:53 -0700 (PDT)
X-Received: by 2002:a05:620a:400f:b0:6af:53bc:7311 with SMTP id
h15-20020a05620a400f00b006af53bc7311mr17066075qko.305.1661282152999; Tue, 23
Aug 2022 12:15:52 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Tue, 23 Aug 2022 12:15:52 -0700 (PDT)
In-Reply-To: <c3e7bae3-a98a-439e-a6ec-e7e7e3175f8cn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=5.173.151.59; posting-account=6_WQ-goAAAB1rf-V6zYvzNSXvyUB0tj4
NNTP-Posting-Host: 5.173.151.59
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
<nnd$2edc9dc1$49c18213@c6f049e09e0a8dd5> <87zgg4x7w2.fsf@nightsong.com>
<f7a0b9b6-8604-4da6-bc40-b6d6caa06128n@googlegroups.com> <tdilmk$7lk$1@gioia.aioe.org>
<c3e7bae3-a98a-439e-a6ec-e7e7e3175f8cn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b5f4752a-7a33-4463-ac4e-f96562404ae8n@googlegroups.com>
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in
VFX Forth
From: siarcze...@gmail.com (Michał Kasprzak)
Injection-Date: Tue, 23 Aug 2022 19:15:53 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 7737

by: Michał Kasprzak - Tue, 23 Aug 2022 19:15 UTC

Look at the following code snippet in the heapsort.asm file starting with the build_next label:
build_next: sub r10, 1
mov qword [rdi+8*r11], r9
sub rbx, 2
cmp r10, -1
jne build_loop
I was in a hurry to deliver my gift to you as soon as possible and I missed correcting it ;)
Of course, it is worth replacing the above piece of code with the following code:
build_next: mov qword [rdi+8*r11], r9
sub rbx, 2
sub r10, 1
jnc build_loop
We got 4 bytes shorter code, 1 instruction less in code and maybe faster :)
Below I am inserting a revised version of heapsort.asm file.

By the way, although "dec reg64" is 1 byte shorter than "sub reg64, 1", I don't use "dec reg64" on purpose. This is because of the known partial-flag stall problem that you can read about on the internet.

Using an unused rbp register instead of the extended r8-r11 registers does not save any code length in the heapsort.asm file. Although push / pop rbp is 1 byte shorter than push / pop r8, which would benefit prolog.asm and epilog.asm, if we go to a specific Forth version, it will probably turn out that every Forth needs to keep rbp and none of them has to keep r8, so it is not worth trying to force yourself to use rbp ;)

Be brave, criticize :)

---------- start of file: heapsort.asm, cut below this line ----------
BITS 64
; rdi = array's address
; rsi = array's number of elements = n
heapsort: mov rax, rsi
shr rax, 1
; nothing to sort if there is 0 or 1 element in the array
jz the_end
; now we are sure that always n>=2
; r10 = index of the last parent node
; r10 is also the counter of the heap building loop
lea r10, [rax-1]
; rbx = index of the left child of the last parent node
; rbx is also index of the left child of r10 node in the loop
lea rbx, [rax+rax-1]
; heap building loop
; sift the counter node's value stored in r9 to the proper position
; r9 = array[r10]
build_loop: mov r9, qword [rdi+8*r10]
; r11 is index of target place for sifted values
; we start sifting from the counter node's index
mov r11, r10
; rax is the index of a current child of r11 node
; we start r11 with the index of the left child
mov rax, rbx
jmp sift1_while
; we will find the child with bigger value and store its index in rax
; we will load this bigger value to rdx
; rcx holds the index of the right child
sift1: lea rcx, [rax+1]
mov rdx, qword [rdi+8*rax]
; note that the right child may not exist
; then we consider the left child to be bigger
cmp rcx, rsi
jnb sift1_check
; we load the right child node's value to r8
mov r8, qword [rdi+8*rcx]
; if the right child node's value is bigger than the left child's
cmp r8, rdx
jle sift1_check
; then we will proceed the right child node
mov rdx, r8
mov rax, rcx
; compare the values of the current child node with the sifting node
sift1_check: cmp r9, rdx
; if the sifting node's value is bigger then we end sifting
jnl build_next
; we store the child's value into the parent node's place
mov qword [rdi+8*r11], rdx
; we make the child's node the new parent's node
mov r11, rax
lea rax, [rax+rax+1]
; do sift if there is still at least one child
sift1_while: cmp rax, rsi
jb sift1
; we store sifted value into the target place and repeat the loop
build_next: mov qword [rdi+8*r11], r9
sub rbx, 2
sub r10, 1
jnc build_loop
; the sorting loop starts with n-1
; rax will hold the counter of the sorting loop
lea rax, [rsi-1]
; the sorting loop
; rdx = array[0]
; r9 = array[counter]
sort_loop: mov rdx, qword [rdi]
mov r9, qword [rdi+8*rax]
; we will sift the node r10=0 which value is in r9
xor r10d, r10d
; we store array[counter]=array[0]
mov qword [rdi+8*rax], rdx
; now we will store in rdx the left child node's index
mov edx, 1
jmp sift2_while
; sift2 works like sift1 with changed registers
; we will find the child with bigger value and store its index in rdx
; we will load this bigger value to rsi
; rcx holds the index of the right child
sift2: lea rcx, [rdx+1]
mov rsi, qword [rdi+8*rdx]
; note that the right child may not exist
; then we consider the left child to be bigger
cmp rcx, rax
jnb sift2_check
; we load the right child node's value to r8
mov r8, qword [rdi+8*rcx]
; if the right child node's value is bigger than the left child's
cmp r8, rsi
jle sift2_check
; then we will proceed the right child node
mov rsi, r8
mov rdx, rcx
; compare the values of the current child node with the sifting node
sift2_check: cmp r9, rsi
; if the sifting node's value is bigger then we end sifting
jnl sort_next
; we store the child's value into the parent node's place
mov qword [rdi+8*r10], rsi
; we make the child's node the new parent's node
mov r10, rdx
lea rdx, [rdx+rdx+1]
; do sift if there is still at least one child
sift2_while: cmp rdx, rax
jb sift2
; we store sifted value into the target place
sort_next: mov qword [rdi+8*r10], r9
; next step of the sorting loop
sub rax, 1
; the sorting loop ends with 1
jnz sort_loop
the_end:
---------- end of file: heapsort.asm, cut above this line ----------

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<nnd$5c5527ce$00205d81@05fde1a4fb5a15f5>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19810&group=comp.lang.forth#19810

copy link Newsgroups: comp.lang.forth

Newsgroups: comp.lang.forth
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com> <f7a0b9b6-8604-4da6-bc40-b6d6caa06128n@googlegroups.com> <tdilmk$7lk$1@gioia.aioe.org> <c3e7bae3-a98a-439e-a6ec-e7e7e3175f8cn@googlegroups.com>
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in
VFX Forth
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
From: alb...@cherry (none)
Originator: albert@cherry.(none) (albert)
Message-ID: <nnd$5c5527ce$00205d81@05fde1a4fb5a15f5>
Organization: KPN B.V.
Date: Tue, 23 Aug 2022 21:32:53 +0200
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!feed.abavia.com!abe006.abavia.com!abp001.abavia.com!news.kpn.nl!not-for-mail
Lines: 39
Injection-Date: Tue, 23 Aug 2022 21:32:53 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
X-Received-Bytes: 3670

by: none - Tue, 23 Aug 2022 19:32 UTC

In article <c3e7bae3-a98a-439e-a6ec-e7e7e3175f8cn@googlegroups.com>,
Micha� Kasprzak <siarczek83@gmail.com> wrote:
>I checked the Mouse (Waldek's new nickname) information, how good it looks when compiling its code in C. On my Fedora
>Linux 36 (which is known to be very convenient for programmers because it has the latest software versions installed out
>of the box) I have installed:
>gcc (GCC) 12.1.1 20220507 (Red Hat 12.1.1-1)
>I got similarly good results as Mouse. Although gcc achieves excellent optimization results, they are still far from perfect.
>gcc didn't really show off when we replace jj = ir with break in sift, which should have helped gcc, but destroyed the
>optimization of the entire code ;(
>
>I have a gift for all of you: Mouse's Ugly Heapsort written in x64 assembly language that works in all native 64-bit Forths.
>It is supposed to be the best, because the fastest heapsort ever created. We define the best one in such a way that if,
>after the presentation of the code and your criticism (which the whole world counts on), no one can speed up or improve
>this code, the code is practically the best.
>Since this code will be used by future generations, long after we are gone, try to improve it to leave something for the
>world after our vacation this year. Of course, all this code is intended for free use by anyone without even mentioning
>the authors.
>First, I will describe the programming technique used.
>Since the code is supposed to work on every native 64-bit Forth, we cannot use the built-in assembler because this one is
>different in every Forth.
>We will use an external assembler which will generate a binary code that can be inserted into any 64-bit native Forth.
>In this case, I used NASM that you can install on Fedora with the command: sudo dnf install nasm
>Source files in NASM are very easy to write, just remember to put one directive at the beginning of the file: BITS 64
>The default NASM format is binary (plain code, no extra data), so we don't need to specify the format when assembling. We
>can add the generation of a so-called listing file, which can help us make choices about the use of certain instructions.
>We will be dealing with three files written in assembly language: heapsort.asm, prolog.asm, epilog.asm. We assemble these
>files with the following commands:
>nasm -l heapsort.lst heapsort.asm
>nasm -l prolog.lst prolog.asm
>nasm -l epilog.lst epilog.asm

This is quite a valuable technique. It is an alternative to calling c-routines
from Forth.
--
"in our communism country Viet Nam, people are forced to be
alive and in the western country like US, people are free to
die from Covid 19 lol" duc ha
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<05ffffa4-cb98-42d0-8228-5b6d0a743b44n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19811&group=comp.lang.forth#19811

copy link Newsgroups: comp.lang.forth

X-Received: by 2002:ac8:5a55:0:b0:343:72f9:7053 with SMTP id o21-20020ac85a55000000b0034372f97053mr20919118qta.518.1661285057583;
Tue, 23 Aug 2022 13:04:17 -0700 (PDT)
X-Received: by 2002:a05:622a:54c:b0:343:5a1a:9ace with SMTP id
m12-20020a05622a054c00b003435a1a9acemr21203186qtx.546.1661285057469; Tue, 23
Aug 2022 13:04:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Tue, 23 Aug 2022 13:04:17 -0700 (PDT)
In-Reply-To: <nnd$5c5527ce$00205d81@05fde1a4fb5a15f5>
Injection-Info: google-groups.googlegroups.com; posting-host=2001:1c05:2f14:600:1cb:4fcd:f6ce:554a;
posting-account=-JQ2RQoAAAB6B5tcBTSdvOqrD1HpT_Rk
NNTP-Posting-Host: 2001:1c05:2f14:600:1cb:4fcd:f6ce:554a
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
<f7a0b9b6-8604-4da6-bc40-b6d6caa06128n@googlegroups.com> <tdilmk$7lk$1@gioia.aioe.org>
<c3e7bae3-a98a-439e-a6ec-e7e7e3175f8cn@googlegroups.com> <nnd$5c5527ce$00205d81@05fde1a4fb5a15f5>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <05ffffa4-cb98-42d0-8228-5b6d0a743b44n@googlegroups.com>
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in
VFX Forth
From: mhx...@iae.nl (Marcel Hendrix)
Injection-Date: Tue, 23 Aug 2022 20:04:17 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 1662

by: Marcel Hendrix - Tue, 23 Aug 2022 20:04 UTC

On Tuesday, August 23, 2022 at 9:32:56 PM UTC+2, none albert wrote:
[..]
> This is quite a valuable technique. It is an alternative to calling c-routines
> from Forth.

Hardly. I am not going to write, e.g., the Faddeev-Leverrier algorithm
in assembler.

-marcel

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<nnd$4e420334$6b6909ad@7615ce88ee79052f>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19812&group=comp.lang.forth#19812

copy link Newsgroups: comp.lang.forth

Newsgroups: comp.lang.forth
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com> <c3e7bae3-a98a-439e-a6ec-e7e7e3175f8cn@googlegroups.com> <nnd$5c5527ce$00205d81@05fde1a4fb5a15f5> <05ffffa4-cb98-42d0-8228-5b6d0a743b44n@googlegroups.com>
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in
VFX Forth
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
From: alb...@cherry (none)
Originator: albert@cherry.(none) (albert)
Message-ID: <nnd$4e420334$6b6909ad@7615ce88ee79052f>
Organization: KPN B.V.
Date: Tue, 23 Aug 2022 23:47:47 +0200
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!feed.abavia.com!abe006.abavia.com!abp001.abavia.com!news.kpn.nl!not-for-mail
Lines: 24
Injection-Date: Tue, 23 Aug 2022 23:47:47 +0200
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
X-Received-Bytes: 1706

by: none - Tue, 23 Aug 2022 21:47 UTC

In article <05ffffa4-cb98-42d0-8228-5b6d0a743b44n@googlegroups.com>,
Marcel Hendrix <mhx@iae.nl> wrote:
>On Tuesday, August 23, 2022 at 9:32:56 PM UTC+2, none albert wrote:
>[..]
>> This is quite a valuable technique. It is an alternative to calling c-routines
>> from Forth.
>
>Hardly. I am not going to write, e.g., the Faddeev-Leverrier algorithm
>in assembler.

You don't have to. Assembler trumps a high level language.
Write that algorithm in c, then use the assembler output to
add it to Forth. No OS-specific techniques required.

>
>-marcel

Re: On the ugly Heapsort implementation which becomes beautiful in VFX Forth

<571f5d51-609e-46c5-abcc-25fb52e42014n@googlegroups.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19820&group=comp.lang.forth#19820

copy link Newsgroups: comp.lang.forth

X-Received: by 2002:a05:620a:7e2:b0:6bc:980:db39 with SMTP id k2-20020a05620a07e200b006bc0980db39mr3844459qkk.176.1661447847193;
Thu, 25 Aug 2022 10:17:27 -0700 (PDT)
X-Received: by 2002:a05:6214:2307:b0:46e:5fbc:3c30 with SMTP id
gc7-20020a056214230700b0046e5fbc3c30mr4393785qvb.21.1661447847034; Thu, 25
Aug 2022 10:17:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.forth
Date: Thu, 25 Aug 2022 10:17:26 -0700 (PDT)
In-Reply-To: <nnd$4e420334$6b6909ad@7615ce88ee79052f>
Injection-Info: google-groups.googlegroups.com; posting-host=5.173.151.59; posting-account=6_WQ-goAAAB1rf-V6zYvzNSXvyUB0tj4
NNTP-Posting-Host: 5.173.151.59
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
<c3e7bae3-a98a-439e-a6ec-e7e7e3175f8cn@googlegroups.com> <nnd$5c5527ce$00205d81@05fde1a4fb5a15f5>
<05ffffa4-cb98-42d0-8228-5b6d0a743b44n@googlegroups.com> <nnd$4e420334$6b6909ad@7615ce88ee79052f>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <571f5d51-609e-46c5-abcc-25fb52e42014n@googlegroups.com>
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in
VFX Forth
From: siarcze...@gmail.com (Michał Kasprzak)
Injection-Date: Thu, 25 Aug 2022 17:17:27 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 7278

by: Michał Kasprzak - Thu, 25 Aug 2022 17:17 UTC

Look at the xor r10d, r10d instruction in the heapsort.asm file.
This known technique of limiting code length by working on the lower 32 bits of 64-bit registers, which works great for rax-rdx registers, unfortunately does not work for extended registers r8-r15. Even the 32-bit xor r10d, r10d requires a prefix byte and the entire instruction is also 3 bytes long.
It's a pity, but we can swap the roles of the r10 and rax registers in the sorting loop part of the code to be able to write xor eax, eax (which is 2 bytes long) instead of xor r10d, r10d and get 1 byte shorter code :)
And that's exactly what I did in the patched version of heapsort.asm below.
Now the revised assembled code is 181 bytes long. It is really short! When you honestly compare this length with the numbers you saw earlier in this thread remember that this code has a sift code inlined in two places for quicker operation.

Who next of you will shorten this code more or find some improvement to this code? :)

---------- start of file: heapsort.asm, cut below this line ----------
BITS 64
; rdi = array's address
; rsi = array's number of elements = n
heapsort: mov rax, rsi
shr rax, 1
; nothing to sort if there is 0 or 1 element in the array
jz the_end
; now we are sure that always n>=2
; r10 = index of the last parent node
; r10 is also the counter of the heap building loop
lea r10, [rax-1]
; rbx = index of the left child of the last parent node
; rbx is also index of the left child of r10 node in the loop
lea rbx, [rax+rax-1]
; heap building loop
; sift the counter node's value stored in r9 to the proper position
; r9 = array[r10]
build_loop: mov r9, qword [rdi+8*r10]
; r11 is index of target place for sifted values
; we start sifting from the counter node's index
mov r11, r10
; rax is the index of a current child of r11 node
; we start r11 with the index of the left child
mov rax, rbx
jmp sift1_while
; we will find the child with bigger value and store its index in rax
; we will load this bigger value to rdx
; rcx holds the index of the right child
sift1: lea rcx, [rax+1]
mov rdx, qword [rdi+8*rax]
; note that the right child may not exist
; then we consider the left child to be bigger
cmp rcx, rsi
jnb sift1_check
; we load the right child node's value to r8
mov r8, qword [rdi+8*rcx]
; if the right child node's value is bigger than the left child's
cmp r8, rdx
jle sift1_check
; then we will proceed the right child node
mov rdx, r8
mov rax, rcx
; compare the values of the current child node with the sifting node
sift1_check: cmp r9, rdx
; if the sifting node's value is bigger then we end sifting
jnl build_next
; we store the child's value into the parent node's place
mov qword [rdi+8*r11], rdx
; we make the child's node the new parent's node
mov r11, rax
lea rax, [rax+rax+1]
; do sift if there is still at least one child
sift1_while: cmp rax, rsi
jb sift1
; we store sifted value into the target place and repeat the loop
build_next: mov qword [rdi+8*r11], r9
sub rbx, 2
sub r10, 1
jnc build_loop
; the sorting loop starts with n-1
; r10 will hold the counter of the sorting loop
lea r10, [rsi-1]
; the sorting loop
; rdx = array[0]
; r9 = array[counter]
sort_loop: mov rdx, qword [rdi]
mov r9, qword [rdi+8*r10]
; we will sift the node rax=0 which value is in r9
xor eax, eax
; we store array[counter]=array[0]
mov qword [rdi+8*r10], rdx
; now we will store in rdx the left child node's index
mov edx, 1
jmp sift2_while
; sift2 works like sift1 with changed registers
; we will find the child with bigger value and store its index in rdx
; we will load this bigger value to rsi
; rcx holds the index of the right child
sift2: lea rcx, [rdx+1]
mov rsi, qword [rdi+8*rdx]
; note that the right child may not exist
; then we consider the left child to be bigger
cmp rcx, r10
jnb sift2_check
; we load the right child node's value to r8
mov r8, qword [rdi+8*rcx]
; if the right child node's value is bigger than the left child's
cmp r8, rsi
jle sift2_check
; then we will proceed the right child node
mov rsi, r8
mov rdx, rcx
; compare the values of the current child node with the sifting node
sift2_check: cmp r9, rsi
; if the sifting node's value is bigger then we end sifting
jnl sort_next
; we store the child's value into the parent node's place
mov qword [rdi+8*rax], rsi
; we make the child's node the new parent's node
mov rax, rdx
lea rdx, [rdx+rdx+1]
; do sift if there is still at least one child
sift2_while: cmp rdx, r10
jb sift2
; we store sifted value into the target place
sort_next: mov qword [rdi+8*rax], r9
; next step of the sorting loop
sub r10, 1
; the sorting loop ends with 1
jnz sort_loop
the_end:
---------- end of file: heapsort.asm, cut above this line ----------

Re: On the ugly Heapsort implementation which becomes beautiful in

<20220825193234.21fabf3d63751c5e9fb7092d@127.0.0.1>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=19823&group=comp.lang.forth#19823

copy link Newsgroups: comp.lang.forth comp.lang.asm.x86

Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: adm...@nospicedham.127.0.0.1 (Kerr-Mudd, John)
Newsgroups: comp.lang.forth,comp.lang.asm.x86
Subject: Re: On the ugly Heapsort implementation which becomes beautiful in
Date: Thu, 25 Aug 2022 19:32:34 +0100
Organization: Dis
Lines: 135
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <20220825193234.21fabf3d63751c5e9fb7092d@127.0.0.1>
References: <277ad6ed-e704-410d-bd31-335f2a7e8b17n@googlegroups.com>
<c3e7bae3-a98a-439e-a6ec-e7e7e3175f8cn@googlegroups.com>
<nnd$5c5527ce$00205d81@05fde1a4fb5a15f5>
<05ffffa4-cb98-42d0-8228-5b6d0a743b44n@googlegroups.com>
<nnd$4e420334$6b6909ad@7615ce88ee79052f>
<571f5d51-609e-46c5-abcc-25fb52e42014n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Info: reader01.eternal-september.org; posting-host="734bf053f5ddfb3c9dfafd52b1ae671b";
logging-data="3907651"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/RoUystCmqXmNqGqMnduzcKX+OXGZ7Cnw="
Cancel-Lock: sha1:LKfni1WyV9GMO7wqRtSbz6AeEfs=

by: Kerr-Mudd, John - Thu, 25 Aug 2022 18:32 UTC

On Thu, 25 Aug 2022 10:17:26 -0700 (PDT)
Michał Kasprzak <siarczek83@gmail.com> wrote:

> Look at the xor r10d, r10d instruction in the heapsort.asm file.
> This known technique of limiting code length by working on the lower 32 bits of 64-bit registers, which works great for rax-rdx registers, unfortunately does not work for extended registers r8-r15. Even the 32-bit xor r10d, r10d requires a prefix byte and the entire instruction is also 3 bytes long.
> It's a pity, but we can swap the roles of the r10 and rax registers in the sorting loop part of the code to be able to write xor eax, eax (which is 2 bytes long) instead of xor r10d, r10d and get 1 byte shorter code :)
> And that's exactly what I did in the patched version of heapsort.asm below.
> Now the revised assembled code is 181 bytes long. It is really short! When you honestly compare this length with the numbers you saw earlier in this thread remember that this code has a sift code inlined in two places for quicker operation.
>
> Who next of you will shorten this code more or find some improvement to this code? :)
>

You might try an asm group
I'll try a xpost to clax

> ---------- start of file: heapsort.asm, cut below this line ----------
> BITS 64
> ; rdi = array's address
> ; rsi = array's number of elements = n
> heapsort: mov rax, rsi
> shr rax, 1
> ; nothing to sort if there is 0 or 1 element in the array
> jz the_end
> ; now we are sure that always n>=2
> ; r10 = index of the last parent node
> ; r10 is also the counter of the heap building loop
> lea r10, [rax-1]
> ; rbx = index of the left child of the last parent node
> ; rbx is also index of the left child of r10 node in the loop
> lea rbx, [rax+rax-1]
> ; heap building loop
> ; sift the counter node's value stored in r9 to the proper position
> ; r9 = array[r10]
> build_loop: mov r9, qword [rdi+8*r10]
> ; r11 is index of target place for sifted values
> ; we start sifting from the counter node's index
> mov r11, r10
> ; rax is the index of a current child of r11 node
> ; we start r11 with the index of the left child
> mov rax, rbx
> jmp sift1_while
> ; we will find the child with bigger value and store its index in rax
> ; we will load this bigger value to rdx
> ; rcx holds the index of the right child
> sift1: lea rcx, [rax+1]
> mov rdx, qword [rdi+8*rax]
> ; note that the right child may not exist
> ; then we consider the left child to be bigger
> cmp rcx, rsi
> jnb sift1_check
> ; we load the right child node's value to r8
> mov r8, qword [rdi+8*rcx]
> ; if the right child node's value is bigger than the left child's
> cmp r8, rdx
> jle sift1_check
> ; then we will proceed the right child node
> mov rdx, r8
> mov rax, rcx
> ; compare the values of the current child node with the sifting node
> sift1_check: cmp r9, rdx
> ; if the sifting node's value is bigger then we end sifting
> jnl build_next
> ; we store the child's value into the parent node's place
> mov qword [rdi+8*r11], rdx
> ; we make the child's node the new parent's node
> mov r11, rax
> lea rax, [rax+rax+1]
> ; do sift if there is still at least one child
> sift1_while: cmp rax, rsi
> jb sift1
> ; we store sifted value into the target place and repeat the loop
> build_next: mov qword [rdi+8*r11], r9
> sub rbx, 2
> sub r10, 1
> jnc build_loop
> ; the sorting loop starts with n-1
> ; r10 will hold the counter of the sorting loop
> lea r10, [rsi-1]
> ; the sorting loop
> ; rdx = array[0]
> ; r9 = array[counter]
> sort_loop: mov rdx, qword [rdi]
> mov r9, qword [rdi+8*r10]
> ; we will sift the node rax=0 which value is in r9
> xor eax, eax
> ; we store array[counter]=array[0]
> mov qword [rdi+8*r10], rdx
> ; now we will store in rdx the left child node's index
> mov edx, 1
> jmp sift2_while
> ; sift2 works like sift1 with changed registers
> ; we will find the child with bigger value and store its index in rdx
> ; we will load this bigger value to rsi
> ; rcx holds the index of the right child
> sift2: lea rcx, [rdx+1]
> mov rsi, qword [rdi+8*rdx]
> ; note that the right child may not exist
> ; then we consider the left child to be bigger
> cmp rcx, r10
> jnb sift2_check
> ; we load the right child node's value to r8
> mov r8, qword [rdi+8*rcx]
> ; if the right child node's value is bigger than the left child's
> cmp r8, rsi
> jle sift2_check
> ; then we will proceed the right child node
> mov rsi, r8
> mov rdx, rcx
> ; compare the values of the current child node with the sifting node
> sift2_check: cmp r9, rsi
> ; if the sifting node's value is bigger then we end sifting
> jnl sort_next
> ; we store the child's value into the parent node's place
> mov qword [rdi+8*rax], rsi
> ; we make the child's node the new parent's node
> mov rax, rdx
> lea rdx, [rdx+rdx+1]
> ; do sift if there is still at least one child
> sift2_while: cmp rdx, r10
> jb sift2
> ; we store sifted value into the target place
> sort_next: mov qword [rdi+8*rax], r9
> ; next step of the sorting loop
> sub r10, 1
> ; the sorting loop ends with 1
> jnz sort_loop
> the_end:
> ---------- end of file: heapsort.asm, cut above this line ----------

--
Bah, and indeed Humbug.

server_pubkey.txt

rocksolid light 0.9.81
clearnet tor