Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  nodelist  faq  login

Time sharing: The use of many people by the computer.


programming / alt.lang.asm / Re: Check if all 256 bits are clear or set ?

SubjectAuthor
* Check if all 256 bits are clear or set ?skybuck2000
`* Re: Check if all 256 bits are clear or set ?skybuck2000
 `* Re: Check if all 256 bits are clear or set ?skybuck2000
  `* Re: Check if all 256 bits are clear or set ?skybuck2000
   +- Re: Check if all 256 bits are clear or set ?Kerr-Mudd, John
   `* Re: Check if all 256 bits are clear or set ?skybuck2000
    `* Re: Check if all 256 bits are clear or set ?skybuck2000
     `* Re: Check if all 256 bits are clear or set ?wolfgang kern
      `* Re: Check if all 256 bits are clear or set ?skybuck2000
       `- Re: Check if all 256 bits are clear or set ?wolfgang kern

1
Subject: Check if all 256 bits are clear or set ?
From: skybuck2000
Newsgroups: alt.lang.asm
Date: Fri, 29 Oct 2021 00:47 UTC
X-Received: by 2002:a37:4152:: with SMTP id o79mr6529128qka.169.1635468473493;
Thu, 28 Oct 2021 17:47:53 -0700 (PDT)
X-Received: by 2002:a9d:67d3:: with SMTP id c19mr6219734otn.71.1635468473268;
Thu, 28 Oct 2021 17:47:53 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: alt.lang.asm
Date: Thu, 28 Oct 2021 17:47:53 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=84.25.28.171; posting-account=np6u_wkAAADxbE7UBGUIOm-csir6aX02
NNTP-Posting-Host: 84.25.28.171
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8e1696c7-c2f6-4033-9c0e-55e06feaa2b1n@googlegroups.com>
Subject: Check if all 256 bits are clear or set ?
From: skybuck2...@hotmail.com (skybuck2000)
Injection-Date: Fri, 29 Oct 2021 00:47:53 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 55
View all headers
I have a data structure in one of my applications which is an array of 32 bytes.

So basically 256 bits.

The code needs to check if all bits are empty or if all bits are full. So basically all clear or all set.

Currently this data structure uses an additional counter of 1 byte to smartly keep track of when a bit is set or cleared individually and it acts upon it's conclusion =D
(secret code =D)

So basically this code uses 1 branch to check the counter, and sometimes it does a little decrementing and sometimes a little incrementing.

So the code itself is pretty efficient.

There is ofcourse a slight little "problem" / "inefficiency" with this technique it consumes 1 additional byte.

So 1 out of 33 bytes. Maybe there are other implicit penalties where this 33 data structure is misaligned because of this 33 byte structure.

So reducing this structure to just 32 bytes could not only safe memory but it might also increase performance a little bit, to maybe a lot ?

So at least memory wise this would give a 3% reduction of memory consumption.

Performance wise I don't know.

Currently checking 256 bits with code with be much worse.

Possibilities:

1. Check each bit individually, this would be pretty stupid and slow.
2. Check each byte, so 32 comparisons, still pretty slow.
3. Check each word, so 16 comparisons, a bit better.
4. Check each longword, so 8 comparisons, a lot better but still slow.
5. Check each quadword, so 4 comparisons, only applicable in 64 bit application.

On some systems 64 bit integer compare can still take two clock cycles.

Is there something better and faster ?

Is there a 128 bit compare in a single clock or two ?

Is there a 256 bit compare in a single clock or two ?

Just wondering ! ;)

Anyway in case such an instruction would exist for example:

if Are256BitsClear( DataStructure )

if Are256BitsSet( DataStructure )

then the counter could most likely/ofcourse be removed =D
and the data structure would be a nice 32 byte fit and a bit more efficient.

Perhaps it might even make sense to use 4 comparisons if the alignment somehow gives it more performance then the overhead of 4 comparisons lol.

Bye,
  Skybuck.


Subject: Re: Check if all 256 bits are clear or set ?
From: skybuck2000
Newsgroups: alt.lang.asm
Date: Fri, 29 Oct 2021 01:01 UTC
References: 1
X-Received: by 2002:ad4:5423:: with SMTP id g3mr7487997qvt.45.1635469311879;
Thu, 28 Oct 2021 18:01:51 -0700 (PDT)
X-Received: by 2002:a05:6830:60f:: with SMTP id w15mr6321139oti.150.1635469311570;
Thu, 28 Oct 2021 18:01:51 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: alt.lang.asm
Date: Thu, 28 Oct 2021 18:01:51 -0700 (PDT)
In-Reply-To: <8e1696c7-c2f6-4033-9c0e-55e06feaa2b1n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=84.25.28.171; posting-account=np6u_wkAAADxbE7UBGUIOm-csir6aX02
NNTP-Posting-Host: 84.25.28.171
References: <8e1696c7-c2f6-4033-9c0e-55e06feaa2b1n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c5a802d8-b92f-479e-b6f9-3f40618cd559n@googlegroups.com>
Subject: Re: Check if all 256 bits are clear or set ?
From: skybuck2...@hotmail.com (skybuck2000)
Injection-Date: Fri, 29 Oct 2021 01:01:51 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 62
View all headers
I was just scanning the internet/googling it and I came across this, which is a pretty neat trick to start with lol:

https://github.com/holiman/uint256/blob/master/uint256.go

Particularly this code, written in go, I have never programmed in go but I can still read/understand some of it:

// Cmp compares z and x and returns:
//
//   -1 if z <  x
//    0 if z == x
//   +1 if z >  x
//
func (z *Int) Cmp(x *Int) (r int) {
// z < x <=> z - x < 0 i.e. when subtraction overflows.
d0, carry := bits.Sub64(z[0], x[0], 0)
d1, carry := bits.Sub64(z[1], x[1], carry)
d2, carry := bits.Sub64(z[2], x[2], carry)
d3, carry := bits.Sub64(z[3], x[3], carry)
if carry == 1 {
return -1
}
if d0|d1|d2|d3 == 0 {
return 0
}
return 1
}

The interesting part is in:

if d0|d1|d2|d3 == 0 {

Apperently what this code does is it retrieves 4x64 bit quantities.

OR-s them together.

And then checks the result of that with a single branch.

So at the assembler/instruction/compiler generated code level this would probably result in something like:
mov register1, 64 bits from array[0]
mov register2, 64 bits from array[1]
mov register3, 64 bits from array[2]
mov register4, 64 bits from array[3]
or register1, register2
or register1, register3
or register1, register4
jump if zero/jump if not zero.

Something like that, so this could be pretty efficient ! at least "branch times" are avoided/costly comparisons.
The jump zero flag instruction could also be pretty efficient, even more efficient than a cmp, because the or instruction may already have set it.

The only slight downside to this would be 7 extra instructions taking up some instruction cache.

All in all not to bad for a first google ! ;) This might be quite quick actually ! LOL and might be worth it.

And the nicest part about it is it doesn't require any special assembler or avx or mmx or sse instructions... hmmmm...

(And ofcourse the compiler has to be slightly efficient at generating this exact assembler code, but it can probably do that ! ;) maybe it has even some more tricks
up it's sleeve.)

Can you do better than this though ? Wondering...

Bye,
  Skybuck.


Subject: Re: Check if all 256 bits are clear or set ?
From: skybuck2000
Newsgroups: alt.lang.asm
Date: Fri, 29 Oct 2021 01:12 UTC
References: 1 2
X-Received: by 2002:ac8:5f54:: with SMTP id y20mr8446769qta.324.1635469941669;
Thu, 28 Oct 2021 18:12:21 -0700 (PDT)
X-Received: by 2002:a4a:e292:: with SMTP id k18mr5568237oot.80.1635469941437;
Thu, 28 Oct 2021 18:12:21 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: alt.lang.asm
Date: Thu, 28 Oct 2021 18:12:21 -0700 (PDT)
In-Reply-To: <c5a802d8-b92f-479e-b6f9-3f40618cd559n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=84.25.28.171; posting-account=np6u_wkAAADxbE7UBGUIOm-csir6aX02
NNTP-Posting-Host: 84.25.28.171
References: <8e1696c7-c2f6-4033-9c0e-55e06feaa2b1n@googlegroups.com> <c5a802d8-b92f-479e-b6f9-3f40618cd559n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <60adee4e-7c80-48cd-9471-da1cdf80655an@googlegroups.com>
Subject: Re: Check if all 256 bits are clear or set ?
From: skybuck2...@hotmail.com (skybuck2000)
Injection-Date: Fri, 29 Oct 2021 01:12:21 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 31
View all headers
This would make a fine implementation for Are256BitsClear.

This leaves Are256BitsSet still to be implemented...

Hmm can the same trick be done for AllSet ?

Probably with AND and checking for FF  also decimally known as 255 :)

So for Are256BitsSet this code may work:

mov register1, 64 bits from array[0]
mov register2, 64 bits from array[1]
mov register3, 64 bits from array[2]
mov register4, 64 bits from array[3]
and register1, register2
and register1, register3
and register1, register4

jump if zero/jump if not zero.

^ Not sure about the jump if not zero instruction...  this would probably be invalid.

It needs to be 255 or nothing at all.

My initially thought was to maybe negate it or something, turn 255 specific to zero somehow...

Maybe subtract 255 from it to see if it's zero... if not then it's not full.

Hmmm ?

Bye for now,
  Skybuck.


Subject: Re: Check if all 256 bits are clear or set ?
From: skybuck2000
Newsgroups: alt.lang.asm
Date: Fri, 29 Oct 2021 02:18 UTC
References: 1 2 3
X-Received: by 2002:ad4:5423:: with SMTP id g3mr7799321qvt.45.1635473897031;
Thu, 28 Oct 2021 19:18:17 -0700 (PDT)
X-Received: by 2002:a05:6830:14d5:: with SMTP id t21mr6443833otq.341.1635473896790;
Thu, 28 Oct 2021 19:18:16 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: alt.lang.asm
Date: Thu, 28 Oct 2021 19:18:16 -0700 (PDT)
In-Reply-To: <60adee4e-7c80-48cd-9471-da1cdf80655an@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=84.25.28.171; posting-account=np6u_wkAAADxbE7UBGUIOm-csir6aX02
NNTP-Posting-Host: 84.25.28.171
References: <8e1696c7-c2f6-4033-9c0e-55e06feaa2b1n@googlegroups.com>
<c5a802d8-b92f-479e-b6f9-3f40618cd559n@googlegroups.com> <60adee4e-7c80-48cd-9471-da1cdf80655an@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8b2aa435-b361-48ca-9a36-fd8fb03a41f2n@googlegroups.com>
Subject: Re: Check if all 256 bits are clear or set ?
From: skybuck2...@hotmail.com (skybuck2000)
Injection-Date: Fri, 29 Oct 2021 02:18:17 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 66
View all headers
I found it fun to scan the internet to see if there is anybody that has/needs these same methods.

I came across unity documentation:

https://docs.unity3d.com/Packages/com.unity.render-pipelines.core@7.2/api/UnityEngine.Rendering.BitArray256.html

Describing

allTrue

True if all bits are 1.
Declaration

public bool allTrue { get; }

and it took a while to find the exact source code but it's here:

https://github.com/needle-mirror/com.unity.render-pipelines.core/blob/master/Runtime/Utilities/BitArray.cs

  public bool allTrue => data1 == ulong.MaxValue && data2 == ulong.MaxValue && data3 == ulong.MaxValue && data4 == ulong.MaxValue;

from this class to give it a bit more context:
    /// <summary>
    /// Bit array of size 256.
    /// </summary>
    [Serializable]
    [System.Diagnostics.DebuggerDisplay("{this.GetType().Name} {humanizedData}")]
    public struct BitArray256 : IBitArray
    {
        [SerializeField]
        ulong data1;
        [SerializeField]
        ulong data2;
        [SerializeField]
        ulong data3;
        [SerializeField]
        ulong data4;

        /// <summary>Number of elements in the bit array.</summary>
        public uint capacity => 256u;
        /// <summary>True if all bits are 0.</summary>
        public bool allFalse => data1 == 0uL && data2 == 0uL && data3 == 0uL && data4 == 0uL;
        /// <summary>True if all bits are 1.</summary>
        public bool allTrue => data1 == ulong.MaxValue && data2 == ulong.MaxValue && data3 == ulong.MaxValue && data4 == ulong.MaxValue;
        /// <summary>Returns the bit array in a human readable form.</summary>

So let's go back to this code:

  public bool allTrue => data1 == ulong.MaxValue && data2 == ulong.MaxValue && data3 == ulong.MaxValue && data4 == ulong.MaxValue;

It's a bit funny to see both solutions for allFalse and AllTrue the naming is nice, in my case you could think of it as AllEmpty and AllFull hint hint about secret code lol.

Anyway

At least the allFalse code could be written a bit more efficient HAHA ! =D
as already explored.

But can the allTrue code be written more efficient as well ? I think so...

At least the and's could be done on the data itself

(data1 && data2 && data3 && data4) == ulong.MaxValue
3's and's
1 comparison to max value.

Bye for now,
  Skybuck.


Subject: Re: Check if all 256 bits are clear or set ?
From: Kerr-Mudd, John
Newsgroups: alt.lang.asm
Organization: Dis
Date: Fri, 29 Oct 2021 09:29 UTC
References: 1 2 3 4
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: adm...@127.0.0.1 (Kerr-Mudd, John)
Newsgroups: alt.lang.asm
Subject: Re: Check if all 256 bits are clear or set ?
Date: Fri, 29 Oct 2021 10:29:35 +0100
Organization: Dis
Lines: 34
Message-ID: <20211029102935.c27654ef550d92ffa145f864@127.0.0.1>
References: <8e1696c7-c2f6-4033-9c0e-55e06feaa2b1n@googlegroups.com>
<c5a802d8-b92f-479e-b6f9-3f40618cd559n@googlegroups.com>
<60adee4e-7c80-48cd-9471-da1cdf80655an@googlegroups.com>
<8b2aa435-b361-48ca-9a36-fd8fb03a41f2n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="9bd604fcd2ad61c26298231578f47fa5";
logging-data="15940"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+4aJeBDlMp7+d+/p5GTC/MQHYJnaN1ILI="
Cancel-Lock: sha1:D+2anEOMC/QLkedeA+p+8w4iy8Q=
X-Newsreader: Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32)
;X-no-Archive: Maybe
GNU: Terry Pratchett
View all headers
On Thu, 28 Oct 2021 19:18:16 -0700 (PDT)
skybuck2000 <skybuck2000@hotmail.com> wrote:

[]  not asm code, but still something easily done:

  mov ecx,NumElements
  mov esi,Array-8   ; 8 bytes=256 bits
CheckElement:
   add esi,8 ; next array element
   mov eax,word [esi]
   or  eax,word [esi+4] ;; use 'and' for all set,
  ; if you want both tests use an additional register!
   jz  Found
  loop CheckElement
;;
; prt none found
  ret
Found:
; prt allzeros found at [esi]
  ret

;;test data
; 0123456789ABCDEF0123456789ABCDEF
Array:
 dd 0
 dd 0 ; all zero case

 dd 1
 dd 0 ; not all zero case

 dd -1
 dd -1 ; all set
--
Bah, and indeed Humbug.


Subject: Re: Check if all 256 bits are clear or set ?
From: skybuck2000
Newsgroups: alt.lang.asm
Date: Fri, 29 Oct 2021 12:07 UTC
References: 1 2 3 4
X-Received: by 2002:a05:620a:44c2:: with SMTP id y2mr8620498qkp.351.1635509220679;
Fri, 29 Oct 2021 05:07:00 -0700 (PDT)
X-Received: by 2002:a4a:bd15:: with SMTP id n21mr4617338oop.58.1635509220416;
Fri, 29 Oct 2021 05:07:00 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: alt.lang.asm
Date: Fri, 29 Oct 2021 05:07:00 -0700 (PDT)
In-Reply-To: <8b2aa435-b361-48ca-9a36-fd8fb03a41f2n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=84.25.28.171; posting-account=np6u_wkAAADxbE7UBGUIOm-csir6aX02
NNTP-Posting-Host: 84.25.28.171
References: <8e1696c7-c2f6-4033-9c0e-55e06feaa2b1n@googlegroups.com>
<c5a802d8-b92f-479e-b6f9-3f40618cd559n@googlegroups.com> <60adee4e-7c80-48cd-9471-da1cdf80655an@googlegroups.com>
<8b2aa435-b361-48ca-9a36-fd8fb03a41f2n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <03aabf97-be7e-4ebc-9d55-7fc2410e801dn@googlegroups.com>
Subject: Re: Check if all 256 bits are clear or set ?
From: skybuck2...@hotmail.com (skybuck2000)
Injection-Date: Fri, 29 Oct 2021 12:07:00 +0000
Content-Type: text/plain; charset="UTF-8"
View all headers
I have created a "professional" github page, so you guys can download the example source code properly and quickly using GIT and don't have to hassle with newsgroup crappy messaging formats:

TData and TLink optimization project

https://github.com/SkybuckFlying/TDataTLinkOptimizationProject

The aim of the project is to:

    Optimize the IsEmpty and IsFull routines for TData and TLink.
    Optimize the Empty and Full routines for TData and TLink.
    Implement WIN64 platform support by implementing BitSet and BitReset.

YES BABY ! =D It's 2021 ! Time to start sharing code a little bit more professionally and effectively and actually good usuable.

However in case you don't know how to use GIT and still want to have a look at the source code then I will post it here anyway,

we can also discuss further ideas for optimization on newsgroups, I am particularly interested in any high speed assembler assistence for x86 and/or x64 instruction set !

unit UnitTDataTLink;

{

TestProgram/Optimization Project for TData and TLink

version 0.02 created on 29 october 2021 by Skybuck Flying

TData contains an array of 32 bytes, in other words 256 bits.
TLink contains an array of 256 pointers.

The mission of this project is to optimize the IsEmpty and IsFull functions
to run as fast as possible on modern x86/x64 processors.

Bonus would be to optimize Empty and Full routines, however beware
the TLink structure uses special pointer values for empty and full.

The verification routines do not need to be optimized and should not
be optimized and are only there to verify any optimized code.

I will sure this code in github for easy downloading and uploading.

Would be cool to get some help with optimizing this code.

The original code used a mCount : byte; member to optimize the code
however this increases the TData structure to 33 bytes which could
be causing misalignment and consumes 3% more memory.

The TLink structure would also fit better if mCount is illiminated.

Both structures contained a mCount : byte variable and have been removed.

Now the same functionality to check for empty and full should be implemented
with as fast as code as possible without using any extra member fields
for the structures to keep them memory efficient like this.

Perhaps these link functions can be optimized with AVX-512 or so ?

}

{

version 0.03 created on 29 october 2021 by Skybuck Flying

Console program turned into multi-device-application project for
multi-platform support.

Project and Source files seperated into seperate folders to avoid
file clutter.

Only WIN32 platform currently implemented.

Other platforms possible thanks to multi-device-project/applicaation.

For some reason it was not possible to add platforms to a console project.

So I modified code so it can work with a multi-device-application, this
added a lot of platforms.

WIN64 platform support is desired !

The rest would be a bonus !

}

interface

uses
Classes; // for TStrings

type
TData = packed record
mData : packed array[0..31] of byte;

procedure Empty;
procedure Full;

function IsEmpty : boolean;
function IsFull : boolean;

// debug routines
function VerifyIsEmpty : boolean;
function VerifyIsFull : boolean;
function VerifyOne : boolean;
function VerifyRandomButOne : boolean;
procedure VerifyAll;
end;

TLink = packed record
mLink : packed array[0..255] of pointer;

procedure Empty;
procedure Full;

function IsEmpty : boolean;
function IsFull : boolean;

// debug routines
function VerifyIsEmpty : boolean;
function VerifyIsFull : boolean;
function VerifyOne : boolean;
function VerifyRandomButOne : boolean;
procedure VerifyAll;
end;

procedure MainTestProgram( ParaOutput : TStrings );

implementation

uses
SysUtils; // for IntToStr

const
ConstDataTestLoops = 1000000; // one million ! =D
ConstLinkTestLoops =  100000; // one hundred thousand ! =D

var
const_empty : pointer;
const_full : pointer;

var
vOutput : TStrings;

procedure OutputMessage( ParaString : string );
begin
vOutput.Add( ParaString );
end;

// only works on win32 platform, need win64 platform versions !

{$ifdef WIN32}
// unlocked versions (multi-threading-unsafe)
procedure BitReset( ParaMemory : pointer; ParaBitIndex : integer );
asm
btr [eax], edx
end;

procedure BitSet( ParaMemory : pointer; ParaBitIndex : integer );
asm
bts [eax], edx
end;

// locked versions (multi-threading-safe)
procedure LockedBitReset( ParaMemory : pointer; ParaBitIndex : integer );
asm
LOCK btr [eax], edx
end;

procedure LockedBitSet( ParaMemory : pointer; ParaBitIndex : integer );
asm
LOCK bts [eax], edx
end;
{$endif}

{$ifdef WIN64}
// unlocked versions (multi-threading-unsafe)
// optimize me ?
procedure BitReset( ParaMemory : pointer; ParaBitIndex : integer );
begin
// to be implemented
ERROR NOT IMPLEMENTED
end;

procedure BitSet( ParaMemory : pointer; ParaBitIndex : integer );
begin
// to be implemented
ERROR NOT IMPLEMENTED
end;

// not really required for this test program but would be nice to have a 64 bit version of this as well.
procedure LockedBitReset( ParaMemory : pointer; ParaBitIndex : integer );
begin

end;

procedure LockedBitSet( ParaMemory : pointer; ParaBitIndex : integer );
begin

end;
{$endif}

{$ifdef WIN32}
procedure TData.Empty;
type
TUInt32Array = packed array[0..7] of UInt32;
begin
TUInt32Array(mData)[0] := 0;
TUInt32Array(mData)[1] := 0;
TUInt32Array(mData)[2] := 0;
TUInt32Array(mData)[3] := 0;
TUInt32Array(mData)[4] := 0;
TUInt32Array(mData)[5] := 0;
TUInt32Array(mData)[6] := 0;
TUInt32Array(mData)[7] := 0;
end;
{$endif}

{$ifdef WIN64}
procedure TData.Empty;
type
TUInt64Array = packed array[0..3] of UInt64;
begin
TUInt64Array(mData)[0] := 0;
TUInt64Array(mData)[1] := 0;
TUInt64Array(mData)[2] := 0;
TUInt64Array(mData)[3] := 0;
end;
{$endif}

{$ifdef WIN32}
procedure TData.Full;
type
TUInt32Array = packed array[0..7] of longword;
begin
//                         11223344
TUInt32Array(mData)[0] := $FFFFFFFF;
TUInt32Array(mData)[1] := $FFFFFFFF;
TUInt32Array(mData)[2] := $FFFFFFFF;
TUInt32Array(mData)[3] := $FFFFFFFF;
TUInt32Array(mData)[4] := $FFFFFFFF;
TUInt32Array(mData)[5] := $FFFFFFFF;
TUInt32Array(mData)[6] := $FFFFFFFF;
TUInt32Array(mData)[7] := $FFFFFFFF;
end;
{$endif}

{$ifdef WIN64}
procedure TData.Full;
type
TUInt64Array = packed array[0..3] of UInt64;
begin
//                         1122334455667788
TUInt64Array(mData)[0] := $FFFFFFFFFFFFFFFF;
TUInt64Array(mData)[1] := $FFFFFFFFFFFFFFFF;
TUInt64Array(mData)[2] := $FFFFFFFFFFFFFFFF;
TUInt64Array(mData)[3] := $FFFFFFFFFFFFFFFF;
end;
{$endif}

{$ifdef WIN32}
function TData.IsEmpty : boolean;
type
TUInt32Array = packed array[0..7] of longword;
var
vResult : longword;
begin
result := False;

vResult :=
TUInt32Array(mData)[0] or
TUInt32Array(mData)[1] or
TUInt32Array(mData)[2] or
TUInt32Array(mData)[3] or
TUInt32Array(mData)[4] or
TUInt32Array(mData)[5] or
TUInt32Array(mData)[6] or
TUInt32Array(mData)[7];

if vResult = 0 then
begin
result := True;
end;
end;
{$endif}

{$ifdef WIN64}
function TData.IsEmpty : boolean;
type
TUInt64Array = packed array[0..3] of UInt64;
var
vResult : UInt64;
begin
result := False;

vResult :=
TUInt64Array(mData)[0] or
TUInt64Array(mData)[1] or
TUInt64Array(mData)[2] or
TUInt64Array(mData)[3];

if vResult = 0 then
begin
result := True;
end;
end;
{$endif}

{$ifdef WIN32}
function TData.IsFull : boolean;
type
TUInt32Array = packed array[0..7] of longword;
var
vResult : UInt64;
begin
result := False;

vResult :=
TUInt32Array(mData)[0] and
TUInt32Array(mData)[1] and
TUInt32Array(mData)[2] and
TUInt32Array(mData)[3] and
TUInt32Array(mData)[4] and
TUInt32Array(mData)[5] and
TUInt32Array(mData)[6] and
TUInt32Array(mData)[7];

//            11223344
if vResult = $FFFFFFFF then
begin
result := True;
end;
end;
{$endif}

{$ifdef WIN64}
function TData.IsFull : boolean;
type
TUInt64Array = packed array[0..3] of UInt64;
var
vResult : UInt64;
begin
result := False;

vResult :=
TUInt64Array(mData)[0] and
TUInt64Array(mData)[1] and
TUInt64Array(mData)[2] and
TUInt64Array(mData)[3];

//            1122334455667788
if vResult = $FFFFFFFFFFFFFFFF then
begin
result := True;
end;
end;
{$endif}

// TData verification routines

function TData.VerifyIsEmpty : boolean;
var
vIndex : integer;
begin
result := True;
for vIndex := 0 to 31 do
begin
if mData[vIndex] <> 0 then
begin
result := False;
end;
end;
end;

function TData.VerifyIsFull : boolean;

Click here to read the complete article
Subject: Re: Check if all 256 bits are clear or set ?
From: skybuck2000
Newsgroups: alt.lang.asm
Date: Tue, 23 Nov 2021 16:23 UTC
References: 1 2 3 4 5
X-Received: by 2002:a05:622a:609:: with SMTP id z9mr7710974qta.243.1637684635689;
Tue, 23 Nov 2021 08:23:55 -0800 (PST)
X-Received: by 2002:a05:6830:1690:: with SMTP id k16mr5572011otr.148.1637684635424;
Tue, 23 Nov 2021 08:23:55 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: alt.lang.asm
Date: Tue, 23 Nov 2021 08:23:55 -0800 (PST)
In-Reply-To: <03aabf97-be7e-4ebc-9d55-7fc2410e801dn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=84.25.28.171; posting-account=np6u_wkAAADxbE7UBGUIOm-csir6aX02
NNTP-Posting-Host: 84.25.28.171
References: <8e1696c7-c2f6-4033-9c0e-55e06feaa2b1n@googlegroups.com>
<c5a802d8-b92f-479e-b6f9-3f40618cd559n@googlegroups.com> <60adee4e-7c80-48cd-9471-da1cdf80655an@googlegroups.com>
<8b2aa435-b361-48ca-9a36-fd8fb03a41f2n@googlegroups.com> <03aabf97-be7e-4ebc-9d55-7fc2410e801dn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <24992b1e-3798-44c9-b2cb-09fe08e0142dn@googlegroups.com>
Subject: Re: Check if all 256 bits are clear or set ?
From: skybuck2...@hotmail.com (skybuck2000)
Injection-Date: Tue, 23 Nov 2021 16:23:55 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 131
View all headers
Conceptually this is very smart to invert the logic !

Your idea (Peter 'Shaggy' Haywood) has given me a new idea for the code:

Instead of wasting cpu cycles/times checking everything instead only check what is needed to come to a certain "inverted conclusions".

It is indeed not necessary to check all bits to see if they are all clear, if one of them is not clear then obviously the data is not empty, and the function can exit earlier.

It is indeed not necessary to check all bits to see if they are all set, if one of them is not set then obviously the data is not full, and the function can exit earlier.

HAHA ! Pretty smart and brilliant ! ;) =D

Or is it ? hmmmm

Maybe not.... it depends...

Let's analyze code below:

{$ifdef WIN32}
function TData.IsEmpty : boolean;
type
TUInt32Array = packed array[0..7] of longword;
var
vResult : longword;
begin
result := False;

vResult :=
TUInt32Array(mData)[0] or
TUInt32Array(mData)[1] or
TUInt32Array(mData)[2] or
TUInt32Array(mData)[3] or
TUInt32Array(mData)[4] or
TUInt32Array(mData)[5] or
TUInt32Array(mData)[6] or
TUInt32Array(mData)[7];

if vResult = 0 then
begin
result := True;
end;
end;
{$endif}

Theoretically it needs to pull in 8x 32 bit integers, this could strain the memory bus somewhat, slow down the cpu, while it's waiting for data.... and it may pollute the l1 data cache unnecessarily.

On the other hand it does only one branch comparison.

Now the alternative would be to do 8 branch comparisons. Perhaps there is a golden ground in the middle, maybe only do 2 or 4 comparisons that is a possibility as well.

So different combinations could be tried to see which one performs best.

Here is where a little bit of artificial intelligence could be handy.

The application could measure all versions during active/real world usage with statisticall data tracking... for the first few minutes or so...

Then once another variations and statistics collected it can switch to non-statistical methods and pick the one which it believes is the fastest.

Anyway let's create a worst case counter example and analyze that, untested code:

To cool thing is this also gets rid of the extra variable.

function TData.IsEmpty : boolean;
type
TUInt32Array = packed array[0..7] of longword;
begin
result := False;

// if not empty then exit.
if TUInt32Array(mData)[0]) <> 0 then exit;
if TUInt32Array(mData)[1]) <> 0 then exit;
if TUInt32Array(mData)[2]) <> 0 then exit;
if TUInt32Array(mData)[3]) <> 0 then exit;
if TUInt32Array(mData)[4]) <> 0 then exit;
if TUInt32Array(mData)[5]) <> 0 then exit;
if TUInt32Array(mData)[6]) <> 0 then exit;
if TUInt32Array(mData)[7]) <> 0 then exit;

result := True; // if execution reaches here then return true.
end;
{$endif}

Pretty cool,

These are still 8 data accesses, no or instructions, 8 comparisons, possibly 8 jumps

On average this will probably run at 4,4,4

It will take up some more branch predicators, less data cache

Hard to say which one will run faster ;)  for now my bet would be on the first one, because or instructions are 1 cycle and produces can be 15 or something.

However data access can be in the range of 400 nanoseconds, maybe 400 cycles, that could severely impact performance.

And this this newer version might run faster, depends on how could the L1 data cache lining is, if it's good maybe 10 cycles per data access... for a average total of 40.

Hmmm... I may have to benchmark this sometime ! ;)

And the isFull version:

// untested code

{$ifdef WIN32}
function TData.IsFull : boolean;
type
TUInt32Array = packed array[0..7] of longword;
begin
result := False;

// if not full exit
if TUInt32Array(mData)[0] <> $FFFFFFFF then exit;
if TUInt32Array(mData)[1] <> $FFFFFFFF then exit;
if TUInt32Array(mData)[2] <> $FFFFFFFF then exit;
if TUInt32Array(mData)[3] <> $FFFFFFFF then exit;
if TUInt32Array(mData)[4] <> $FFFFFFFF then exit;
if TUInt32Array(mData)[5] <> $FFFFFFFF then exit;
if TUInt32Array(mData)[6] <> $FFFFFFFF then exit;
if TUInt32Array(mData)[7] <> $FFFFFFFF then exit;

result := True;
end;
{$endif}

Ok interesting,

Now we truely have something to benchmark ! ;) =D

I am kinda busy so this will have to wait time a further time, but other people are welcome to benchmark this themselfes ! ;) :)

Bye for now,
  And thanks for the feedback and ideas !
    Skybuck Flying ! =D


Subject: Re: Check if all 256 bits are clear or set ?
From: wolfgang kern
Newsgroups: alt.lang.asm
Organization: Aioe.org NNTP Server
Date: Tue, 23 Nov 2021 18:58 UTC
References: 1 2 3 4 5 6
Path: i2pn2.org!i2pn.org!aioe.org!2P8cjsBfU8q2CWcmNOHU/A.user.46.165.242.75.POSTED!not-for-mail
From: nowh...@nevernet.at (wolfgang kern)
Newsgroups: alt.lang.asm
Subject: Re: Check if all 256 bits are clear or set ?
Date: Tue, 23 Nov 2021 19:58:01 +0100
Organization: Aioe.org NNTP Server
Message-ID: <snjdjp$epo$1@gioia.aioe.org>
References: <8e1696c7-c2f6-4033-9c0e-55e06feaa2b1n@googlegroups.com>
<c5a802d8-b92f-479e-b6f9-3f40618cd559n@googlegroups.com>
<60adee4e-7c80-48cd-9471-da1cdf80655an@googlegroups.com>
<8b2aa435-b361-48ca-9a36-fd8fb03a41f2n@googlegroups.com>
<03aabf97-be7e-4ebc-9d55-7fc2410e801dn@googlegroups.com>
<24992b1e-3798-44c9-b2cb-09fe08e0142dn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="15160"; posting-host="2P8cjsBfU8q2CWcmNOHU/A.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.1.2
Content-Language: en-US
X-Notice: Filtered by postfilter v. 0.9.2
View all headers
On 23/11/2021 17:23, skybuck2000 wrote:
,,,,
Ok interesting,

Now we truely have something to benchmark ! ;) =D

I am kinda busy so this will have to wait time a further time, but other people are welcome to benchmark this themselfes ! ;) :)


don't waste your time because its more than obvious that OR chains are much faster and shorter than IF THEN jump for an all zero check.
and not a single branch is required with setcc/cmov after the last OR.

similar story for check if all set: AND all and INC last.
if the INC doesn't set ZF then it wasn't an all bits set value.
__
wolfgang


Subject: Re: Check if all 256 bits are clear or set ?
From: skybuck2000
Newsgroups: alt.lang.asm
Date: Wed, 24 Nov 2021 20:37 UTC
References: 1 2 3 4 5 6 7
X-Received: by 2002:a37:43cf:: with SMTP id q198mr1232741qka.689.1637786258164;
Wed, 24 Nov 2021 12:37:38 -0800 (PST)
X-Received: by 2002:a9d:2f27:: with SMTP id h36mr3796197otb.361.1637786257912;
Wed, 24 Nov 2021 12:37:37 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: alt.lang.asm
Date: Wed, 24 Nov 2021 12:37:37 -0800 (PST)
In-Reply-To: <snjdjp$epo$1@gioia.aioe.org>
Injection-Info: google-groups.googlegroups.com; posting-host=84.25.28.171; posting-account=np6u_wkAAADxbE7UBGUIOm-csir6aX02
NNTP-Posting-Host: 84.25.28.171
References: <8e1696c7-c2f6-4033-9c0e-55e06feaa2b1n@googlegroups.com>
<c5a802d8-b92f-479e-b6f9-3f40618cd559n@googlegroups.com> <60adee4e-7c80-48cd-9471-da1cdf80655an@googlegroups.com>
<8b2aa435-b361-48ca-9a36-fd8fb03a41f2n@googlegroups.com> <03aabf97-be7e-4ebc-9d55-7fc2410e801dn@googlegroups.com>
<24992b1e-3798-44c9-b2cb-09fe08e0142dn@googlegroups.com> <snjdjp$epo$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bb3e083a-d179-4f45-b161-9fd8f94dfa46n@googlegroups.com>
Subject: Re: Check if all 256 bits are clear or set ?
From: skybuck2...@hotmail.com (skybuck2000)
Injection-Date: Wed, 24 Nov 2021 20:37:38 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 41
View all headers
On Tuesday, November 23, 2021 at 7:58:03 PM UTC+1, wolfgang kern wrote:
On 23/11/2021 17:23, skybuck2000 wrote:
,,,,
Ok interesting,

Now we truely have something to benchmark ! ;) =D

I am kinda busy so this will have to wait time a further time, but other people are welcome to benchmark this themselfes ! ;) :)

don't waste your time because its more than obvious that OR chains are
much faster and shorter than IF THEN jump for an all zero check.
and not a single branch is required with setcc/cmov after the last OR.

similar story for check if all set: AND all and INC last.
if the INC doesn't set ZF then it wasn't an all bits set value.

What is the first 4 integers are in the L1 Data cache or on a page boundary near the end.

And the next 4 integers are not in L1 Data cache and on second 4k page.

This could at least incur a 400 cycles or so ??

Me no expert in that but that little I think to know.

You seem to be assuming all data is in L1 data cache, and that could be your flaw.

Though on average a high percentage would indeed be in L1 data cache.

Though I would not be surprise if somehow you are wrong ! =D

Perhaps because of other processing occuring, there is always something to trip things up ! =D

And many different computers, but ok, todays systems... can be assumed ! ;)

Though this kind of software should be able to run on 8486, pentium, athlons, skylakes, threadrippers and any future computers/processors.

L1 caches seem to be shrinking in size on many-core CPUs.

If that shrinkage continues, code like this may be affected.

Bye,
  Skybuck


Subject: Re: Check if all 256 bits are clear or set ?
From: wolfgang kern
Newsgroups: alt.lang.asm
Organization: Aioe.org NNTP Server
Date: Thu, 25 Nov 2021 11:16 UTC
References: 1 2 3 4 5 6 7 8
Path: i2pn2.org!i2pn.org!aioe.org!G2H052nv0/CT4xmzUoRVow.user.46.165.242.75.POSTED!not-for-mail
From: nowh...@nevernet.at (wolfgang kern)
Newsgroups: alt.lang.asm
Subject: Re: Check if all 256 bits are clear or set ?
Date: Thu, 25 Nov 2021 12:16:53 +0100
Organization: Aioe.org NNTP Server
Message-ID: <snnrb6$1v5k$1@gioia.aioe.org>
References: <8e1696c7-c2f6-4033-9c0e-55e06feaa2b1n@googlegroups.com>
<c5a802d8-b92f-479e-b6f9-3f40618cd559n@googlegroups.com>
<60adee4e-7c80-48cd-9471-da1cdf80655an@googlegroups.com>
<8b2aa435-b361-48ca-9a36-fd8fb03a41f2n@googlegroups.com>
<03aabf97-be7e-4ebc-9d55-7fc2410e801dn@googlegroups.com>
<24992b1e-3798-44c9-b2cb-09fe08e0142dn@googlegroups.com>
<snjdjp$epo$1@gioia.aioe.org>
<bb3e083a-d179-4f45-b161-9fd8f94dfa46n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: gioia.aioe.org; logging-data="64692"; posting-host="G2H052nv0/CT4xmzUoRVow.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.1.2
Content-Language: en-US
X-Notice: Filtered by postfilter v. 0.9.2
View all headers
On 24/11/2021 21:37, skybuck2000 wrote:
On Tuesday, November 23, 2021 at 7:58:03 PM UTC+1, wolfgang kern wrote:
On 23/11/2021 17:23, skybuck2000 wrote:
,,,,
Ok interesting,

Now we truely have something to benchmark ! ;) =D

I am kinda busy so this will have to wait time a further time, but other people are welcome to benchmark this themselfes ! ;) :)

don't waste your time because its more than obvious that OR chains are
much faster and shorter than IF THEN jump for an all zero check.
and not a single branch is required with setcc/cmov after the last OR.

similar story for check if all set: AND all and INC last.
if the INC doesn't set ZF then it wasn't an all bits set value.

What is the first 4 integers are in the L1 Data cache or on a page boundary near the end.

I'd say this were only possible on a stupid memory layout

And the next 4 integers are not in L1 Data cache and on second 4k page.

This could at least incur a 400 cycles or so ??

nope, usually about 35 cycles per cache burst read.

Me no expert in that but that little I think to know.

You seem to be assuming all data is in L1 data cache, and that could be your flaw.

Though on average a high percentage would indeed be in L1 data cache.

Though I would not be surprise if somehow you are wrong ! =D

Perhaps because of other processing occuring, there is always something to trip things up ! =D

And many different computers, but ok, todays systems... can be assumed ! ;)

Though this kind of software should be able to run on 8486, pentium, athlons, skylakes, threadrippers and any future computers/processors.

L1 caches seem to be shrinking in size on many-core CPUs.

If that shrinkage continues, code like this may be affected.

whatsoever you think to know ...
IF THEN structs will always lose against OR/AND-chains in terms of size and speed regardless of where the data are.
and again any split memory compound sound pretty careless programming.
__
wolfgang


1
rocksolid light 0.7.2
clearneti2ptor