Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  nodelist  faq  login

THEGODDESSOFTHENETHASTWISTINGFINGERSANDHERVOICEISLIKEAJAVELININTHENIGHTDUDE


computers / comp.arch / Bit Swizzling

SubjectAuthor
* Bit SwizzlingRick C. Hodgin
+* Re: Bit SwizzlingMitchAlsup
|`- Re: Bit SwizzlingMitchAlsup
+* Re: Bit SwizzlingJoshua Landau
|`- Re: Bit SwizzlingRick C. Hodgin
+- Re: Bit SwizzlingBart
+* Re: Bit SwizzlingChris
|`* Re: Bit SwizzlingJohn Levine
| +- Re: Bit SwizzlingRick C. Hodgin
| `- Re: Bit SwizzlingChris
`* Re: Bit Swizzlingolcott
 `- Re: Bit Swizzlingolcott

1
Subject: Bit Swizzling
From: Rick C. Hodgin
Newsgroups: comp.lang.c, comp.arch, comp.arch.fpga, comp.lang.asm.x86
Organization: Liberty Software Foundation
Date: Sat, 5 Sep 2020 00:33 UTC
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: rick.c.h...@nospicedham.gmail.com (Rick C. Hodgin)
Newsgroups: comp.lang.c,comp.arch,comp.arch.fpga,comp.lang.asm.x86
Subject: Bit Swizzling
Date: Fri, 4 Sep 2020 20:33:23 -0400
Organization: Liberty Software Foundation
Lines: 57
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <riumcj$3j9$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="22b79ff0dc4692c6394fd5df7481c0d9";
logging-data="9370"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19HWKakfgFgI4Oi8oDA8uZW/Rw74YlrEh0="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:ZtJ45OBBOfJo4klsWtqHGRxhrbQ=
View all headers

I'm not sure where to ask this question, so I pushed it out to several
groups.  You need not reply to all of them if you don't think it is a
topical subject.

I included comp.compilers in a previous message, but it apparently holds
the message until it passes moderation.  So, I've not included it here.
A duplicate message may post if/when the comp.compilers moderator John
Levine approves it.

-----
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?

For example, if I have an 8-bit byte and I want to swizzle the bits
thusly:

     Input:   07 06 05 04 03 02 01 00
    Output:   05 04 07 02 01 03 00 06

I can easily swizzle the data using a brute force method:

     v = get_value();
     o = 0;
     swizzle1(o, v, 0, 6);
     swizzle1(o, v, 1, 0);
     swizzle1(o, v, 2, 3);
     swizzle1(o, v, 3, 1);
     swizzle1(o, v, 4, 2);
     swizzle1(o, v, 5, 7);
     swizzle1(o, v, 6, 4);
     swizzle1(o, v, 7, 5);

     // Untested, off the top of my head
     void swizzle(unsigned char& o, unsigned char v, int s, int d)
     {
         o |= (((v & (1 << s)) >> s) << d);
     }

And, of course, that algorithm can be optimized based on relative
values of s and d, and if s is 0, etc.

However, there also exists a minimal set of steps to complete that
swizzling because in the operation above, bits 5 and 4 are used in
sequence.  It could be done using a swizzle2() operation, for example
and handle 2 bits at a time.

Are there any existing algorithms which examine the operations that
must be conducted and the optimized / minimal sequence of steps to
conduct it?

Thank you in advance.

--
Rick C. Hodgin



Subject: Re: Bit Swizzling
From: MitchAlsup
Newsgroups: comp.arch
Date: Sat, 5 Sep 2020 01:26 UTC
References: 1
X-Received: by 2002:a37:a143:: with SMTP id k64mr10992299qke.266.1599269187132;
Fri, 04 Sep 2020 18:26:27 -0700 (PDT)
X-Received: by 2002:a9d:7e93:: with SMTP id m19mr7162833otp.132.1599269186874;
Fri, 04 Sep 2020 18:26:26 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 4 Sep 2020 18:26:26 -0700 (PDT)
In-Reply-To: <riumcj$3j9$1@dont-email.me>
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d1ea:b65f:5e33:34b3;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d1ea:b65f:5e33:34b3
References: <riumcj$3j9$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ce570e1a-d80c-4791-92b5-a81451f2ae20o@googlegroups.com>
Subject: Re: Bit Swizzling
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 05 Sep 2020 01:26:27 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 94
View all headers
On Friday, September 4, 2020 at 7:44:20 PM UTC-5, Rick C. Hodgin wrote:
I'm not sure where to ask this question, so I pushed it out to several
groups.  You need not reply to all of them if you don't think it is a
topical subject.

I included comp.compilers in a previous message, but it apparently holds
the message until it passes moderation.  So, I've not included it here.
A duplicate message may post if/when the comp.compilers moderator John
Levine approves it.

-----
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?

For example, if I have an 8-bit byte and I want to swizzle the bits
thusly:

     Input:   07 06 05 04 03 02 01 00
    Output:   05 04 07 02 01 03 00 06

I can easily swizzle the data using a brute force method:

     v = get_value();
     o = 0;
     swizzle1(o, v, 0, 6);
     swizzle1(o, v, 1, 0);
     swizzle1(o, v, 2, 3);
     swizzle1(o, v, 3, 1);
     swizzle1(o, v, 4, 2);
     swizzle1(o, v, 5, 7);
     swizzle1(o, v, 6, 4);
     swizzle1(o, v, 7, 5);

     // Untested, off the top of my head
     void swizzle(unsigned char& o, unsigned char v, int s, int d)

The input variable s should be a for loop inside of the swizzle function.
The input variable d should be consumed as part of running the for loop.
The constant 1 I think should be 15 or 0xF

     {
         o |= (((v & (1 << s)) >> s) << d);
     }

And, of course, that algorithm can be optimized based on relative
values of s and d, and if s is 0, etc.

However, there also exists a minimal set of steps to complete that
swizzling because in the operation above, bits 5 and 4 are used in
sequence.  It could be done using a swizzle2() operation, for example
and handle 2 bits at a time.

Are there any existing algorithms which examine the operations that
must be conducted and the optimized / minimal sequence of steps to
conduct it?

Thank you in advance.

--
Rick C. Hodgin

The Hardware answer is that swizzle is simply a multiplexer with k inputs
of size j such that j×k = bits-in-register. With the number of bits in a
register a power of 2, K will also be a power of 2, and j a logarithm of
a power of 2.

For a 64-bit register and a byte swizzler, one needs eight 8-bit multiplexers
and eight 3-bit multiplex indexes (or a 8×3 = 24-bit constant)

The One I put in the Samsung GPU ISA used 32-bit registers and 4-bit swizzle
sizes; so we still had 8 multiplexers and the index count was still 3-bits.
The inputs could be from a constant field or from a register.

In both cases any given multiplexer produces a fixed number of bits at a
fixed position in the output. This is actually faster than a shift or
add (because the depth of the multiplexer is thinner at 2-gates of delay).

Swizzle can also be used to perform various SMEAR activities:

     0404040404040404 = SWIZZLE( 0706050403020100, 44444444 );

The problem in SW is that the native register sizes are "all wrong" so
on has to compute and paste.

unit64_t swizzle( uint64_t container, uint32_t indexer )
{
     uint64_t out = 0;
     for( uint64_t i = 0; i < 64; i += 8, indexer >>= 4 )
          out |= ((container >> (indexer & 15)) & 256) << i;
     return out;
}


Subject: Re: Bit Swizzling
From: MitchAlsup
Newsgroups: comp.arch
Date: Sat, 5 Sep 2020 01:27 UTC
References: 1 2
X-Received: by 2002:a37:8043:: with SMTP id b64mr10913721qkd.396.1599269245747;
Fri, 04 Sep 2020 18:27:25 -0700 (PDT)
X-Received: by 2002:a9d:6a59:: with SMTP id h25mr7640026otn.104.1599269245365;
Fri, 04 Sep 2020 18:27:25 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!peer01.ams4!peer.am4.highwinds-media.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 4 Sep 2020 18:27:25 -0700 (PDT)
In-Reply-To: <ce570e1a-d80c-4791-92b5-a81451f2ae20o@googlegroups.com>
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=2600:1700:291:29f0:d1ea:b65f:5e33:34b3;
posting-account=H_G_JQkAAADS6onOMb-dqvUozKse7mcM
NNTP-Posting-Host: 2600:1700:291:29f0:d1ea:b65f:5e33:34b3
References: <riumcj$3j9$1@dont-email.me> <ce570e1a-d80c-4791-92b5-a81451f2ae20o@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <24129195-1c62-4584-b140-b3d6ea59a981o@googlegroups.com>
Subject: Re: Bit Swizzling
From: MitchAl...@aol.com (MitchAlsup)
Injection-Date: Sat, 05 Sep 2020 01:27:25 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5143
X-Received-Body-CRC: 66396865
View all headers
On Friday, September 4, 2020 at 8:26:30 PM UTC-5, MitchAlsup wrote:
On Friday, September 4, 2020 at 7:44:20 PM UTC-5, Rick C. Hodgin wrote:
I'm not sure where to ask this question, so I pushed it out to several
groups.  You need not reply to all of them if you don't think it is a
topical subject.

I included comp.compilers in a previous message, but it apparently holds
the message until it passes moderation.  So, I've not included it here.
A duplicate message may post if/when the comp.compilers moderator John
Levine approves it.

-----
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?

For example, if I have an 8-bit byte and I want to swizzle the bits
thusly:

     Input:   07 06 05 04 03 02 01 00
    Output:   05 04 07 02 01 03 00 06

I can easily swizzle the data using a brute force method:

     v = get_value();
     o = 0;
     swizzle1(o, v, 0, 6);
     swizzle1(o, v, 1, 0);
     swizzle1(o, v, 2, 3);
     swizzle1(o, v, 3, 1);
     swizzle1(o, v, 4, 2);
     swizzle1(o, v, 5, 7);
     swizzle1(o, v, 6, 4);
     swizzle1(o, v, 7, 5);

     // Untested, off the top of my head
     void swizzle(unsigned char& o, unsigned char v, int s, int d)

The input variable s should be a for loop inside of the swizzle function.
The input variable d should be consumed as part of running the for loop.
The constant 1 I think should be 15 or 0xF

     {
         o |= (((v & (1 << s)) >> s) << d);
     }

And, of course, that algorithm can be optimized based on relative
values of s and d, and if s is 0, etc.

However, there also exists a minimal set of steps to complete that
swizzling because in the operation above, bits 5 and 4 are used in
sequence.  It could be done using a swizzle2() operation, for example
and handle 2 bits at a time.

Are there any existing algorithms which examine the operations that
must be conducted and the optimized / minimal sequence of steps to
conduct it?

Thank you in advance.

--
Rick C. Hodgin

The Hardware answer is that swizzle is simply a multiplexer with k inputs
of size j such that j×k = bits-in-register. With the number of bits in a
register a power of 2, K will also be a power of 2, and j a logarithm of
a power of 2.

For a 64-bit register and a byte swizzler, one needs eight 8-bit multiplexers
and eight 3-bit multiplex indexes (or a 8×3 = 24-bit constant)

The One I put in the Samsung GPU ISA used 32-bit registers and 4-bit swizzle
sizes; so we still had 8 multiplexers and the index count was still 3-bits.
The inputs could be from a constant field or from a register.

In both cases any given multiplexer produces a fixed number of bits at a
fixed position in the output. This is actually faster than a shift or
add (because the depth of the multiplexer is thinner at 2-gates of delay)..

Swizzle can also be used to perform various SMEAR activities:

     0404040404040404 = SWIZZLE( 0706050403020100, 44444444 );

The problem in SW is that the native register sizes are "all wrong" so
on has to compute and paste.

unit64_t swizzle( uint64_t container, uint32_t indexer )
{
     uint64_t out = 0;
     for( uint64_t i = 0; i < 64; i += 8, indexer >>= 4 )
          out |= ((container >> (indexer & 15)) & 256) << i;
     return out;
}

Make that 255 instead of 256.......


Subject: Re: Bit Swizzling
From: Joshua Landau
Newsgroups: comp.arch
Date: Sat, 5 Sep 2020 03:32 UTC
References: 1
X-Received: by 2002:ac8:691:: with SMTP id f17mr12150614qth.241.1599276763727; Fri, 04 Sep 2020 20:32:43 -0700 (PDT)
X-Received: by 2002:a9d:77d4:: with SMTP id w20mr7426958otl.182.1599276763493; Fri, 04 Sep 2020 20:32:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.etla.org!news.uzoreto.com!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Fri, 4 Sep 2020 20:32:43 -0700 (PDT)
In-Reply-To: <riumcj$3j9$1@dont-email.me>
Complaints-To: groups-abuse@google.com
Injection-Info: google-groups.googlegroups.com; posting-host=92.18.31.192; posting-account=yWPmqwoAAAAcRl38EiYubbHR9c-W21xe
NNTP-Posting-Host: 92.18.31.192
References: <riumcj$3j9$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <33f295b6-5987-4eee-b3b1-ffcc3caf585fn@googlegroups.com>
Subject: Re: Bit Swizzling
From: joshua.l...@gmail.com (Joshua Landau)
Injection-Date: Sat, 05 Sep 2020 03:32:43 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 68
View all headers
On Saturday, 5 September 2020 at 01:44:20 UTC+1, Rick C. Hodgin wrote:
I'm not sure where to ask this question, so I pushed it out to several
groups. You need not reply to all of them if you don't think it is a
topical subject.

I included comp.compilers in a previous message, but it apparently holds
the message until it passes moderation. So, I've not included it here.
A duplicate message may post if/when the comp.compilers moderator John
Levine approves it.

-----
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?

For example, if I have an 8-bit byte and I want to swizzle the bits
thusly:

Input: 07 06 05 04 03 02 01 00
Output: 05 04 07 02 01 03 00 06

I can easily swizzle the data using a brute force method:

v = get_value();
o = 0;
swizzle1(o, v, 0, 6);
swizzle1(o, v, 1, 0);
swizzle1(o, v, 2, 3);
swizzle1(o, v, 3, 1);
swizzle1(o, v, 4, 2);
swizzle1(o, v, 5, 7);
swizzle1(o, v, 6, 4);
swizzle1(o, v, 7, 5);

// Untested, off the top of my head
void swizzle(unsigned char& o, unsigned char v, int s, int d)
{
o |= (((v & (1 << s)) >> s) << d);
}

And, of course, that algorithm can be optimized based on relative
values of s and d, and if s is 0, etc.

However, there also exists a minimal set of steps to complete that
swizzling because in the operation above, bits 5 and 4 are used in
sequence. It could be done using a swizzle2() operation, for example
and handle 2 bits at a time.

Are there any existing algorithms which examine the operations that
must be conducted and the optimized / minimal sequence of steps to
conduct it?

For an 8 bit value there's a quick method.

Start with abcdefgh
PDEP to get 0000000a0000000b0000000c0000000d0000000e0000000f0000000g0000000h
Multiply by an appropriate 64 bit constant to get the values in the high bits; in your example 0x20_01_80_40_04_10_08_02
Shift down by 56.

If your value is 64 bit, you can instead do
(x >> 0) & 0x0101010101010101
(x >> 1) & 0x0101010101010101
....
(x >> 7) & 0x0101010101010101
to get your strided inputs, perform a multiplication on each to get the wanted ordering of those 8 bits,
and then PDEP each into place. The overall cost is 15 shifts, 8 ands, 8 muls, 8 PDEPs and 7 ors,
with a critical path of shift-and-mul-shift-pdep-or-or-or.

I can think of more optimal sequences for special-cases.


Subject: Re: Bit Swizzling
From: Bart
Newsgroups: comp.lang.c, comp.arch, comp.arch.fpga, comp.lang.asm.x86
Organization: virginmedia.com
Date: Sat, 5 Sep 2020 11:33 UTC
References: 1
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: bc...@nospicedham.freeuk.com (Bart)
Newsgroups: comp.lang.c,comp.arch,comp.arch.fpga,comp.lang.asm.x86
Subject: Re: Bit Swizzling
Date: Sat, 5 Sep 2020 12:33:32 +0100
Organization: virginmedia.com
Lines: 99
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <oIK4H.1039975$Ij4.921359@fx29.am4>
References: <riumcj$3j9$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="22b79ff0dc4692c6394fd5df7481c0d9";
logging-data="29215"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/TAvP07BF3WRF8Wx3TTi7MXgDhffkiPuw="
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:68.0) Gecko/20100101
Thunderbird/68.11.0
Cancel-Lock: sha1:YtrIy5r5OuBhIM4Vg+Ax+/aTQ9E=
View all headers
On 05/09/2020 01:33, Rick C. Hodgin wrote:

-----
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?

For example, if I have an 8-bit byte and I want to swizzle the bits
thusly:

     Input:   07 06 05 04 03 02 01 00
    Output:   05 04 07 02 01 03 00 06

If it's just 8-bit bytes, then it's easy: use any method to build a map that translates one 8-bit value to its swizzled version. I assume it will be used many millions of times.

I can easily swizzle the data using a brute force method:

     v = get_value();
     o = 0;
     swizzle1(o, v, 0, 6);
     swizzle1(o, v, 1, 0);
     swizzle1(o, v, 2, 3);
     swizzle1(o, v, 3, 1);
     swizzle1(o, v, 4, 2);
     swizzle1(o, v, 5, 7);
     swizzle1(o, v, 6, 4);
     swizzle1(o, v, 7, 5);

     // Untested, off the top of my head
     void swizzle(unsigned char& o, unsigned char v, int s, int d)
     {
         o |= (((v & (1 << s)) >> s) << d);
     }

It took me a while to figure out what this means, which appears to be take bit s of v, and store it as bit d of o. (But o has to start at 0 since |= means it can't change a 1 in o to 0.)

In my language it would be simply this:

    o.[d] := v.[s]

although there, it can write both 1s and 0s into o.

And, of course, that algorithm can be optimized based on relative
values of s and d, and if s is 0, etc.

However, there also exists a minimal set of steps to complete that
swizzling because in the operation above, bits 5 and 4 are used in
sequence.  It could be done using a swizzle2() operation, for example
and handle 2 bits at a time.

Are there any existing algorithms which examine the operations that
must be conducted and the optimized / minimal sequence of steps to
conduct it?

The algorithm that works on what, bits of C++ code? What is its input? What are the aims: to make it fast? Will the mapping always be known constants, or variables? Will all bits be translated or just some?

And, what would be its output?

If it's just reordering the bits in a word (where you don't map, for example, both bits 5 and 7 to bit 3), that doesn't sound too hard (although it probably is!). But importantly, this isn't done in-place.

Up to 8 or 16 bits, probably you can use tables, but the tables need to be set up.

Otherwise the simplest thing I can think of, for an N-bit word, is a table of translation pairs. For your 8-bit example this will just be the parameters to your swizzle function, but as data:

   (0,6)
   (1,0)
   (2,3)
   (3,1)
   (4,2)
   (5,7)
   (6,4)
   (7,5)

This be input to a routine that will do the job, or it can be the input to an algorithem that generates, for example, one giant logical expression.

I can probably just about do that last part, if it doesn't need to be optimised, example:

   y = ((x&1)<<6)|((x&2)>>1)|((x&4)<<1)|((x&8)>>2)|
       ((x&16)>>2)|((x&32)<<2)|((x&64)>>2)|((x&128)>>2);

where x is the input word, and y is the output word. I did this with a 12-line script based on that data input. (Not tested; may need extra parens.)

I'm sure gcc-O3 will find some optimisations in there if there are any.



Subject: Re: Bit Swizzling
From: Chris
Newsgroups: comp.arch
Organization: Aioe.org NNTP Server
Date: Sat, 5 Sep 2020 12:12 UTC
References: 1
Path: i2pn2.org!i2pn.org!aioe.org!.POSTED.y2H3LZeB0rKxhDNwUatTDQ.user.gioia.aioe.org!not-for-mail
From: xxx.syse...@gfsys.co.uk (Chris)
Newsgroups: comp.arch
Subject: Re: Bit Swizzling
Date: Sat, 05 Sep 2020 13:12:02 +0100
Organization: Aioe.org NNTP Server
Lines: 60
Message-ID: <rivvah$1neg$2@gioia.aioe.org>
References: <riumcj$3j9$1@dont-email.me>
NNTP-Posting-Host: y2H3LZeB0rKxhDNwUatTDQ.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (X11; SunOS sun4u; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2
X-Notice: Filtered by postfilter v. 0.9.2
View all headers
On 09/05/20 01:33, Rick C. Hodgin wrote:

I'm not sure where to ask this question, so I pushed it out to several
groups. You need not reply to all of them if you don't think it is a
topical subject.

I included comp.compilers in a previous message, but it apparently holds
the message until it passes moderation. So, I've not included it here.
A duplicate message may post if/when the comp.compilers moderator John
Levine approves it.

-----
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?

For example, if I have an 8-bit byte and I want to swizzle the bits
thusly:

Input: 07 06 05 04 03 02 01 00
Output: 05 04 07 02 01 03 00 06

I can easily swizzle the data using a brute force method:

v = get_value();
o = 0;
swizzle1(o, v, 0, 6);
swizzle1(o, v, 1, 0);
swizzle1(o, v, 2, 3);
swizzle1(o, v, 3, 1);
swizzle1(o, v, 4, 2);
swizzle1(o, v, 5, 7);
swizzle1(o, v, 6, 4);
swizzle1(o, v, 7, 5);

// Untested, off the top of my head
void swizzle(unsigned char& o, unsigned char v, int s, int d)
{
o |= (((v & (1 << s)) >> s) << d);
}

And, of course, that algorithm can be optimized based on relative
values of s and d, and if s is 0, etc.

However, there also exists a minimal set of steps to complete that
swizzling because in the operation above, bits 5 and 4 are used in
sequence. It could be done using a swizzle2() operation, for example
and handle 2 bits at a time.

Are there any existing algorithms which examine the operations that
must be conducted and the optimized / minimal sequence of steps to
conduct it?

Thank you in advance.


Why not just use a lookup table ?. Minimum ops and fast...

Chris



Subject: Re: Bit Swizzling
From: Rick C. Hodgin
Newsgroups: comp.arch
Organization: Liberty Software Foundation
Date: Sat, 5 Sep 2020 13:16 UTC
References: 1 2
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: rick.c.h...@gmail.com (Rick C. Hodgin)
Newsgroups: comp.arch
Subject: Re: Bit Swizzling
Date: Sat, 5 Sep 2020 09:16:12 -0400
Organization: Liberty Software Foundation
Lines: 36
Message-ID: <rj032u$mou$1@dont-email.me>
References: <riumcj$3j9$1@dont-email.me>
<33f295b6-5987-4eee-b3b1-ffcc3caf585fn@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 5 Sep 2020 13:16:14 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="9616f26d2be0b5776599f3d14f99326d";
logging-data="23326"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/b7d7LwymENfc8gYrJN2Ee77SrsW807H8="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:pXoKJ+TGydJhJcFLtW35wklWLL4=
In-Reply-To: <33f295b6-5987-4eee-b3b1-ffcc3caf585fn@googlegroups.com>
Content-Language: en-US
View all headers
On 9/4/20 11:32 PM, Joshua Landau wrote:
For an 8 bit value there's a quick method.

Start with abcdefgh
PDEP to get 0000000a0000000b0000000c0000000d0000000e0000000f0000000g0000000h
Multiply by an appropriate 64 bit constant to get the values in the high bits; in your example 0x20_01_80_40_04_10_08_02
Shift down by 56.

That's a good one when parallel ops are available.  Fast, and it doesn't
require much overhead.  I like it.

If your value is 64 bit, you can instead do
(x >> 0) & 0x0101010101010101
(x >> 1) & 0x0101010101010101
...
(x >> 7) & 0x0101010101010101
to get your strided inputs, perform a multiplication on each to get the wanted ordering of those 8 bits,
and then PDEP each into place. The overall cost is 15 shifts, 8 ands, 8 muls, 8 PDEPs and 7 ors,
with a critical path of shift-and-mul-shift-pdep-or-or-or.

I can think of more optimal sequences for special-cases.

For an optimal software algorithm where only logical bitwise operators
are available on 8-bit, 16-bit, 32-bit, or 64-bit quantities, I'm going
to have to brute-force the logic through using bit identifiers in bytes,
and then do an analysis to see if any sequences exist in the destination
which match the source, and then extract out those components and
iteratively resolve down until the operation is exhausted with other
manual steps.

I was hoping this was a solved problem that is well-known with easy-to-
read C code taught in CS 351 or something where I could garner and
adapt their well-documented code. :-)

--
Rick C. Hodgin


Subject: Re: Bit Swizzling
From: John Levine
Newsgroups: comp.arch, comp.compilers
Organization: Taughannock Networks
Date: Sat, 5 Sep 2020 18:50 UTC
References: 1 2 3
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: joh...@taugh.com (John Levine)
Newsgroups: comp.arch,comp.compilers
Subject: Re: Bit Swizzling
Date: 5 Sep 2020 18:50:16 -0000
Organization: Taughannock Networks
Lines: 16
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <20-09-016@comp.compilers>
References: <riumcj$3j9$1@dont-email.me> <rivvah$1neg$2@gioia.aioe.org> <20-09-014@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="1888"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize
Posted-Date: 05 Sep 2020 14:52:13 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Cleverness: some
View all headers
In article <rivvah$1neg$2@gioia.aioe.org>,
-----
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?

Why not just use a lookup table ?. Minimum ops and fast...

Assuming you're looking for something you can implement in logic
rather than by table lookup, it sounds like a set of Karnaugh maps.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly



Subject: Re: Bit Swizzling
From: Rick C. Hodgin
Newsgroups: comp.arch
Organization: Liberty Software Foundation
Date: Sun, 6 Sep 2020 15:01 UTC
References: 1 2 3 4
Path: i2pn2.org!i2pn.org!news.swapon.de!eternal-september.org!feeder.eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: rick.c.h...@gmail.com (Rick C. Hodgin)
Newsgroups: comp.arch
Subject: Re: Bit Swizzling
Date: Sun, 6 Sep 2020 11:01:41 -0400
Organization: Liberty Software Foundation
Lines: 22
Message-ID: <rj2tkl$4rj$1@dont-email.me>
References: <riumcj$3j9$1@dont-email.me> <rivvah$1neg$2@gioia.aioe.org>
<20-09-014@comp.compilers> <20-09-016@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 6 Sep 2020 15:01:41 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="8b34da463572b8a1b67b91ab4456e699";
logging-data="4979"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+WuXmVV5XKDngWHfZEfAdaLWU9oAoA2ew="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:u+5Ed+SgqDkBH2iBFP78flkL4yg=
In-Reply-To: <20-09-016@comp.compilers>
Content-Language: en-US
View all headers
On 9/5/20 2:50 PM, John Levine wrote:
 > Assuming you're looking for something you can implement in logic
 > rather than by table lookup...

Correct.

I'm looking for an algorithm to analyze bit swizzling requirements and
generate optimal code for them given the capabilities and constraints of
various ISAs.

 > it sounds like a set of Karnaugh maps

This is part of my CAlive project.  I'm looking for an algorithm which
already has the logic worked out, rather than (re-)inventing myself.

Note:  I originally cross-posted this to comp.compilers yesterday, but
        John Levine did not allow it to post.  I reply here only to the
        other group he cross-posted to originally, comp.arch.

--
Rick C. Hodgin



Subject: Re: Bit Swizzling
From: Chris
Newsgroups: comp.arch, comp.compilers
Organization: Aioe.org NNTP Server
Date: Sun, 6 Sep 2020 16:48 UTC
References: 1 2 3 4
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.etla.org!nntp-feed.chiark.greenend.org.uk!ewrotcd!maths.tcd.ie!newsswitch.lcs.mit.edu!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: xxx.syse...@gfsys.co.uk (Chris)
Newsgroups: comp.arch,comp.compilers
Subject: Re: Bit Swizzling
Date: Sun, 06 Sep 2020 17:48:58 +0100
Organization: Aioe.org NNTP Server
Lines: 28
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <20-09-019@comp.compilers>
References: <riumcj$3j9$1@dont-email.me> <rivvah$1neg$2@gioia.aioe.org> <20-09-014@comp.compilers> <20-09-016@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970";
logging-data="79013"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, hardware, comment
Posted-Date: 06 Sep 2020 13:20:15 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
View all headers
On 09/05/20 19:50, John Levine wrote:
In article<rivvah$1neg$2@gioia.aioe.org>,
-----
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?

Why not just use a lookup table ?. Minimum ops and fast...

Assuming you're looking for something you can implement in logic
rather than by table lookup, it sounds like a set of Karnaugh maps.

Unless this is just an intellectual exercise for the fun of it, an
engineer would choose the minimal design at lowest cost to
satisfy the requirements. Table methods don't have to be in
software as a single eprom or gate array could do it in hardware.
8 inputs to address lines, then 8 bits of output, scale as required,
so why make life more difficult than necessary ?.

Old project here, where a programmer spent nearly 5 pages of 6800
asm to translate an input connect pin layout to that required for
the internal functions. Code was impenetrable, so substituted a
256 byte lookup table. Less space that the code and easily
modified for new requirements...

Chris
[I think the question is whether there is a way to mechanically
generate a version of the opaque assembler. -John]


Subject: Re: Operand forwarding: complexity and limits
From: EricP
Newsgroups: comp.arch
Date: Tue, 8 Sep 2020 16:12 UTC
References: 1 2 3 4
Path: i2pn2.org!i2pn.org!aioe.org!peer02.ams4!peer.am4.highwinds-media.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx24.iad.POSTED!not-for-mail
From: ThatWoul...@thevillage.com (EricP)
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: Operand forwarding: complexity and limits
References: <9b9a4e1d-fa6c-4e03-bd71-54a7be1cd509n@googlegroups.com> <e2293476-8c3e-4d82-831c-d6dfcd7c058do@googlegroups.com> <_4B5H.70060$5l1.62999@fx10.iad> <rj787n$k4f$1@dont-email.me>
In-Reply-To: <rj787n$k4f$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 46
Message-ID: <35O5H.63545$Ml5.22945@fx24.iad>
X-Complaints-To: abuse@UsenetServer.com
NNTP-Posting-Date: Tue, 08 Sep 2020 16:13:51 UTC
Date: Tue, 08 Sep 2020 12:12:56 -0400
X-Received-Bytes: 2474
X-Received-Body-CRC: 3248685039
View all headers
Ivan Godard wrote:
On 9/7/2020 6:25 PM, EricP wrote:

         ----- operand_bus
         |
         | ------------
         v v          |
         mux          |
         v            |
     Res_Stn_FF       |
         v            |
   tri-state_bus  -----
         v        v   ^
        fast_path_mux |
             |        |
             v        |
         FU_operand   |
             v        |
        FU_result_FF  |
             v        |
             ---------- result_bus


How is this different from an accumulator? (said by an ignorant software type)

Ideally you want dependent operations performed on the same or separate
function units to not stall for 1 clock because of the delay
in propagating a prior result all the way through all that logic.

Not shown on the diagram are multiple result/bypass buses,
multiple function units each with multiple reservation stations.
So there is more logic in the paths than shown.

This was me looking for a way to have my cake and eat it too,
having the operand muxes and reservation station logic be present,
but not totally paying for it in critical path delay.

Ultimately it may not be possible to avoid since wire delay would dominate.
Or maybe its only possible for operations within a single FU like ALU.
So A+B+C can be back-to-back but A*B+C has to stall 1 extra clock while
the result moves from the MUL unit to ALU unit. But that could also
impact loads and stores so the cost is going to add up quick.





Subject: Re: Bit Swizzling
From: olcott
Newsgroups: comp.lang.c, comp.arch, comp.arch.fpga, comp.lang.asm.x86
Organization: A noiseless patient Spider
Date: Tue, 8 Sep 2020 16:50 UTC
References: 1 2
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: NoO...@nospicedham.NoWhere.com (olcott)
Newsgroups: comp.lang.c,comp.arch,comp.arch.fpga,comp.lang.asm.x86
Subject: Re: Bit Swizzling
Date: Tue, 8 Sep 2020 11:50:03 -0500
Organization: A noiseless patient Spider
Lines: 70
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <RJOdnXFJ37ukK8rCnZ2dnUU7-cWdnZ2d@giganews.com>
References: <riumcj$3j9$1@dont-email.me>
<lv-dndvc_vUJLcrCnZ2dnUU7-QXNnZ2d@giganews.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="977adf50e65fb70f12ad8b1d87377ca3";
logging-data="2293"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/K/lb3RNg2hdk5zVYmRIJUJyUIY53AoTo="
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101
Thunderbird/68.12.0
Cancel-Lock: sha1:3QZdNbICLD9bh6j4QMUjC8LQcak=
View all headers
On 9/8/2020 11:25 AM, olcott wrote:
On 9/4/2020 7:33 PM, Rick C. Hodgin wrote:

I'm not sure where to ask this question, so I pushed it out to several
groups.  You need not reply to all of them if you don't think it is a
topical subject.

I included comp.compilers in a previous message, but it apparently holds
the message until it passes moderation.  So, I've not included it here.
A duplicate message may post if/when the comp.compilers moderator John
Levine approves it.

-----
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?

For example, if I have an 8-bit byte and I want to swizzle the bits
thusly:

     Input:   07 06 05 04 03 02 01 00
    Output:   05 04 07 02 01 03 00 06

I can easily swizzle the data using a brute force method:

     v = get_value();
     o = 0;
     swizzle1(o, v, 0, 6);
     swizzle1(o, v, 1, 0);
     swizzle1(o, v, 2, 3);
     swizzle1(o, v, 3, 1);
     swizzle1(o, v, 4, 2);
     swizzle1(o, v, 5, 7);
     swizzle1(o, v, 6, 4);
     swizzle1(o, v, 7, 5);

     // Untested, off the top of my head
     void swizzle(unsigned char& o, unsigned char v, int s, int d)
     {
         o |= (((v & (1 << s)) >> s) << d);
     }

And, of course, that algorithm can be optimized based on relative
values of s and d, and if s is 0, etc.

However, there also exists a minimal set of steps to complete that
swizzling because in the operation above, bits 5 and 4 are used in
sequence.  It could be done using a swizzle2() operation, for example
and handle 2 bits at a time.

Are there any existing algorithms which examine the operations that
must be conducted and the optimized / minimal sequence of steps to
conduct it?

Thank you in advance.


https://www.techopedia.com/definition/22413/swizzling



glTexParameteri(Target, GL_TEXTURE_SWIZZLE_R, Format.Swizzle[0]);
glTexParameteri(Target, GL_TEXTURE_SWIZZLE_G, Format.Swizzle[1]);
glTexParameteri(Target, GL_TEXTURE_SWIZZLE_B, Format.Swizzle[2]);
glTexParameteri(Target, GL_TEXTURE_SWIZZLE_A, Format.Swizzle[3]);
https://gli.g-truc.net/0.7.0/index.html

--
Copyright 2020 Pete Olcott



1
rocksolid light 0.7.2
clearneti2ptor