Welcome to novaBBS (click a section below)

mail files register newsreader groups login

Message-ID:

Gravity brings me down.

Re: portable way to get highest bit set?

Subject	Author
portable way to get highest bit set?	candycanearter07
Re: portable way to get highest bit set?	Anton Shepelev
Re: portable way to get highest bit set?	David Brown
Re: portable way to get highest bit set?	Anton Shepelev
Re: portable way to get highest bit set?	David Brown
Re: portable way to get highest bit set?	Anton Shepelev
Re: portable way to get highest bit set?	David Brown
Re: portable way to get highest bit set?	Anton Shepelev
Re: portable way to get highest bit set?	Ben Bacarisse
Re: portable way to get highest bit set?	David Brown
Re: portable way to get highest bit set?	candycanearter07
Re: portable way to get highest bit set?	David Brown
Re: portable way to get highest bit set?	Ian C
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	candycanearter07
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	David Brown
Re: portable way to get highest bit set?	Ben Bacarisse
Re: portable way to get highest bit set?	David Brown
Re: portable way to get highest bit set?	Ben Bacarisse
Re: portable way to get highest bit set?	David Brown
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Anton Shepelev
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Anton Shepelev
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Anton Shepelev
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Scott Lurndal
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Branimir Maksimovic
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Chris M. Thomasson
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Keith Thompson
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	jak
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	jak
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	jak
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	jak
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Kaz Kylheku
Re: portable way to get highest bit set?	Bart
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Ben Bacarisse
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Anton Shepelev
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Anton Shepelev
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Anton Shepelev
Re: portable way to get highest bit set?	Bart
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Anton Shepelev
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Anton Shepelev
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	David Brown
Re: portable way to get highest bit set?	Michael S
Re: portable way to get highest bit set?	Michael S
False positives in spam filter? (was: Re: portable way to get	Ray Banana
Re: portable way to get highest bit set?	Anton Shepelev
Re: portable way to get highest bit set?	Ben Bacarisse
Re: portable way to get highest bit set?	Anton Shepelev
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Anton Shepelev
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Richard Harnden
Re: portable way to get highest bit set?	candycanearter07
Re: portable way to get highest bit set?	Bart
Re: portable way to get highest bit set?	Keith Thompson
Re: portable way to get highest bit set?	candycanearter07
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Ben Bacarisse
Re: portable way to get highest bit set?	Lew Pitcher
Re: portable way to get highest bit set?	Lew Pitcher
Re: portable way to get highest bit set?	Lew Pitcher
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Bart
Re: portable way to get highest bit set?	Tim Rentsch
Re: portable way to get highest bit set?	Kaz Kylheku
Re: portable way to get highest bit set?	jak
Re: portable way to get highest bit set?	Blue-Maned_Hawk
Re: portable way to get highest bit set?	Kaz Kylheku

Pages:1 2 3 4 5 678

Re: portable way to get highest bit set?

<20231018174110.00003717@yahoo.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29977&group=comp.lang.c#29977

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5...@yahoo.com (Michael S)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Wed, 18 Oct 2023 17:41:10 +0300
Organization: A noiseless patient Spider
Lines: 364
Message-ID: <20231018174110.00003717@yahoo.com>
References: <ug5gvh$1jkar$3@dont-email.me>
<20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com>
<ug6huc$1rvp1$1@dont-email.me>
<86h6mxawqq.fsf@linuxsc.com>
<20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com>
<86cyxkb2ka.fsf@linuxsc.com>
<20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com>
<864jiwaqic.fsf@linuxsc.com>
<20231012173021.0000149c@yahoo.com>
<86v8bbanjv.fsf@linuxsc.com>
<20231012191641.00007493@yahoo.com>
<865y346qvs.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="95acdbf19cb030e28ad29123ca01f64e";
logging-data="3760360"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18rZtZsfwiSzN0J4vm9zW6RBcFF2lclqHs="
Cancel-Lock: sha1:QaRdvj7MdsPEJsx7gVIt2LXP0/4=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)

by: Michael S - Wed, 18 Oct 2023 14:41 UTC

On Wed, 18 Oct 2023 05:45:27 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

> Michael S <already5chosen@yahoo.com> writes:
>
> > On Thu, 12 Oct 2023 08:05:40 -0700
> > Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> >
> >> Michael S <already5chosen@yahoo.com> writes:
> >>
> >>> On Thu, 12 Oct 2023 07:01:47 -0700
> >>> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> >>>
> >>>> [non-working example]
> >>>
> >>> That code does not look like working.
> >>
> >> Yes, deliberately so. I didn't want to give working
> >> code yet, so other people could work on the problem.
> >>
> >>> That is:
> >>>
> >>> T
> >>> highest_bit_set1( T u ){
> >>> T r = 0;
> >>> while (u > r) r = r*2+1;
> >>> return r ^ (r>>1);
> >>> }
> >>
> >> As expected, this works. I like the way it naturally
> >> accommodates a zero input.
> >>
> >> Can you find a different solution that works in
> >> logarithmic time rather than linear time?
> >
> > Naturally, I can. But the code is not as easy to comprehend as I
> > would like. And I am not sure that for typical distributions with
> > prevalence of small numbers it is faster than above variant.
>
> I agree that whether a particular method is acceptably fast
> depends on the expected distribution of inputs, and also on
> the sizes of the types involved. For 8 or 16 bit inputs,
> probably even very simple schemes are just fine. But since
> what was asked for is not to depend on the sizes, and nothing
> was said about what the expected inputs would be, it's natural
> to look for logarithmic solution, and try to make it as easy
> to understand as we can.
>
> > T
> > highest_bit_set2( T u ){
> > T v;
> > for (int rshift = 0; (v = (u >> rshift) >> 1) != 0; rshift =
> > rshift*2+1) u |= v;
> > return u ^ (u>>1);
> > }
>
> I want to look at a lot of different approaches, but let me say
> at the start that this method is very much like one I had coded
> earlier, and the one you give here is pretty good. We'll get
> back to this case later but let's start at the simple answers and
> move on to the more elaborate ones.
>
> Generally speaking there are two ways of getting a result,
> namely, either look for the value or construct the value. Linear
> methods tend to look for the result, either starting at the top
> and scanning down, or starting at the bottom and scanning up
> (which is what your first answer did). The poster jak gave a
> nice function that is linear but of the constructive variety.
> The next code is jak's except the layout is slightly different
> and it uses the type T (stipulated to be unsigned) rather than a
> putting in a particular definite type.
>
> T
> hbs_jak( T val ){
> int i;
> for( i = 0; val > 1; val /= 2, i++ );
> return val << i;
> }
>
> This function nicely and naturally handles an input value of
> zero, which is something to look for (that is, how a zero input
> is handled) when considering high-bit-determining functions (for
> the obvious reason that zero is kind of an oddball because it has
> no bits set).
>
> As mentioned before your first answer is linear, and uses an
> upward scan, and deals naturally with a zero input.
>
> Thinking about the choice of scan direction, it occurred to me
> that rather than start at one end or the other, we could start in
> the middle, and do either an upward scan or a downward scan
> depending on an initial test. I wrote this:
>
> T
> hbs_split( T u ){
> unsigned n = MASK_WIDTH( 0?u:-1 )/2;
> T v = u>>1;
> T bit = (0?bit:1) << n;
> if( bit > u ) do bit >>= 1; while( bit > u );
> else while( bit <= v ) bit <<= 1;
> return bit;
> }
>
> If the inputs are evenly distributed this function should on average
> be about twice as fast as a single-direction scan.
>
> In between linear and logarithmic, there is a well-known sublinear
> technique that I was half expecting someone would post.

I did think about almost identical method.
Mine was u |= u+1 instead of u &= u-1.
But since in the worst case this method is not faster than more obvious
linear search, I didn't post it here.

> Besides
> being a little faster than a straight linear scan, it has the nice
> property that the code is very simple:
>
> T
> hbs_zap_low_bit( T u ){
> while( u & u-1 ) u &= u-1;
> return u;
> }
>
> The loop takes time proportional to the number of one bits (so
> strictly speaking it is not sublinear, but still faster than
> other linear methods). This function runs as fast or faster than
> all the other non-logarithmic methods I tried. (I did simple
> performance measurements with an input set of all values between
> 0 and 300,000,000, for each of three type sizes, those being 32,
> 64, and 128 bits.)
>
> On the subject of performance, the various functions being posted
> are given roughly in performance order, slowest first and getting
> faster as we go on.
>
> Now we have arrived at the logarithmic methods. Anton posted a
> method using binary search (but using a specific type), which I
> revised to be type-agnostic:
>
> T
> hbs_binary_search( T n ){
> T const ones = -1;
> unsigned least = 0, limit = MASK_WIDTH( ones );
>
> do {
> unsigned k = least+limit >> 1;
> if( n & ones<<k ) least = k;
> else limit = k;
> } while( least+1 != limit );
>
> return n & ones<<least;
> }
>
> After doing this version I had the idea of doing a more careful
> binary search, which would detect the case of having found an
> result and exit the loop immediately. That idea yielded this
> function:
>
> T
> hbs_binary_search_with_early_exit( T x ){
> unsigned t = MASK_WIDTH( (T)-1 ) +1, s = t/2, u = t-s-1;
> T y = (0?y:1) << u-1, z = x >> 1;;
>
> do {} while(
> t < 2 ? 0 :
> y <= z ? t = s, u = t/2, s = t-u-1, y <<= u+1 :
> y > x ? t = u, s = t/2, u = t-s-1, y >>= s+1 :
> /********/ 0
> );
>
> return y;
> }
>
> To see how this works, at each step there are 't' bits total, with
> 's' bits to the left of the bit under test, and 'u' bits to the
> right, so t = s+u+1. The variable 'x' holds the input value, 'z'
> holds the input value shifted right one, and 'y' holds the one-bit
> value being checked. Thus if x >= y and y > z, y is the answer.
> Writing the function was an exercise in careful bookkeeping, and
> not easy to write - it took me about half a dozen tries before I
> got it right. And, sadly, not really any better than the simple
> binary search. But it was fun to try.
>
> Next we get to your first logarithmic version (namely, the one given
> in the post being responded to here), which gives appreciably better
> speed than the binary searches. This function uses the "expanding
> bit-smearing" technique, and is similar to a method I had written,
> which went through several revisions, as follows:
>
> T
> hbs_expanding_smearing_1( T u ){
> unsigned shift = 1;
> while( (u & u>>1) != u>>1 ) u |= u>>shift, shift += shift;
> return u ^ u>>1;
> }
>
> T
> hbs_expanding_smearing_2( T u ){
> unsigned shift = 1;
> do u |= u>>shift, shift <<= 1; while( (u & u>>1) != u>>1
> ); return u ^ u>>1;
> }
>
> T
> hbs_expanding_smearing_3( T u ){
> T r;
>
> for(
> unsigned s = 1;
> u |= u>>s, r = u ^ u>>1, r & r-1;
> s <<= 1
> ){}
>
> return r;
> }
>
> One can see the progression of putting the smearing earlier in the
> sequence of operations. The motivation to change from a 'do' to a
> 'for' is so changing the shift amount could be done only after we
> know we are getting to the next (half) loop iteration. All of these
> ran a bit faster than your first logarithmic versioon. (I should
> say again that my performance measurements were done more casually
> than carefully.)
>
> Then you posted a revised version of your first logarithmic effort:
>
> T
> hbs_michael_s_2_revised( T u ){
> for (int rshift = 1; ((u+1) & u) != 0; rshift += rshift)
> u |= (u >> rshift);
> return u ^ (u>>1);
> }
>
> Nice! This revision runs about the same speed as the fastest of the
> three functions above (the third one). I think a lot of that
> improvement is a consequence of the simple exit test in the for()
> loop. Seeing that inspired me to re-do my earlier version, giving
> this:
>
> T
> hbs_expanding_smearing_with_michael_s_exit_test( T u ){
> for(
> unsigned s = 1;
> u |= u>>s, u+1 & u;
> s <<= 1
> ){}
>
> return u ^ u>>1;
> }
>
> This revision runs about 15% faster than the _3 version above. I
> attribute all of the gain to using your simplified loop exit test.
>
> Out of intellectual curiosity, I tried some variations of some of
> the earlier versions. First it occurred to me that the jak version
> could be sped up by shifting larger amounts at first
>
> T
> hbs_jak_fast( T v ){
> unsigned k = 0;
>
> while( v > 0xFF ) v >>= 8, k += 8;
> if( v > 0xF ) v >>= 4, k += 4;
> if( v > 0x3 ) v >>= 2, k += 2;
> if( v > 0x1 ) v >>= 1, k += 1;
>
> return v << k;
> }
>
> This change gives a substantial improvement. Of course that is at
> least partly the result of cherry picking and tuning to the input
> set. Still it was interesting to see how much faster this version
> ran.
>
> Next, I thought that if splitting once is good then surely
> splitting twice is better. Eventually that idea let to this
> revision
>
> T
> hbs_split_more( T u ){
> unsigned n = MASK_WIDTH( 0?u:-1 )/2;
> T v = u>>1;
> T bit = (0?bit:1) << n;
>
> n /= 2, bit = bit > u ? bit >> n : bit << n;
> n /= 2, bit = bit > u ? bit >> n : bit << n;
> if( n > 10 ){
> n /= 2, bit = bit > u ? bit >> n : bit << n;
> n /= 2, bit = bit > u ? bit >> n : bit << n;
> }
>
> if( bit > u ) do bit >>= 1; while( bit > u );
> else while( bit <= v ) bit <<= 1;
>
> return bit;
> }
>
> Rather ugly and ad hoc. However it runs faster than all of the
> previous versions in the particular case of T being a 128 bit
> type and using clang to compile. (There were sometimes large
> differences between using gcc and using clang, which I did not
> investigate systematically.)
>
> There were a couple of type-specific functions posted, but I don't
> include them here because they are solving a different problem.
>
> Not quite finished yet. It occurred to me to take the speedup ideas
> used in the last two investigations and try them in the context of
> the best expanding-smearing scheme. That idea gave this function:
>
> T
> hbs_expanding_smearing_special( T u ){
> u |= u>>1;
> u |= u>>2;
>
> for(
> unsigned s = 4;
> u |= u>>s, u+1 & u;
> s <<= 1
> ){}
>
> return u ^ u>>1;
> }
>
> This revision gives about 20% improvement over the previous best.
> Here again I am taking advantage of your simple loop exit test.
>
> Last but not least, I explored using the approach of converting to
> floating-point, to see how it stacks up against the other schemes
> (with the admission that it is more limited than the type-agnostic
> methods). Here is the terminus of that exploration:
>
> T
> hbs_using_IEEE_floating_point( T u ){
> union { float f; struct { unsigned m:23, e:8, sign: 1; } b; }
> x; return u ? x.f = u & ~(u>>1), (T)1 << x.b.e - 127 : 0;
> }
>

Click here to read the complete article

Re: portable way to get highest bit set?

<ugotjo$1vmr0$2@i2pn2.org>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29978&group=comp.lang.c#29978

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!.POSTED!not-for-mail
From: rich...@damon-family.org (Richard Damon)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Wed, 18 Oct 2023 11:28:56 -0400
Organization: i2pn2 (i2pn.org)
Message-ID: <ugotjo$1vmr0$2@i2pn2.org>
References: <ug5gvh$1jkar$3@dont-email.me>
<pan$e0805$faa9a91a$a60476dc$4a90cfa8@invalid.invalid>
<87h6mwu9rk.fsf@nosuchdomain.example.com>
<pan$325f3$70a576a8$3150c04a$26db7b31@invalid.invalid>
<ugagv6$2vs64$1@dont-email.me>
<pan$21acb$49c8a2a6$a1bbf038$1e40c8f0@invalid.invalid>
<ugck5i$3faav$1@dont-email.me>
<pan$19122$eaff5f26$f64222cd$d70e555c@invalid.invalid>
<ugd37f$3ln97$2@dont-email.me>
<pan$c64d0$8c0ed5b8$62596621$789ec9e7@invalid.invalid>
<86zg0k8u3l.fsf@linuxsc.com>
<pan$b29a$e4923427$117477d2$1f9519d0@invalid.invalid>
<ugic2h$vne1$2@dont-email.me>
<pan$65378$afa1dfdf$4758d80c$ecf133ab@invalid.invalid>
<ugjmsh$1herf$1@dont-email.me>
<pan$c4193$19d997f9$f1b9f7e2$7c08b1c1@invalid.invalid>
<ugk7mh$2eq4g$1@dont-email.me>
<pan$2b50f$3034e623$4c245e91$a3f4773@invalid.invalid>
<ugoddv$3jojl$1@dont-email.me> <ugolge$3loqk$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 18 Oct 2023 15:28:56 -0000 (UTC)
Injection-Info: i2pn2.org;
logging-data="2087776"; mail-complaints-to="usenet@i2pn2.org";
posting-account="diqKR1lalukngNWEqoq9/uFtbkm5U+w3w6FQ0yesrXg";
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <ugolge$3loqk$1@dont-email.me>

by: Richard Damon - Wed, 18 Oct 2023 15:28 UTC

On 10/18/23 9:10 AM, Bart wrote:
>
> So what does a fork of C look like?
>

C++ or C# maybe?

Re: portable way to get highest bit set?

<20231018184537.000002fb@yahoo.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29979&group=comp.lang.c#29979

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5...@yahoo.com (Michael S)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Wed, 18 Oct 2023 18:45:37 +0300
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <20231018184537.000002fb@yahoo.com>
References: <ug5gvh$1jkar$3@dont-email.me>
<pan$e0805$faa9a91a$a60476dc$4a90cfa8@invalid.invalid>
<87h6mwu9rk.fsf@nosuchdomain.example.com>
<pan$325f3$70a576a8$3150c04a$26db7b31@invalid.invalid>
<ugagv6$2vs64$1@dont-email.me>
<pan$21acb$49c8a2a6$a1bbf038$1e40c8f0@invalid.invalid>
<ugck5i$3faav$1@dont-email.me>
<pan$19122$eaff5f26$f64222cd$d70e555c@invalid.invalid>
<ugd37f$3ln97$2@dont-email.me>
<pan$c64d0$8c0ed5b8$62596621$789ec9e7@invalid.invalid>
<86zg0k8u3l.fsf@linuxsc.com>
<pan$b29a$e4923427$117477d2$1f9519d0@invalid.invalid>
<ugic2h$vne1$2@dont-email.me>
<pan$65378$afa1dfdf$4758d80c$ecf133ab@invalid.invalid>
<ugjmsh$1herf$1@dont-email.me>
<pan$c4193$19d997f9$f1b9f7e2$7c08b1c1@invalid.invalid>
<ugk7mh$2eq4g$1@dont-email.me>
<pan$2b50f$3034e623$4c245e91$a3f4773@invalid.invalid>
<ugoddv$3jojl$1@dont-email.me>
<ugolge$3loqk$1@dont-email.me>
<ugotjo$1vmr0$2@i2pn2.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="95acdbf19cb030e28ad29123ca01f64e";
logging-data="3760360"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18e64oD+2RK338nr823jXBE/yN6eqgy744="
Cancel-Lock: sha1:0xDwoJt2399pTBZn6iaEtM1h7hE=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)

by: Michael S - Wed, 18 Oct 2023 15:45 UTC

On Wed, 18 Oct 2023 11:28:56 -0400
Richard Damon <richard@damon-family.org> wrote:

> On 10/18/23 9:10 AM, Bart wrote:
> >
> > So what does a fork of C look like?
> >
>
> C++ or C# maybe?

C++ - may be
C# is a fork of Java

Re: portable way to get highest bit set?

<861qdr7rbz.fsf@linuxsc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29980&group=comp.lang.c#29980

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Wed, 18 Oct 2023 10:50:24 -0700
Organization: A noiseless patient Spider
Lines: 18
Message-ID: <861qdr7rbz.fsf@linuxsc.com>
References: <ug5gvh$1jkar$3@dont-email.me> <20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com> <ug6huc$1rvp1$1@dont-email.me> <86h6mxawqq.fsf@linuxsc.com> <20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com> <86cyxkb2ka.fsf@linuxsc.com> <20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com> <864jiwaqic.fsf@linuxsc.com> <20231012173021.0000149c@yahoo.com> <86v8bbanjv.fsf@linuxsc.com> <20231013021504.12623444fc7d7fdaab87f1e0@gmail.moc> <86v8b67vbp.fsf@linuxsc.com> <20231016135642.0000274b@yahoo.com> <20231017173943.c944730832b28cc34125cb2e@gmail.moc>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="1a94926632e87d909ba2d595d724d7ca";
logging-data="3976812"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+AlIErbLD+PtB9y1OWTHQZMNqEDJMUeek="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:a9SRSmQAzSxZP7/0ANGhywJZ5gM=
sha1:RsgH4YoKrAxQ655fGx1PU52dBB4=

by: Tim Rentsch - Wed, 18 Oct 2023 17:50 UTC

Anton Shepelev <anton.txt@gmail.moc> writes:

> Michael S to Tim Rentsch:
>
>> First, you postulated that zero-run-time MASK_WIDTH()
>> exist without actually writing such function. That sounds
>> like cheating.
>
> Or an exercise for the reader.
>
>> Second, your loop is 2-3 times slower then loops based
>> Kaz's approach.
>
> I believe Tim wanted to improve my approach without
> completely redesigning it.

Right. I was trying to follow your outline, changing
only minor or incidental details.

Re: portable way to get highest bit set?

<ugp766$3ppas$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29982&group=comp.lang.c#29982

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!rocksolid2!news.neodome.net!usenet.network!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nos...@please.ty (jak)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Wed, 18 Oct 2023 20:12:21 +0200
Organization: A noiseless patient Spider
Lines: 361
Message-ID: <ugp766$3ppas$1@dont-email.me>
References: <ug5gvh$1jkar$3@dont-email.me>
<20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com>
<ug6huc$1rvp1$1@dont-email.me> <86h6mxawqq.fsf@linuxsc.com>
<20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com>
<86cyxkb2ka.fsf@linuxsc.com>
<20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com>
<864jiwaqic.fsf@linuxsc.com> <20231012173021.0000149c@yahoo.com>
<86v8bbanjv.fsf@linuxsc.com> <20231012191641.00007493@yahoo.com>
<865y346qvs.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 18 Oct 2023 18:12:22 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ee167f63fb9ab1d9c931b7e7a6b1af82";
logging-data="3990876"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19YXbycxhnn3irw0ERDeIvu"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17.1
Cancel-Lock: sha1:TlPT5Ht0q6GwVYISeNAYPUWQAQ8=
In-Reply-To: <865y346qvs.fsf@linuxsc.com>

by: jak - Wed, 18 Oct 2023 18:12 UTC

Tim Rentsch ha scritto:
> Michael S <already5chosen@yahoo.com> writes:
>
>> On Thu, 12 Oct 2023 08:05:40 -0700
>> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>>
>>> Michael S <already5chosen@yahoo.com> writes:
>>>
>>>> On Thu, 12 Oct 2023 07:01:47 -0700
>>>> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>>>>
>>>>> [non-working example]
>>>>
>>>> That code does not look like working.
>>>
>>> Yes, deliberately so. I didn't want to give working
>>> code yet, so other people could work on the problem.
>>>
>>>> That is:
>>>>
>>>> T
>>>> highest_bit_set1( T u ){
>>>> T r = 0;
>>>> while (u > r) r = r*2+1;
>>>> return r ^ (r>>1);
>>>> }
>>>
>>> As expected, this works. I like the way it naturally
>>> accommodates a zero input.
>>>
>>> Can you find a different solution that works in
>>> logarithmic time rather than linear time?
>>
>> Naturally, I can. But the code is not as easy to comprehend as I
>> would like. And I am not sure that for typical distributions with
>> prevalence of small numbers it is faster than above variant.
>
> I agree that whether a particular method is acceptably fast
> depends on the expected distribution of inputs, and also on
> the sizes of the types involved. For 8 or 16 bit inputs,
> probably even very simple schemes are just fine. But since
> what was asked for is not to depend on the sizes, and nothing
> was said about what the expected inputs would be, it's natural
> to look for logarithmic solution, and try to make it as easy
> to understand as we can.
>
>> T
>> highest_bit_set2( T u ){
>> T v;
>> for (int rshift = 0; (v = (u >> rshift) >> 1) != 0; rshift = rshift*2+1)
>> u |= v;
>> return u ^ (u>>1);
>> }
>
> I want to look at a lot of different approaches, but let me say
> at the start that this method is very much like one I had coded
> earlier, and the one you give here is pretty good. We'll get
> back to this case later but let's start at the simple answers and
> move on to the more elaborate ones.
>
> Generally speaking there are two ways of getting a result,
> namely, either look for the value or construct the value. Linear
> methods tend to look for the result, either starting at the top
> and scanning down, or starting at the bottom and scanning up
> (which is what your first answer did). The poster jak gave a
> nice function that is linear but of the constructive variety.
> The next code is jak's except the layout is slightly different
> and it uses the type T (stipulated to be unsigned) rather than a
> putting in a particular definite type.
>
> T
> hbs_jak( T val ){
> int i;
> for( i = 0; val > 1; val /= 2, i++ );
> return val << i;
> }
>
> This function nicely and naturally handles an input value of
> zero, which is something to look for (that is, how a zero input
> is handled) when considering high-bit-determining functions (for
> the obvious reason that zero is kind of an oddball because it has
> no bits set).
>
> As mentioned before your first answer is linear, and uses an
> upward scan, and deals naturally with a zero input.
>
> Thinking about the choice of scan direction, it occurred to me
> that rather than start at one end or the other, we could start in
> the middle, and do either an upward scan or a downward scan
> depending on an initial test. I wrote this:
>
> T
> hbs_split( T u ){
> unsigned n = MASK_WIDTH( 0?u:-1 )/2;
> T v = u>>1;
> T bit = (0?bit:1) << n;
> if( bit > u ) do bit >>= 1; while( bit > u );
> else while( bit <= v ) bit <<= 1;
> return bit;
> }
>
> If the inputs are evenly distributed this function should on average
> be about twice as fast as a single-direction scan.
>
> In between linear and logarithmic, there is a well-known sublinear
> technique that I was half expecting someone would post. Besides
> being a little faster than a straight linear scan, it has the nice
> property that the code is very simple:
>
> T
> hbs_zap_low_bit( T u ){
> while( u & u-1 ) u &= u-1;
> return u;
> }
>
> The loop takes time proportional to the number of one bits (so
> strictly speaking it is not sublinear, but still faster than
> other linear methods). This function runs as fast or faster than
> all the other non-logarithmic methods I tried. (I did simple
> performance measurements with an input set of all values between
> 0 and 300,000,000, for each of three type sizes, those being 32,
> 64, and 128 bits.)
>
> On the subject of performance, the various functions being posted
> are given roughly in performance order, slowest first and getting
> faster as we go on.
>
> Now we have arrived at the logarithmic methods. Anton posted a
> method using binary search (but using a specific type), which I
> revised to be type-agnostic:
>
> T
> hbs_binary_search( T n ){
> T const ones = -1;
> unsigned least = 0, limit = MASK_WIDTH( ones );
>
> do {
> unsigned k = least+limit >> 1;
> if( n & ones<<k ) least = k;
> else limit = k;
> } while( least+1 != limit );
>
> return n & ones<<least;
> }
>
> After doing this version I had the idea of doing a more careful
> binary search, which would detect the case of having found an
> result and exit the loop immediately. That idea yielded this
> function:
>
> T
> hbs_binary_search_with_early_exit( T x ){
> unsigned t = MASK_WIDTH( (T)-1 ) +1, s = t/2, u = t-s-1;
> T y = (0?y:1) << u-1, z = x >> 1;;
>
> do {} while(
> t < 2 ? 0 :
> y <= z ? t = s, u = t/2, s = t-u-1, y <<= u+1 :
> y > x ? t = u, s = t/2, u = t-s-1, y >>= s+1 :
> /********/ 0
> );
>
> return y;
> }
>
> To see how this works, at each step there are 't' bits total, with
> 's' bits to the left of the bit under test, and 'u' bits to the
> right, so t = s+u+1. The variable 'x' holds the input value, 'z'
> holds the input value shifted right one, and 'y' holds the one-bit
> value being checked. Thus if x >= y and y > z, y is the answer.
> Writing the function was an exercise in careful bookkeeping, and
> not easy to write - it took me about half a dozen tries before I
> got it right. And, sadly, not really any better than the simple
> binary search. But it was fun to try.
>
> Next we get to your first logarithmic version (namely, the one given
> in the post being responded to here), which gives appreciably better
> speed than the binary searches. This function uses the "expanding
> bit-smearing" technique, and is similar to a method I had written,
> which went through several revisions, as follows:
>
> T
> hbs_expanding_smearing_1( T u ){
> unsigned shift = 1;
> while( (u & u>>1) != u>>1 ) u |= u>>shift, shift += shift;
> return u ^ u>>1;
> }
>
> T
> hbs_expanding_smearing_2( T u ){
> unsigned shift = 1;
> do u |= u>>shift, shift <<= 1; while( (u & u>>1) != u>>1 );
> return u ^ u>>1;
> }
>
> T
> hbs_expanding_smearing_3( T u ){
> T r;
>
> for(
> unsigned s = 1;
> u |= u>>s, r = u ^ u>>1, r & r-1;
> s <<= 1
> ){}
>
> return r;
> }
>
> One can see the progression of putting the smearing earlier in the
> sequence of operations. The motivation to change from a 'do' to a
> 'for' is so changing the shift amount could be done only after we
> know we are getting to the next (half) loop iteration. All of these
> ran a bit faster than your first logarithmic versioon. (I should
> say again that my performance measurements were done more casually
> than carefully.)
>
> Then you posted a revised version of your first logarithmic effort:
>
> T
> hbs_michael_s_2_revised( T u ){
> for (int rshift = 1; ((u+1) & u) != 0; rshift += rshift)
> u |= (u >> rshift);
> return u ^ (u>>1);
> }
>
> Nice! This revision runs about the same speed as the fastest of the
> three functions above (the third one). I think a lot of that
> improvement is a consequence of the simple exit test in the for()
> loop. Seeing that inspired me to re-do my earlier version, giving
> this:
>
> T
> hbs_expanding_smearing_with_michael_s_exit_test( T u ){
> for(
> unsigned s = 1;
> u |= u>>s, u+1 & u;
> s <<= 1
> ){}
>
> return u ^ u>>1;
> }
>
> This revision runs about 15% faster than the _3 version above. I
> attribute all of the gain to using your simplified loop exit test.
>
> Out of intellectual curiosity, I tried some variations of some of
> the earlier versions. First it occurred to me that the jak version
> could be sped up by shifting larger amounts at first
>
> T
> hbs_jak_fast( T v ){
> unsigned k = 0;
>
> while( v > 0xFF ) v >>= 8, k += 8;
> if( v > 0xF ) v >>= 4, k += 4;
> if( v > 0x3 ) v >>= 2, k += 2;
> if( v > 0x1 ) v >>= 1, k += 1;
>
> return v << k;
> }
>
> This change gives a substantial improvement. Of course that is at
> least partly the result of cherry picking and tuning to the input
> set. Still it was interesting to see how much faster this version
> ran.
>
> Next, I thought that if splitting once is good then surely
> splitting twice is better. Eventually that idea let to this
> revision
>
> T
> hbs_split_more( T u ){
> unsigned n = MASK_WIDTH( 0?u:-1 )/2;
> T v = u>>1;
> T bit = (0?bit:1) << n;
>
> n /= 2, bit = bit > u ? bit >> n : bit << n;
> n /= 2, bit = bit > u ? bit >> n : bit << n;
> if( n > 10 ){
> n /= 2, bit = bit > u ? bit >> n : bit << n;
> n /= 2, bit = bit > u ? bit >> n : bit << n;
> }
>
> if( bit > u ) do bit >>= 1; while( bit > u );
> else while( bit <= v ) bit <<= 1;
>
> return bit;
> }
>
> Rather ugly and ad hoc. However it runs faster than all of the
> previous versions in the particular case of T being a 128 bit
> type and using clang to compile. (There were sometimes large
> differences between using gcc and using clang, which I did not
> investigate systematically.)
>
> There were a couple of type-specific functions posted, but I don't
> include them here because they are solving a different problem.
>
> Not quite finished yet. It occurred to me to take the speedup ideas
> used in the last two investigations and try them in the context of
> the best expanding-smearing scheme. That idea gave this function:
>
> T
> hbs_expanding_smearing_special( T u ){
> u |= u>>1;
> u |= u>>2;
>
> for(
> unsigned s = 4;
> u |= u>>s, u+1 & u;
> s <<= 1
> ){}
>
> return u ^ u>>1;
> }
>
> This revision gives about 20% improvement over the previous best.
> Here again I am taking advantage of your simple loop exit test.
>
> Last but not least, I explored using the approach of converting to
> floating-point, to see how it stacks up against the other schemes
> (with the admission that it is more limited than the type-agnostic
> methods). Here is the terminus of that exploration:
>
> T
> hbs_using_IEEE_floating_point( T u ){
> union { float f; struct { unsigned m:23, e:8, sign: 1; } b; } x;
> return u ? x.f = u & ~(u>>1), (T)1 << x.b.e - 127 : 0;
> }
>
> This short function gives exact results for T up to 128 bits wide.
> It runs about the same speed as the best conventional method,
> except that under gcc it runs about 40% faster on 128 bit inputs.
>
> An interesting journey. A hat tip to you for your two versions,
> including especially the fast simple loop exit test.
>

Click here to read the complete article

Re: portable way to get highest bit set?

<ugp83h$3pur8$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29983&group=comp.lang.c#29983

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Wed, 18 Oct 2023 11:28:00 -0700
Organization: A noiseless patient Spider
Lines: 9
Message-ID: <ugp83h$3pur8$1@dont-email.me>
References: <ug5gvh$1jkar$3@dont-email.me>
<pan$e0805$faa9a91a$a60476dc$4a90cfa8@invalid.invalid>
<87h6mwu9rk.fsf@nosuchdomain.example.com>
<pan$325f3$70a576a8$3150c04a$26db7b31@invalid.invalid>
<ugagv6$2vs64$1@dont-email.me>
<pan$21acb$49c8a2a6$a1bbf038$1e40c8f0@invalid.invalid>
<ugck5i$3faav$1@dont-email.me>
<pan$19122$eaff5f26$f64222cd$d70e555c@invalid.invalid>
<ugd37f$3ln97$2@dont-email.me>
<pan$c64d0$8c0ed5b8$62596621$789ec9e7@invalid.invalid>
<86zg0k8u3l.fsf@linuxsc.com>
<pan$b29a$e4923427$117477d2$1f9519d0@invalid.invalid>
<ugic2h$vne1$2@dont-email.me>
<pan$65378$afa1dfdf$4758d80c$ecf133ab@invalid.invalid>
<ugjmsh$1herf$1@dont-email.me>
<pan$c4193$19d997f9$f1b9f7e2$7c08b1c1@invalid.invalid>
<ugk7mh$2eq4g$1@dont-email.me>
<pan$2b50f$3034e623$4c245e91$a3f4773@invalid.invalid>
<ugoddv$3jojl$1@dont-email.me> <ugolge$3loqk$1@dont-email.me>
<ugotjo$1vmr0$2@i2pn2.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 18 Oct 2023 18:28:01 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="8b8431092374d68dacb83a2b382dcc61";
logging-data="3996520"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18aoT2g5Hu5V+jw7fPq2Z+Rr41fAykldkI="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:+Ma1b+QN15MpV0wuy6YHZDo/Gg4=
Content-Language: en-US
In-Reply-To: <ugotjo$1vmr0$2@i2pn2.org>

by: Chris M. Thomasson - Wed, 18 Oct 2023 18:28 UTC

On 10/18/2023 8:28 AM, Richard Damon wrote:
> On 10/18/23 9:10 AM, Bart wrote:
>>
>> So what does a fork of C look like?
>>
>
> C++ or C# maybe?

LOL! :^)

Re: portable way to get highest bit set?

<01WXM.8912$igw9.4124@fx37.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29984&group=comp.lang.c#29984

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx37.iad.POSTED!not-for-mail
Newsgroups: comp.lang.c
From: branimir...@icloud.com (Branimir Maksimovic)
Subject: Re: portable way to get highest bit set?
References: <ug5gvh$1jkar$3@dont-email.me>
<pan$e0805$faa9a91a$a60476dc$4a90cfa8@invalid.invalid>
<87h6mwu9rk.fsf@nosuchdomain.example.com>
<pan$325f3$70a576a8$3150c04a$26db7b31@invalid.invalid>
<ugagv6$2vs64$1@dont-email.me>
<pan$21acb$49c8a2a6$a1bbf038$1e40c8f0@invalid.invalid>
<ugck5i$3faav$1@dont-email.me>
<pan$19122$eaff5f26$f64222cd$d70e555c@invalid.invalid>
<ugd37f$3ln97$2@dont-email.me>
<pan$c64d0$8c0ed5b8$62596621$789ec9e7@invalid.invalid>
<86zg0k8u3l.fsf@linuxsc.com>
<pan$b29a$e4923427$117477d2$1f9519d0@invalid.invalid>
<ugic2h$vne1$2@dont-email.me>
<pan$65378$afa1dfdf$4758d80c$ecf133ab@invalid.invalid>
<ugjmsh$1herf$1@dont-email.me>
<pan$c4193$19d997f9$f1b9f7e2$7c08b1c1@invalid.invalid>
<ugk7mh$2eq4g$1@dont-email.me>
<pan$2b50f$3034e623$4c245e91$a3f4773@invalid.invalid>
<ugoddv$3jojl$1@dont-email.me> <ugolge$3loqk$1@dont-email.me>
<ugotjo$1vmr0$2@i2pn2.org> <ugp83h$3pur8$1@dont-email.me>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 16
Message-ID: <01WXM.8912$igw9.4124@fx37.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Wed, 18 Oct 2023 19:06:36 UTC
Organization: usenet-news.net
Date: Wed, 18 Oct 2023 19:06:36 GMT
X-Received-Bytes: 1811

by: Branimir Maksimovic - Wed, 18 Oct 2023 19:06 UTC

On 2023-10-18, Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
> On 10/18/2023 8:28 AM, Richard Damon wrote:
>> On 10/18/23 9:10 AM, Bart wrote:
>>>
>>> So what does a fork of C look like?
>>>
>>
>> C++ or C# maybe?
>
> LOL! :^)
There are attempts, to replace C, but all I see, they can replace C++, only :P

7-77-777, Evil Sinner!
https://www.linkedin.com/in/branimir-maksimovic-6762bbaa/M

Re: portable way to get highest bit set?

<86wmvj5gkc.fsf@linuxsc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29986&group=comp.lang.c#29986

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Wed, 18 Oct 2023 22:25:55 -0700
Organization: A noiseless patient Spider
Lines: 90
Message-ID: <86wmvj5gkc.fsf@linuxsc.com>
References: <ug5gvh$1jkar$3@dont-email.me> <20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com> <ug6huc$1rvp1$1@dont-email.me> <86h6mxawqq.fsf@linuxsc.com> <20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com> <86cyxkb2ka.fsf@linuxsc.com> <20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com> <864jiwaqic.fsf@linuxsc.com> <20231012173021.0000149c@yahoo.com> <86v8bbanjv.fsf@linuxsc.com> <20231012191641.00007493@yahoo.com> <865y346qvs.fsf@linuxsc.com> <ugp766$3ppas$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="c39f9e0eb637dc2d4d061ce94725fd35";
logging-data="195688"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+YDIwM1sukYDvKbN2/mokj3kYdtW+TSI8="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:qMWjCCAF2fHE9B4Ej9+xsMielKw=
sha1:YVTi5f7ut8lUoQqOOVHQhx9DXhA=

by: Tim Rentsch - Thu, 19 Oct 2023 05:25 UTC

jak <nospam@please.ty> writes:

> Tim Rentsch ha scritto:
>
>> [...] The poster jak gave a
>> nice function that is linear but of the constructive variety.
>> The next code is jak's except the layout is slightly different
>> and it uses the type T (stipulated to be unsigned) rather than a
>> putting in a particular definite type.
>>
>> T
>> hbs_jak( T val ){
>> int i;
>> for( i = 0; val > 1; val /= 2, i++ );
>> return val << i;
>> }
>>
>> [...]
>>
>> Out of intellectual curiosity, I tried some variations of some of
>> the earlier versions. First it occurred to me that the jak version
>> could be sped up by shifting larger amounts at first
>>
>> T
>> hbs_jak_fast( T v ){
>> unsigned k = 0;
>>
>> while( v > 0xFF ) v >>= 8, k += 8;
>> if( v > 0xF ) v >>= 4, k += 4;
>> if( v > 0x3 ) v >>= 2, k += 2;
>> if( v > 0x1 ) v >>= 1, k += 1;
>>
>> return v << k;
>> }
>>
>> This change gives a substantial improvement. [...]
>
> I improved the function you mentioned. Now the research is "almost"
> dichotomous.
>
> #include <limits.h>
>
> typedef unsigned long T;
>
> T h_bit(T val)
> {
> T mask;
> int tbits = 0, bits = (sizeof(mask) * CHAR_BIT) / 2;
> for(; bits > 0; bits /= 2)
> {
> if((mask = ~0UL << bits) & val)
> {
> val >>= bits;
> tbits += bits;
> }
> }
> return val << tbits;
> }

Right. We can generalize the ad hoc few extra shifts by writing
a loop. Actually I already wrote a function nearly identical to
the one above, as part of my exploration. Performance is decent.

The function above has two problems. There's an obvious one that
concerns the type being shifted in the expression for computing
the value of mask. Rather than using '~0UL', this operand needs
to be of type T, so for example it could be '(T)-1'. As it
stands the expression '~0UL << bits' can have undefined behavior
it the type T is wider than unsigned long.

The second problem is more subtle. The above code works fine if
the width of type T is a power of two, but doesn't work properly
if it isn't. Consider for example an implementation where the
value of CHAR_BIT is 12, and the function is called with an
argument value of 7. Assuming sizeof (T) is a power of two,
eventually we will get to a point where the shift amount is 3.
There is no overlap between (T)-1 << 3 and 7, so the if() will
not execute its controlled block. But on the next iteration
the shift amount will be 1, so val will be 7>>1 == 3 coming
out of the loop, and the return value will have two bits set
rather than one.

The solution to this problem is to pick the initial shift amount
to be the largest power of two less than or equal to the maximal
value of T (that is, the log base 2 of that power). This initial
shift value can be computed by means of an expression that gives
a compile-time constant value, for any practical value of shift
amount. (Naturally we would encapsulate the complications in a
macro definition or two.) If someone gets stuck I can post code
for that in a followup.

Re: portable way to get highest bit set?

<86sf675fir.fsf@linuxsc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29987&group=comp.lang.c#29987

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Wed, 18 Oct 2023 22:48:28 -0700
Organization: A noiseless patient Spider
Lines: 54
Message-ID: <86sf675fir.fsf@linuxsc.com>
References: <ug5gvh$1jkar$3@dont-email.me> <20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com> <ug6huc$1rvp1$1@dont-email.me> <86h6mxawqq.fsf@linuxsc.com> <20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com> <86cyxkb2ka.fsf@linuxsc.com> <20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com> <864jiwaqic.fsf@linuxsc.com> <20231012173021.0000149c@yahoo.com> <86v8bbanjv.fsf@linuxsc.com> <20231012191641.00007493@yahoo.com> <865y346qvs.fsf@linuxsc.com> <20231018174110.00003717@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="c39f9e0eb637dc2d4d061ce94725fd35";
logging-data="203493"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18lK1IAMWp0V+bdNetkahYdIhS8bTuybnc="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:/ebro5KEgO/rF1kU1lygjQP8faE=
sha1:BlOmQyKrqgWSw9Xe6IbLR1wVZaM=

by: Tim Rentsch - Thu, 19 Oct 2023 05:48 UTC

Michael S <already5chosen@yahoo.com> writes:

> On Wed, 18 Oct 2023 05:45:27 -0700
> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
[...]

>> Last but not least, I explored using the approach of converting to
>> floating-point, to see how it stacks up against the other schemes
>> (with the admission that it is more limited than the type-agnostic
>> methods). Here is the terminus of that exploration:
>>
>> T
>> hbs_using_IEEE_floating_point( T u ){
>> union { float f; struct { unsigned m:23, e:8, sign: 1; } b; }
>> x; return u ? x.f = u & ~(u>>1), (T)1 << x.b.e - 127 : 0;
>> }
>
> u & ~(u>>1) is a brilliant trick! [...]

Thank you for the kind description. I must admit I was pleased
when I discovered it.

>> This short function gives exact results for T up to 128 bits wide.
>> It runs about the same speed as the best conventional method,
>> except that under gcc it runs about 40% faster on 128 bit inputs.
>
> That's surprising.
> Intuitively I would expect this method, despite all elegance, to be
> slower than loopless method of Kaz. I'd expect the later to run in 25-30
> clocks for 128-bit inputs.

Instinctively it seems like straight line should be faster, but
actually the looping version scales better. Here's why I think
that is. The costly operations are the full-width shifts and
ors. The looping version has some bookeeping overhead (for
changing the shift value), but these operations are cheap and can
be done in parallel. Also the looping version has the expression
for the exit test but this too can be overlapped with (the next
iteration of) shifting and oring. But here is the key: the
straight line code always has to do ALL the shifts and ors; the
looping version does only as many as are needed to produce the
answer. Taking an early exit saves iterations on the vast
majority of input values.

> Sounds like the author of gcc run-time support function
> __floatuntisf did an excellent job.

Yes, and better than clang. In many or most cases clang does
better on the 128 bit types, but here gcc outperforms clang by
almost a factor of two. It might be interesting to see a more
comprehensive set of results using different versions of these
compilers (and libraries) - I'm just using whatever versions
happen to be on the colo server here.

Re: portable way to get highest bit set?

<86o7gv5bk1.fsf@linuxsc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29988&group=comp.lang.c#29988

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!news.chmurka.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Thu, 19 Oct 2023 00:14:06 -0700
Organization: A noiseless patient Spider
Lines: 76
Message-ID: <86o7gv5bk1.fsf@linuxsc.com>
References: <ug5gvh$1jkar$3@dont-email.me> <20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com> <ug6huc$1rvp1$1@dont-email.me> <86h6mxawqq.fsf@linuxsc.com> <20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com> <86cyxkb2ka.fsf@linuxsc.com> <20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com> <864jiwaqic.fsf@linuxsc.com> <20231012173021.0000149c@yahoo.com> <86v8bbanjv.fsf@linuxsc.com> <20231013021504.12623444fc7d7fdaab87f1e0@gmail.moc> <86mswm9mwk.fsf@linuxsc.com> <20231014015035.a51cbb621de8eea5ac6a8651@gmail.moc> <20231017183345.26e2afd8641d5c63f8e8e630@gmail.moc>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="c39f9e0eb637dc2d4d061ce94725fd35";
logging-data="235262"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19iUj46CGF5yRl5h3tJhVnT8iDrPj6+5dE="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:OLLAuMdhWs5FBTaw13viKDdjv8A=
sha1:oNRsEAzyKZ5l1Q299YcrBaqWZrs=

by: Tim Rentsch - Thu, 19 Oct 2023 07:14 UTC

Anton Shepelev <anton.txt@gmail.moc> writes:

> I wrote:
>
>> reflecting on my solution later I decided that a direct
>> comparison of the analysed value with a power of two might
>> be better, especially because it is already in the form
>> required for the return value.

[a long line wrapped]

> Here it is:
>
> unsigned long int
> high_bit_mask( unsigned long n )
> { const unsigned long M =
> (unsigned long)1 << (CHAR_BIT * sizeof( long ) - 1);
> unsigned long mask = 0;
> int l = -1,
> r = CHAR_BIT * sizeof( long );
>
> while( 1 )
> { unsigned long C;
> const int m = (l + r)/2;
>
> C = M >> m;
> if( n >= C ) { r = m; mask = C; }
> else { l = m; }
>
> if( r - l == 1 ) break;
> }
> return mask;
> }

Is there some reason you are using a while(1) with a conditional
'break;' statement at the end rather than just using do...while;?
I think most people would agree that using a do...while is better
style.

> It does not work with `long long' in my environment, so I post
> the code for `long'.

It seems you still do not get the spirit of the problem. Part of
the goal is to make the code not depend on a specific choice of
type for the values being worked on. The code above has 'long'
and 'unsigned long' all over the place. Rather than do that,
write the function in terms of an abstract type T, and then put a
typedef for T before the function for whatever type you think is
appropriate for your environment. I would like to take any code
posted and incorporate it into my little test bench here so I can
test it and see how it does, but having to go through this code
and change all the 'unsigned long's and so forth makes that a big
pain in the ass.

> This is the cleanest binary search vesion that I can think of.

The following binary search code is simpler and runs faster.

T
hbs_binary( T u ){
const T one = 1, v = u>>1;
unsigned least = 0, limit = MASK_WIDTH( -one );

do {
const unsigned k = least+limit >> 1;
const T bit = one << k;

/**/ if( bit <= v ) least = k+1;
else if( bit > u ) limit = k;
else return bit;

} while( limit > 0 );

return 0;
}

Re: portable way to get highest bit set?

<ugqm71$7e0u$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29989&group=comp.lang.c#29989

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nos...@please.ty (jak)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Thu, 19 Oct 2023 09:34:57 +0200
Organization: A noiseless patient Spider
Lines: 110
Message-ID: <ugqm71$7e0u$1@dont-email.me>
References: <ug5gvh$1jkar$3@dont-email.me>
<20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com>
<ug6huc$1rvp1$1@dont-email.me> <86h6mxawqq.fsf@linuxsc.com>
<20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com>
<86cyxkb2ka.fsf@linuxsc.com>
<20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com>
<864jiwaqic.fsf@linuxsc.com> <20231012173021.0000149c@yahoo.com>
<86v8bbanjv.fsf@linuxsc.com> <20231012191641.00007493@yahoo.com>
<865y346qvs.fsf@linuxsc.com> <ugp766$3ppas$1@dont-email.me>
<86wmvj5gkc.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 19 Oct 2023 07:34:57 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="62db6fbee6be907b3fee3d4cdaa41af9";
logging-data="243742"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+cLJGLt1m6OPgnI4MFSeUK"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17.1
Cancel-Lock: sha1:U1kdUuuDl/IDU3rMSHtQhKGtu8E=
In-Reply-To: <86wmvj5gkc.fsf@linuxsc.com>

by: jak - Thu, 19 Oct 2023 07:34 UTC

Tim Rentsch ha scritto:
> jak <nospam@please.ty> writes:
>
>> Tim Rentsch ha scritto:
>>
>>> [...] The poster jak gave a
>>> nice function that is linear but of the constructive variety.
>>> The next code is jak's except the layout is slightly different
>>> and it uses the type T (stipulated to be unsigned) rather than a
>>> putting in a particular definite type.
>>>
>>> T
>>> hbs_jak( T val ){
>>> int i;
>>> for( i = 0; val > 1; val /= 2, i++ );
>>> return val << i;
>>> }
>>>
>>> [...]
>>>
>>> Out of intellectual curiosity, I tried some variations of some of
>>> the earlier versions. First it occurred to me that the jak version
>>> could be sped up by shifting larger amounts at first
>>>
>>> T
>>> hbs_jak_fast( T v ){
>>> unsigned k = 0;
>>>
>>> while( v > 0xFF ) v >>= 8, k += 8;
>>> if( v > 0xF ) v >>= 4, k += 4;
>>> if( v > 0x3 ) v >>= 2, k += 2;
>>> if( v > 0x1 ) v >>= 1, k += 1;
>>>
>>> return v << k;
>>> }
>>>
>>> This change gives a substantial improvement. [...]
>>
>> I improved the function you mentioned. Now the research is "almost"
>> dichotomous.
>>
>> #include <limits.h>
>>
>> typedef unsigned long T;
>>
>> T h_bit(T val)
>> {
>> T mask;
>> int tbits = 0, bits = (sizeof(mask) * CHAR_BIT) / 2;
>> for(; bits > 0; bits /= 2)
>> {
>> if((mask = ~0UL << bits) & val)
>> {
>> val >>= bits;
>> tbits += bits;
>> }
>> }
>> return val << tbits;
>> }
>
> Right. We can generalize the ad hoc few extra shifts by writing
> a loop. Actually I already wrote a function nearly identical to
> the one above, as part of my exploration. Performance is decent.
>

Of course and this is the reason why I thought of sending you the new
version.

>
> The function above has two problems. There's an obvious one that
> concerns the type being shifted in the expression for computing
> the value of mask. Rather than using '~0UL', this operand needs
> to be of type T, so for example it could be '(T)-1'. As it
> stands the expression '~0UL << bits' can have undefined behavior
> it the type T is wider than unsigned long.
>

In fact, in that line there are errors by distraction. For example, not
even the 'mask' variable is necessary. It only existed for debug and can
be removed. The same is for ~0UL which in the local version is already
(T)-1 but it would not be an UB even for the compiler (no warning).
Autocast helps.

>
> The second problem is more subtle. The above code works fine if
> the width of type T is a power of two, but doesn't work properly
> if it isn't. Consider for example an implementation where the
> value of CHAR_BIT is 12, and the function is called with an
> argument value of 7. Assuming sizeof (T) is a power of two,
> eventually we will get to a point where the shift amount is 3.
> There is no overlap between (T)-1 << 3 and 7, so the if() will
> not execute its controlled block. But on the next iteration
> the shift amount will be 1, so val will be 7>>1 == 3 coming
> out of the loop, and the return value will have two bits set
> rather than one.
>
> The solution to this problem is to pick the initial shift amount
> to be the largest power of two less than or equal to the maximal
> value of T (that is, the log base 2 of that power). This initial
> shift value can be computed by means of an expression that gives
> a compile-time constant value, for any practical value of shift
> amount. (Naturally we would encapsulate the complications in a

> macro definition or two.) If someone gets stuck I can post code
> for that in a followup.
>
This would be really interesting. Personally I would agree to hang the
first version of the function to the second :^)

Greetings

Re: portable way to get highest bit set?

<20231019122318.424cc40a22971a045ef11be5@gmail.moc>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29990&group=comp.lang.c#29990

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!news.chmurka.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton....@gmail.moc (Anton Shepelev)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Thu, 19 Oct 2023 12:23:18 +0300
Organization: A noiseless patient Spider
Lines: 120
Message-ID: <20231019122318.424cc40a22971a045ef11be5@gmail.moc>
References: <ug5gvh$1jkar$3@dont-email.me>
<20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com>
<ug6huc$1rvp1$1@dont-email.me>
<86h6mxawqq.fsf@linuxsc.com>
<20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com>
<86cyxkb2ka.fsf@linuxsc.com>
<20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com>
<864jiwaqic.fsf@linuxsc.com>
<20231012173021.0000149c@yahoo.com>
<86v8bbanjv.fsf@linuxsc.com>
<20231013021504.12623444fc7d7fdaab87f1e0@gmail.moc>
<86mswm9mwk.fsf@linuxsc.com>
<20231014015035.a51cbb621de8eea5ac6a8651@gmail.moc>
<20231017183345.26e2afd8641d5c63f8e8e630@gmail.moc>
<86o7gv5bk1.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="ed6e1a73370f4dfe93b336d1fe2714af";
logging-data="292059"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+zIeomH8vy3e12ARRtOgyPeopkxqh0f1E="
Cancel-Lock: sha1:BSnCLNHw6Fhrbmbpd7GTqKuwbiw=
X-Newsreader: Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32)

by: Anton Shepelev - Thu, 19 Oct 2023 09:23 UTC

Tim Rentsch to Anton Shepelev:

> > unsigned long int
> > high_bit_mask( unsigned long n )
> > { const unsigned long M = (unsigned long)1 << (CHAR_BIT * sizeof( long ) - 1);
> > unsigned long mask = 0;
> > int l = -1,
> > r = CHAR_BIT * sizeof( long );
> >
> > while( 1 )
> > { unsigned long C;
> > const int m = (l + r)/2;
> >
> > C = M >> m;
> > if( n >= C ) { r = m; mask = C; }
> > else { l = m; }
> >
> > if( r - l == 1 ) break;
> > }
> > return mask;
> > }
>
> Is there some reason you are using a while(1) with a
> conditional 'break;' statement at the end rather than just
> using do...while;? I think most people would agree that
> using a do...while is better style.

None at all. I started designing the loop in the general
form, and then failed to notice that the final version had a
single `break' at the end, in which case do..while is indeed
better.

> It seems you still do not get the spirit of the problem.
> Part of the goal is to make the code not depend on a
> specific choice of type for the values being worked on.

Since C does not have generics, I thought the actual
function must work with /some/ specific type, which can be
the largest integer type available.

> The code above has 'long' and 'unsigned long' all over the
> place. Rather than do that, write the function in terms
> of an abstract type T, and then put a typedef for T before
> the function for whatever type you think is appropriate
> for your environment.

Using #typedef thus to emulate generics simply did not occur
to me. A truly generic solution would be a macro that
generated a function for the type passed in the parameter,
yet nobody has written such a macro.

> I would like to take any code posted and incorporate it
> into my little test bench here so I can test it and see
> how it does, but having to go through this code and change
> all the 'unsigned long's and so forth makes that a big
> pain in the ass.

OK:

T
high_bit_mask( T n )
{ const T M = (T)1 << (CHAR_BIT * sizeof( T ) - 1);
T mask = 0;
int l = -1,
r = CHAR_BIT * sizeof( T );

do
{ const int m = (l + r)/2;
T C = M >> m;

if( n >= C ) { r = m; mask = C; }
else { l = m; }
} while( r - l > 1 );

return mask;
}

> The following binary search code is simpler and runs faster.
>
> T
> hbs_binary( T u ){
> const T one = 1, v = u>>1;
> unsigned least = 0, limit = MASK_WIDTH( -one );
>
> do {
> const unsigned k = least+limit >> 1;
> const T bit = one << k;
>
> /**/ if( bit <= v ) least = k+1;
> else if( bit > u ) limit = k;
> else return bit;
>
> } while( limit > 0 );
>
> return 0;
> }

I disagree about the higher simplicity of your code, because
it

1. has two return statements,

2. employs the /clever trick/ of comparing the mask to
different values in different branches.

3. uses a three-way `if'.

4. uses conditions that are not mutually exclusive.

5. uses `least' and `limit' instead of intuitive `left'
and `right' or `min' and `max'. My version makes a
point of giving its left and right indices the
intuitive meaning that they have in the standard
positional notation.

My version, however, is still not a plain-vanilla binary search.

--
() ascii ribbon campaign -- against html e-mail
/\ www.asciiribbon.org -- against proprietary attachments

Re: portable way to get highest bit set?

<Ld8YM.79215$0UVe.56593@fx17.iad>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29992&group=comp.lang.c#29992

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer03.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx17.iad.POSTED!not-for-mail
Newsgroups: comp.lang.c
From: branimir...@icloud.com (Branimir Maksimovic)
Subject: Re: portable way to get highest bit set?
References: <ug5gvh$1jkar$3@dont-email.me>
<20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com>
<ug6huc$1rvp1$1@dont-email.me> <86h6mxawqq.fsf@linuxsc.com>
<20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com>
<86cyxkb2ka.fsf@linuxsc.com>
<20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com>
<864jiwaqic.fsf@linuxsc.com> <20231012173021.0000149c@yahoo.com>
<86v8bbanjv.fsf@linuxsc.com> <20231012191641.00007493@yahoo.com>
<865y346qvs.fsf@linuxsc.com> <20231018174110.00003717@yahoo.com>
<86sf675fir.fsf@linuxsc.com>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 55
Message-ID: <Ld8YM.79215$0UVe.56593@fx17.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Thu, 19 Oct 2023 11:15:55 UTC
Organization: usenet-news.net
Date: Thu, 19 Oct 2023 11:15:55 GMT
X-Received-Bytes: 3685

by: Branimir Maksimovic - Thu, 19 Oct 2023 11:15 UTC

On 2023-10-19, Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> Michael S <already5chosen@yahoo.com> writes:
>
>> On Wed, 18 Oct 2023 05:45:27 -0700 Tim Rentsch <tr.17687@z991.linuxsc.com>
>> wrote:
>>
> [...]
>
>>> Last but not least, I explored using the approach of converting to
>>> floating-point, to see how it stacks up against the other schemes (with the
>>> admission that it is more limited than the type-agnostic methods). Here is
>>> the terminus of that exploration:
>>>
>>> T hbs_using_IEEE_floating_point( T u ){ union { float f; struct {
>>> unsigned m:23, e:8, sign: 1; } b; } x; return u ? x.f = u & ~(u>>1),
>>> (T)1 << x.b.e - 127 : 0; }
>>
>> u & ~(u>>1) is a brilliant trick! [...]
>
> Thank you for the kind description. I must admit I was pleased when I
> discovered it.
>
>>> This short function gives exact results for T up to 128 bits wide. It runs
>>> about the same speed as the best conventional method, except that under gcc
>>> it runs about 40% faster on 128 bit inputs.
>>
>> That's surprising. Intuitively I would expect this method, despite all
>> elegance, to be slower than loopless method of Kaz. I'd expect the later to
>> run in 25-30 clocks for 128-bit inputs.
>
> Instinctively it seems like straight line should be faster, but actually the
> looping version scales better. Here's why I think that is. The costly
> operations are the full-width shifts and ors. The looping version has some
> bookeeping overhead (for changing the shift value), but these operations are
> cheap and can be done in parallel. Also the looping version has the
> expression for the exit test but this too can be overlapped with (the next
> iteration of) shifting and oring. But here is the key: the straight line
> code always has to do ALL the shifts and ors; the looping version does only
> as many as are needed to produce the answer. Taking an early exit saves
> iterations on the vast majority of input values.
>
>> Sounds like the author of gcc run-time support function __floatuntisf did an
>> excellent job.
>
> Yes, and better than clang. In many or most cases clang does better on the
> 128 bit types, but here gcc outperforms clang by almost a factor of two. It
> might be interesting to see a more comprehensive set of results using
> different versions of these compilers (and libraries) - I'm just using
> whatever versions happen to be on the colo server here.
gcc is also faster on Apple ARM...

7-77-777, Evil Sinner!
https://www.linkedin.com/in/branimir-maksimovic-6762bbaa/M

Re: portable way to get highest bit set?

<20231020001455.00006ea4@yahoo.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=29997&group=comp.lang.c#29997

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5...@yahoo.com (Michael S)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Fri, 20 Oct 2023 00:14:55 +0300
Organization: A noiseless patient Spider
Lines: 248
Message-ID: <20231020001455.00006ea4@yahoo.com>
References: <ug5gvh$1jkar$3@dont-email.me>
<20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com>
<ug6huc$1rvp1$1@dont-email.me>
<86h6mxawqq.fsf@linuxsc.com>
<20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com>
<86cyxkb2ka.fsf@linuxsc.com>
<20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com>
<864jiwaqic.fsf@linuxsc.com>
<20231012173021.0000149c@yahoo.com>
<86v8bbanjv.fsf@linuxsc.com>
<20231012191641.00007493@yahoo.com>
<865y346qvs.fsf@linuxsc.com>
<20231018174110.00003717@yahoo.com>
<86sf675fir.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="eb59e3fe0e7c270fa6ca41745e99a63c";
logging-data="611495"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18QVhUA7SeW6qqA/ibVWPQMTHNDVEMadJA="
Cancel-Lock: sha1:iRwgI/pMcz7V0stGKcMVirKZuPk=
X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)

by: Michael S - Thu, 19 Oct 2023 21:14 UTC

On Wed, 18 Oct 2023 22:48:28 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

> Michael S <already5chosen@yahoo.com> writes:
>
> > On Wed, 18 Oct 2023 05:45:27 -0700
> > Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> >
> [...]
>
> >> Last but not least, I explored using the approach of converting to
> >> floating-point, to see how it stacks up against the other schemes
> >> (with the admission that it is more limited than the type-agnostic
> >> methods). Here is the terminus of that exploration:
> >>
> >> T
> >> hbs_using_IEEE_floating_point( T u ){
> >> union { float f; struct { unsigned m:23, e:8, sign: 1; }
> >> b; } x; return u ? x.f = u & ~(u>>1), (T)1 << x.b.e - 127 : 0;
> >> }
> >
> > u & ~(u>>1) is a brilliant trick! [...]
>
> Thank you for the kind description. I must admit I was pleased
> when I discovered it.
>
> >> This short function gives exact results for T up to 128 bits wide.
> >> It runs about the same speed as the best conventional method,
> >> except that under gcc it runs about 40% faster on 128 bit inputs.
> >
> > That's surprising.
> > Intuitively I would expect this method, despite all elegance, to be
> > slower than loopless method of Kaz. I'd expect the later to run in
> > 25-30 clocks for 128-bit inputs.
>
> Instinctively it seems like straight line should be faster, but
> actually the looping version scales better. Here's why I think
> that is. The costly operations are the full-width shifts and
> ors. The looping version has some bookeeping overhead (for
> changing the shift value), but these operations are cheap and can
> be done in parallel. Also the looping version has the expression
> for the exit test but this too can be overlapped with (the next
> iteration of) shifting and oring. But here is the key: the
> straight line code always has to do ALL the shifts and ors; the
> looping version does only as many as are needed to produce the
> answer. Taking an early exit saves iterations on the vast
> majority of input values.
>
> > Sounds like the author of gcc run-time support function
> > __floatuntisf did an excellent job.
>
> Yes, and better than clang. In many or most cases clang does
> better on the 128 bit types, but here gcc outperforms clang by
> almost a factor of two. It might be interesting to see a more
> comprehensive set of results using different versions of these
> compilers (and libraries) - I'm just using whatever versions
> happen to be on the colo server here.

According to my measurements, Kaz's code is faster.
More precisly, for 32-bit and 64-bit it is approximately identical in
speed to FP-based variants, but easily faster for 128 bits.
The following double precision FP-based variant was closer to Kaz's
but still the difference is significant:

T
highest_bit_set( T u ){
if (u == 0)
return 0;
double d = u & ~(u>>1);
uint64_t ur;
memcpy(&ur, &d, sizeof(ur));
return (T)1 << ((ur >> 52) - 0x3ff);
}

May be, the difference between our results is due to environment.
Mine was x86-64 Windows, gcc 12.2.0. Flags '-Wall -O3 -march=native'.

May be, it's due to my relatively old CPUs: Intel Ive Bridge, Haswell
and Skylake.

But most likely the reason is a difference in test vectors.
My test vector was pseudo-random with log-uniform distribution. I.e.
all 128 possible results appear with equal probability.

Here is my testbench. It is written in C++. Hopefully, since it's just
a testbench, it would not be considered horribly off topic in
comp.lang.c.
Very few comments. Sorry about it.

#include <cstdint>
#include <cstdio>
#include <random>
#include <vector>
#include <chrono>
#include <algorithm>

typedef test_uint_t test_t;
enum {
N_BITS = sizeof(test_t)*8,
N_POINTS = 10000000,
N_ITER = 13,
RND_LSHIFT = N_BITS > 64 ? 64 : 0,
};

extern "C" test_t highest_bit_set(test_t u);

// dst[2*i+0] - input value
// dst[2*i+1] - result
static int fill_nonrandom(test_t* dst)
{ // 0..255 - all values
dst[0*2+0] = dst[0*2+1] = 0;
for (int nb = 1; nb <= 8; ++nb) {
const int MSBit = 1 << (nb-1);
for (int i = 0; i < MSBit; ++i) {
dst[(MSBit+i)*2+0] = test_t(MSBit+i);
dst[(MSBit+i)*2+1] = test_t(MSBit);
}
}
// for nbits > 8 fill all values with 1 and 2 bits set + all values
with nbits-1 and nbits bit set
int idx = 1 << 8;
for (int nb = 9; nb <= N_BITS; ++nb) {
const test_t MSK = test_t(-1) >> (N_BITS - nb);
const test_t MSBit = MSK ^ (MSK >> 1);
int idx0 = idx;
dst[idx*2] = MSBit; ++idx;
dst[idx*2] = MSK; ++idx;
for (int bit_i = 0; bit_i < nb-1; ++bit_i)
dst[(idx+bit_i)*2] = MSBit | (test_t(1) << bit_i);
idx += nb - 1;
for (int bit_i = 0; bit_i < nb-1; ++bit_i)
dst[(idx+bit_i)*2] = ~(test_t(1) << bit_i) & MSK;
idx += nb - 1;
for (int i = idx0; i < idx; ++i)
dst[i*2+1] = MSBit;
}
return idx;
}

// dst[2*i+0] - input value
// dst[2*i+1] - result
static void fill_random(test_t* dst, int len)
{ const test_t MSK = test_t(-1);
const test_t MSBit = MSK ^ (MSK >> 1);
std::mt19937_64 rndGen;
for (int i = 0; i < len; ++i) {
int nbits = int(rndGen() % (N_BITS+1));
test_t res = 0;
test_t val = 0;
if (nbits > 0) {
test_t rndVal = rndGen();
if (N_BITS > 64)
rndVal = (rndVal << RND_LSHIFT) + rndGen();
val = ((rndVal | MSBit) & MSK) >> (N_BITS - nbits);
res = (MSBit) >> (N_BITS - nbits);
}
dst[i*2+0] = val;
dst[i*2+1] = res;
// if (i < 5)
// printf("%2d %016llx %016llx %016llx\n", nbits, (unsigned long
long)val, (unsigned long long)res, (unsigned long long)MSBit);
}
}

int main()
{ auto t0 = std::chrono::steady_clock::now();
printf("Generating %d test points...", N_POINTS);
std::vector<test_t> vec(N_POINTS*2);
int fill_i = fill_nonrandom(vec.data());
// printf("%d => %d\n", N_BITS, fill_i);
fill_random(vec.data()+fill_i*2, N_POINTS-fill_i);
auto t1 = std::chrono::steady_clock::now();
std::chrono::duration<long long, std::nano> dur = t1 - t0;
printf("done. %.3f msec.\n", dur.count()*1e-6);

long long dt[N_ITER];
for (int it = 0; it < N_ITER; ++it) {
t0 = std::chrono::steady_clock::now();
for (int i = 0; i < N_POINTS; ++i) {
test_t val = vec[i*2+0];
test_t ref = vec[i*2+1];
test_t res = highest_bit_set(val);
if (res != ref) {
printf("%016llx => %016llx != %016llx\n"
,(unsigned long long)val
,(unsigned long long)res
,(unsigned long long)ref
);
return 1;
}
}
t1 = std::chrono::steady_clock::now();
dur = t1 - t0;
dt[it] = dur.count();
}

long long dtSum = 0;
for (int it = 0; it < N_ITER; ++it)
dtSum += dt[it];

std::nth_element(dt, dt + N_ITER/2, dt + N_ITER); // median
long long dtMed = dt[N_ITER/2];

printf("Throughput time (nsec) mean/med %.3f / %.3f\n"
, double(dtSum)/N_ITER/N_POINTS
, double(dtMed)/N_POINTS
);

test_t diff = 0;
for (int it = 0; it < N_ITER; ++it) {
t0 = std::chrono::steady_clock::now();
for (int i = 0; i < N_POINTS; ++i) {
test_t val = vec[i*2+0];
test_t ref = vec[i*2+1];
test_t res = highest_bit_set(val | diff);
diff |= ref ^ res;
}
t1 = std::chrono::steady_clock::now();
dur = t1 - t0;
dt[it] = dur.count();
}

if (diff) {
printf("latency test failed.\n");
}

dtSum = 0;
for (int it = 0; it < N_ITER; ++it)
dtSum += dt[it];

std::nth_element(dt, dt + N_ITER/2, dt + N_ITER); // median
dtMed = dt[N_ITER/2];

printf("Latency time (nsec) mean/med %.3f / %.3f\n"
, double(dtSum)/N_ITER/N_POINTS
, double(dtMed)/N_POINTS
);

return 0;
}

Re: portable way to get highest bit set?

<86fs255ox5.fsf@linuxsc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30001&group=comp.lang.c#30001

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Fri, 20 Oct 2023 07:49:58 -0700
Organization: A noiseless patient Spider
Lines: 68
Message-ID: <86fs255ox5.fsf@linuxsc.com>
References: <ug5gvh$1jkar$3@dont-email.me> <ug6cv8$1r3g4$1@dont-email.me> <ug9vrc$2ov0t$1@dont-email.me> <uga0in$2ov0t$2@dont-email.me> <86pm1e7nxs.fsf@linuxsc.com> <ugjc0o$1dsv9$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="76468ba083774df7f57ee167b65bbb38";
logging-data="1218036"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19RkUlXQ0axvhRCzsNCV2o3a7PXPS5A7as="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:tRrSOolMajsozZSwAdGqFresiyA=
sha1:b7Bn1HOPk7xs8RigvsfWCu9IgeY=

by: Tim Rentsch - Fri, 20 Oct 2023 14:49 UTC

Bart <bc@freeuk.com> writes:

> On 16/10/2023 13:26, Tim Rentsch wrote:
>
>> Lew Pitcher <lew.pitcher@digitalfreehold.ca> writes:
>>
>>> Well, I'll be go to COBOL. A minor tweak just occurred to me.
>>>
>>> unsigned long long int msb_mask(unsigned long long int valu)
>>> {
>>> unsigned long long int mask;
>>>
>>> for (mask = ((ULLONG_MAX) ^ (ULLONG_MAX >> 1));
>>> mask && !(mask&valu);
>>> mask >>= 1) continue;
>>> return mask; /* don't need to AND with valu now */
>>> }
>>
>> This function depends on the type by using ULLONG_MAX. It is
>> easy though to make it not depend on that, and so to work
>> with an arbitrary unsigned type T:
>
> If T has to be an unsigned type then it is not arbitrary!

T is stipulated to be an unsigned integer type, with no other
constraining conditions given. That makes T an arbitrary
unsigned type. Don't be obtuse.

> Presumably you also want T to be a narrow type like u8 or u16, but
> I can't see the point of this. It may mean extra masking
> operations to truncate intermediate results. (Does it mean a
> shorter number of loop iterations?)

I expect most people assumed T would be at least as big as
unsigned int, which I think is reasonable. Clearly the point of
the request is to deal with larger types, not smaller types.

> However, how will a function such as the one below be used in
> practice: will there be only the one such function in a program?
> Say, for T set to unsigned short.
>
> But then suppose the need comes up elsewhere for it to work on
> unsigned long long; will there now be two versions, differently
> named, or just one that does both?

The OP asked for a function that didn't depend on width. The
point of writing a type-agnostic function is to show him or
her how to do that. And I think some other interesting results
have come out of the investigation.

> I would be inclined to just write the one function anyway, working
> on u64 that will accept any integer types, and possibly have a
> separate version for a narrower type if that is considered to
> confer advantages.

Writing a function just for a 64-bit type means larger types will
not be supported. Both gcc and clang support 128 bit integer
types (and there is some indication that even larger types will
be supported in the future). By the way, does your compiler
support 128 bit integer types?

> In that case, there is no need for type-agnostic code.

You're reaching conclusions without knowing any facts. It may
very well be the case that having a type-agnostic template
available satisfies some other requirement that we don't know
about. At the very least, for me even just the educational
value has been worth it.

Re: portable way to get highest bit set?

<ugu8jl$16cbp$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30002&group=comp.lang.c#30002

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (Bart)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Fri, 20 Oct 2023 17:07:17 +0100
Organization: A noiseless patient Spider
Lines: 57
Message-ID: <ugu8jl$16cbp$1@dont-email.me>
References: <ug5gvh$1jkar$3@dont-email.me> <ug6cv8$1r3g4$1@dont-email.me>
<ug9vrc$2ov0t$1@dont-email.me> <uga0in$2ov0t$2@dont-email.me>
<86pm1e7nxs.fsf@linuxsc.com> <ugjc0o$1dsv9$1@dont-email.me>
<86fs255ox5.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 20 Oct 2023 16:07:17 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1f0cc092caff22549cb46b3d5c868eba";
logging-data="1257849"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+qQxTiStMoIbS1TVKKDxI8OX9TCBTn/IY="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:Oj/XoyR+JN2UFMw+jRLwSffpxpc=
In-Reply-To: <86fs255ox5.fsf@linuxsc.com>
Content-Language: en-GB

by: Bart - Fri, 20 Oct 2023 16:07 UTC

On 20/10/2023 15:49, Tim Rentsch wrote:
> Bart <bc@freeuk.com> writes:
>
>> I would be inclined to just write the one function anyway, working
>> on u64 that will accept any integer types, and possibly have a
>> separate version for a narrower type if that is considered to
>> confer advantages.
>
> Writing a function just for a 64-bit type means larger types will
> not be supported. Both gcc and clang support 128 bit integer
> types (and there is some indication that even larger types will
> be supported in the future).

You mean C23's _BitInt types?

The rationale for arbitrary-width types below 64 bits appears to be for
use with FPGAs with custom word widths, but many seem excited about
using them for other purposes on ordinary machines. I just have 100
questions about how exactly they would work.

I would rather C acquired proper bit and bitfield operations.

While much bigger types specified to the exact bit (like 138,717 bits)
just seems totally the wrong approach. Outside power-of-two types of
sizes 128, 256 and possibly 1024 bits, which might have dedicated
support functions, completely arbitrary sizes really need a big-integer
type where you don't specify the exact size.

But at the minute, C is suffering from having an 'int' type and 'int'
constant that is typically 32 bits. That doesn't look like it will
change any time soon.

> By the way, does your compiler
> support 128 bit integer types?

The C compiler? No. That would be awkward now anyway because the new IL
is based on a 32/64-bit VM.

I did have a quite decent 128-bit implementation on one of my languages,
better than gcc's __int128_t effort, with:

* i128 and u128 variations

* i128 and u128 literals

* The ability to print either type

* Most arithmetic ops supported

The main thing missing was a full 128/128-bit divide (I had only
128/64-bit.)

I dropped it because it wasn't used enough to justify the extensive
support needed, which also over-complicated the IL and meant by-passing
the ABI. The main application of it was supporting 128-bit types in the
compiler!

Re: portable way to get highest bit set?

<87bkctqgo4.fsf@nosuchdomain.example.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30003&group=comp.lang.c#30003

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!nntp.comgw.net!paganini.bofh.team!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Fri, 20 Oct 2023 11:42:35 -0700
Organization: None to speak of
Lines: 35
Message-ID: <87bkctqgo4.fsf@nosuchdomain.example.com>
References: <ug5gvh$1jkar$3@dont-email.me> <ug6cv8$1r3g4$1@dont-email.me>
<ug9vrc$2ov0t$1@dont-email.me> <uga0in$2ov0t$2@dont-email.me>
<86pm1e7nxs.fsf@linuxsc.com> <ugjc0o$1dsv9$1@dont-email.me>
<86fs255ox5.fsf@linuxsc.com> <ugu8jl$16cbp$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="dd146bd732f164bacf9a56efa646b3b2";
logging-data="1328364"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/vO+6yH+KkH/Xjuz8LgCYU"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:IFy1MeUAaLIGryYG6cgC7pOrtBg=
sha1:QhvLLQlk1rkR48jQ31fvA8okuJ8=

by: Keith Thompson - Fri, 20 Oct 2023 18:42 UTC

Bart <bc@freeuk.com> writes:
> On 20/10/2023 15:49, Tim Rentsch wrote:
>> Bart <bc@freeuk.com> writes:
>>> I would be inclined to just write the one function anyway, working
>>> on u64 that will accept any integer types, and possibly have a
>>> separate version for a narrower type if that is considered to
>>> confer advantages.
>>
>> Writing a function just for a 64-bit type means larger types will
>> not be supported. Both gcc and clang support 128 bit integer
>> types (and there is some indication that even larger types will
>> be supported in the future).
>
> You mean C23's _BitInt types?

I believe Tim is referring to `__int128` and `unsigned __int128`,
an extension provided (on some but not all target systems) by gcc
and clang.

Note that these are not true integer types, and are not treated
as extended integer types. In particular, there are no integer
constants for these types. The [u]intmax_t types are 64 bits,
not 128. (And GNU libc doesn't support them.)

C23's _BitInt types are not of unlimited width. <limits.h> defines
BITINT_MAXWIDTH, which is required to be >= ULLONG_WIDTH (at least
64 bits). gcc and clang do not yet support _BitInt, and I don't
know whether they'll support more than 64 bits.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Re: portable way to get highest bit set?

<uh0a4o$1n5gr$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30008&group=comp.lang.c#30008

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bc...@freeuk.com (Bart)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Sat, 21 Oct 2023 11:45:45 +0100
Organization: A noiseless patient Spider
Lines: 46
Message-ID: <uh0a4o$1n5gr$1@dont-email.me>
References: <ug5gvh$1jkar$3@dont-email.me> <ug6cv8$1r3g4$1@dont-email.me>
<ug9vrc$2ov0t$1@dont-email.me> <uga0in$2ov0t$2@dont-email.me>
<86pm1e7nxs.fsf@linuxsc.com> <ugjc0o$1dsv9$1@dont-email.me>
<86fs255ox5.fsf@linuxsc.com> <ugu8jl$16cbp$1@dont-email.me>
<87bkctqgo4.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 21 Oct 2023 10:45:45 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="d4adf88480e0ce8019c23c5afe1f2494";
logging-data="1807899"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/kpTQRmsTDoGBep9rSf70Q8oLLZT5c87w="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:gUR4XCSXMn/E1iklNe6ZWxUMgg4=
In-Reply-To: <87bkctqgo4.fsf@nosuchdomain.example.com>
Content-Language: en-GB

by: Bart - Sat, 21 Oct 2023 10:45 UTC

On 20/10/2023 19:42, Keith Thompson wrote:
> Bart <bc@freeuk.com> writes:
>> On 20/10/2023 15:49, Tim Rentsch wrote:
>>> Bart <bc@freeuk.com> writes:
>>>> I would be inclined to just write the one function anyway, working
>>>> on u64 that will accept any integer types, and possibly have a
>>>> separate version for a narrower type if that is considered to
>>>> confer advantages.
>>>
>>> Writing a function just for a 64-bit type means larger types will
>>> not be supported. Both gcc and clang support 128 bit integer
>>> types (and there is some indication that even larger types will
>>> be supported in the future).
>>
>> You mean C23's _BitInt types?
>
> I believe Tim is referring to `__int128` and `unsigned __int128`,
> an extension provided (on some but not all target systems) by gcc
> and clang.

Even where he says 'even larger types'?

> Note that these are not true integer types, and are not treated
> as extended integer types. In particular, there are no integer
> constants for these types. The [u]intmax_t types are 64 bits,
> not 128. (And GNU libc doesn't support them.)
>
> C23's _BitInt types are not of unlimited width. <limits.h> defines
> BITINT_MAXWIDTH, which is required to be >= ULLONG_WIDTH (at least
> 64 bits). gcc and clang do not yet support _BitInt, and I don't
> know whether they'll support more than 64 bits.

"The clang implementation of _BitInt provides support for bit widths up
to 10**24"

from: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf

(I don't know if 10**24 was a typo for 2**24, which was the limit for
LLVM on which Clang is based, itself raised from 2**23.

Either way, doing the equivalent of adding an int474646_t to a
uint1377812_t type is ridiculous.)

Re: portable way to get highest bit set?

<20231021225010.00001606@yahoo.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30010&group=comp.lang.c#30010

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5...@yahoo.com (Michael S)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Sat, 21 Oct 2023 22:50:10 +0300
Organization: A noiseless patient Spider
Lines: 290
Message-ID: <20231021225010.00001606@yahoo.com>
References: <ug5gvh$1jkar$3@dont-email.me>
<20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com>
<ug6huc$1rvp1$1@dont-email.me>
<86h6mxawqq.fsf@linuxsc.com>
<20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com>
<86cyxkb2ka.fsf@linuxsc.com>
<20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com>
<864jiwaqic.fsf@linuxsc.com>
<20231012173021.0000149c@yahoo.com>
<86v8bbanjv.fsf@linuxsc.com>
<20231012191641.00007493@yahoo.com>
<865y346qvs.fsf@linuxsc.com>
<20231018174110.00003717@yahoo.com>
<86sf675fir.fsf@linuxsc.com>
<20231020001455.00006ea4@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="22e793304a2d0b864fbd05624b1170a8";
logging-data="2030291"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+L09AoFILa2XPPBfHJUMQOcI9M+Qgw0V4="
Cancel-Lock: sha1:47x66IJBEfMHK7dQ7W003SDTmDA=
X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)

by: Michael S - Sat, 21 Oct 2023 19:50 UTC

On Fri, 20 Oct 2023 00:14:55 +0300
Michael S <already5chosen@yahoo.com> wrote:

> On Wed, 18 Oct 2023 22:48:28 -0700
> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
> > Michael S <already5chosen@yahoo.com> writes:
> >
> > > On Wed, 18 Oct 2023 05:45:27 -0700
> > > Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> > >
> > [...]
> >
> > >> Last but not least, I explored using the approach of converting
> > >> to floating-point, to see how it stacks up against the other
> > >> schemes (with the admission that it is more limited than the
> > >> type-agnostic methods). Here is the terminus of that
> > >> exploration:
> > >>
> > >> T
> > >> hbs_using_IEEE_floating_point( T u ){
> > >> union { float f; struct { unsigned m:23, e:8, sign: 1; }
> > >> b; } x; return u ? x.f = u & ~(u>>1), (T)1 << x.b.e - 127 : 0;
> > >> }
> > >
> > > u & ~(u>>1) is a brilliant trick! [...]
> >
> > Thank you for the kind description. I must admit I was pleased
> > when I discovered it.
> >
> > >> This short function gives exact results for T up to 128 bits
> > >> wide. It runs about the same speed as the best conventional
> > >> method, except that under gcc it runs about 40% faster on 128
> > >> bit inputs.
> > >
> > > That's surprising.
> > > Intuitively I would expect this method, despite all elegance, to
> > > be slower than loopless method of Kaz. I'd expect the later to
> > > run in 25-30 clocks for 128-bit inputs.
> >
> > Instinctively it seems like straight line should be faster, but
> > actually the looping version scales better. Here's why I think
> > that is. The costly operations are the full-width shifts and
> > ors. The looping version has some bookeeping overhead (for
> > changing the shift value), but these operations are cheap and can
> > be done in parallel. Also the looping version has the expression
> > for the exit test but this too can be overlapped with (the next
> > iteration of) shifting and oring. But here is the key: the
> > straight line code always has to do ALL the shifts and ors; the
> > looping version does only as many as are needed to produce the
> > answer. Taking an early exit saves iterations on the vast
> > majority of input values.
> >
> > > Sounds like the author of gcc run-time support function
> > > __floatuntisf did an excellent job.
> >
> > Yes, and better than clang. In many or most cases clang does
> > better on the 128 bit types, but here gcc outperforms clang by
> > almost a factor of two. It might be interesting to see a more
> > comprehensive set of results using different versions of these
> > compilers (and libraries) - I'm just using whatever versions
> > happen to be on the colo server here.
>
> According to my measurements, Kaz's code is faster.
> More precisly, for 32-bit and 64-bit it is approximately identical in
> speed to FP-based variants, but easily faster for 128 bits.
> The following double precision FP-based variant was closer to Kaz's
> but still the difference is significant:
>
> T
> highest_bit_set( T u ){
> if (u == 0)
> return 0;
> double d = u & ~(u>>1);
> uint64_t ur;
> memcpy(&ur, &d, sizeof(ur));
> return (T)1 << ((ur >> 52) - 0x3ff);
> }
>

Here is a modification of FP-based functions that according to my
measurements that is faster than or equal to Kaz's method for 32, 64
and 128 bits.

T
highest_bit_set( T u ){
if (u == 0)
return 0;
const uint64_t DBL_MAX_EXACT_INT = (uint64_t)-1 >> (64 - 53);
double d;
int lshift = 0;
if ((T)-1 <= DBL_MAX_EXACT_INT) {
d = u;
} else {
const int rshift = (T)-1 > (uint64_t)-1 ? 64 : 0;
while (u > (uint64_t)-1) {
u >>= rshift;
lshift += 64;
}
uint64_t u64 = u;
d = u64 & ~(u64>>1);
}
uint64_t ur;
memcpy(&ur, &d, sizeof(ur));
return (T)1 << (((ur >> 52) - 0x3ff)+lshift);
}

When the number of bits is greater than 128, this method, assuming
64-bit underlaying hardware, become O(N*N) while Kaz's method remains
O(N*log(N)) so I expect this method to be equal in speed to Kaz's for
256 bit and slower for 512 bits. But those aree just guesses.

> May be, the difference between our results is due to environment.
> Mine was x86-64 Windows, gcc 12.2.0. Flags '-Wall -O3 -march=native'.
>
> May be, it's due to my relatively old CPUs: Intel Ive Bridge, Haswell
> and Skylake.
>
> But most likely the reason is a difference in test vectors.
> My test vector was pseudo-random with log-uniform distribution. I.e.
> all 128 possible results appear with equal probability.
>
> Here is my testbench. It is written in C++. Hopefully, since it's just
> a testbench, it would not be considered horribly off topic in
> comp.lang.c.
> Very few comments. Sorry about it.
>
> #include <cstdint>
> #include <cstdio>
> #include <random>
> #include <vector>
> #include <chrono>
> #include <algorithm>
>
> typedef test_uint_t test_t;
> enum {
> N_BITS = sizeof(test_t)*8,
> N_POINTS = 10000000,
> N_ITER = 13,
> RND_LSHIFT = N_BITS > 64 ? 64 : 0,
> };
>
> extern "C" test_t highest_bit_set(test_t u);
>
>
> // dst[2*i+0] - input value
> // dst[2*i+1] - result
> static int fill_nonrandom(test_t* dst)
> {
> // 0..255 - all values
> dst[0*2+0] = dst[0*2+1] = 0;
> for (int nb = 1; nb <= 8; ++nb) {
> const int MSBit = 1 << (nb-1);
> for (int i = 0; i < MSBit; ++i) {
> dst[(MSBit+i)*2+0] = test_t(MSBit+i);
> dst[(MSBit+i)*2+1] = test_t(MSBit);
> }
> }
> // for nbits > 8 fill all values with 1 and 2 bits set + all values
> with nbits-1 and nbits bit set
> int idx = 1 << 8;
> for (int nb = 9; nb <= N_BITS; ++nb) {
> const test_t MSK = test_t(-1) >> (N_BITS - nb);
> const test_t MSBit = MSK ^ (MSK >> 1);
> int idx0 = idx;
> dst[idx*2] = MSBit; ++idx;
> dst[idx*2] = MSK; ++idx;
> for (int bit_i = 0; bit_i < nb-1; ++bit_i)
> dst[(idx+bit_i)*2] = MSBit | (test_t(1) << bit_i);
> idx += nb - 1;
> for (int bit_i = 0; bit_i < nb-1; ++bit_i)
> dst[(idx+bit_i)*2] = ~(test_t(1) << bit_i) & MSK;
> idx += nb - 1;
> for (int i = idx0; i < idx; ++i)
> dst[i*2+1] = MSBit;
> }
> return idx;
> }
>
> // dst[2*i+0] - input value
> // dst[2*i+1] - result
> static void fill_random(test_t* dst, int len)
> {
> const test_t MSK = test_t(-1);
> const test_t MSBit = MSK ^ (MSK >> 1);
> std::mt19937_64 rndGen;
> for (int i = 0; i < len; ++i) {
> int nbits = int(rndGen() % (N_BITS+1));
> test_t res = 0;
> test_t val = 0;
> if (nbits > 0) {
> test_t rndVal = rndGen();
> if (N_BITS > 64)
> rndVal = (rndVal << RND_LSHIFT) + rndGen();
> val = ((rndVal | MSBit) & MSK) >> (N_BITS - nbits);
> res = (MSBit) >> (N_BITS - nbits);
> }
> dst[i*2+0] = val;
> dst[i*2+1] = res;
> // if (i < 5)
> // printf("%2d %016llx %016llx %016llx\n", nbits, (unsigned long
> long)val, (unsigned long long)res, (unsigned long long)MSBit);
> }
> }
>
> int main()
> {
> auto t0 = std::chrono::steady_clock::now();
> printf("Generating %d test points...", N_POINTS);
> std::vector<test_t> vec(N_POINTS*2);
> int fill_i = fill_nonrandom(vec.data());
> // printf("%d => %d\n", N_BITS, fill_i);
> fill_random(vec.data()+fill_i*2, N_POINTS-fill_i);
> auto t1 = std::chrono::steady_clock::now();
> std::chrono::duration<long long, std::nano> dur = t1 - t0;
> printf("done. %.3f msec.\n", dur.count()*1e-6);
>
> long long dt[N_ITER];
> for (int it = 0; it < N_ITER; ++it) {
> t0 = std::chrono::steady_clock::now();
> for (int i = 0; i < N_POINTS; ++i) {
> test_t val = vec[i*2+0];
> test_t ref = vec[i*2+1];
> test_t res = highest_bit_set(val);
> if (res != ref) {
> printf("%016llx => %016llx != %016llx\n"
> ,(unsigned long long)val
> ,(unsigned long long)res
> ,(unsigned long long)ref
> );
> return 1;
> }
> }
> t1 = std::chrono::steady_clock::now();
> dur = t1 - t0;
> dt[it] = dur.count();
> }
>
> long long dtSum = 0;
> for (int it = 0; it < N_ITER; ++it)
> dtSum += dt[it];
>
> std::nth_element(dt, dt + N_ITER/2, dt + N_ITER); // median
> long long dtMed = dt[N_ITER/2];
>
> printf("Throughput time (nsec) mean/med %.3f / %.3f\n"
> , double(dtSum)/N_ITER/N_POINTS
> , double(dtMed)/N_POINTS
> );
>
> test_t diff = 0;
> for (int it = 0; it < N_ITER; ++it) {
> t0 = std::chrono::steady_clock::now();
> for (int i = 0; i < N_POINTS; ++i) {
> test_t val = vec[i*2+0];
> test_t ref = vec[i*2+1];
> test_t res = highest_bit_set(val | diff);
> diff |= ref ^ res;
> }
> t1 = std::chrono::steady_clock::now();
> dur = t1 - t0;
> dt[it] = dur.count();
> }
>
> if (diff) {
> printf("latency test failed.\n");
> }
>
> dtSum = 0;
> for (int it = 0; it < N_ITER; ++it)
> dtSum += dt[it];
>
> std::nth_element(dt, dt + N_ITER/2, dt + N_ITER); // median
> dtMed = dt[N_ITER/2];
>
> printf("Latency time (nsec) mean/med %.3f / %.3f\n"
> , double(dtSum)/N_ITER/N_POINTS
> , double(dtMed)/N_POINTS
> );
>
> return 0;
> }
>
>
>
>

Click here to read the complete article

Re: portable way to get highest bit set?

<87bkcrlju1.fsf@nosuchdomain.example.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30013&group=comp.lang.c#30013

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Keith.S....@gmail.com (Keith Thompson)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Sat, 21 Oct 2023 14:57:42 -0700
Organization: None to speak of
Lines: 48
Message-ID: <87bkcrlju1.fsf@nosuchdomain.example.com>
References: <ug5gvh$1jkar$3@dont-email.me> <ug6cv8$1r3g4$1@dont-email.me>
<ug9vrc$2ov0t$1@dont-email.me> <uga0in$2ov0t$2@dont-email.me>
<86pm1e7nxs.fsf@linuxsc.com> <ugjc0o$1dsv9$1@dont-email.me>
<86fs255ox5.fsf@linuxsc.com> <ugu8jl$16cbp$1@dont-email.me>
<87bkctqgo4.fsf@nosuchdomain.example.com>
<uh0a4o$1n5gr$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="60672f50a3fccb4dfda591533ddd6cb2";
logging-data="2099074"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+l4hdkKhRnxdwTHQkihIVb"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Cancel-Lock: sha1:UT19jFmt8Z6AIssZyVDSLk8sQOk=
sha1:BLxUMw2wS1Xszn/5NDRLPjAzNrA=

by: Keith Thompson - Sat, 21 Oct 2023 21:57 UTC

Bart <bc@freeuk.com> writes:
> On 20/10/2023 19:42, Keith Thompson wrote:
>> Bart <bc@freeuk.com> writes:
>>> On 20/10/2023 15:49, Tim Rentsch wrote:
>>>> Bart <bc@freeuk.com> writes:
>>>>> I would be inclined to just write the one function anyway, working
>>>>> on u64 that will accept any integer types, and possibly have a
>>>>> separate version for a narrower type if that is considered to
>>>>> confer advantages.
>>>>
>>>> Writing a function just for a 64-bit type means larger types will
>>>> not be supported. Both gcc and clang support 128 bit integer
>>>> types (and there is some indication that even larger types will
>>>> be supported in the future).
>>>
>>> You mean C23's _BitInt types?
>> I believe Tim is referring to `__int128` and `unsigned __int128`,
>> an extension provided (on some but not all target systems) by gcc
>> and clang.
>
> Even where he says 'even larger types'?

Obviously not, but I don't know what he's referring to there.

>> Note that these are not true integer types, and are not treated
>> as extended integer types. In particular, there are no integer
>> constants for these types. The [u]intmax_t types are 64 bits,
>> not 128. (And GNU libc doesn't support them.)
>> C23's _BitInt types are not of unlimited width. <limits.h> defines
>> BITINT_MAXWIDTH, which is required to be >= ULLONG_WIDTH (at least
>> 64 bits). gcc and clang do not yet support _BitInt, and I don't
>> know whether they'll support more than 64 bits.
>
> "The clang implementation of _BitInt provides support for bit widths
> up to 10**24"
>
> from: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf

My mistake. gcc 13.2.0 doesn't support _BitInt, but clang 17.0.3 does
(I haven't checked earlier versions), and it has BITINT_MAXWIDTH =
8388608 (2**23).

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Re: portable way to get highest bit set?

<86y1fu4rfm.fsf@linuxsc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30022&group=comp.lang.c#30022

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Sun, 22 Oct 2023 08:17:49 -0700
Organization: A noiseless patient Spider
Lines: 67
Message-ID: <86y1fu4rfm.fsf@linuxsc.com>
References: <ug5gvh$1jkar$3@dont-email.me> <20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com> <ug6huc$1rvp1$1@dont-email.me> <86h6mxawqq.fsf@linuxsc.com> <20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com> <86cyxkb2ka.fsf@linuxsc.com> <20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com> <864jiwaqic.fsf@linuxsc.com> <20231012173021.0000149c@yahoo.com> <86v8bbanjv.fsf@linuxsc.com> <20231012191641.00007493@yahoo.com> <865y346qvs.fsf@linuxsc.com> <ugp766$3ppas$1@dont-email.me> <86wmvj5gkc.fsf@linuxsc.com> <ugqm71$7e0u$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="6dc747ee13145ffb1c38bba62e6e987b";
logging-data="2629165"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/dm2eTX23BrEnZQHf7jIS/cy3IVWUFvf8="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:l0PT60VcHnENJUtie+LozVD1J7I=
sha1:bmJo8YLEo429G54e7Y0Q9KfbNmE=

by: Tim Rentsch - Sun, 22 Oct 2023 15:17 UTC

jak <nospam@please.ty> writes:

> Tim Rentsch ha scritto:
>
>> jak <nospam@please.ty> writes:
>>
>>> [an improvement on an earlier function]
>>>
>>> #include <limits.h>
>>>
>>> typedef unsigned long T;
>>>
>>> T h_bit(T val)
>>> {
>>> T mask;
>>> int tbits = 0, bits = (sizeof(mask) * CHAR_BIT) / 2;
>>> for(; bits > 0; bits /= 2)
>>> {
>>> if((mask = ~0UL << bits) & val)
>>> {
>>> val >>= bits;
>>> tbits += bits;
>>> }
>>> }
>>> return val << tbits;
>>> }
>>
>> Right. We can generalize the ad hoc few extra shifts by writing
>> a loop. [...]
>>
>> The function above has two problems. [...]
>>
>> The second problem is more subtle. The above code works fine if
>> the width of type T is a power of two, but doesn't work properly
>> if it isn't. Consider for example an implementation where the
>> value of CHAR_BIT is 12, and the function is called with an
>> argument value of 7. Assuming sizeof (T) is a power of two,
>> eventually we will get to a point where the shift amount is 3.
>> There is no overlap between (T)-1 << 3 and 7, so the if() will
>> not execute its controlled block. But on the next iteration
>> the shift amount will be 1, so val will be 7>>1 == 3 coming
>> out of the loop, and the return value will have two bits set
>> rather than one.
>>
>> The solution to this problem is to pick the initial shift amount
>> to be the largest power of two less than or equal to the maximal
>> value of T (that is, the log base 2 of that power). This initial
>> shift value can be computed by means of an expression that gives
>> a compile-time constant value, for any practical value of shift
>> amount. (Naturally we would encapsulate the complications in a
>> macro definition or two.) If someone gets stuck I can post code
>> for that in a followup.
>
> This would be really interesting. Personally I would agree to
> hang the first version of the function to the second :^)

If you want to see some code for the "best power of two" macro let
me know. I should warn you that there is nothing especially
interesting about it, except perhaps for the MASK_WIDTH() macro,
which was already posted.

I should also say that, although this scheme has a nice simplicity
to it, I don't think it's going to lead anywhere, because there are
other methods that are faster and not a lot more difficult to
understand. It is a nice solution, and you deserve recognition for
that; it seems unlikely though to be the method of choice for this
particular problem.

Re: portable way to get highest bit set?

<86ttqi4oae.fsf@linuxsc.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30023&group=comp.lang.c#30023

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: tr.17...@z991.linuxsc.com (Tim Rentsch)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Sun, 22 Oct 2023 09:25:45 -0700
Organization: A noiseless patient Spider
Lines: 88
Message-ID: <86ttqi4oae.fsf@linuxsc.com>
References: <ug5gvh$1jkar$3@dont-email.me> <20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com> <ug6huc$1rvp1$1@dont-email.me> <86h6mxawqq.fsf@linuxsc.com> <20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com> <86cyxkb2ka.fsf@linuxsc.com> <20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com> <864jiwaqic.fsf@linuxsc.com> <20231012173021.0000149c@yahoo.com> <86v8bbanjv.fsf@linuxsc.com> <20231012191641.00007493@yahoo.com> <865y346qvs.fsf@linuxsc.com> <20231018174110.00003717@yahoo.com> <86sf675fir.fsf@linuxsc.com> <20231020001455.00006ea4@yahoo.com> <20231021225010.00001606@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="6dc747ee13145ffb1c38bba62e6e987b";
logging-data="2664480"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19AsQunH/W/pAHKLiBp4BPtTrC1pCdhON4="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:XM4ZaYjJGA1ehBaqGVdb+I35Sg4=
sha1:tUuoaX100zDzlp2KO5Gg3aXnNlI=

by: Tim Rentsch - Sun, 22 Oct 2023 16:25 UTC

Michael S <already5chosen@yahoo.com> writes:

> On Fri, 20 Oct 2023 00:14:55 +0300
> Michael S <already5chosen@yahoo.com> wrote:
>
>> On Wed, 18 Oct 2023 22:48:28 -0700
>> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>>
>>> Michael S <already5chosen@yahoo.com> writes:
>>>
>>>> On Wed, 18 Oct 2023 05:45:27 -0700
>>>> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>>>
>>> [...]

I am responding here to both of your last two postings. My
comments are mostly just isolated statements, sometimes in the
general area of some of what you said, sometimes of a more general
nature.

My measurements of the expanding-shift-oring method give results
that are sometimes better and sometimes worse than other methods
already posted. Results depend a lot on the platform (I have
tried two) and on the compiler used (gcc and clang). Ditto for
the FP methods, including the elaborate one posted in your last
message. I am deliberately non-specific about what the platforms
are and which versions of the compilers were used, partly because
I don't know, but mostly because my focus is on comparing the
different algorithms, not on comparing the compilers or platforms.

Your measuring tool is great (and certainly appropriate for the
group here, given that it furthers a conversation about C code).
I used it to try maybe half a dozen different variations, and
found three that consistently beat the straight-line shift-oring
code (with both gcc and clang). There were major differences
between gcc and clang, but I always did apples-and-apples
comparisons (and using -O3 -march=native, as you suggested).
Some methods were better on only one of the two measures (either
Throughput or Latency, I don't remember which), some were better
on both, but there was a tradeoff between those two results:
either a little bit better on both measures, or a lot better on
only one (and somewhat worse on the other).

Incidentally, I found the default number of iterations in the tool
to be too low to give reliable indicators of performance. I ended
up using about 2000 iterations, which gave pretty steady results.

After seeing your second posting, I tried a lot of variations of
your FP method. I feel like most of my time went into reverse
engineering what the compilers were doing. There were big
variations between gcc and clang, which I sort of expected, but
the larger problem was that minor changes that seemed like they
should make no difference sometimes caused huge differences.
Weird. Also some changes were fine on clang but awful on gcc
(I didn't find any going the other direction, but maybe that's
because I wasn't looking very hard there).

End result, my sense is that the fastest method is likely to be
platform specific, and probably needs to be coded in assembly.
Also it may depend on the expected distribution of inputs, which
I didn't explore at all (I simply used your testbench code without
trying to understand it or change it in any significant way).

Much thanks for the testbench code. I would not have gotten nearly
as far without it.

Oh, by the way, one result I did find is that using a union is
just as good as using memcpy(). That did /not/ hold for using
bitfields to extract the exponent, but just for switching between
representations a union was fine. (Maybe if I had been a little
smarter I could have gotten bitfields to work also, but I didn't
get that far.)

> When the number of bits is greater than 128, this method, assuming
> 64-bit underlaying hardware, become O(N*N) while Kaz's method
> remains O(N*log(N)) so I expect this method to be equal in speed
> to Kaz's for 256 bit and slower for 512 bits. But those aree just
> guesses.

My comment upthread is about scaling. As the size gets bigger,
the method of oring with ever-increasing shift amounts is doomed,
because it's an O(N*log(N)) algorithm, but the problem can be
solved in O(N). My sense is that 128 bits is close to the
crossover, and maybe even is the crossover if we were writing in
assembly rather than C. And now that I've said that, it's time
to abandon the discussion, since it is now no longer topical in
a C newsgroup. :)

Re: portable way to get highest bit set?

<uh43ee$2lgq1$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30029&group=comp.lang.c#30029

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: nos...@please.ty (jak)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Sun, 22 Oct 2023 23:15:58 +0200
Organization: A noiseless patient Spider
Lines: 101
Message-ID: <uh43ee$2lgq1$1@dont-email.me>
References: <ug5gvh$1jkar$3@dont-email.me>
<20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com>
<ug6huc$1rvp1$1@dont-email.me> <86h6mxawqq.fsf@linuxsc.com>
<20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com>
<86cyxkb2ka.fsf@linuxsc.com>
<20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com>
<864jiwaqic.fsf@linuxsc.com> <20231012173021.0000149c@yahoo.com>
<86v8bbanjv.fsf@linuxsc.com> <20231012191641.00007493@yahoo.com>
<865y346qvs.fsf@linuxsc.com> <ugp766$3ppas$1@dont-email.me>
<86wmvj5gkc.fsf@linuxsc.com> <ugqm71$7e0u$1@dont-email.me>
<86y1fu4rfm.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 22 Oct 2023 21:15:58 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="b301a23c3eea6a37273b705753873c94";
logging-data="2802497"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Z9cXPB+CGtaxcKgMN5RdN"
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.17.1
Cancel-Lock: sha1:nufaqTIp9dqEbNjp+5RmXZ/Pmvc=
In-Reply-To: <86y1fu4rfm.fsf@linuxsc.com>

by: jak - Sun, 22 Oct 2023 21:15 UTC

Tim Rentsch ha scritto:
> jak <nospam@please.ty> writes:
>
>> Tim Rentsch ha scritto:
>>
>>> jak <nospam@please.ty> writes:
>>>
>>>> [an improvement on an earlier function]
>>>>
>>>> #include <limits.h>
>>>>
>>>> typedef unsigned long T;
>>>>
>>>> T h_bit(T val)
>>>> {
>>>> T mask;
>>>> int tbits = 0, bits = (sizeof(mask) * CHAR_BIT) / 2;
>>>> for(; bits > 0; bits /= 2)
>>>> {
>>>> if((mask = ~0UL << bits) & val)
>>>> {
>>>> val >>= bits;
>>>> tbits += bits;
>>>> }
>>>> }
>>>> return val << tbits;
>>>> }
>>>
>>> Right. We can generalize the ad hoc few extra shifts by writing
>>> a loop. [...]
>>>
>>> The function above has two problems. [...]
>>>
>>> The second problem is more subtle. The above code works fine if
>>> the width of type T is a power of two, but doesn't work properly
>>> if it isn't. Consider for example an implementation where the
>>> value of CHAR_BIT is 12, and the function is called with an
>>> argument value of 7. Assuming sizeof (T) is a power of two,
>>> eventually we will get to a point where the shift amount is 3.
>>> There is no overlap between (T)-1 << 3 and 7, so the if() will
>>> not execute its controlled block. But on the next iteration
>>> the shift amount will be 1, so val will be 7>>1 == 3 coming
>>> out of the loop, and the return value will have two bits set
>>> rather than one.
>>>
>>> The solution to this problem is to pick the initial shift amount
>>> to be the largest power of two less than or equal to the maximal
>>> value of T (that is, the log base 2 of that power). This initial
>>> shift value can be computed by means of an expression that gives
>>> a compile-time constant value, for any practical value of shift
>>> amount. (Naturally we would encapsulate the complications in a
>>> macro definition or two.) If someone gets stuck I can post code
>>> for that in a followup.
>>
>> This would be really interesting. Personally I would agree to
>> hang the first version of the function to the second :^)
>
> If you want to see some code for the "best power of two" macro let
> me know. I should warn you that there is nothing especially
> interesting about it, except perhaps for the MASK_WIDTH() macro,
> which was already posted.
>

ok. Thank you.

>
> I should also say that, although this scheme has a nice simplicity
> to it, I don't think it's going to lead anywhere, because there are
> other methods that are faster and not a lot more difficult to
> understand. It is a nice solution, and you deserve recognition for
> that; it seems unlikely though to be the method of choice for this
> particular problem.
>

I aimed at simplicity to make it as portable as possible. I tried what I
told you:

#include <limits.h>

typedef unsigned long T;

T h_bit(T val)
{ int tbits = 0, bits = (sizeof(val) * CHAR_BIT) / 2;
for(; bits > 0; bits /= 2)
if((((T)-1) << bits) & val)
{
val >>= bits;
tbits += bits;
}
for(; val > 1; val /= 2, tbits++);
return val << tbits;
}

In this way it also works with char_bit == 12 and for any value it
performs:
3 Interactions per data 1 byte wide.
+1 When the width doubles (e.g. 6 for 8 bytes).
+1 if CHAR_BIT == 12.
If there is no something that halves these timing, given its simplicity
I remain happy.

Re: portable way to get highest bit set?

<20231023011138.000078ba@yahoo.com>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30033&group=comp.lang.c#30033

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: already5...@yahoo.com (Michael S)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Mon, 23 Oct 2023 01:11:38 +0300
Organization: A noiseless patient Spider
Lines: 116
Message-ID: <20231023011138.000078ba@yahoo.com>
References: <ug5gvh$1jkar$3@dont-email.me>
<20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com>
<ug6huc$1rvp1$1@dont-email.me>
<86h6mxawqq.fsf@linuxsc.com>
<20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com>
<86cyxkb2ka.fsf@linuxsc.com>
<20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com>
<864jiwaqic.fsf@linuxsc.com>
<20231012173021.0000149c@yahoo.com>
<86v8bbanjv.fsf@linuxsc.com>
<20231012191641.00007493@yahoo.com>
<865y346qvs.fsf@linuxsc.com>
<20231018174110.00003717@yahoo.com>
<86sf675fir.fsf@linuxsc.com>
<20231020001455.00006ea4@yahoo.com>
<20231021225010.00001606@yahoo.com>
<86ttqi4oae.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="c1487f2676524522ee2f938c97b0b269";
logging-data="2806238"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19MZY3pUipxe3/nkTcruMEuKIaoLw+S/mY="
Cancel-Lock: sha1:IMvw/HjUXO1ra3l/CueK1TrL49k=
X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)

by: Michael S - Sun, 22 Oct 2023 22:11 UTC

On Sun, 22 Oct 2023 09:25:45 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

> Michael S <already5chosen@yahoo.com> writes:
>
> > On Fri, 20 Oct 2023 00:14:55 +0300
> > Michael S <already5chosen@yahoo.com> wrote:
> >
> >> On Wed, 18 Oct 2023 22:48:28 -0700
> >> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> >>
> >>> Michael S <already5chosen@yahoo.com> writes:
> >>>
> >>>> On Wed, 18 Oct 2023 05:45:27 -0700
> >>>> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> >>>
> >>> [...]
>
> I am responding here to both of your last two postings. My
> comments are mostly just isolated statements, sometimes in the
> general area of some of what you said, sometimes of a more general
> nature.
>
> My measurements of the expanding-shift-oring method give results
> that are sometimes better and sometimes worse than other methods
> already posted. Results depend a lot on the platform (I have
> tried two) and on the compiler used (gcc and clang). Ditto for
> the FP methods, including the elaborate one posted in your last
> message. I am deliberately non-specific about what the platforms
> are and which versions of the compilers were used, partly because
> I don't know, but mostly because my focus is on comparing the
> different algorithms, not on comparing the compilers or platforms.
>
> Your measuring tool is great (and certainly appropriate for the
> group here, given that it furthers a conversation about C code).
> I used it to try maybe half a dozen different variations, and
> found three that consistently beat the straight-line shift-oring
> code (with both gcc and clang). There were major differences
> between gcc and clang, but I always did apples-and-apples
> comparisons (and using -O3 -march=native, as you suggested).
> Some methods were better on only one of the two measures (either
> Throughput or Latency, I don't remember which), some were better
> on both, but there was a tradeoff between those two results:
> either a little bit better on both measures, or a lot better on
> only one (and somewhat worse on the other).
>
> Incidentally, I found the default number of iterations in the tool
> to be too low to give reliable indicators of performance. I ended
> up using about 2000 iterations, which gave pretty steady results.
>

Most likely, you computer was heavily loaded with other jobs.
On idle computer median should stabilaze much sooner. And mean can be
simply ignored.
Alternatevely, on some new server CPUs it could be an opposite effect
in action - due to aggresive power saving it takes relatively long time
until CPU decides to run at maximal frequency. This particular problem
can be mitigated by binding your process to specific CPU core (on Linux
- via taskset command, on Windows with start /affinity).

> After seeing your second posting, I tried a lot of variations of
> your FP method. I feel like most of my time went into reverse
> engineering what the compilers were doing. There were big
> variations between gcc and clang, which I sort of expected, but
> the larger problem was that minor changes that seemed like they
> should make no difference sometimes caused huge differences.
> Weird. Also some changes were fine on clang but awful on gcc
> (I didn't find any going the other direction, but maybe that's
> because I wasn't looking very hard there).
>
> End result, my sense is that the fastest method is likely to be
> platform specific, and probably needs to be coded in assembly.
> Also it may depend on the expected distribution of inputs, which
> I didn't explore at all (I simply used your testbench code without
> trying to understand it or change it in any significant way).
>
> Much thanks for the testbench code. I would not have gotten nearly
> as far without it.
>
> Oh, by the way, one result I did find is that using a union is
> just as good as using memcpy(). That did /not/ hold for using
> bitfields to extract the exponent, but just for switching between
> representations a union was fine. (Maybe if I had been a little
> smarter I could have gotten bitfields to work also, but I didn't
> get that far.)
>
>
> > When the number of bits is greater than 128, this method, assuming
> > 64-bit underlaying hardware, become O(N*N) while Kaz's method
> > remains O(N*log(N)) so I expect this method to be equal in speed
> > to Kaz's for 256 bit and slower for 512 bits. But those aree just
> > guesses.
>
> My comment upthread is about scaling. As the size gets bigger,
> the method of oring with ever-increasing shift amounts is doomed,
> because it's an O(N*log(N)) algorithm, but the problem can be
> solved in O(N). My sense is that 128 bits is close to the
> crossover, and maybe even is the crossover if we were writing in
> assembly rather than C. And now that I've said that, it's time
> to abandon the discussion, since it is now no longer topical in
> a C newsgroup. :)

Intrinsic functions should be as fast as asm.
With gcc - __builtin_clz() family. It does not belong to the realm of
portable C, but still is much more portable than asm. And arguably still
on topic in this group, if not in this thread.

As to your suggestion that the problem can be solved in O(N) time, right
now, at night, I can only think of sort of O(N) solutions that exploit
knowledge of internal representation of wide integer number, e.g.
knowledge of its endiannes, either Little or Big. Which is not exactly
portable.

Re: portable way to get highest bit set?

<uh49nm$2mrvj$1@dont-email.me>

copy mid

https://www.novabbs.com/devel/article-flat.php?id=30034&group=comp.lang.c#30034

copy link Newsgroups: comp.lang.c

Path: i2pn2.org!i2pn.org!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.lang.c
Subject: Re: portable way to get highest bit set?
Date: Sun, 22 Oct 2023 16:03:18 -0700
Organization: A noiseless patient Spider
Lines: 67
Message-ID: <uh49nm$2mrvj$1@dont-email.me>
References: <ug5gvh$1jkar$3@dont-email.me>
<20231011102714.44a870af4dfe68f756974953@g{oogle}mail.com>
<ug6huc$1rvp1$1@dont-email.me> <86h6mxawqq.fsf@linuxsc.com>
<20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com>
<86cyxkb2ka.fsf@linuxsc.com>
<20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com>
<864jiwaqic.fsf@linuxsc.com> <20231012173021.0000149c@yahoo.com>
<86v8bbanjv.fsf@linuxsc.com> <20231012191641.00007493@yahoo.com>
<865y346qvs.fsf@linuxsc.com> <20231018174110.00003717@yahoo.com>
<86sf675fir.fsf@linuxsc.com> <20231020001455.00006ea4@yahoo.com>
<20231021225010.00001606@yahoo.com> <86ttqi4oae.fsf@linuxsc.com>
<20231023011138.000078ba@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 22 Oct 2023 23:03:18 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="efc60c55f8725e1e11e7eb2f0ab7c611";
logging-data="2846707"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18IlgM+TMkNKxCt4BSwQLqOwFqs9i2uVOc="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:GxJFFnjMoPYq8LFobKOB82QwjxE=
Content-Language: en-US
In-Reply-To: <20231023011138.000078ba@yahoo.com>

by: Chris M. Thomasson - Sun, 22 Oct 2023 23:03 UTC

On 10/22/2023 3:11 PM, Michael S wrote:
> On Sun, 22 Oct 2023 09:25:45 -0700
> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
>> Michael S <already5chosen@yahoo.com> writes:
>>
>>> On Fri, 20 Oct 2023 00:14:55 +0300
>>> Michael S <already5chosen@yahoo.com> wrote:
>>>
>>>> On Wed, 18 Oct 2023 22:48:28 -0700
>>>> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>>>>
>>>>> Michael S <already5chosen@yahoo.com> writes:
>>>>>
>>>>>> On Wed, 18 Oct 2023 05:45:27 -0700
>>>>>> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>>>>>
>>>>> [...]
>>
>> I am responding here to both of your last two postings. My
>> comments are mostly just isolated statements, sometimes in the
>> general area of some of what you said, sometimes of a more general
>> nature.
>>
>> My measurements of the expanding-shift-oring method give results
>> that are sometimes better and sometimes worse than other methods
>> already posted. Results depend a lot on the platform (I have
>> tried two) and on the compiler used (gcc and clang). Ditto for
>> the FP methods, including the elaborate one posted in your last
>> message. I am deliberately non-specific about what the platforms
>> are and which versions of the compilers were used, partly because
>> I don't know, but mostly because my focus is on comparing the
>> different algorithms, not on comparing the compilers or platforms.
>>
>> Your measuring tool is great (and certainly appropriate for the
>> group here, given that it furthers a conversation about C code).
>> I used it to try maybe half a dozen different variations, and
>> found three that consistently beat the straight-line shift-oring
>> code (with both gcc and clang). There were major differences
>> between gcc and clang, but I always did apples-and-apples
>> comparisons (and using -O3 -march=native, as you suggested).
>> Some methods were better on only one of the two measures (either
>> Throughput or Latency, I don't remember which), some were better
>> on both, but there was a tradeoff between those two results:
>> either a little bit better on both measures, or a lot better on
>> only one (and somewhat worse on the other).
>>
>> Incidentally, I found the default number of iterations in the tool
>> to be too low to give reliable indicators of performance. I ended
>> up using about 2000 iterations, which gave pretty steady results.
>>
>
> Most likely, you computer was heavily loaded with other jobs.

Could be. Try to shut down as many processes as possible to get to a
"bare bones" state, before running any benchmark...

> On idle computer median should stabilaze much sooner. And mean can be
> simply ignored.
> Alternatevely, on some new server CPUs it could be an opposite effect
> in action - due to aggresive power saving it takes relatively long time
> until CPU decides to run at maximal frequency. This particular problem
> can be mitigated by binding your process to specific CPU core (on Linux
> - via taskset command, on Windows with start /affinity).

devel / comp.lang.c / Re: portable way to get highest bit set?

Pages:1 2 3 4 5 678

server_pubkey.txt

rocksolid light 0.9.81
clearnet tor