novaBBS - comp.lang.c - Re: Losing my mind: results change with/without printf() statements

On 12/07/2021 10:51, Michael S wrote:
> On Sunday, July 11, 2021 at 6:40:14 PM UTC+3, Mike Terry wrote:
>> On 09/07/2021 13:52, Michael S wrote:
<snip>
....
>>>
>>> Solved it, finally.
>>> 6 seconds of intentionally non-optimized compute + ~30 hours of thinking and trying, 10 of each due to being stubborn.
>>>
>>> 1 9 9
>>> 2 20 29
>>> 3 360 389
>>> 4 2701 3090
>>> 5 4096 7186
>>> 6 109466 116652
>>> 7 161022 277674
>>> 8 13068583 13346257
>>> 9 2136960 15483217
>>> 10 0 15483217
>>> 11 71101800 86585017
>>> 12 55121700430 55208285447
>>> 13 1057516028 56265801475
>>> 14 1023436651875 1079702453350
>>> 15 30731637609504 31811340062854
>>> 16 719883432165805 751694772228659
>>> 17 195741994241 751890514222900
>>> 18 2325274798835849 3077165313058749
>>> 19 2253334981970 3079418648040719
>>>
>>> But I cheated - did it in C++ with use of STL containers.
>>> Of course, I can do it in C, but then bulk of the code will be dealing with infrastructure instead of problem in hand.
>> I've also worked out a "solution", and it agrees with your table above.
>> I'm not sure it counts, though, as it runs MUCH slower than yours! E.g.
>> for the 19-digit calculation it takes around 4 minutes per digit (so
>> over an hour for the whole calculation). I have two big tables, which
>> are copied, and just copying them (once per digit) takes perhaps 20
>> seconds. Well I was being lazy and I don't have to copy them really,
>> but anyway that's only 5 mins of the run time, so perhaps my algorithm
>> is different from yours. (Or perhaps it's basically the same but there
>> are some implementation optimisations I could be using.)
>>
>>
>> Mike.

Right, what you say below is effectively the logic for my solution, but
my implementation was a bit crappy. I was using a number of huge arrays
where you were using associative containers. At the time I thought the
associative containers would be slower than just looping the tables, due
to key searches and (naively?) memory allocations for the nodes.

I had four arrays, each of 3^d entries [d= #digits = divisor], each
entry being an 8 byte counter. So total virtual memory 32 * 3^d bytes.
This was "manageable" up to around d=18 (run time about 5 minutes) with
run times multiplying by 3 as expected for each increment of d, but d=18
exhausted all physical ram (32 G on my laptop, as I like to run several
VMs without suffering) and d=19 went up by a factor of about 40 rather
than 3! Yesterday I checked paging and it had shot up (as you can
imagine), but nowadays that's not immediately obvious as the disk is
solid state. In the old days I would have heared the paging straight
away, and also Windows 10 seems not to mind so much in terms of
impacting other processes - I hadn't really noticed any slowdown for
other processes at the time!

So I've already started to experiment with an associative container to
see how that goes. (I'm thinking I might need a custom allocator(s) to
give "nearly free" allocations per "table". Don't know about that yet...)

>
>
>
> Let's see.
> My solution is based on following observations:
>
> Observations 1.
> When the numbers are split in n-digit prefix and s-digit suffix then the
> amount of 1-child numbers for all possible suffixes of specific prefix depends
> on number of children of prefix itself (0 or 1, obviously all prefixes with 2
> or more children are irrelevant) and on n values of remainder for "right-edge
> facing" substrings, i.e. substrings [1:n], [2:n],.., [n:n].
> All prefixes that have these characteristic sets the same can be processed
> together as a group. For the rest of discussion I would call such set of
> characteristics simply "a key".
> So far, it gives us nothing*, because possible number of keys = 2 * nDigits**n,
> i.e. for "interesting" values of nDigits the exploration space is bigger than
> original space of "raw" prefixes that contains only 10**n elements.
>
> Observations 2.
> The order of remainders in the key does not matter.
> That's the most important step that vastly reduces the keys space and
> increases # of elements in each group.
>
> Observations 3.
> If several remainders in the key have the same value then only the fact
> of repetition matter, but the number of repetitions of given remainder
> does not matter.
> This observation is probably not very important for increasing # of elements in
> the group == reduction of # of groups, but it allows relatively compact and
> convenient representation of the keys as bit patterns of 2*nDigits bits, where
> one half of the bits marks a presence of particular reminders in the set and
> another half of the bits marks remainders that appear more than once.
> Alternatively, one can use even more compact representation on base 3, where
> trit value=0 of means that respective reminder does not appear in the set,
> value=1 means that it appears once and value=2 means that it appears more than
> once. I didn't try this representation, because I feel that for nDigits <= 32
> it would not make calculations faster. I can consider it for nDigits > 32.
> But for nDigits=33 to 36 I can think for few other ideas of compaction and
> nDigits=37 so far looks obviously out of reach, so I could as well not care
> about it.
>
> Observations 4.
> For prefixes with 1 child, as opposed to prefixes with no children, not just
> a number of repetitions of particular reminder does not matter, but even a fact
> of repetition does not matter. So, key representation for this case could be
> shrunk to nDigit bits.
> That observation is new for me, I don't use it yet so do not know how much it
> helps.

OK, that sounds like my thinking, more or less. I iterate along the
string digit by digit, counting the various prefix categories, e.g. for
a string like "1103" mod 4, successive categories for each round would
go like this:

Notation: <abcd> means
d "occurences" of 0 (mod 4)
c "occurences" of 1 (mod 4)
b "occurences" of 2 (mod 4)
a "occurences" of 3 (mod 4)
with a,b,c,d each being 0,1,or 2. I think this is effectively your
"key" (never needing to track a,b,c,d above 2, as more occurences
behave as for 2 occurences)

"1" : <0010>
"11" : <1010>
"110" : <0201> // 1 mod=0 here
"1103" : <0002>

So counting "occurences" for e.g. "110" I'm interested in the
right-to-left suffixes of the string, which happen to be "0", "10",
"110" having mod 4 remainders 0, 2, 2 respectively, hence "one 0, two
2s" : <0201>. This "key" is always enough to calculate the next line
key given the next digit number to add in. (I think that's the key
insight, reducing the space that needs to be tracked...) The "two 2s"
become "two 0s" due to multiplication by 10 (the number base), and then
"two 3s" due to adding digit 3. And the "one 0" also ends up as "one
3", and finally the digit 3 itself gives us "one 3" : total "four 3s"
but saturation reduces this to "two 3s" key <2000>. All the different
ways <0201> might have arisen from various input strings don't matter
when adding the next digit, just <2000> is all the data we need. Oh,
and I track whether or not a previous mod=0 has occured, independently
of each of the keys, so I have two tables which I call "open" and
"closed" string counts, and key counts transfer from the open to the
closed array when a mod=0 occurs. [Straight away there's an
inefficiency in that some of my array entries are never incremented, yet
I still iterate over them. I'm thinking this becomes a non-issue
switching to an associative array.

So there's a kind of "key arrithmetic" going on, with "multiplications
by 10" and "adding new digit", and I might phrase this in my program's
terms as <0201> x 10 + 3 = <0002> but of course that's not regular
integer multiplication and addition...

>
>
> Now to implementation.
> As mentioned in post above, for associative array at the heart of algorithm I
> use ready-made STL container.
> Specifically, I use std::unordered_map. For this particular job unordered_map
> is better than std::map - both a little faster and consumes less memory.
> I use unsigned 64-bit integers both for key (as explained above) and for value
> == number of elements in prefix group. Since # of elements could sometimes
> exceed 2**64, I use a little trick to handle it - numbers in range [0:2**63-1]
> stored directly, while for bigger numbers, which are extremely rare, stored
> as a pointer (actually, index) into secondary storage.
> Originally, I was holding both "current" n-digit groups and "next" (n+1)-digit
> groups in associative arrays, but with such organization on 64GB machine,
> for nDigits=29 at stages of n=14,15,16 I was running out of memory.
> So, in the up-to-date variant of program, at the end of each sub-stage
> associative array is copied to flat array of (key,value) pairs. The later is
> ~2.5 times more compact. Associative array is freed. This process takes some
> time (not a lot) and only reduces peak memory consumption by factor of 1.5, but
> on this particular machine that made a difference between finishing 29 or
> running out of memory (or into swapping, which is unacceptably slow).
>
> There is one more trick in the final version that I borrowed from brute-force
> implementations of last week - the use of small look-up tables for calculation
> of remainder. Unlike in brute-force, in "smart" variants this trick is not a
> major source of speed-up, because majority of time spent in access of
> associative arrays rather than in "inner loop". But when run time is measured
> in hours then even 20% speed-up is welcome.

I haven't "studied" the above, as I'm still thinking about changing my
own program to use associative arrays, and having got to this point I
want to get a finished result myself. My first (working) solution had
only simple array(s) to track counts for each key (which I call
"indexes" which seemed natural for use of arrays, LOL) and simply loop
over the whole array when iterating for the next digit. So for length 4
strings I have (naively) 3^4 = 81 keys <abcd> so I had array length 81
and looped over it 4 times (once for each digit) - well, you can see
what I was doing, I expect. I hadn't properly considered how SPARSE
these arrays would be, which is what I started wondering about yesterday.

Also, I'm worried about memory allocations used by STL containers,
wondering if I will need a custom allocator... (but I haven't got to
that point yet as I need to check the unoptimised container logic
version works as expected, and anyway I'm unsure how it all works with
different container types.) Then there are still other implementation
choices I made with no real thought at the time. Like I precalculate
the x10 operation on indexes, thinking this would be needed multiple
times, but again how SPARSE are my arrays? As the string length
increased just iterating once over EVERY key value to calculate the x10
result might be worse than calculating it only when required. (Probably
not the biggest problem I have though as this is just a one-dimension loop.)

So I'm going to carry on on my own for a bit as I've become "hooked" by
the whole thing, and it's more fun than reading Peter Olcott posts. :)

Anyway, thanks for taking the time to explain your implementation
details - I'll certainly come back to look at them in more detail when I
get to my own end point...

Mike.

Subject	Replies	Author
Losing my mind: results change with/without printf() statements By: DFS on Wed, 30 Jun 2021	63	DFS

"When in doubt, print 'em out." -- Karl's Programming Proverb 0x7

devel / comp.lang.c / Re: Losing my mind: results change with/without printf() statements