Message-ID:

We don't really understand it, so we'll give it to the programmers.

On Sunday, July 11, 2021 at 6:40:14 PM UTC+3, Mike Terry wrote:
> On 09/07/2021 13:52, Michael S wrote:
> > On Thursday, July 8, 2021 at 11:03:08 AM UTC+3, Michael S wrote:
> >> On Wednesday, July 7, 2021 at 2:02:53 AM UTC+3, Michael S wrote:
> >>> On Tuesday, July 6, 2021 at 8:28:02 PM UTC+3, Kent Dickey wrote:
> >>>> In article <c7bfb14a-2084-41c3...@googlegroups.com>,
> >>>> Michael S <already...@yahoo.com> wrote:
> >>>>>
> >>>>> Those are still variations of the theme that I outlined in the post from
> >>>>> Jul 4, 2021, 1:21:16 PM
> >>>>> As proven in the post from Jul 5, 2021, 12:44:48 PM, as far as original
> >>>>> 1=19 digits challenge is concerned, it's a dead end.
> >>>>> In order to tackle the original challenge, the positive (one-child)
> >>>>> results should somehow be discovered and counted in bunches of of at
> >>>>> least thousands (preferably 100s of thousands) rather than one-by-one.
> >>>>> Just rejecting negatives in bunches is not sufficient.
> >>>> I was just posting code I wrote before your posts.
> >>>>
> >>>> I did not see any "proofs", there was a post with some numbers, with no
> >>>> explanation of where they came from or why we should believe it.
> >>> There was an explanation: the numbers are estimates found by Monte Carlo method.
> >>> For each number of digits I tested 1E7 randomly distributed points and extrapolated.
> >>> Later on I repeated the tests with 1E8 point per range and got approximately the same
> >>> result. That is, estimates could be off by few per cents, but it wouldn't change the conclusion:
> >>> the total number of one-child numbers in range 1:1e19 is too huge for any methods that reject
> >>> candidates in bunches but confirm individually.
> >>> Why should you believe it? Because I said so and I tend to be reliable in that sort of things.
> >>> I didn't post the Monte Carlo code here because
> >>> (a) it is trivial, every decent pro has to be able to code something like that in less than 2 hours
> >>> (b) it is coded in C++ and the group is about C.
> >>> I can save it at my github account if wish to see it.
> >>>
> >>> BTW, here are full results:
> >>> 1: 10 / 10 * 10 = 1.000000000000000e+01
> >>> 2: 20 / 90 * 90 = 2.000000000000000e+01
> >>> 3: 360 / 900 * 900 = 3.600000000000000e+02
> >>> 4: 2701 / 9000 * 9000 = 2.701000000000000e+03
> >>> 5: 4096 / 90000 * 90000 = 4.096000000000000e+03
> >>> 6: 109466 / 900000 * 900000 = 1.094660000000000e+05
> >>> 7: 161022 / 9000000 * 9000000 = 1.610220000000000e+05
> >>> 8: 13068583 / 90000000 * 90000000 = 1.306858300000000e+07
> >>> 9: 237707 / 100000000 * 900000000 = 2.139363000000000e+06
> >>> 10: 0 / 100000000 * 9000000000 = 0.000000000000000e+00
> >>> 11: 78609 / 100000000 * 90000000000 = 7.074810000000000e+07
> >>> 12: 6126028 / 100000000 * 900000000000 = 5.513425200000000e+10
> >>> 13: 11791 / 100000000 * 9000000000000 = 1.061190000000000e+09
> >>> 14: 1137562 / 100000000 * 90000000000000 = 1.023805800000000e+12
> >>> 15: 3414651 / 100000000 * 900000000000000 = 3.073185900000000e+13
> >>> 16: 7992302 / 100000000 * 9000000000000000 = 7.193071800000000e+14
> >>> 17: 198 / 100000000 * 90000000000000000 = 1.782000000000000e+11
> >>> 18: 258937 / 100000000 * 900000000000000000 = 2.330433000000000e+15
> >>> 19: 30 / 100000000 * 9000000000000000000 = 2.700000000000000e+12
> >>> Total 3.084430326475721e+15
> >>>> I wrote
> >>>> some code to get some idea if pruning would work--and for some ranges, it's
> >>>> very effective, for other ranges, it's not effective at all. Basically,
> >>>> to see the timing differences for different ranges. My pruning code showed:
> >>>>
> >>>> size runtime factor number of 1-childs
> >>>> 10^6: 0.046 secs = = 116652
> >>>> 10^7: 0.231 secs = 5.02x = 277674
> >>>> 10^8: 5.374 secs = 23.2x = 13346257
> >>>> 10^9: 11.463 secs =2.13x = 15483217
> >>>>
> >>> I have exact results for up to 13 digits.
> >>> Ndigits N-one-childs
> >>> 1 - 10
> >>> 2 - 20
> >>> 3 - 360
> >>> 4 - 2701
> >>> 5 - 4096
> >>> 6 - 109466
> >>> 7 - 161022
> >>> 8 - 13068583
> >>> 9 - 2136960
> >>> 10 - 0
> >>> 11 - 71101800
> >>> 12 - 55121700430
> >>> 13 - 1057516028
> >>> Pay attention, format is slightly different from yours, yours is accumulated, mine is per specific number of digits.
> >>> 12 and 13 digits took relatively long compute time to find - 38min and 57min respectively. i.e. much more than the time allowed by informal rules of the challenge.
> >> Update:
> >> With better code (posted on comp.arch) I managed to brute-force 14 digits.
> >> It took almost 4 hours on a single core, far away from timing requirements of the challenge.
> >> I expect that 17 digits can be ripped in similar time. May be, 19 digits too, not sure about it.
> >> For 15 digits I expect results in 4-5 days.
> >> But ripping 16 or 18 digits with this code will take many months of single-core compute time.
> >> Current results:
> >> 1 - 9
> >> 2 - 20
> >> 3 - 360
> >> 4 - 2701
> >> 5 - 4096
> >> 6 - 109466
> >> 7 - 161022
> >> 8 - 13068583
> >> 9 - 2136960
> >> 10 - 0
> >> 11 - 71101800
> >> 12 - 55121700430
> >> 13 - 1057516028
> >> 14 - 1023436651875
> >> 15 - ???
> >> 16 - ???
> >> 17 - ???
> >> 18 - ???
> >> 19 - ???
> >>>> This shows pruning working very effectively for 10^9. But 8-digit numbers
> >>>> have so many 1-childs, that pruning the negative just doesn't cut out
> >>>> enough work. It needs to do something to count groups of 1-childs positively.
> >>>>
> >> I am starting to get an idea of how to count in bunches.
> >> Still have one algorithmic difficulty to figure out before I start coding.
> >>>> It's easy to brute force the digits through 10^9. However, something happens
> >>>> when the digits are >= 10. I already reported there are no 1-childs when
> >>>> digits == 10.
> >>> Yes, when the fact is know that's relatively easy to understand.
> >>>> The problem changes for digits >= 11 since individual digits
> >>>> can no longer be 1-childs (other than 0). Rather than trying them all, there's
> >>>> a way to show what patterns result in 1-childs (and not 2-childs or 0-childs),
> >>>> and then just counting those patterns. This is not obvious to me and would
> >>>> require a great deal more study, since it seems to revolve around 19-digit
> >>>> numbers and factors of 19 (and 18, etc.).
> >>>>
> >>>> I don't see how to partition this problem easily. Take 18-digit numbers. You
> >>>> can try all 9-digit numbers and calculate for all 10^9 combos whether that
> >>>> sequence is a 0-child, 1-child, or 2+child. So, all 18-digit numbers of
> >>>> 0-child 9-digit numbers and then 1-child 9-digit numbs, or 1-child 9-digit
> >>>> numbers then 0-child 9-digit numbers concatenated together would give you a
> >>>> 1-child. But that doesn't work: there are new patterns formed which need to
> >>>> be accounted for which could generate another 1-child, making the 18-digit
> >>>> number no longer a 1-child. I don't see how to detect that without forming
> >>>> them.
> >>>>
> >>> Me too.
> >>> And yes, 18 digits are the most problematic by far, but 16 digits would also
> >>> took months of single-core compute time with this sort of algorithm.
> >>>> I suspect there's a pattern, and that if you take a given divisor (say,
> >>>> 18), and form all strings of length 1 through 9, there's a pattern to the
> >>>> number of 1-childs for a given length (and we don't see this solving the
> >>>> problem directly since the divisor is changing as well as the length). And
> >>>> that this pattern results in a formula for 1-childs of 18 digits, something
> >>>> like: raw 1-childs for 10^18 - (raw 1-childs for 10^17) - (a few '18' cases
> >>>> caused by the extra digit going from 17 to 18 digits).
> >>>>
> >>>> Kent
> >
> > Solved it, finally.
> > 6 seconds of intentionally non-optimized compute + ~30 hours of thinking and trying, 10 of each due to being stubborn.
> >
> > 1 9 9
> > 2 20 29
> > 3 360 389
> > 4 2701 3090
> > 5 4096 7186
> > 6 109466 116652
> > 7 161022 277674
> > 8 13068583 13346257
> > 9 2136960 15483217
> > 10 0 15483217
> > 11 71101800 86585017
> > 12 55121700430 55208285447
> > 13 1057516028 56265801475
> > 14 1023436651875 1079702453350
> > 15 30731637609504 31811340062854
> > 16 719883432165805 751694772228659
> > 17 195741994241 751890514222900
> > 18 2325274798835849 3077165313058749
> > 19 2253334981970 3079418648040719
> >
> > But I cheated - did it in C++ with use of STL containers.
> > Of course, I can do it in C, but then bulk of the code will be dealing with infrastructure instead of problem in hand.
> I've also worked out a "solution", and it agrees with your table above.
> I'm not sure it counts, though, as it runs MUCH slower than yours! E.g.
> for the 19-digit calculation it takes around 4 minutes per digit (so
> over an hour for the whole calculation). I have two big tables, which
> are copied, and just copying them (once per digit) takes perhaps 20
> seconds. Well I was being lazy and I don't have to copy them really,
> but anyway that's only 5 mins of the run time, so perhaps my algorithm
> is different from yours. (Or perhaps it's basically the same but there
> are some implementation optimisations I could be using.)
>
>
> Mike.

Let's see.
My solution is based on following observations:

Observations 1.
When the numbers are split in n-digit prefix and s-digit suffix then the
amount of 1-child numbers for all possible suffixes of specific prefix depends
on number of children of prefix itself (0 or 1, obviously all prefixes with 2
or more children are irrelevant) and on n values of remainder for "right-edge
facing" substrings, i.e. substrings [1:n], [2:n],.., [n:n].
All prefixes that have these characteristic sets the same can be processed
together as a group. For the rest of discussion I would call such set of
characteristics simply "a key".
So far, it gives us nothing*, because possible number of keys = 2 * nDigits**n,
i.e. for "interesting" values of nDigits the exploration space is bigger than
original space of "raw" prefixes that contains only 10**n elements.

Observations 2.
The order of remainders in the key does not matter.
That's the most important step that vastly reduces the keys space and
increases # of elements in each group.

Observations 3.
If several remainders in the key have the same value then only the fact
of repetition matter, but the number of repetitions of given remainder
does not matter.
This observation is probably not very important for increasing # of elements in
the group == reduction of # of groups, but it allows relatively compact and
convenient representation of the keys as bit patterns of 2*nDigits bits, where
one half of the bits marks a presence of particular reminders in the set and
another half of the bits marks remainders that appear more than once.
Alternatively, one can use even more compact representation on base 3, where
trit value=0 of means that respective reminder does not appear in the set,
value=1 means that it appears once and value=2 means that it appears more than
once. I didn't try this representation, because I feel that for nDigits <= 32
it would not make calculations faster. I can consider it for nDigits > 32.
But for nDigits=33 to 36 I can think for few other ideas of compaction and
nDigits=37 so far looks obviously out of reach, so I could as well not care
about it.

Observations 4.
For prefixes with 1 child, as opposed to prefixes with no children, not just
a number of repetitions of particular reminder does not matter, but even a fact
of repetition does not matter. So, key representation for this case could be
shrunk to nDigit bits.
That observation is new for me, I don't use it yet so do not know how much it
helps.

Now to implementation.
As mentioned in post above, for associative array at the heart of algorithm I
use ready-made STL container.
Specifically, I use std::unordered_map. For this particular job unordered_map
is better than std::map - both a little faster and consumes less memory.
I use unsigned 64-bit integers both for key (as explained above) and for value
== number of elements in prefix group. Since # of elements could sometimes
exceed 2**64, I use a little trick to handle it - numbers in range [0:2**63-1]
stored directly, while for bigger numbers, which are extremely rare, stored
as a pointer (actually, index) into secondary storage.
Originally, I was holding both "current" n-digit groups and "next" (n+1)-digit
groups in associative arrays, but with such organization on 64GB machine,
for nDigits=29 at stages of n=14,15,16 I was running out of memory.
So, in the up-to-date variant of program, at the end of each sub-stage
associative array is copied to flat array of (key,value) pairs. The later is
~2.5 times more compact. Associative array is freed. This process takes some
time (not a lot) and only reduces peak memory consumption by factor of 1.5, but
on this particular machine that made a difference between finishing 29 or
running out of memory (or into swapping, which is unacceptably slow).

There is one more trick in the final version that I borrowed from brute-force
implementations of last week - the use of small look-up tables for calculation
of remainder. Unlike in brute-force, in "smart" variants this trick is not a
major source of speed-up, because majority of time spent in access of
associative arrays rather than in "inner loop". But when run time is measured
in hours then even 20% speed-up is welcome.

-------------
* - at least for harder cases of nDigits that are not divisible by either 2 or 5,

Subject	Replies	Author
Losing my mind: results change with/without printf() statements By: DFS on Wed, 30 Jun 2021	63	DFS

We don't really understand it, so we'll give it to the programmers.

devel / comp.lang.c / Re: Losing my mind: results change with/without printf() statements