Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  newsreader  groups  login

Message-ID:  

Staff meeting in the conference room in %d minutes.


devel / comp.arch / Re: My experience with Apple M1 chip

SubjectAuthor
* My experience with Apple M1 chipBranimir Maksimovic
+* Re: My experience with Apple M1 chipBranimir Maksimovic
|`* Re: My experience with Apple M1 chipBranimir Maksimovic
| `- Re: My experience with Apple M1 chipBranimir Maksimovic
+* Re: My experience with Apple M1 chipChris M. Thomasson
|`* Re: My experience with Apple M1 chipBranimir Maksimovic
| +* Re: My experience with Apple M1 chipThomas Koenig
| |`* Re: My experience with Apple M1 chipBranimir Maksimovic
| | `- Re: My experience with Apple M1 chipQuadibloc
| `* Re: My experience with Apple M1 chipaph
|  `* Re: My experience with Apple M1 chipBranimir Maksimovic
|   +* Re: My experience with Apple M1 chipaph
|   |`* Re: My experience with Apple M1 chipBranimir Maksimovic
|   | `* Re: My experience with Apple M1 chipBranimir Maksimovic
|   |  `- Re: My experience with Apple M1 chipBranimir Maksimovic
|   `* Re: My experience with Apple M1 chipJohn Dallman
|    +- Re: My experience with Apple M1 chipBranimir Maksimovic
|    `* Re: My experience with Apple M1 chipMichael S
|     +- Re: My experience with Apple M1 chipTheo
|     +- Re: My experience with Apple M1 chipJohn Levine
|     `* Re: My experience with Apple M1 chipBranimir Maksimovic
|      `* Re: My experience with Apple M1 chipMichael S
|       +* Re: My experience with Apple M1 chipKent Dickey
|       |+* Re: My experience with Apple M1 chipBranimir Maksimovic
|       ||`* Re: My experience with Apple M1 chipMichael S
|       || `* Re: My experience with Apple M1 chipBranimir Maksimovic
|       ||  +* Re: My experience with Apple M1 chipMichael S
|       ||  |+* Re: My experience with Apple M1 chipBranimir Maksimovic
|       ||  ||`* Re: My experience with Apple M1 chipMichael S
|       ||  || `- Re: My experience with Apple M1 chipBranimir Maksimovic
|       ||  |`* Re: My experience with Apple M1 chipBranimir Maksimovic
|       ||  | +- Re: My experience with Apple M1 chipBranimir Maksimovic
|       ||  | `* Re: My experience with Apple M1 chipMichael S
|       ||  |  `* Re: My experience with Apple M1 chipBranimir Maksimovic
|       ||  |   `- Re: My experience with Apple M1 chipMichael S
|       ||  `* Re: My experience with Apple M1 chipantispam
|       ||   `- Re: My experience with Apple M1 chipBranimir Maksimovic
|       |`* Re: My experience with Apple M1 chipMichael S
|       | +* Re: My experience with Apple M1 chipBranimir Maksimovic
|       | |+* Re: My experience with Apple M1 chipMichael S
|       | ||`- Re: My experience with Apple M1 chipBranimir Maksimovic
|       | |`* Re: My experience with Apple M1 chipAnton Ertl
|       | | +* Re: My experience with Apple M1 chipBranimir Maksimovic
|       | | |+* Re: My experience with Apple M1 chipKent Dickey
|       | | ||`- Re: My experience with Apple M1 chipBranimir Maksimovic
|       | | |`* Re: My experience with Apple M1 chipAnton Ertl
|       | | | +* Re: My experience with Apple M1 chipBranimir Maksimovic
|       | | | |`* Re: My experience with Apple M1 chipAnton Ertl
|       | | | | `* Re: My experience with Apple M1 chipJohn Dallman
|       | | | |  +- Re: My experience with Apple M1 chipBranimir Maksimovic
|       | | | |  `- Re: My experience with Apple M1 chipAnton Ertl
|       | | | `* Re: My experience with Apple M1 chipStefan Monnier
|       | | |  `- Re: My experience with Apple M1 chipMarcus
|       | | `* Re: My experience with Apple M1 chipMichael S
|       | |  `* Re: My experience with Apple M1 chipAnton Ertl
|       | |   `- Re: My experience with Apple M1 chipMichael S
|       | +* Re: My experience with Apple M1 chipKent Dickey
|       | |`* Re: My experience with Apple M1 chipMichael S
|       | | +- Re: My experience with Apple M1 chipBranimir Maksimovic
|       | | `* Re: My experience with Apple M1 chipJohn Dallman
|       | |  `- Re: My experience with Apple M1 chipBranimir Maksimovic
|       | `* Re: My experience with Apple M1 chipMichael S
|       |  `* Re: My experience with Apple M1 chipKent Dickey
|       |   `* Re: My experience with Apple M1 chipKent Dickey
|       |    `* Re: My experience with Apple M1 chipMichael S
|       |     `* Re: My experience with Apple M1 chipBranimir Maksimovic
|       |      `* Re: My experience with Apple M1 chipMichael S
|       |       `- Re: My experience with Apple M1 chipBranimir Maksimovic
|       `- Re: My experience with Apple M1 chipBranimir Maksimovic
`* Re: My experience with Apple M1 chipChris M. Thomasson
 +* Re: My experience with Apple M1 chipBranimir Maksimovic
 |`- Re: My experience with Apple M1 chipChris M. Thomasson
 `* Re: My experience with Apple M1 chipKent Dickey
  `- Re: My experience with Apple M1 chipChris M. Thomasson

Pages:123
Re: My experience with Apple M1 chip

<187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18440&group=comp.arch#18440

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5e08:: with SMTP id h8mr4663374qtx.54.1625613409674;
Tue, 06 Jul 2021 16:16:49 -0700 (PDT)
X-Received: by 2002:a05:6830:33ea:: with SMTP id i10mr16263001otu.342.1625613409399;
Tue, 06 Jul 2021 16:16:49 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 6 Jul 2021 16:16:49 -0700 (PDT)
In-Reply-To: <SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.182.191; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.182.191
References: <T6VEI.159$VU3.17@fx46.iad> <19dcc459-6eb5-4191-a186-c50d12ed347fn@googlegroups.com>
<2l%EI.8$gE.2@fx21.iad> <80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com>
<SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com>
Subject: Re: My experience with Apple M1 chip
From: already5...@yahoo.com (Michael S)
Injection-Date: Tue, 06 Jul 2021 23:16:49 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Michael S - Tue, 6 Jul 2021 23:16 UTC

On Tuesday, July 6, 2021 at 10:08:27 PM UTC+3, Kent Dickey wrote:
> In article <80ac50a0-dde2-4a66...@googlegroups.com>,
> Michael S <already...@yahoo.com> wrote:
> >On Tuesday, July 6, 2021 at 7:16:01 PM UTC+3, Branimir Maksimovic wrote:
> >> On 2021-07-06, Michael S <already...@yahoo.com> wrote:
> >> > I can believe that M1@3.2 GHz/Rosetta is able to run x64 software as
> >fast as i3-8100B but have trouble believing that it could match i7-8700B
> >either in single thread or in multithread throughput. Unless, of course,
> >absolute majority of run time spent in native libraries.
> >> You don't count that M1 is ~25-33% faster single core then any x86 :P
> >
> >I took it into account.
> >
> >Besides, while it's true for x86 CPUs in prev-gen Mac-Mini it's not true
> >for *any* x86.
> >M1 is slower than top Zen3 bins and about the same or a little slower
> >than top Comet Lake.
> >Probably somewhat slower than top Tiger Lake, but that comparison is
> >rather close.
> >Probably, measurably slower than top Rocket Lake, but I didn't look at
> >Rocket Lake closely.
> I have a Mac Mini M1, and it seems fast--very fast for some workloads (hard to
> predict branches, or working set in the 100-200KB range). It is not the
> fastest CPU on the planet, but it likely is the fastest laptop CPU. At < 10W
> at the AC plug it compares pretty favorably to 60W CPUs. If you have a
> relatively short benchmark (say, one file, C or C++, can be run from the Unix
> command line, doesn't require me to install anything else, should run in less
> than 5 minutes), I can compile it and run it for you, and then you can compare
> those results to any system you like. I don't think comparing optimized AVX
> is going to be useful, but simple integer or floating point algorithms would
> be best.
>
> Kent

The Euler-413 challenge that we discussed today in comp.lang.c for nDigits=11 should run in about 3 minutes.
But who is going to test on the fastest Zen3 ?
I have Ryzen 7 5800H at work, it's pretty fast and with single thread easily blows away 4.25GHz Skylake, but it's much slower than likes of 5900X.

Code:
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static unsigned long long oneChildsInRange(int nDigits);
int main(int argz, char** argv)
{ if (argz < 2) {
fprintf(stderr, "Usage:\n%s nDigits\n", argv[0]);
return 1;
}

char* endp;
int nDigits = strtol(argv[1], &endp, 0);
if (endp == argv[1]) {
fprintf(stderr, "Bad nDigits argument '%s'. Not a number.\n", argv[1]);
return 1;
}

if (nDigits < 5 || nDigits > 19) {
fprintf(stderr, "Please specify nDigits argument in range [5:19].\n");
return 1;
}

printf("%2d %20llu\n", nDigits, oneChildsInRange(nDigits));
return 0;
}

static int countChilds(const uint8_t *digits, int nDigits, const uint8_t *remTab)
{ int nChilds = 0;
for (int beg = 0; beg < nDigits; ++beg) {
unsigned r = 0;
for (int end = beg; end < nDigits; ++end) {
r = remTab[r*10+digits[end]];
nChilds += (r == 0);
}
if (nChilds > 1)
break; // we don't try to distinguish between cases of (nChilds > 1)
}
return nChilds;
}

static int countChilds2(
const uint8_t *prefix, int prefixlen,
const uint8_t *suffix, int suffixlen,
const uint8_t *remTab,
int nChilds0)
{ int nChilds = nChilds0;
for (int prefix_i = 0; prefix_i < prefixlen; ++prefix_i) {
unsigned r = prefix[prefix_i];
for (int i = 0; i < suffixlen; ++i) {
r = remTab[r*10+suffix[i]];
nChilds += (r == 0);
}
if (nChilds > 1)
break; // we don't try to distinguish between cases of (nChilds > 1)
}
return nChilds;
}

static void preprocessPrefix(uint8_t *dst, const uint8_t *src, int nDigits, const uint8_t *remTab)
{ for (int beg = 0; beg < nDigits; ++beg) {
unsigned r = 0;
for (int end = beg; end < nDigits; ++end)
r = remTab[r*10+src[end]];
dst[beg] = r;
}
}

static unsigned long long intpow(unsigned base, int pow)
{ unsigned long long prod = 1;
for (int k = 0; k < pow; ++k)
prod *= base;
return prod;
}

static void to_digits(uint8_t* dst, unsigned long long x, int nDigits)
{ for (int k = 0; k < nDigits; ++k) {
dst[nDigits-1-k] = x % 10;
x /= 10;
}
}

static unsigned long long oneChildsInRange(int nDigits)
{ // initialize look-up table
uint8_t remTab[200];
for (int i = 0; i < nDigits*10; ++i)
remTab[i] = i % nDigits;

// initialize table of suffixes
uint8_t suffixes[10000][4];
int nSuff0 = 0;
int nSuff1 = 0;
for (int i = 0; i < 10000; ++i) {
uint8_t suffix[4];
to_digits(suffix, i, 4); // convert suffix to array of digits
int nc = countChilds(suffix, 4, remTab);
if (nc < 2) {
if (nc == 0) {
memcpy(suffixes[nSuff0], suffix, sizeof(suffixes[0]));
++nSuff0;
} else { // nc==1
memcpy(suffixes[9999-nSuff1], suffix, sizeof(suffixes[0])); // store starting from the end of array
++nSuff1;
}
}
}
if (nSuff1 > 0) // make suffixes[] array continuous
memmove(suffixes[nSuff0], suffixes[10000-nSuff1], nSuff1*sizeof(suffixes[0]));

unsigned long long cnt = 0;
unsigned long long pref0 = intpow(10, nDigits-5);
for (unsigned long long pref = pref0; pref < pref0*10; ++pref) {
uint8_t prefix[20];
to_digits(prefix, pref, nDigits-4); // convert prefix to array of digits
int nc = countChilds(prefix, nDigits-4, remTab);
if (nc < 2) {
uint8_t processed_prefix[20];
preprocessPrefix(processed_prefix, prefix, nDigits-4, remTab);
for (int i = 0; i < nSuff0; ++i) // concatenate suffix with 0 children to prefix with 0 or 1 children
cnt += (countChilds2(processed_prefix, nDigits-4, suffixes[i], 4, remTab, nc)==1);
if (nc == 0) {
for (int i = nSuff0; i < nSuff0+nSuff1; ++i) // concatenate suffix with 1 child to prefix with 0 children
cnt += (countChilds2(processed_prefix, nDigits-4, suffixes[i], 4, remTab, 1)==1);
}
}
}

return cnt;
}

Re: My experience with Apple M1 chip

<zQ5FI.779$VU3.610@fx46.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18441&group=comp.arch#18441

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx46.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: My experience with Apple M1 chip
References: <T6VEI.159$VU3.17@fx46.iad>
<19dcc459-6eb5-4191-a186-c50d12ed347fn@googlegroups.com>
<2l%EI.8$gE.2@fx21.iad>
<80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com>
<SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com>
<187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 49
Message-ID: <zQ5FI.779$VU3.610@fx46.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Tue, 06 Jul 2021 23:39:11 UTC
Organization: usenet-news.net
Date: Tue, 06 Jul 2021 23:39:11 GMT
X-Received-Bytes: 3130
 by: Branimir Maksimovic - Tue, 6 Jul 2021 23:39 UTC

On 2021-07-06, Michael S <already5chosen@yahoo.com> wrote:
> On Tuesday, July 6, 2021 at 10:08:27 PM UTC+3, Kent Dickey wrote:
>> In article <80ac50a0-dde2-4a66...@googlegroups.com>,
>> Michael S <already...@yahoo.com> wrote:
>> >On Tuesday, July 6, 2021 at 7:16:01 PM UTC+3, Branimir Maksimovic wrote:
>> >> On 2021-07-06, Michael S <already...@yahoo.com> wrote:
>> >> > I can believe that M1@3.2 GHz/Rosetta is able to run x64 software as
>> >fast as i3-8100B but have trouble believing that it could match i7-8700B
>> >either in single thread or in multithread throughput. Unless, of course,
>> >absolute majority of run time spent in native libraries.
>> >> You don't count that M1 is ~25-33% faster single core then any x86 :P
>> >
>> >I took it into account.
>> >
>> >Besides, while it's true for x86 CPUs in prev-gen Mac-Mini it's not true
>> >for *any* x86.
>> >M1 is slower than top Zen3 bins and about the same or a little slower
>> >than top Comet Lake.
>> >Probably somewhat slower than top Tiger Lake, but that comparison is
>> >rather close.
>> >Probably, measurably slower than top Rocket Lake, but I didn't look at
>> >Rocket Lake closely.
>> I have a Mac Mini M1, and it seems fast--very fast for some workloads (hard to
>> predict branches, or working set in the 100-200KB range). It is not the
>> fastest CPU on the planet, but it likely is the fastest laptop CPU. At < 10W
>> at the AC plug it compares pretty favorably to 60W CPUs. If you have a
>> relatively short benchmark (say, one file, C or C++, can be run from the Unix
>> command line, doesn't require me to install anything else, should run in less
>> than 5 minutes), I can compile it and run it for you, and then you can compare
>> those results to any system you like. I don't think comparing optimized AVX
>> is going to be useful, but simple integer or floating point algorithms would
>> be best.
>>
>> Kent
>
>
> The Euler-413 challenge that we discussed today in comp.lang.c for nDigits=11 should run in about 3 minutes.

bmaxa@Branimirs-Air euler % time ./euler413 11
11 71101800
../euler413 11 62.07s user 0.16s system 99% cpu 1:02.39 total

Minute on M1...

> But who is going to test on the fastest Zen3 ?

Someone with fastest Zen3? :P

Re: My experience with Apple M1 chip

<0a28f293-79cc-41fe-a830-029e4e7c14aan@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18442&group=comp.arch#18442

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a37:71c1:: with SMTP id m184mr23004682qkc.367.1625615405360;
Tue, 06 Jul 2021 16:50:05 -0700 (PDT)
X-Received: by 2002:a4a:8749:: with SMTP id a9mr15695229ooi.71.1625615405118;
Tue, 06 Jul 2021 16:50:05 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Tue, 6 Jul 2021 16:50:04 -0700 (PDT)
In-Reply-To: <zQ5FI.779$VU3.610@fx46.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.182.191; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.182.191
References: <T6VEI.159$VU3.17@fx46.iad> <19dcc459-6eb5-4191-a186-c50d12ed347fn@googlegroups.com>
<2l%EI.8$gE.2@fx21.iad> <80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com>
<SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com> <187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com>
<zQ5FI.779$VU3.610@fx46.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0a28f293-79cc-41fe-a830-029e4e7c14aan@googlegroups.com>
Subject: Re: My experience with Apple M1 chip
From: already5...@yahoo.com (Michael S)
Injection-Date: Tue, 06 Jul 2021 23:50:05 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Michael S - Tue, 6 Jul 2021 23:50 UTC

On Wednesday, July 7, 2021 at 2:39:14 AM UTC+3, Branimir Maksimovic wrote:
> On 2021-07-06, Michael S <already...@yahoo.com> wrote:
> > On Tuesday, July 6, 2021 at 10:08:27 PM UTC+3, Kent Dickey wrote:
> >> In article <80ac50a0-dde2-4a66...@googlegroups.com>,
> >> Michael S <already...@yahoo.com> wrote:
> >> >On Tuesday, July 6, 2021 at 7:16:01 PM UTC+3, Branimir Maksimovic wrote:
> >> >> On 2021-07-06, Michael S <already...@yahoo.com> wrote:
> >> >> > I can believe that M1@3.2 GHz/Rosetta is able to run x64 software as
> >> >fast as i3-8100B but have trouble believing that it could match i7-8700B
> >> >either in single thread or in multithread throughput. Unless, of course,
> >> >absolute majority of run time spent in native libraries.
> >> >> You don't count that M1 is ~25-33% faster single core then any x86 :P
> >> >
> >> >I took it into account.
> >> >
> >> >Besides, while it's true for x86 CPUs in prev-gen Mac-Mini it's not true
> >> >for *any* x86.
> >> >M1 is slower than top Zen3 bins and about the same or a little slower
> >> >than top Comet Lake.
> >> >Probably somewhat slower than top Tiger Lake, but that comparison is
> >> >rather close.
> >> >Probably, measurably slower than top Rocket Lake, but I didn't look at
> >> >Rocket Lake closely.
> >> I have a Mac Mini M1, and it seems fast--very fast for some workloads (hard to
> >> predict branches, or working set in the 100-200KB range). It is not the
> >> fastest CPU on the planet, but it likely is the fastest laptop CPU. At < 10W
> >> at the AC plug it compares pretty favorably to 60W CPUs. If you have a
> >> relatively short benchmark (say, one file, C or C++, can be run from the Unix
> >> command line, doesn't require me to install anything else, should run in less
> >> than 5 minutes), I can compile it and run it for you, and then you can compare
> >> those results to any system you like. I don't think comparing optimized AVX
> >> is going to be useful, but simple integer or floating point algorithms would
> >> be best.
> >>
> >> Kent
> >
> >
> > The Euler-413 challenge that we discussed today in comp.lang.c for nDigits=11 should run in about 3 minutes.
> bmaxa@Branimirs-Air euler % time ./euler413 11
> 11 71101800
> ./euler413 11 62.07s user 0.16s system 99% cpu 1:02.39 total
>

Sorry, my mistake.
I had another variant of algorithm that took 3 minutes. This one is indeed much faster.
1m15.640s on Xeon E-2176G underclocked to 4.25GHz.

>
> Minute on M1...
> > But who is going to test on the fastest Zen3 ?
> Someone with fastest Zen3? :P

Re: My experience with Apple M1 chip

<446FI.6468$Vv6.2664@fx45.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18443&group=comp.arch#18443

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx45.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: My experience with Apple M1 chip
References: <T6VEI.159$VU3.17@fx46.iad>
<19dcc459-6eb5-4191-a186-c50d12ed347fn@googlegroups.com>
<2l%EI.8$gE.2@fx21.iad>
<80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com>
<SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com>
<187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com>
<zQ5FI.779$VU3.610@fx46.iad>
<0a28f293-79cc-41fe-a830-029e4e7c14aan@googlegroups.com>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 55
Message-ID: <446FI.6468$Vv6.2664@fx45.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Tue, 06 Jul 2021 23:55:44 UTC
Organization: usenet-news.net
Date: Tue, 06 Jul 2021 23:55:44 GMT
X-Received-Bytes: 3648
 by: Branimir Maksimovic - Tue, 6 Jul 2021 23:55 UTC

On 2021-07-06, Michael S <already5chosen@yahoo.com> wrote:
> On Wednesday, July 7, 2021 at 2:39:14 AM UTC+3, Branimir Maksimovic wrote:
>> On 2021-07-06, Michael S <already...@yahoo.com> wrote:
>> > On Tuesday, July 6, 2021 at 10:08:27 PM UTC+3, Kent Dickey wrote:
>> >> In article <80ac50a0-dde2-4a66...@googlegroups.com>,
>> >> Michael S <already...@yahoo.com> wrote:
>> >> >On Tuesday, July 6, 2021 at 7:16:01 PM UTC+3, Branimir Maksimovic wrote:
>> >> >> On 2021-07-06, Michael S <already...@yahoo.com> wrote:
>> >> >> > I can believe that M1@3.2 GHz/Rosetta is able to run x64 software as
>> >> >fast as i3-8100B but have trouble believing that it could match i7-8700B
>> >> >either in single thread or in multithread throughput. Unless, of course,
>> >> >absolute majority of run time spent in native libraries.
>> >> >> You don't count that M1 is ~25-33% faster single core then any x86 :P
>> >> >
>> >> >I took it into account.
>> >> >
>> >> >Besides, while it's true for x86 CPUs in prev-gen Mac-Mini it's not true
>> >> >for *any* x86.
>> >> >M1 is slower than top Zen3 bins and about the same or a little slower
>> >> >than top Comet Lake.
>> >> >Probably somewhat slower than top Tiger Lake, but that comparison is
>> >> >rather close.
>> >> >Probably, measurably slower than top Rocket Lake, but I didn't look at
>> >> >Rocket Lake closely.
>> >> I have a Mac Mini M1, and it seems fast--very fast for some workloads (hard to
>> >> predict branches, or working set in the 100-200KB range). It is not the
>> >> fastest CPU on the planet, but it likely is the fastest laptop CPU. At < 10W
>> >> at the AC plug it compares pretty favorably to 60W CPUs. If you have a
>> >> relatively short benchmark (say, one file, C or C++, can be run from the Unix
>> >> command line, doesn't require me to install anything else, should run in less
>> >> than 5 minutes), I can compile it and run it for you, and then you can compare
>> >> those results to any system you like. I don't think comparing optimized AVX
>> >> is going to be useful, but simple integer or floating point algorithms would
>> >> be best.
>> >>
>> >> Kent
>> >
>> >
>> > The Euler-413 challenge that we discussed today in comp.lang.c for nDigits=11 should run in about 3 minutes.
>> bmaxa@Branimirs-Air euler % time ./euler413 11
>> 11 71101800
>> ./euler413 11 62.07s user 0.16s system 99% cpu 1:02.39 total
>>
>
> Sorry, my mistake.
> I had another variant of algorithm that took 3 minutes. This one is indeed much faster.
> 1m15.640s on Xeon E-2176G underclocked to 4.25GHz.

M1 runs at 3.2Ghz max. This one consumes 6W when executing your proggy :P
>
>>

--
something dumb

Re: My experience with Apple M1 chip

<sc2qtq$5la$1@gioia.aioe.org>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18444&group=comp.arch#18444

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!NBiuIU74OKL7NpIOsbuNjQ.user.gioia.aioe.org.POSTED!not-for-mail
From: chris.m....@gmail.com (Chris M. Thomasson)
Newsgroups: comp.arch
Subject: Re: My experience with Apple M1 chip
Date: Tue, 6 Jul 2021 17:02:04 -0700
Organization: Aioe.org NNTP Server
Lines: 18
Message-ID: <sc2qtq$5la$1@gioia.aioe.org>
References: <muhEI.26121$P64.11471@fx47.iad>
NNTP-Posting-Host: NBiuIU74OKL7NpIOsbuNjQ.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.11.0
X-Notice: Filtered by postfilter v. 0.9.2
Content-Language: en-US
 by: Chris M. Thomasson - Wed, 7 Jul 2021 00:02 UTC

On 7/4/2021 5:05 AM, Branimir Maksimovic wrote:
> Fantastic chip, blows away my 2700X in single thread.
> 3950 scimark2 score vs 3050 2700X.
> But, chip scales badly. Tried WCG and when running 2 threads no slowdown.
> However with 3 threads slowdown iz 25%, with 4 threads slowdown is ~50%.
> So chip is actually like i3 with two cores+HT.
> But it is much faster as single thread perfomance is fantastic.
> What is puzzling is that 5-7 thread does not have such slowdown,
> that is on low power cores performance loss per thread is more like 10-15%.
>

Can you try to run the following program, and post your results? C++17...

https://pastebin.com/raw/CYZ78gVj

Should compile right up. Its a poor mans RCU, with debugging variables
turned on for a sanity check. Basically, a proxy collector to be able to
get RCU like abilities using 100% standard C++.

Re: My experience with Apple M1 chip

<1f6FI.1230$0Z7.715@fx39.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18445&group=comp.arch#18445

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx39.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: My experience with Apple M1 chip
References: <muhEI.26121$P64.11471@fx47.iad> <sc2qtq$5la$1@gioia.aioe.org>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 44
Message-ID: <1f6FI.1230$0Z7.715@fx39.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Wed, 07 Jul 2021 00:07:25 UTC
Organization: usenet-news.net
Date: Wed, 07 Jul 2021 00:07:25 GMT
X-Received-Bytes: 2072
 by: Branimir Maksimovic - Wed, 7 Jul 2021 00:07 UTC

On 2021-07-07, Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
> On 7/4/2021 5:05 AM, Branimir Maksimovic wrote:
>> Fantastic chip, blows away my 2700X in single thread.
>> 3950 scimark2 score vs 3050 2700X.
>> But, chip scales badly. Tried WCG and when running 2 threads no slowdown.
>> However with 3 threads slowdown iz 25%, with 4 threads slowdown is ~50%.
>> So chip is actually like i3 with two cores+HT.
>> But it is much faster as single thread perfomance is fantastic.
>> What is puzzling is that 5-7 thread does not have such slowdown,
>> that is on low power cores performance loss per thread is more like 10-15%.
>>
>
> Can you try to run the following program, and post your results? C++17...
>
> https://pastebin.com/raw/CYZ78gVj
>
> Should compile right up. Its a poor mans RCU, with debugging variables
> turned on for a sanity check. Basically, a proxy collector to be able to
> get RCU like abilities using 100% standard C++.
bmaxa@Branimirs-Air Chris % time ./test1
Chris M. Thomassons Proxy Collector Port ver .0.0.2...
_______________________________________

Booting threads...
Threads running...
Threads completed!

node_allocations = 92400000
node_deallocations = 92400000

dtor_collect = 249854
release_collect = 905
quiesce_complete = 250759
quiesce_begin = 250759
quiesce_complete_nodes = 92400000

Test Completed!

../test1 136.73s user 1.26s system 657% cpu 20.992 total

--
something dumb

Re: My experience with Apple M1 chip

<QrydnTqz5rfCr3j9nZ2dnUU7-N_NnZ2d@giganews.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18448&group=comp.arch#18448

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!rocksolid2!i2pn.org!aioe.org!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!buffer1.nntp.dca1.giganews.com!buffer2.nntp.dca1.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Tue, 06 Jul 2021 23:49:35 -0500
Newsgroups: comp.arch
Subject: Re: My experience with Apple M1 chip
References: <T6VEI.159$VU3.17@fx46.iad> <80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com> <SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com> <187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com>
Organization: provalid.com
X-Newsreader: trn 4.0-test76 (Apr 2, 2001)
From: keg...@provalid.com (Kent Dickey)
Originator: kegs@provalid.com (Kent Dickey)
Message-ID: <QrydnTqz5rfCr3j9nZ2dnUU7-N_NnZ2d@giganews.com>
Date: Tue, 06 Jul 2021 23:49:35 -0500
Lines: 94
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-KkBhkhx+LcnrH8cwy+ztGtqA8iCV2XBxpx2xNZGGXMehY0+aeoHcS749kPz0zo4CA+mMSQlOPhQ3YR0!/opvsy5Hfw17IYdnvIbzKJIJ+mkVLK6wZ/hwKqRaFZ4hWOcIz1jR+vke8uxgY5nREfwo1iiEyhM=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 4125
 by: Kent Dickey - Wed, 7 Jul 2021 04:49 UTC

In article <187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com>,
Michael S <already5chosen@yahoo.com> wrote:
>On Tuesday, July 6, 2021 at 10:08:27 PM UTC+3, Kent Dickey wrote:
>> I have a Mac Mini M1, and it seems fast--very fast for some workloads
>(hard to
>> predict branches, or working set in the 100-200KB range). It is not the
>> fastest CPU on the planet, but it likely is the fastest laptop CPU. At < 10W
>> at the AC plug it compares pretty favorably to 60W CPUs. If you have a
>> relatively short benchmark (say, one file, C or C++, can be run from the Unix
>> command line, doesn't require me to install anything else, should run in less
>> than 5 minutes), I can compile it and run it for you, and then you can
>compare
>> those results to any system you like. I don't think comparing optimized AVX
>> is going to be useful, but simple integer or floating point algorithms would
>> be best.
>>
>> Kent
>
>
>The Euler-413 challenge that we discussed today in comp.lang.c for
>nDigits=11 should run in about 3 minutes.
>But who is going to test on the fastest Zen3 ?
>I have Ryzen 7 5800H at work, it's pretty fast and with single thread
>easily blows away 4.25GHz Skylake, but it's much slower than likes of
>5900X.
>
>Code:
[snipped].

I compiled and ran it on my Mac Mini M1 with 8GB of memory (it's 3.2GHz,
I've measured it by reading the CPU timers, Apple doesn't publicly list
the frequency):

---
m1-mini-bash$ cc -O2 -o michaels1 michaels1.c
m1-mini-bash$ time ./michaels1 11
11 71101800

real 0m48.764s
user 0m48.507s
sys 0m0.071s
---

(I've also run it multiple times, later runs get as low at 48.493s real).

The same code on my iMac Pro (3.0GHz 10-core Intel Xeon W, Turbo to 4.5GHz,
thought to be Intel Xeon W-2150B, so not new, but not really old either, and
not a laptop CPU):

---
imacpro$ time ./michaels1 11
11 71101800

real 1m39.070s
user 1m38.530s
sys 0m0.305s
---

I tried compiling -O3, made no real difference (-O3 was slightly slower on
my Intel Mac).

So the Mac Mini M1 is almost exactly twice as fast as my iMac Pro for this
code.

I tried running 4 copies at once in the simplest way possible:

time ./michaels1 11 & time ./michaels1 11 & time ./michaels1 11 & \
time ./michaels1 11

Each run took about 52.08sec real time. No fans, I don't feel any heat.
On my iMac Pro, doing the same 4 runs, each run took about 1m41.5sec,
and the fans came on.

And I copied over the x86_64 binary which I compiled on my iMacPro to the
Mac Mini M1.

---
m1-mini-bash$ time ./michaels1x86 11
11 71101800

real 1m14.517s
user 1m13.896s
sys 0m0.287s
---

This is some of the biggest slowdown I've seen with the JIT translation. It
still runs x86 faster than my actual x86.

Summary results:
Mac Mini M1: 48.764s
Mac Mini M1 JIT x86: 74.517s
iMac Pro 4.5GHz: 99.070s

Kent

Re: My experience with Apple M1 chip

<UIKdnbTyjLEbq3j9nZ2dnUU7-f_NnZ2d@giganews.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18450&group=comp.arch#18450

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!buffer2.nntp.dca1.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Wed, 07 Jul 2021 00:07:18 -0500
Newsgroups: comp.arch
Subject: Re: My experience with Apple M1 chip
References: <muhEI.26121$P64.11471@fx47.iad> <sc2qtq$5la$1@gioia.aioe.org>
Organization: provalid.com
X-Newsreader: trn 4.0-test76 (Apr 2, 2001)
From: keg...@provalid.com (Kent Dickey)
Originator: kegs@provalid.com (Kent Dickey)
Message-ID: <UIKdnbTyjLEbq3j9nZ2dnUU7-f_NnZ2d@giganews.com>
Date: Wed, 07 Jul 2021 00:07:18 -0500
Lines: 89
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-JbGqaWHMtRaBJLZNC7jcB4nP7hd43J67UL4A+ixoKf0Cy750yo05CERqvJigHyo0dt2jrljRI2gs0Ki!p8TGqm3cnZQdzzNg+4giK7Yu4RsuEBRN5bi8Ki0uV5WzTEATMNUrZv7nBupH62faZrpQYJeRMXc=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 3480
 by: Kent Dickey - Wed, 7 Jul 2021 05:07 UTC

In article <sc2qtq$5la$1@gioia.aioe.org>,
Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
>On 7/4/2021 5:05 AM, Branimir Maksimovic wrote:
>> Fantastic chip, blows away my 2700X in single thread.
>> 3950 scimark2 score vs 3050 2700X.
>> But, chip scales badly. Tried WCG and when running 2 threads no slowdown.
>> However with 3 threads slowdown iz 25%, with 4 threads slowdown is ~50%.
>> So chip is actually like i3 with two cores+HT.
>> But it is much faster as single thread perfomance is fantastic.
>> What is puzzling is that 5-7 thread does not have such slowdown,
>> that is on low power cores performance loss per thread is more like 10-15%.
>>
>
>Can you try to run the following program, and post your results? C++17...
>
>https://pastebin.com/raw/CYZ78gVj
>
>Should compile right up. Its a poor mans RCU, with debugging variables
>turned on for a sanity check. Basically, a proxy collector to be able to
>get RCU like abilities using 100% standard C++.

This looks like a program which spawns dozens of threads. The M1 chip only
has 8 cores (4 fast, 4 slow), so I'm not really sure what this benchmark
shows. It may be that it does better with fewer cores (less contention),
it may do worse.

Here's my output:

---
m1-mini-bash$ c++ -O2 -std=c++17 -o thomassons.rcu thomassons.rcu.cpp
m1-mini-bash$ time ./thomassons.rcu
Chris M. Thomassons Proxy Collector Port ver .0.0.2...
_______________________________________

Booting threads...
Threads running...
Threads completed!

node_allocations = 92400000
node_deallocations = 92400000

dtor_collect = 3
release_collect = 169
quiesce_complete = 172
quiesce_begin = 172
quiesce_complete_nodes = 92400000

Test Completed!

real 0m24.240s
user 3m11.190s
sys 0m0.857s
---

So 24.240 seconds real time.

I also ran it on my iMac Pro x86 (4.5GHz Turbo, 10 cores):

---
imacpro-bash$ clang++ -O2 -std=c++17 -o thomassons.rcu thomassons.rcu.cpp
imacpro-bash$ time ./thomassons.rcu
Chris M. Thomassons Proxy Collector Port ver .0.0.2...
_______________________________________

Booting threads...
Threads running...
Threads completed!

node_allocations = 92400000
node_deallocations = 92400000

dtor_collect = 11
release_collect = 231
quiesce_complete = 242
quiesce_begin = 242
quiesce_complete_nodes = 92400000

Test Completed!

real 0m27.370s
user 8m46.058s
sys 0m3.432s
---

Kent

Re: My experience with Apple M1 chip

<8eb074ff-9443-4fb6-90e7-33e7c749a0fen@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18453&group=comp.arch#18453

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:4110:: with SMTP id j16mr2368535qko.37.1625649309766;
Wed, 07 Jul 2021 02:15:09 -0700 (PDT)
X-Received: by 2002:a9d:5f19:: with SMTP id f25mr5915613oti.206.1625649309504;
Wed, 07 Jul 2021 02:15:09 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!4.us.feeder.erje.net!2.eu.feeder.erje.net!feeder.erje.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 7 Jul 2021 02:15:09 -0700 (PDT)
In-Reply-To: <QrydnTqz5rfCr3j9nZ2dnUU7-N_NnZ2d@giganews.com>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <T6VEI.159$VU3.17@fx46.iad> <80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com>
<SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com> <187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com>
<QrydnTqz5rfCr3j9nZ2dnUU7-N_NnZ2d@giganews.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <8eb074ff-9443-4fb6-90e7-33e7c749a0fen@googlegroups.com>
Subject: Re: My experience with Apple M1 chip
From: already5...@yahoo.com (Michael S)
Injection-Date: Wed, 07 Jul 2021 09:15:09 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Michael S - Wed, 7 Jul 2021 09:15 UTC

On Wednesday, July 7, 2021 at 7:49:42 AM UTC+3, Kent Dickey wrote:
> In article <187875de-0cd7-4e6e...@googlegroups.com>,
> Michael S <already...@yahoo.com> wrote:
> >On Tuesday, July 6, 2021 at 10:08:27 PM UTC+3, Kent Dickey wrote:
> >> I have a Mac Mini M1, and it seems fast--very fast for some workloads
> >(hard to
> >> predict branches, or working set in the 100-200KB range). It is not the
> >> fastest CPU on the planet, but it likely is the fastest laptop CPU. At < 10W
> >> at the AC plug it compares pretty favorably to 60W CPUs. If you have a
> >> relatively short benchmark (say, one file, C or C++, can be run from the Unix
> >> command line, doesn't require me to install anything else, should run in less
> >> than 5 minutes), I can compile it and run it for you, and then you can
> >compare
> >> those results to any system you like. I don't think comparing optimized AVX
> >> is going to be useful, but simple integer or floating point algorithms would
> >> be best.
> >>
> >> Kent
> >
> >
> >The Euler-413 challenge that we discussed today in comp.lang.c for
> >nDigits=11 should run in about 3 minutes.
> >But who is going to test on the fastest Zen3 ?
> >I have Ryzen 7 5800H at work, it's pretty fast and with single thread
> >easily blows away 4.25GHz Skylake, but it's much slower than likes of
> >5900X.
> >
> >Code:
> [snipped].
>
> I compiled and ran it on my Mac Mini M1 with 8GB of memory (it's 3.2GHz,
> I've measured it by reading the CPU timers, Apple doesn't publicly list
> the frequency):
>
> ---
> m1-mini-bash$ cc -O2 -o michaels1 michaels1.c
> m1-mini-bash$ time ./michaels1 11
> 11 71101800
>
> real 0m48.764s
> user 0m48.507s
> sys 0m0.071s
> ---
>
> (I've also run it multiple times, later runs get as low at 48.493s real).
>

Fantastic.
It seems, power management on your Mac Mini is seriously better than on Mac Air of Branimir.
Judged by this, you can run this program with nDigits=12 and it will take less than 25min.

> The same code on my iMac Pro (3.0GHz 10-core Intel Xeon W, Turbo to 4.5GHz,
> thought to be Intel Xeon W-2150B, so not new, but not really old either, and
> not a laptop CPU):
>
> ---
> imacpro$ time ./michaels1 11
> 11 71101800
>
> real 1m39.070s
> user 1m38.530s
> sys 0m0.305s
> ---
>

That's disappointing.
Since 3 levels of inner loops of this program are running completely out of L1D cache, I would
expect that your Cascade Lake based Xeon-W will be exact match Hz-for-Hz of my Coffee Lake Xeon-E.
So, I expected 1m15.640 / 4.5 * 4.25 = 1m11.440s.
But then I recollected that compiler is not the same - mine is gcc and yours is clang (clang9 ?).
I recompiled with clang10.1 and it's indeed slower that way: 1m44.682s, i.e. Hz-for-Hz exactly as your result.

BTW, clang can do a little better job on this code if you specify -march=native or -march=skx or -march=cascadelake or -march=skylake or -march=haswell or -march=sandybridge or -march=nehalem or -march=znver2.. Any of this flags produce very similar output.

Then it's ~12% faster which is still much slower than gcc.

> I tried compiling -O3, made no real difference (-O3 was slightly slower on
> my Intel Mac).
>
> So the Mac Mini M1 is almost exactly twice as fast as my iMac Pro for this
> code.
>
> I tried running 4 copies at once in the simplest way possible:
>
> time ./michaels1 11 & time ./michaels1 11 & time ./michaels1 11 & \
> time ./michaels1 11
>
> Each run took about 52.08sec real time. No fans, I don't feel any heat.

Again, fantastic. Seems like cooling in Mac Mini is very good. I think it's not "no fans" but rather "slow quiet fans".

> On my iMac Pro, doing the same 4 runs, each run took about 1m41.5sec,
> and the fans came on.
>

Frequency went down to 4.35Gz. Not too bad for Cascade Lake.
More interesting what would happen when all 10 cores running.

On my Xeon-E/gcc
1p - 75.592s
2p - 75.619s
3p - 75.700s
4p - 75.775s
5p - 75.991s
6p - 78.076s

But I did it slightly differently, with script
---- begin r6
../uut 11 &
../uut 11 &
../uut 11 &
../uut 11 &
../uut 11 &
../uut 11 &
wait
-- end r6
So, I have one copy of 'time' instead of 6.

> And I copied over the x86_64 binary which I compiled on my iMacPro to the
> Mac Mini M1.
>
> ---
> m1-mini-bash$ time ./michaels1x86 11
> 11 71101800
>
> real 1m14.517s
> user 1m13.896s
> sys 0m0.287s
> ---
>
> This is some of the biggest slowdown I've seen with the JIT translation. It
> still runs x86 faster than my actual x86.
>

Very interesting. Actually, for me it's the most interesting part.
Relatively bigger slowdown should be probably expected, because none of the time is spent in native libraries.

> Summary results:
> Mac Mini M1: 48.764s
> Mac Mini M1 JIT x86: 74.517s
> iMac Pro 4.5GHz: 99.070s
>
> Kent

Re: My experience with Apple M1 chip

<VveFI.461$qL.162@fx14.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18455&group=comp.arch#18455

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!peer02.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx14.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: My experience with Apple M1 chip
References: <T6VEI.159$VU3.17@fx46.iad>
<80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com>
<SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com>
<187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com>
<QrydnTqz5rfCr3j9nZ2dnUU7-N_NnZ2d@giganews.com>
<8eb074ff-9443-4fb6-90e7-33e7c749a0fen@googlegroups.com>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 154
Message-ID: <VveFI.461$qL.162@fx14.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Wed, 07 Jul 2021 09:31:33 UTC
Organization: usenet-news.net
Date: Wed, 07 Jul 2021 09:31:33 GMT
X-Received-Bytes: 6597
 by: Branimir Maksimovic - Wed, 7 Jul 2021 09:31 UTC

On 2021-07-07, Michael S <already5chosen@yahoo.com> wrote:
> On Wednesday, July 7, 2021 at 7:49:42 AM UTC+3, Kent Dickey wrote:
>> In article <187875de-0cd7-4e6e...@googlegroups.com>,
>> Michael S <already...@yahoo.com> wrote:
>> >On Tuesday, July 6, 2021 at 10:08:27 PM UTC+3, Kent Dickey wrote:
>> >> I have a Mac Mini M1, and it seems fast--very fast for some workloads
>> >(hard to
>> >> predict branches, or working set in the 100-200KB range). It is not the
>> >> fastest CPU on the planet, but it likely is the fastest laptop CPU. At < 10W
>> >> at the AC plug it compares pretty favorably to 60W CPUs. If you have a
>> >> relatively short benchmark (say, one file, C or C++, can be run from the Unix
>> >> command line, doesn't require me to install anything else, should run in less
>> >> than 5 minutes), I can compile it and run it for you, and then you can
>> >compare
>> >> those results to any system you like. I don't think comparing optimized AVX
>> >> is going to be useful, but simple integer or floating point algorithms would
>> >> be best.
>> >>
>> >> Kent
>> >
>> >
>> >The Euler-413 challenge that we discussed today in comp.lang.c for
>> >nDigits=11 should run in about 3 minutes.
>> >But who is going to test on the fastest Zen3 ?
>> >I have Ryzen 7 5800H at work, it's pretty fast and with single thread
>> >easily blows away 4.25GHz Skylake, but it's much slower than likes of
>> >5900X.
>> >
>> >Code:
>> [snipped].
>>
>> I compiled and ran it on my Mac Mini M1 with 8GB of memory (it's 3.2GHz,
>> I've measured it by reading the CPU timers, Apple doesn't publicly list
>> the frequency):
>>
>> ---
>> m1-mini-bash$ cc -O2 -o michaels1 michaels1.c
>> m1-mini-bash$ time ./michaels1 11
>> 11 71101800
>>
>> real 0m48.764s
>> user 0m48.507s
>> sys 0m0.071s
>> ---
>>
>> (I've also run it multiple times, later runs get as low at 48.493s real).
>>
>
> Fantastic.
> It seems, power management on your Mac Mini is seriously better than on Mac Air of Branimir.
> Judged by this, you can run this program with nDigits=12 and it will take less than 25min.
Now I have
bmaxa@Branimirs-Air euler % time ./euler413 11
11 71101800
../euler413 11 48.99s user 0.12s system 99% cpu 49.446 total
my CPU is non stop busy :P
probably will be even faster if I shutdown everything that works :P

>
>> The same code on my iMac Pro (3.0GHz 10-core Intel Xeon W, Turbo to 4.5GHz,
>> thought to be Intel Xeon W-2150B, so not new, but not really old either, and
>> not a laptop CPU):
>>
>> ---
>> imacpro$ time ./michaels1 11
>> 11 71101800
>>
>> real 1m39.070s
>> user 1m38.530s
>> sys 0m0.305s
>> ---
>>
>
> That's disappointing.
> Since 3 levels of inner loops of this program are running completely out of L1D cache, I would
> expect that your Cascade Lake based Xeon-W will be exact match Hz-for-Hz of my Coffee Lake Xeon-E.
> So, I expected 1m15.640 / 4.5 * 4.25 = 1m11.440s.
> But then I recollected that compiler is not the same - mine is gcc and yours is clang (clang9 ?).
> I recompiled with clang10.1 and it's indeed slower that way: 1m44.682s, i.e. Hz-for-Hz exactly as your result.
>
> BTW, clang can do a little better job on this code if you specify -march=native or -march=skx or -march=cascadelake or -march=skylake or -march=haswell or -march=sandybridge or -march=nehalem or -march=znver2. Any of this flags produce very similar output.
>
> Then it's ~12% faster which is still much slower than gcc.
>
>> I tried compiling -O3, made no real difference (-O3 was slightly slower on
>> my Intel Mac).
>>
>> So the Mac Mini M1 is almost exactly twice as fast as my iMac Pro for this
>> code.
>>
>> I tried running 4 copies at once in the simplest way possible:
>>
>> time ./michaels1 11 & time ./michaels1 11 & time ./michaels1 11 & \
>> time ./michaels1 11
>>
>> Each run took about 52.08sec real time. No fans, I don't feel any heat.
>
> Again, fantastic. Seems like cooling in Mac Mini is very good. I think it's not "no fans" but rather "slow quiet fans".
>
>> On my iMac Pro, doing the same 4 runs, each run took about 1m41.5sec,
>> and the fans came on.
>>
>
> Frequency went down to 4.35Gz. Not too bad for Cascade Lake.
> More interesting what would happen when all 10 cores running.
>
> On my Xeon-E/gcc
> 1p - 75.592s
> 2p - 75.619s
> 3p - 75.700s
> 4p - 75.775s
> 5p - 75.991s
> 6p - 78.076s
>
> But I did it slightly differently, with script
> ---- begin r6
> ./uut 11 &
> ./uut 11 &
> ./uut 11 &
> ./uut 11 &
> ./uut 11 &
> ./uut 11 &
> wait
> -- end r6
> So, I have one copy of 'time' instead of 6.
>
>> And I copied over the x86_64 binary which I compiled on my iMacPro to the
>> Mac Mini M1.
>>
>> ---
>> m1-mini-bash$ time ./michaels1x86 11
>> 11 71101800
>>
>> real 1m14.517s
>> user 1m13.896s
>> sys 0m0.287s
>> ---
>>
>> This is some of the biggest slowdown I've seen with the JIT translation. It
>> still runs x86 faster than my actual x86.
bmaxa@Branimirs-Air euler % g++ -O3 euler413.c -o euler413 -target x86_64-apple-darwin
clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]
bmaxa@Branimirs-Air euler % time ./euler413 11
11 71101800
../euler413 11 80.40s user 0.20s system 99% cpu 1:20.95 total
And this is with busy CPU :P
but I got better time with O2
bmaxa@Branimirs-Air euler % gcc -O2 euler413.c -o euler413 -target x86_64-apple-darwin
bmaxa@Branimirs-Air euler % time ./euler413 11
11 71101800
../euler413 11 75.92s user 0.18s system 96% cpu 1:18.69 total

--
something dumb

Re: My experience with Apple M1 chip

<2021Jul7.122809@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18457&group=comp.arch#18457

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: My experience with Apple M1 chip
Date: Wed, 07 Jul 2021 10:28:09 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 35
Message-ID: <2021Jul7.122809@mips.complang.tuwien.ac.at>
References: <T6VEI.159$VU3.17@fx46.iad> <19dcc459-6eb5-4191-a186-c50d12ed347fn@googlegroups.com> <2l%EI.8$gE.2@fx21.iad> <80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com> <SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com> <187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com> <zQ5FI.779$VU3.610@fx46.iad>
Injection-Info: reader02.eternal-september.org; posting-host="117eb1bd3d54c618d512f5c15e0b9b13";
logging-data="24197"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+h/PTEdWtd14wOzk6q+Xqa"
Cancel-Lock: sha1:wEmBB7UGQzj4iI/pV7yBs2nlpvo=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Wed, 7 Jul 2021 10:28 UTC

Branimir Maksimovic <branimir.maksimovic@gmail.com> writes:
>> The Euler-413 challenge that we discussed today in comp.lang.c for nDigits=11 should run in about 3 minutes.
>
>bmaxa@Branimirs-Air euler % time ./euler413 11
>11 71101800
>./euler413 11 62.07s user 0.16s system 99% cpu 1:02.39 total
>
>
>Minute on M1...
>
>> But who is going to test on the fastest Zen3 ?
>
>Someone with fastest Zen3? :P

On a 5800X (and the code compiled with gcc-10.2 -O3):

perf stat -e cycles a.out 11
11 71101800

Performance counter stats for 'a.out 11':

236931852739 cycles

49.974309558 seconds time elapsed

49.971624000 seconds user
0.000000000 seconds sys

The 5800X runs this benchmark at 4741MHz (on average), slightly above
the turbo frequency of 4700MHz.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: My experience with Apple M1 chip

<memo.20210707115529.6784V@jgd.cix.co.uk>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18458&group=comp.arch#18458

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!rocksolid2!news.neodome.net!news.mixmin.net!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: jgd...@cix.co.uk (John Dallman)
Newsgroups: comp.arch
Subject: Re: My experience with Apple M1 chip
Date: Wed, 7 Jul 2021 11:55 +0100 (BST)
Organization: A noiseless patient Spider
Lines: 10
Message-ID: <memo.20210707115529.6784V@jgd.cix.co.uk>
References: <8eb074ff-9443-4fb6-90e7-33e7c749a0fen@googlegroups.com>
Reply-To: jgd@cix.co.uk
Injection-Info: reader02.eternal-september.org; posting-host="c72ec11a06505549f1a61a74645d26a4";
logging-data="13947"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+GJt+DgIZGWLZvHjmiAbQq1PdQKjukg8U="
Cancel-Lock: sha1:0lg8fUok3NL0W2AUEFf4GvHziIQ=
 by: John Dallman - Wed, 7 Jul 2021 10:55 UTC

In article <8eb074ff-9443-4fb6-90e7-33e7c749a0fen@googlegroups.com>,
already5chosen@yahoo.com (Michael S) wrote:

> It seems, power management on your Mac Mini is seriously better
> than on Mac Air of Branimir.

Mac Minis have mains power, and fans. If you're going to run anything
computationally intensive on an M1, they're the best option at present.

John

Re: My experience with Apple M1 chip

<YigFI.3408$Nq7.1210@fx33.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18461&group=comp.arch#18461

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!feeder.usenetexpress.com!tr1.eu1.usenetexpress.com!feeder1.feed.usenet.farm!feed.usenet.farm!peer02.ams4!peer.am4.highwinds-media.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx33.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: My experience with Apple M1 chip
References: <T6VEI.159$VU3.17@fx46.iad> <19dcc459-6eb5-4191-a186-c50d12ed347fn@googlegroups.com> <2l%EI.8$gE.2@fx21.iad> <80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com> <SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com> <187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com> <zQ5FI.779$VU3.610@fx46.iad> <2021Jul7.122809@mips.complang.tuwien.ac.at>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 52
Message-ID: <YigFI.3408$Nq7.1210@fx33.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Wed, 07 Jul 2021 11:34:16 UTC
Organization: usenet-news.net
Date: Wed, 07 Jul 2021 11:34:16 GMT
X-Received-Bytes: 2695
 by: Branimir Maksimovic - Wed, 7 Jul 2021 11:34 UTC

On 2021-07-07, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> Branimir Maksimovic <branimir.maksimovic@gmail.com> writes:
>>> The Euler-413 challenge that we discussed today in comp.lang.c for nDigits=11 should run in about 3 minutes.
>>
>>bmaxa@Branimirs-Air euler % time ./euler413 11
>>11 71101800
>>./euler413 11 62.07s user 0.16s system 99% cpu 1:02.39 total
>>
>>
>>Minute on M1...
>>
>>> But who is going to test on the fastest Zen3 ?
>>
>>Someone with fastest Zen3? :P
>
> On a 5800X (and the code compiled with gcc-10.2 -O3):
>
> perf stat -e cycles a.out 11
> 11 71101800
>
> Performance counter stats for 'a.out 11':
>
> 236931852739 cycles
>
> 49.974309558 seconds time elapsed
>
> 49.971624000 seconds user
> 0.000000000 seconds sys
>
> The 5800X runs this benchmark at 4741MHz (on average), slightly above
> the turbo frequency of 4700MHz.
>
> - anton
Please can you run scimark2?
https://math.nist.gov/scimark2/download.html
bmaxa@Branimirs-Air scimark2 % gcc-11 -O3 *.c -o scimark2
bmaxa@Branimirs-Air scimark2 % ./scimark2
** **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov) **
** **
Using 2.00 seconds min time per kenel.
Composite Score: 3993.31
FFT Mflops: 3299.58 (N=1024)
SOR Mflops: 3191.51 (100 x 100)
MonteCarlo: Mflops: 445.95
Sparse matmult Mflops: 4330.37 (N=1000, nz=5000)
LU Mflops: 8699.13 (M=100, N=100)

--
something dumb

Re: My experience with Apple M1 chip

<LjgFI.3409$Nq7.1364@fx33.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18462&group=comp.arch#18462

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx33.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: My experience with Apple M1 chip
References: <8eb074ff-9443-4fb6-90e7-33e7c749a0fen@googlegroups.com>
<memo.20210707115529.6784V@jgd.cix.co.uk>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 18
Message-ID: <LjgFI.3409$Nq7.1364@fx33.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Wed, 07 Jul 2021 11:35:07 UTC
Organization: usenet-news.net
Date: Wed, 07 Jul 2021 11:35:07 GMT
X-Received-Bytes: 1156
 by: Branimir Maksimovic - Wed, 7 Jul 2021 11:35 UTC

On 2021-07-07, John Dallman <jgd@cix.co.uk> wrote:
> In article <8eb074ff-9443-4fb6-90e7-33e7c749a0fen@googlegroups.com>,
> already5chosen@yahoo.com (Michael S) wrote:
>
>> It seems, power management on your Mac Mini is seriously better
>> than on Mac Air of Branimir.
>
> Mac Minis have mains power, and fans. If you're going to run anything
> computationally intensive on an M1, they're the best option at present.
>

Of course, who would run WCG 24/7 on fanless laptop :P

> John

--
something dumb

Re: My experience with Apple M1 chip

<c4Cdnfwkbpj0LHj9nZ2dnUU7-f_NnZ2d@giganews.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18465&group=comp.arch#18465

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!buffer2.nntp.dca1.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Wed, 07 Jul 2021 08:51:37 -0500
Newsgroups: comp.arch
Subject: Re: My experience with Apple M1 chip
References: <T6VEI.159$VU3.17@fx46.iad> <zQ5FI.779$VU3.610@fx46.iad> <2021Jul7.122809@mips.complang.tuwien.ac.at> <YigFI.3408$Nq7.1210@fx33.iad>
Organization: provalid.com
X-Newsreader: trn 4.0-test76 (Apr 2, 2001)
From: keg...@provalid.com (Kent Dickey)
Originator: kegs@provalid.com (Kent Dickey)
Message-ID: <c4Cdnfwkbpj0LHj9nZ2dnUU7-f_NnZ2d@giganews.com>
Date: Wed, 07 Jul 2021 08:51:37 -0500
Lines: 94
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-L163I1riY048jo4mcHTRzjbI0PYqaEm7EJzq66/RIk2T8bal7sFUyJCOnI5bfzduxEiT74E4CNRUYas!H4FuV2PqaUR+AU37zXhBAbYGGF+y8g0s3f98meyutD95nHEBho79pE6mlNcAYeqwcMt3LYM2n4w=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 4964
 by: Kent Dickey - Wed, 7 Jul 2021 13:51 UTC

In article <YigFI.3408$Nq7.1210@fx33.iad>,
Branimir Maksimovic <branimir.maksimovic@gmail.com> wrote:
>Please can you run scimark2?
>https://math.nist.gov/scimark2/download.html
>bmaxa@Branimirs-Air scimark2 % gcc-11 -O3 *.c -o scimark2
>bmaxa@Branimirs-Air scimark2 % ./scimark2
>** **
>** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
>** for details. (Results can be submitted to pozo@nist.gov) **
>** **
>Using 2.00 seconds min time per kenel.
>Composite Score: 3993.31
>FFT Mflops: 3299.58 (N=1024)
>SOR Mflops: 3191.51 (100 x 100)
>MonteCarlo: Mflops: 445.95
>Sparse matmult Mflops: 4330.37 (N=1000, nz=5000)
>LU Mflops: 8699.13 (M=100, N=100)
>
>
>--
>something dumb

I get different numbers on my Mac Mini M1 using the default clang compiler:

---
m1-mini-bash$ unzip scimark2_1c.zip
[ snip ]
m1-mini-bash$ cc -O3 -o scimark2.O3 *.c
m1-mini-bash$ ./scimark2.O3
** **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov) **
** **
Using 2.00 seconds min time per kenel.
Composite Score: 4008.07
FFT Mflops: 2972.29 (N=1024)
SOR Mflops: 2330.35 (100 x 100)
MonteCarlo: Mflops: 475.59
Sparse matmult Mflops: 3992.39 (N=1000, nz=5000)
LU Mflops: 10269.74 (M=100, N=100)

The MonteCarlo result is low, relative to my x86 machine.

There's a scimark4 benchmark which tries to avoid over-optimizing compilers
(I guess this benchmark didn't use all of it's results sometimes):

I get basically the same results.

---
m1-mini-bash$ unzip ../scimark4_c.zip
[ snip ]
m1-mini-bash$ cd scimark4/
m1-mini-bash$ make
Called make:
cc -O3 -funroll-loops -Wall -pedantic -ansi -c FFT.c
cc -O3 -funroll-loops -Wall -pedantic -ansi -c kernel.c
cc -O3 -funroll-loops -Wall -pedantic -ansi -c Stopwatch.c
cc -O3 -funroll-loops -Wall -pedantic -ansi -c Random.c
cc -O3 -funroll-loops -Wall -pedantic -ansi -c SOR.c
cc -O3 -funroll-loops -Wall -pedantic -ansi -c SparseCompRow.c
cc -O3 -funroll-loops -Wall -pedantic -ansi -c array.c
cc -O3 -funroll-loops -Wall -pedantic -ansi -c MonteCarlo.c
cc -O3 -funroll-loops -Wall -pedantic -ansi -c LU.c
cc -O3 -funroll-loops -Wall -pedantic -ansi -c scimark4.c
cc -O3 -funroll-loops -Wall -pedantic -ansi -o scimark4 FFT.o kernel.o Stopwatch.o Random.o SOR.o SparseCompRow.o array.o MonteCarlo.o LU.o scimark4.o -lm
clang: warning: argument unused during compilation: '-ansi' [-Wunused-command-line-argument]
m1-mini-bash$ ./scimark4
** **
** SciMark4 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov) **
** **
Using 2.00 seconds min time per kenel.

FFT Mflops: 2999.29 (N=1024)
SOR Mflops: 2330.24 (100 x 100)
MonteCarlo: Mflops: 475.55
Sparse matmult Mflops: 3994.09 (N=1000, nz=5000)
LU Mflops: 10160.98 (M=100, N=100)

************************************
Composite Score: 3992.03
************************************

FFT reps: 131072
SOR reps: 131072
Montel Carlo reps: 268435456
Sparse MatMult repss: 1048576
LU reps: 32768

checksum: 4.3411644672159821e+05
---

Kent

Re: My experience with Apple M1 chip

<OskFI.3416$Nq7.1937@fx33.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18470&group=comp.arch#18470

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx33.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: My experience with Apple M1 chip
References: <T6VEI.159$VU3.17@fx46.iad> <zQ5FI.779$VU3.610@fx46.iad>
<2021Jul7.122809@mips.complang.tuwien.ac.at> <YigFI.3408$Nq7.1210@fx33.iad>
<c4Cdnfwkbpj0LHj9nZ2dnUU7-f_NnZ2d@giganews.com>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 80
Message-ID: <OskFI.3416$Nq7.1937@fx33.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Wed, 07 Jul 2021 16:17:50 UTC
Organization: usenet-news.net
Date: Wed, 07 Jul 2021 16:17:50 GMT
X-Received-Bytes: 3717
 by: Branimir Maksimovic - Wed, 7 Jul 2021 16:17 UTC

On 2021-07-07, Kent Dickey <kegs@provalid.com> wrote:
> In article <YigFI.3408$Nq7.1210@fx33.iad>,
> Branimir Maksimovic <branimir.maksimovic@gmail.com> wrote:
>>Please can you run scimark2?
>>https://math.nist.gov/scimark2/download.html
>>bmaxa@Branimirs-Air scimark2 % gcc-11 -O3 *.c -o scimark2
>>bmaxa@Branimirs-Air scimark2 % ./scimark2
>>** **
>>** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
>>** for details. (Results can be submitted to pozo@nist.gov) **
>>** **
>>Using 2.00 seconds min time per kenel.
>>Composite Score: 3993.31
>>FFT Mflops: 3299.58 (N=1024)
>>SOR Mflops: 3191.51 (100 x 100)
>>MonteCarlo: Mflops: 445.95
>>Sparse matmult Mflops: 4330.37 (N=1000, nz=5000)
>>LU Mflops: 8699.13 (M=100, N=100)
>>
>>
>>--
>>something dumb
>
> I get different numbers on my Mac Mini M1 using the default clang compiler:
>
> ---
> m1-mini-bash$ unzip scimark2_1c.zip
> [ snip ]
> m1-mini-bash$ cc -O3 -o scimark2.O3 *.c
> m1-mini-bash$ ./scimark2.O3
> ** **
> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
> ** for details. (Results can be submitted to pozo@nist.gov) **
> ** **
> Using 2.00 seconds min time per kenel.
> Composite Score: 4008.07
> FFT Mflops: 2972.29 (N=1024)
> SOR Mflops: 2330.35 (100 x 100)
> MonteCarlo: Mflops: 475.59
> Sparse matmult Mflops: 3992.39 (N=1000, nz=5000)
> LU Mflops: 10269.74 (M=100, N=100)
>
> The MonteCarlo result is low, relative to my x86 machine.

that is quirk of clang/gcc
my rust bench:
https://github.com/bmaxa/scimark2rust

bmaxa@Branimirs-Air scimark2rust % ./target/release/scimark2
** **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov) **
** **
Using 2.00 seconds min time per kernel
Composite Score: 3228.17
FFT Mflops: 2325.65 (N=1024)
SOR Mflops: 963.48 (100 x 100)
MonteCarlo Mflops: 2147.48
Sparse matmult Mflops: 5242.88 (N=1000, nz=5000)
LU Nflops: 5461.33 (M=100, N=100)

overal lower score, but monte carlo high.

>
> There's a scimark4
I know it has unoptimised versions of files...

benchmark which tries to avoid over-optimizing compilers
> (I guess this benchmark didn't use all of it's results sometimes):
>
> I get basically the same results.

make defauult used same versions as in scimark2.
>
>

--
something dumb

Re: My experience with Apple M1 chip

<c5a7429b-b7e6-4fbc-ac98-bf5160c3a87dn@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18476&group=comp.arch#18476

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:620a:1026:: with SMTP id a6mr26310181qkk.331.1625684057409;
Wed, 07 Jul 2021 11:54:17 -0700 (PDT)
X-Received: by 2002:a4a:d781:: with SMTP id c1mr19334913oou.23.1625684057173;
Wed, 07 Jul 2021 11:54:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 7 Jul 2021 11:54:17 -0700 (PDT)
In-Reply-To: <187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.182.191; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.182.191
References: <T6VEI.159$VU3.17@fx46.iad> <19dcc459-6eb5-4191-a186-c50d12ed347fn@googlegroups.com>
<2l%EI.8$gE.2@fx21.iad> <80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com>
<SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com> <187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c5a7429b-b7e6-4fbc-ac98-bf5160c3a87dn@googlegroups.com>
Subject: Re: My experience with Apple M1 chip
From: already5...@yahoo.com (Michael S)
Injection-Date: Wed, 07 Jul 2021 18:54:17 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 270
 by: Michael S - Wed, 7 Jul 2021 18:54 UTC

On Wednesday, July 7, 2021 at 2:16:50 AM UTC+3, Michael S wrote:
> On Tuesday, July 6, 2021 at 10:08:27 PM UTC+3, Kent Dickey wrote:
> > In article <80ac50a0-dde2-4a66...@googlegroups.com>,
> > Michael S <already...@yahoo.com> wrote:
> > >On Tuesday, July 6, 2021 at 7:16:01 PM UTC+3, Branimir Maksimovic wrote:
> > >> On 2021-07-06, Michael S <already...@yahoo.com> wrote:
> > >> > I can believe that M1@3.2 GHz/Rosetta is able to run x64 software as
> > >fast as i3-8100B but have trouble believing that it could match i7-8700B
> > >either in single thread or in multithread throughput. Unless, of course,
> > >absolute majority of run time spent in native libraries.
> > >> You don't count that M1 is ~25-33% faster single core then any x86 :P
> > >
> > >I took it into account.
> > >
> > >Besides, while it's true for x86 CPUs in prev-gen Mac-Mini it's not true
> > >for *any* x86.
> > >M1 is slower than top Zen3 bins and about the same or a little slower
> > >than top Comet Lake.
> > >Probably somewhat slower than top Tiger Lake, but that comparison is
> > >rather close.
> > >Probably, measurably slower than top Rocket Lake, but I didn't look at
> > >Rocket Lake closely.
> > I have a Mac Mini M1, and it seems fast--very fast for some workloads (hard to
> > predict branches, or working set in the 100-200KB range). It is not the
> > fastest CPU on the planet, but it likely is the fastest laptop CPU. At < 10W
> > at the AC plug it compares pretty favorably to 60W CPUs. If you have a
> > relatively short benchmark (say, one file, C or C++, can be run from the Unix
> > command line, doesn't require me to install anything else, should run in less
> > than 5 minutes), I can compile it and run it for you, and then you can compare
> > those results to any system you like. I don't think comparing optimized AVX
> > is going to be useful, but simple integer or floating point algorithms would
> > be best.
> >
> > Kent
> The Euler-413 challenge that we discussed today in comp.lang.c for nDigits=11 should run in about 3 minutes.
> But who is going to test on the fastest Zen3 ?
> I have Ryzen 7 5800H at work, it's pretty fast and with single thread easily blows away 4.25GHz Skylake, but it's much slower than likes of 5900X.
>
> Code:
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
>
> static unsigned long long oneChildsInRange(int nDigits);
> int main(int argz, char** argv)
> {
> if (argz < 2) {
> fprintf(stderr, "Usage:\n%s nDigits\n", argv[0]);
> return 1;
> }
>
> char* endp;
> int nDigits = strtol(argv[1], &endp, 0);
> if (endp == argv[1]) {
> fprintf(stderr, "Bad nDigits argument '%s'. Not a number.\n", argv[1]);
> return 1;
> }
>
> if (nDigits < 5 || nDigits > 19) {
> fprintf(stderr, "Please specify nDigits argument in range [5:19].\n");
> return 1;
> }
>
> printf("%2d %20llu\n", nDigits, oneChildsInRange(nDigits));
> return 0;
> }
>
> static int countChilds(const uint8_t *digits, int nDigits, const uint8_t *remTab)
> {
> int nChilds = 0;
> for (int beg = 0; beg < nDigits; ++beg) {
> unsigned r = 0;
> for (int end = beg; end < nDigits; ++end) {
> r = remTab[r*10+digits[end]];
> nChilds += (r == 0);
> }
> if (nChilds > 1)
> break; // we don't try to distinguish between cases of (nChilds > 1)
> }
> return nChilds;
> }
>
> static int countChilds2(
> const uint8_t *prefix, int prefixlen,
> const uint8_t *suffix, int suffixlen,
> const uint8_t *remTab,
> int nChilds0)
> {
> int nChilds = nChilds0;
> for (int prefix_i = 0; prefix_i < prefixlen; ++prefix_i) {
> unsigned r = prefix[prefix_i];
> for (int i = 0; i < suffixlen; ++i) {
> r = remTab[r*10+suffix[i]];
> nChilds += (r == 0);
> }
> if (nChilds > 1)
> break; // we don't try to distinguish between cases of (nChilds > 1)
> }
> return nChilds;
> }
>
> static void preprocessPrefix(uint8_t *dst, const uint8_t *src, int nDigits, const uint8_t *remTab)
> {
> for (int beg = 0; beg < nDigits; ++beg) {
> unsigned r = 0;
> for (int end = beg; end < nDigits; ++end)
> r = remTab[r*10+src[end]];
> dst[beg] = r;
> }
> }
>
>
> static unsigned long long intpow(unsigned base, int pow)
> {
> unsigned long long prod = 1;
> for (int k = 0; k < pow; ++k)
> prod *= base;
> return prod;
> }
>
> static void to_digits(uint8_t* dst, unsigned long long x, int nDigits)
> {
> for (int k = 0; k < nDigits; ++k) {
> dst[nDigits-1-k] = x % 10;
> x /= 10;
> }
> }
>
> static unsigned long long oneChildsInRange(int nDigits)
> {
> // initialize look-up table
> uint8_t remTab[200];
> for (int i = 0; i < nDigits*10; ++i)
> remTab[i] = i % nDigits;
>
> // initialize table of suffixes
> uint8_t suffixes[10000][4];
> int nSuff0 = 0;
> int nSuff1 = 0;
> for (int i = 0; i < 10000; ++i) {
> uint8_t suffix[4];
> to_digits(suffix, i, 4); // convert suffix to array of digits
> int nc = countChilds(suffix, 4, remTab);
> if (nc < 2) {
> if (nc == 0) {
> memcpy(suffixes[nSuff0], suffix, sizeof(suffixes[0]));
> ++nSuff0;
> } else { // nc==1
> memcpy(suffixes[9999-nSuff1], suffix, sizeof(suffixes[0])); // store starting from the end of array
> ++nSuff1;
> }
> }
> }
> if (nSuff1 > 0) // make suffixes[] array continuous
> memmove(suffixes[nSuff0], suffixes[10000-nSuff1], nSuff1*sizeof(suffixes[0]));
>
> unsigned long long cnt = 0;
> unsigned long long pref0 = intpow(10, nDigits-5);
> for (unsigned long long pref = pref0; pref < pref0*10; ++pref) {
> uint8_t prefix[20];
> to_digits(prefix, pref, nDigits-4); // convert prefix to array of digits
> int nc = countChilds(prefix, nDigits-4, remTab);
> if (nc < 2) {
> uint8_t processed_prefix[20];
> preprocessPrefix(processed_prefix, prefix, nDigits-4, remTab);
> for (int i = 0; i < nSuff0; ++i) // concatenate suffix with 0 children to prefix with 0 or 1 children
> cnt += (countChilds2(processed_prefix, nDigits-4, suffixes[i], 4, remTab, nc)==1);
> if (nc == 0) {
> for (int i = nSuff0; i < nSuff0+nSuff1; ++i) // concatenate suffix with 1 child to prefix with 0 children
> cnt += (countChilds2(processed_prefix, nDigits-4, suffixes[i], 4, remTab, 1)==1);
> }
> }
> }
>
> return cnt;
> }

In the mean time I simplified and improved this program.
With new variant nDigits=11 is no longer interesting as a benchmark (too fast), but nDigits=12 and nDigits=13 are now well suited.
On my Xeon-E they took, respectively, 9m47.099s and 1m50.575s
Code:

//-- beg
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static unsigned long long oneChildsInRange(int nDigits);
int main(int argz, char** argv)
{ if (argz < 2) {
fprintf(stderr, "Usage:\n%s nDigits\n", argv[0]);
return 1;
}

char* endp;
int nDigits = strtol(argv[1], &endp, 0);
if (endp == argv[1]) {
fprintf(stderr, "Bad nDigits argument '%s'. Not a number.\n", argv[1]);
return 1;
}

if (nDigits < 11 || nDigits > 19) {
fprintf(stderr, "Please specify nDigits argument in range [11:19].\n");
return 1;
}

printf("%2d %20llu\n", nDigits, oneChildsInRange(nDigits));
return 0;
}

typedef struct {
uint8_t isChildTab[19+10]; // [i] = i % nDigits == 0
uint8_t x10remTab [19+10]; // [i] = (10*i) % nDigits
} tabs_t;

static unsigned long long countChildsRecursive(
int prefix_nChilds, // 0 or 1
const uint8_t prefixRem[],
int prefixlen,
const tabs_t* tabs)
{ unsigned long long cnt = 0;
for (int suffix = prefix_nChilds; suffix < 10; ++suffix) {
int nChilds = suffix ? prefix_nChilds : 1;
const uint8_t *isChild = &tabs->isChildTab[suffix];
for (int i = 0; i < prefixlen; ++i)
nChilds += isChild[prefixRem[i]];

if (nChilds < 2) {
if (tabs->isChildTab[prefixlen+1]) { // all digits processed
cnt += nChilds;
} else {
// extend prefix
uint8_t prefixRemEx[20];
for (int i = 0; i < prefixlen; ++i)
prefixRemEx[i] = tabs->x10remTab[prefixRem[i]+suffix];
prefixRemEx[prefixlen] = tabs->x10remTab[suffix];
cnt += countChildsRecursive(nChilds, prefixRemEx, prefixlen+1, tabs);
}
}
}
return cnt;
}


Click here to read the complete article
Re: My experience with Apple M1 chip

<292a5785-5c1e-4345-b9d2-dad5aefb5817n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18477&group=comp.arch#18477

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1a1d:: with SMTP id f29mr24026214qtb.200.1625684174004;
Wed, 07 Jul 2021 11:56:14 -0700 (PDT)
X-Received: by 2002:a9d:5603:: with SMTP id e3mr14442882oti.178.1625684173811;
Wed, 07 Jul 2021 11:56:13 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Wed, 7 Jul 2021 11:56:13 -0700 (PDT)
In-Reply-To: <2021Jul7.122809@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=87.68.182.191; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 87.68.182.191
References: <T6VEI.159$VU3.17@fx46.iad> <19dcc459-6eb5-4191-a186-c50d12ed347fn@googlegroups.com>
<2l%EI.8$gE.2@fx21.iad> <80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com>
<SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com> <187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com>
<zQ5FI.779$VU3.610@fx46.iad> <2021Jul7.122809@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <292a5785-5c1e-4345-b9d2-dad5aefb5817n@googlegroups.com>
Subject: Re: My experience with Apple M1 chip
From: already5...@yahoo.com (Michael S)
Injection-Date: Wed, 07 Jul 2021 18:56:13 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Michael S - Wed, 7 Jul 2021 18:56 UTC

On Wednesday, July 7, 2021 at 1:34:55 PM UTC+3, Anton Ertl wrote:
> Branimir Maksimovic <branimir....@gmail.com> writes:
> >> The Euler-413 challenge that we discussed today in comp.lang.c for nDigits=11 should run in about 3 minutes.
> >
> >bmaxa@Branimirs-Air euler % time ./euler413 11
> >11 71101800
> >./euler413 11 62.07s user 0.16s system 99% cpu 1:02.39 total
> >
> >
> >Minute on M1...
> >
> >> But who is going to test on the fastest Zen3 ?
> >
> >Someone with fastest Zen3? :P
> On a 5800X (and the code compiled with gcc-10.2 -O3):
>
> perf stat -e cycles a.out 11
> 11 71101800
>
> Performance counter stats for 'a.out 11':
>
> 236931852739 cycles
>
> 49.974309558 seconds time elapsed
>
> 49.971624000 seconds user
> 0.000000000 seconds sys
>
> The 5800X runs this benchmark at 4741MHz (on average), slightly above
> the turbo frequency of 4700MHz.
>

5800X is certainly fast, but according to my understanding, 5900X and 5950X are faster.

> - anton
> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: My experience with Apple M1 chip

<2021Jul8.084725@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18490&group=comp.arch#18490

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: My experience with Apple M1 chip
Date: Thu, 08 Jul 2021 06:47:25 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 42
Message-ID: <2021Jul8.084725@mips.complang.tuwien.ac.at>
References: <T6VEI.159$VU3.17@fx46.iad> <19dcc459-6eb5-4191-a186-c50d12ed347fn@googlegroups.com> <2l%EI.8$gE.2@fx21.iad> <80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com> <SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com> <187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com> <zQ5FI.779$VU3.610@fx46.iad> <2021Jul7.122809@mips.complang.tuwien.ac.at> <YigFI.3408$Nq7.1210@fx33.iad>
Injection-Info: reader02.eternal-september.org; posting-host="4046eb382dfb208213332e8b4156df24";
logging-data="29361"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19C2RQRJa2kOBuWZCOvKl0e"
Cancel-Lock: sha1:02jCwyNhgPWoIWAG5nyqmxr05Yg=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Thu, 8 Jul 2021 06:47 UTC

Branimir Maksimovic <branimir.maksimovic@gmail.com> writes:
>Please can you run scimark2?
>https://math.nist.gov/scimark2/download.html
>bmaxa@Branimirs-Air scimark2 % gcc-11 -O3 *.c -o scimark2
>bmaxa@Branimirs-Air scimark2 % ./scimark2
>** **
>** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
>** for details. (Results can be submitted to pozo@nist.gov) **
>** **
>Using 2.00 seconds min time per kenel.
>Composite Score: 3993.31
>FFT Mflops: 3299.58 (N=1024)
>SOR Mflops: 3191.51 (100 x 100)
>MonteCarlo: Mflops: 445.95
>Sparse matmult Mflops: 4330.37 (N=1000, nz=5000)
>LU Mflops: 8699.13 (M=100, N=100)

A few variations:

Ryzen 7 5800x
https://math.nist.gov/scimark2/scimark2_1c.zip
gcc -O3 *.c -lm -o scimark2-gcc10 or clang -O3 *.c -lm -o scimark2-clang11

** **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to pozo@nist.gov) **
** **
Using 2.00 seconds min time per kenel.
gcc10 clang11
Composite Score: 3938.53 3969.17
FFT Mflops: 3534.28 3308.27 (N=1024)
SOR Mflops: 2792.49 2797.77 (100 x 100)
MonteCarlo: Mflops: 864.78 853.07
Sparse matmult Mflops: 4646.79 5167.34 (N=1000, nz=5000)
LU Mflops: 7854.31 7719.41 (M=100, N=100)

The M1 is pretty good. Too bad it's only available in Apple products.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: My experience with Apple M1 chip

<2021Jul8.091724@mips.complang.tuwien.ac.at>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18492&group=comp.arch#18492

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: ant...@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: My experience with Apple M1 chip
Date: Thu, 08 Jul 2021 07:17:24 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 43
Message-ID: <2021Jul8.091724@mips.complang.tuwien.ac.at>
References: <T6VEI.159$VU3.17@fx46.iad> <19dcc459-6eb5-4191-a186-c50d12ed347fn@googlegroups.com> <2l%EI.8$gE.2@fx21.iad> <80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com> <SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com> <187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com> <zQ5FI.779$VU3.610@fx46.iad> <2021Jul7.122809@mips.complang.tuwien.ac.at> <292a5785-5c1e-4345-b9d2-dad5aefb5817n@googlegroups.com>
Injection-Info: reader02.eternal-september.org; posting-host="4046eb382dfb208213332e8b4156df24";
logging-data="29361"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18wQQ2U1f5YwggWAaZ9fwGn"
Cancel-Lock: sha1:vHoGk2YlI2KAF+nfaFGoqc5qM5w=
X-newsreader: xrn 10.00-beta-3
 by: Anton Ertl - Thu, 8 Jul 2021 07:17 UTC

Michael S <already5chosen@yahoo.com> writes:
>On Wednesday, July 7, 2021 at 1:34:55 PM UTC+3, Anton Ertl wrote:
>> Branimir Maksimovic <branimir....@gmail.com> writes:
>> >> The Euler-413 challenge that we discussed today in comp.lang.c for nDigits=11 should run in about 3 minutes.
>> >
>> >bmaxa@Branimirs-Air euler % time ./euler413 11
>> >11 71101800
>> >./euler413 11 62.07s user 0.16s system 99% cpu 1:02.39 total
>> >
>> >
>> >Minute on M1...
>> >
>> >> But who is going to test on the fastest Zen3 ?
>> >
>> >Someone with fastest Zen3? :P
>> On a 5800X (and the code compiled with gcc-10.2 -O3):
>>
>> perf stat -e cycles a.out 11
>> 11 71101800
>>
>> Performance counter stats for 'a.out 11':
>>
>> 236931852739 cycles
>>
>> 49.974309558 seconds time elapsed
>>
>> 49.971624000 seconds user
>> 0.000000000 seconds sys
>>
>> The 5800X runs this benchmark at 4741MHz (on average), slightly above
>> the turbo frequency of 4700MHz.
>>
>
>5800X is certainly fast, but according to my understanding, 5900X and 5950X are faster.

The 5950X has 4900MHz official boost clock, so one can expect ~47.x s
for this binary (unless it is L3 or memory bound, in which case I
would not expect a speedup from the 5950X).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Re: My experience with Apple M1 chip

<fbfc0035-9ce6-423a-ba60-a2fdc27ab080n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18493&group=comp.arch#18493

  copy link   Newsgroups: comp.arch
X-Received: by 2002:ac8:5cd5:: with SMTP id s21mr26791113qta.192.1625729995883; Thu, 08 Jul 2021 00:39:55 -0700 (PDT)
X-Received: by 2002:a4a:e923:: with SMTP id a3mr21196505ooe.45.1625729995554; Thu, 08 Jul 2021 00:39:55 -0700 (PDT)
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed9.news.xs4all.nl!tr2.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 8 Jul 2021 00:39:55 -0700 (PDT)
In-Reply-To: <2021Jul8.091724@mips.complang.tuwien.ac.at>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <T6VEI.159$VU3.17@fx46.iad> <19dcc459-6eb5-4191-a186-c50d12ed347fn@googlegroups.com> <2l%EI.8$gE.2@fx21.iad> <80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com> <SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com> <187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com> <zQ5FI.779$VU3.610@fx46.iad> <2021Jul7.122809@mips.complang.tuwien.ac.at> <292a5785-5c1e-4345-b9d2-dad5aefb5817n@googlegroups.com> <2021Jul8.091724@mips.complang.tuwien.ac.at>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fbfc0035-9ce6-423a-ba60-a2fdc27ab080n@googlegroups.com>
Subject: Re: My experience with Apple M1 chip
From: already5...@yahoo.com (Michael S)
Injection-Date: Thu, 08 Jul 2021 07:39:55 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 47
 by: Michael S - Thu, 8 Jul 2021 07:39 UTC

On Thursday, July 8, 2021 at 10:22:59 AM UTC+3, Anton Ertl wrote:
> Michael S <already...@yahoo.com> writes:
> >On Wednesday, July 7, 2021 at 1:34:55 PM UTC+3, Anton Ertl wrote:
> >> Branimir Maksimovic <branimir....@gmail.com> writes:
> >> >> The Euler-413 challenge that we discussed today in comp.lang.c for nDigits=11 should run in about 3 minutes.
> >> >
> >> >bmaxa@Branimirs-Air euler % time ./euler413 11
> >> >11 71101800
> >> >./euler413 11 62.07s user 0.16s system 99% cpu 1:02.39 total
> >> >
> >> >
> >> >Minute on M1...
> >> >
> >> >> But who is going to test on the fastest Zen3 ?
> >> >
> >> >Someone with fastest Zen3? :P
> >> On a 5800X (and the code compiled with gcc-10.2 -O3):
> >>
> >> perf stat -e cycles a.out 11
> >> 11 71101800
> >>
> >> Performance counter stats for 'a.out 11':
> >>
> >> 236931852739 cycles
> >>
> >> 49.974309558 seconds time elapsed
> >>
> >> 49.971624000 seconds user
> >> 0.000000000 seconds sys
> >>
> >> The 5800X runs this benchmark at 4741MHz (on average), slightly above
> >> the turbo frequency of 4700MHz.
> >>
> >
> >5800X is certainly fast, but according to my understanding, 5900X and 5950X are faster.
> The 5950X has 4900MHz official boost clock, so one can expect ~47.x s
> for this binary (unless it is L3 or memory bound, in which case I
> would not expect a speedup from the 5950X).
> - anton

Computations in this program are heavily L1D & arithmetic bound.
Even L2 is barely touches on Zen3 and likely not touched at all on M1.
So, I expect perfect performance scaling with CPU frequency. But expectations are not the same as measurements.

> --
> 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
> Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Re: My experience with Apple M1 chip

<_9CFI.2808$Nz.1791@fx22.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18497&group=comp.arch#18497

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.dns-netz.com!news.freedyn.net!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!feeder1.feed.usenet.farm!feed.usenet.farm!news-out.netnews.com!news.alt.net!fdc3.netnews.com!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx22.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: My experience with Apple M1 chip
References: <T6VEI.159$VU3.17@fx46.iad>
<19dcc459-6eb5-4191-a186-c50d12ed347fn@googlegroups.com>
<2l%EI.8$gE.2@fx21.iad>
<80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com>
<SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com>
<187875de-0cd7-4e6e-b4a9-71a9eb1f5527n@googlegroups.com>
<zQ5FI.779$VU3.610@fx46.iad> <2021Jul7.122809@mips.complang.tuwien.ac.at>
<YigFI.3408$Nq7.1210@fx33.iad> <2021Jul8.084725@mips.complang.tuwien.ac.at>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 50
Message-ID: <_9CFI.2808$Nz.1791@fx22.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Thu, 08 Jul 2021 12:26:34 UTC
Organization: usenet-news.net
Date: Thu, 08 Jul 2021 12:26:34 GMT
X-Received-Bytes: 3092
 by: Branimir Maksimovic - Thu, 8 Jul 2021 12:26 UTC

On 2021-07-08, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> Branimir Maksimovic <branimir.maksimovic@gmail.com> writes:
>>Please can you run scimark2?
>>https://math.nist.gov/scimark2/download.html
>>bmaxa@Branimirs-Air scimark2 % gcc-11 -O3 *.c -o scimark2
>>bmaxa@Branimirs-Air scimark2 % ./scimark2
>>** **
>>** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
>>** for details. (Results can be submitted to pozo@nist.gov) **
>>** **
>>Using 2.00 seconds min time per kenel.
>>Composite Score: 3993.31
>>FFT Mflops: 3299.58 (N=1024)
>>SOR Mflops: 3191.51 (100 x 100)
>>MonteCarlo: Mflops: 445.95
>>Sparse matmult Mflops: 4330.37 (N=1000, nz=5000)
>>LU Mflops: 8699.13 (M=100, N=100)
>
> A few variations:
>
> Ryzen 7 5800x
> https://math.nist.gov/scimark2/scimark2_1c.zip
> gcc -O3 *.c -lm -o scimark2-gcc10 or clang -O3 *.c -lm -o scimark2-clang11
>
> ** **
> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
> ** for details. (Results can be submitted to pozo@nist.gov) **
> ** **
> Using 2.00 seconds min time per kenel.
> gcc10 clang11
> Composite Score: 3938.53 3969.17
> FFT Mflops: 3534.28 3308.27 (N=1024)
> SOR Mflops: 2792.49 2797.77 (100 x 100)
> MonteCarlo: Mflops: 864.78 853.07
> Sparse matmult Mflops: 4646.79 5167.34 (N=1000, nz=5000)
> LU Mflops: 7854.31 7719.41 (M=100, N=100)
>
> The M1 is pretty good. Too bad it's only available in Apple products.

Worse is that I can't install Linux at this time so I must suffer macOS :(
Well, I can live with that, because there is homebrew :p
working fom command line as used to, alacritty+tmux :P

>
> - anton

--
something dumb

Re: My experience with Apple M1 chip

<d5a8db38-b2b8-403d-9692-b4bdbd4388f0n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18498&group=comp.arch#18498

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:6214:410f:: with SMTP id kc15mr9272326qvb.50.1625751447857;
Thu, 08 Jul 2021 06:37:27 -0700 (PDT)
X-Received: by 2002:a9d:6b0b:: with SMTP id g11mr11576836otp.240.1625751447599;
Thu, 08 Jul 2021 06:37:27 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 8 Jul 2021 06:37:27 -0700 (PDT)
In-Reply-To: <A74FI.172$Yv3.30@fx41.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <T6VEI.159$VU3.17@fx46.iad> <19dcc459-6eb5-4191-a186-c50d12ed347fn@googlegroups.com>
<2l%EI.8$gE.2@fx21.iad> <80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com>
<SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com> <A74FI.172$Yv3.30@fx41.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <d5a8db38-b2b8-403d-9692-b4bdbd4388f0n@googlegroups.com>
Subject: Re: My experience with Apple M1 chip
From: already5...@yahoo.com (Michael S)
Injection-Date: Thu, 08 Jul 2021 13:37:27 +0000
Content-Type: text/plain; charset="UTF-8"
 by: Michael S - Thu, 8 Jul 2021 13:37 UTC

On Wednesday, July 7, 2021 at 12:43:00 AM UTC+3, Branimir Maksimovic wrote:
> On 2021-07-06, Kent Dickey <ke...@provalid.com> wrote:
> > In article <80ac50a0-dde2-4a66...@googlegroups.com>,
> > Michael S <already...@yahoo.com> wrote:
> >>On Tuesday, July 6, 2021 at 7:16:01 PM UTC+3, Branimir Maksimovic wrote:
> >>> On 2021-07-06, Michael S <already...@yahoo.com> wrote:
> >>> > I can believe that M1@3.2 GHz/Rosetta is able to run x64 software as
> >>fast as i3-8100B but have trouble believing that it could match i7-8700B
> >>either in single thread or in multithread throughput. Unless, of course,
> >>absolute majority of run time spent in native libraries.
> >>> You don't count that M1 is ~25-33% faster single core then any x86 :P
> >>
> >>I took it into account.
> >>
> >>Besides, while it's true for x86 CPUs in prev-gen Mac-Mini it's not true
> >>for *any* x86.
> >>M1 is slower than top Zen3 bins and about the same or a little slower
> >>than top Comet Lake.
> >>Probably somewhat slower than top Tiger Lake, but that comparison is
> >>rather close.
> >>Probably, measurably slower than top Rocket Lake, but I didn't look at
> >>Rocket Lake closely.
> >
> > I have a Mac Mini M1, and it seems fast--very fast for some workloads (hard to
> > predict branches,
> Yes, my assembler doesn't work as well as on x86 ;p
> likes loop unrolling very much :p
> Got 10% just coknverting suboutine in macro and calling several times :P
> or working set in the 100-200KB range). It is not the
> > fastest CPU on the planet, but it likely is the fastest laptop CPU.
> Sorry can't gree. Blows away x86 alright.
> At < 10W
> > at the AC plug it compares pretty favorably to 60W CPUs. If you have a
> > relatively short benchmark (say, one file, C or C++, can be run from the Unix
> > command line, doesn't require me to install anything else, should run in less
> > than 5 minutes), I can compile it and run it for you, and then you can compare
> > those results to any system you like. I don't think comparing optimized AVX
> > is going to be useful, but simple integer or floating point algorithms would
> > be best.
> It can compare with optimised AVX alright :P
>

Do you want to test it?
I have a linear algebra core (Cholesky decomposition) coded in optimised AVX (Intel intrisinc) and the same algorithm in
plain C++ that can be, in theory, vectorized by good compiler.

The most interesting would be compiling for Intel Mac and then running binary through Rosetta, as Kent did for my first test.
I don't know if you can do it without Intel Mac.

> >
> > Kent
>
>
> --
> something dumb

Re: My experience with Apple M1 chip

<xmDFI.1310$6U5.249@fx02.iad>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18499&group=comp.arch#18499

  copy link   Newsgroups: comp.arch
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!news-out.netnews.com!news.alt.net!fdc2.netnews.com!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx02.iad.POSTED!not-for-mail
Newsgroups: comp.arch
From: branimir...@gmail.com (Branimir Maksimovic)
Subject: Re: My experience with Apple M1 chip
References: <T6VEI.159$VU3.17@fx46.iad>
<19dcc459-6eb5-4191-a186-c50d12ed347fn@googlegroups.com>
<2l%EI.8$gE.2@fx21.iad>
<80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com>
<SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com> <A74FI.172$Yv3.30@fx41.iad>
<d5a8db38-b2b8-403d-9692-b4bdbd4388f0n@googlegroups.com>
User-Agent: slrn/1.0.3 (Darwin)
Lines: 68
Message-ID: <xmDFI.1310$6U5.249@fx02.iad>
X-Complaints-To: abuse@usenet-news.net
NNTP-Posting-Date: Thu, 08 Jul 2021 13:48:13 UTC
Organization: usenet-news.net
Date: Thu, 08 Jul 2021 13:48:13 GMT
X-Received-Bytes: 3894
 by: Branimir Maksimovic - Thu, 8 Jul 2021 13:48 UTC

On 2021-07-08, Michael S <already5chosen@yahoo.com> wrote:
> On Wednesday, July 7, 2021 at 12:43:00 AM UTC+3, Branimir Maksimovic wrote:
>> On 2021-07-06, Kent Dickey <ke...@provalid.com> wrote:
>> > In article <80ac50a0-dde2-4a66...@googlegroups.com>,
>> > Michael S <already...@yahoo.com> wrote:
>> >>On Tuesday, July 6, 2021 at 7:16:01 PM UTC+3, Branimir Maksimovic wrote:
>> >>> On 2021-07-06, Michael S <already...@yahoo.com> wrote:
>> >>> > I can believe that M1@3.2 GHz/Rosetta is able to run x64 software as
>> >>fast as i3-8100B but have trouble believing that it could match i7-8700B
>> >>either in single thread or in multithread throughput. Unless, of course,
>> >>absolute majority of run time spent in native libraries.
>> >>> You don't count that M1 is ~25-33% faster single core then any x86 :P
>> >>
>> >>I took it into account.
>> >>
>> >>Besides, while it's true for x86 CPUs in prev-gen Mac-Mini it's not true
>> >>for *any* x86.
>> >>M1 is slower than top Zen3 bins and about the same or a little slower
>> >>than top Comet Lake.
>> >>Probably somewhat slower than top Tiger Lake, but that comparison is
>> >>rather close.
>> >>Probably, measurably slower than top Rocket Lake, but I didn't look at
>> >>Rocket Lake closely.
>> >
>> > I have a Mac Mini M1, and it seems fast--very fast for some workloads (hard to
>> > predict branches,
>> Yes, my assembler doesn't work as well as on x86 ;p
>> likes loop unrolling very much :p
>> Got 10% just coknverting suboutine in macro and calling several times :P
>> or working set in the 100-200KB range). It is not the
>> > fastest CPU on the planet, but it likely is the fastest laptop CPU.
>> Sorry can't gree. Blows away x86 alright.
>> At < 10W
>> > at the AC plug it compares pretty favorably to 60W CPUs. If you have a
>> > relatively short benchmark (say, one file, C or C++, can be run from the Unix
>> > command line, doesn't require me to install anything else, should run in less
>> > than 5 minutes), I can compile it and run it for you, and then you can compare
>> > those results to any system you like. I don't think comparing optimized AVX
>> > is going to be useful, but simple integer or floating point algorithms would
>> > be best.
>> It can compare with optimised AVX alright :P
>>
>
> Do you want to test it?
> I have a linear algebra core (Cholesky decomposition) coded in optimised AVX (Intel intrisinc) and the same algorithm in
> plain C++ that can be, in theory, vectorized by good compiler.

Of course, would be interresting how M1 does that.

>
> The most interesting would be compiling for Intel Mac and then running binary through Rosetta, as Kent did for my first test.
> I don't know if you can do it without Intel Mac.

Apples gcc support cross compilation. I build x86 binaries no problem. Only thing is if they contain AVX code won't work
as Rosetta does not supports AVX...
>
>
>
>> >
>> > Kent
>>
>>
>> --
>> something dumb

--
something dumb

Re: My experience with Apple M1 chip

<84dc1c7c-4db9-4699-9fb0-a97a6cc3cf27n@googlegroups.com>

  copy mid

https://www.novabbs.com/devel/article-flat.php?id=18501&group=comp.arch#18501

  copy link   Newsgroups: comp.arch
X-Received: by 2002:a05:622a:1c4:: with SMTP id t4mr17946632qtw.140.1625753926293; Thu, 08 Jul 2021 07:18:46 -0700 (PDT)
X-Received: by 2002:a05:6808:1313:: with SMTP id y19mr22649075oiv.37.1625753926045; Thu, 08 Jul 2021 07:18:46 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.uzoreto.com!tr1.eu1.usenetexpress.com!feeder.usenetexpress.com!tr2.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.arch
Date: Thu, 8 Jul 2021 07:18:45 -0700 (PDT)
In-Reply-To: <xmDFI.1310$6U5.249@fx02.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=199.203.251.52; posting-account=ow8VOgoAAAAfiGNvoH__Y4ADRwQF1hZW
NNTP-Posting-Host: 199.203.251.52
References: <T6VEI.159$VU3.17@fx46.iad> <19dcc459-6eb5-4191-a186-c50d12ed347fn@googlegroups.com> <2l%EI.8$gE.2@fx21.iad> <80ac50a0-dde2-4a66-b09c-62663cd5b4aan@googlegroups.com> <SJGdnUhI6OG5N3n9nZ2dnUU7-ffNnZ2d@giganews.com> <A74FI.172$Yv3.30@fx41.iad> <d5a8db38-b2b8-403d-9692-b4bdbd4388f0n@googlegroups.com> <xmDFI.1310$6U5.249@fx02.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <84dc1c7c-4db9-4699-9fb0-a97a6cc3cf27n@googlegroups.com>
Subject: Re: My experience with Apple M1 chip
From: already5...@yahoo.com (Michael S)
Injection-Date: Thu, 08 Jul 2021 14:18:46 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 79
 by: Michael S - Thu, 8 Jul 2021 14:18 UTC

On Thursday, July 8, 2021 at 4:48:16 PM UTC+3, Branimir Maksimovic wrote:
> On 2021-07-08, Michael S <already...@yahoo.com> wrote:
> > On Wednesday, July 7, 2021 at 12:43:00 AM UTC+3, Branimir Maksimovic wrote:
> >> On 2021-07-06, Kent Dickey <ke...@provalid.com> wrote:
> >> > In article <80ac50a0-dde2-4a66...@googlegroups.com>,
> >> > Michael S <already...@yahoo.com> wrote:
> >> >>On Tuesday, July 6, 2021 at 7:16:01 PM UTC+3, Branimir Maksimovic wrote:
> >> >>> On 2021-07-06, Michael S <already...@yahoo.com> wrote:
> >> >>> > I can believe that M1@3.2 GHz/Rosetta is able to run x64 software as
> >> >>fast as i3-8100B but have trouble believing that it could match i7-8700B
> >> >>either in single thread or in multithread throughput. Unless, of course,
> >> >>absolute majority of run time spent in native libraries.
> >> >>> You don't count that M1 is ~25-33% faster single core then any x86 :P
> >> >>
> >> >>I took it into account.
> >> >>
> >> >>Besides, while it's true for x86 CPUs in prev-gen Mac-Mini it's not true
> >> >>for *any* x86.
> >> >>M1 is slower than top Zen3 bins and about the same or a little slower
> >> >>than top Comet Lake.
> >> >>Probably somewhat slower than top Tiger Lake, but that comparison is
> >> >>rather close.
> >> >>Probably, measurably slower than top Rocket Lake, but I didn't look at
> >> >>Rocket Lake closely.
> >> >
> >> > I have a Mac Mini M1, and it seems fast--very fast for some workloads (hard to
> >> > predict branches,
> >> Yes, my assembler doesn't work as well as on x86 ;p
> >> likes loop unrolling very much :p
> >> Got 10% just coknverting suboutine in macro and calling several times :P
> >> or working set in the 100-200KB range). It is not the
> >> > fastest CPU on the planet, but it likely is the fastest laptop CPU.
> >> Sorry can't gree. Blows away x86 alright.
> >> At < 10W
> >> > at the AC plug it compares pretty favorably to 60W CPUs. If you have a
> >> > relatively short benchmark (say, one file, C or C++, can be run from the Unix
> >> > command line, doesn't require me to install anything else, should run in less
> >> > than 5 minutes), I can compile it and run it for you, and then you can compare
> >> > those results to any system you like. I don't think comparing optimized AVX
> >> > is going to be useful, but simple integer or floating point algorithms would
> >> > be best.
> >> It can compare with optimised AVX alright :P
> >>
> >
> > Do you want to test it?
> > I have a linear algebra core (Cholesky decomposition) coded in optimised AVX (Intel intrisinc) and the same algorithm in
> > plain C++ that can be, in theory, vectorized by good compiler.
> Of course, would be interresting how M1 does that.

It's in my public github repo already5chosen/others under directory cholesky_solver.
You can try it yourself, e.g. outer_product_c2x2hiv for intrinsic-based variant vs outer_product_c2x2hi
for the same algorithm in plain c++.
For speed measurement, I was mostly concerned with N=85.

Unfortunately, there is a big chance that without additional explanations you will not be able proceed.
The repo was intended for myself so didn't contain comprehensive readme.
Tomorrow, or much later today, I could answer few questions, but right now I have to work. :(

> >
> > The most interesting would be compiling for Intel Mac and then running binary through Rosetta, as Kent did for my first test.
> > I don't know if you can do it without Intel Mac.
> Apples gcc support cross compilation. I build x86 binaries no problem. Only thing is if they contain AVX code won't work
> as Rosetta does not supports AVX...

That a pity.
I heard about it a year ago, but completely forgot.

> >
> >
> >
> >> >
> >> > Kent
> >>
> >>
> >> --
> >> something dumb
>
>
> --
> something dumb

Pages:123
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor