novaBBS - sci.math - More of philosophy about China and about USA and about Exascale computers..

Hello,

More of philosophy about China and about USA and about Exascale computers..

I am a white arab from Morocco, and i think i am smart since i have also
invented many scalable algorithms and algorithms..

China has already reached Exascale - on two separate systems

And in USA Intel's Aurora Supercomputer Now Expected to Exceed 2 ExaFLOPS Performance

But Exascale supercomputers will also allow to construct an accurate map
of the brain that allows to "reverse" engineer or understand the brain,
read the following so that to notice it:

“If we don’t improve today’s technology, the compute time for a whole
mouse brain would be something like 1,000,000 days of work on current
supercomputers. Using all of Aurora, if everything worked beautifully,
it could still take 1,000 days.” Nicola Ferrier, Argonne senior computer
scientist

Also the goal of delivering safe, abundant, cheap energy from fusion is
just one of many challenges in which exascale computing’s power may
prove decisive. That’s the hope and expectation. Also to know more about
the other benefits of using Exascale computing power, read more here:

https://www.hpcwire.com/2019/05/07/ten-great-reasons-among-many-more-to-build-the-1-5-exaflops-frontier/

More of my philosophy about 3D stacking in CPUs and more..

3D stacking offers an extension for Moore’s Law, but in 3D stacking
Heat removal is the issue and the big problem, this is why the actual
technologies like the 3D stacking of Intel are limited to stacking just
two or few layers.

More of my philosophy about more of my philosophy about Moore’s Law and
EUV (Extreme ultraviolet lithography)..

Researchers have proposed successors to EUV, including e-beam and
nanoimprint lithography, but have not found any of them to be reliable
enough to justify substantial investment.

And I think by also using EUV (Extreme ultraviolet lithography) to
create CPUs we will extend Moore's law by around 15 years that
corresponds to around 100x scalability in performance, and i think that
it is the same performance of 100x as the following invention from graphene:

About graphene and about unlocking Moore’s Law..

I think that graphene can now be mass produced, you can read about it here:

We May Finally Have a Way of Mass Producing Graphene

It's as simple as one, two, three.

So the following invention will be possible:

Physicists Create Microchip 100 Times Faster Than Conventional Ones

More philosophy about the Microchips that are 100 Times or 1000 times
Faster Than Conventional Ones..

I think that the following invention of Microchips that are 100 Times
or 1000 times Faster Than Conventional Ones has its weakness, since
its weakness is cache-coherence traffic between cores that
takes time, so i think that they are speaking about 100-times
or 1000-times more speed in a single core performance, so
parallelism is still necessary and you need scalable algorithms
for that so that to scale much more on multicores CPUs..

Physicists Create Microchip 100 Times Faster Than Conventional Ones

More of my philosophy about the knee of an M/M/n queue and more..

Here is the mathematical equation of the knee of an M/M/n queue in
queuing theory in operational research:

1/(n+1)^(1/n)

n is the number of servers.

So then an M/M/1 has a knee of 50% of the utilization, and the one of
an M/M/2 is 0,578, so i correct below:

More of my philosophy about the network topology in multicores CPUs..

I invite you to look at the following video:

Ring or Mesh, or other? AMD's Future on CPU Connectivity

https://www.youtube.com/watch?v=8teWvMXK99I&t=904s

And i invite you to read the following article:

Does an AMD Chiplet Have a Core Count Limit?

I think i am smart and i say that the above video and the above article
are not so smart, so i will talk about a very important thing, and it is
the following, read the following:

Performance Scalability of a Multi-core Web Server

https://www.researchgate.net/publication/221046211_Performance_scalability_of_a_multi-core_web_server

So notice carefully that it is saying the following:

"..we determined that performance scaling was limited by the capacity of
the address bus, which became saturated on all eight cores. If this key
obstacle is addressed, commercial web server and systems software are
well-positioned to scale to a large number of cores."

So as you notice they were using an Intel Xeon of 8 cores, and the
application was scalable to 8x but the hardware was not scalable to 8x,
since it was scalable only to 4.8x, and this was caused by the bus
saturation, since the Address bus saturation causes poor scaling, and
the Address Bus carries requests and responses for data, called snoops,
and more caches mean more sources and more destinations for snoops that
is causing the poor scaling, so as you notice that a network topology of
a Ring bus or a bus was not sufficient so that to scale to 8x on an
Intel Xeon with 8 cores, so i think that the new architectures like Epyc
CPU and Threadripper CPU can use a faster bus or/and a different network
topology that permits to both ensure a full scalability locally in the
same node and globally between the nodes, so then we can notice that a
sophisticated mesh network topology not only permits to reduce the
number of hops inside the CPU for good latency, but it is also good for
reliability by using its sophisticated redundancy and it is faster than
previous topologies like the ring bus or the bus since
for example the search on address bus becomes parallelized, and it looks
like the internet network that uses mesh topology using routers, so it
parallelizes, and i also think that using a more sophisticated topology
like a mesh network topology is related to queuing theory since we can
notice that in operational research the mathematics says that we can
make the queue like M/M/1 more efficient by making the server more
powerful, but we can notice that
the knee of a M/M/1 queue is around 50% , so we can notice that
by using in a mesh topology like internet or inside a CPU you can
by parallelizing more you can in operational research both enhance the
knee of the queue and the speed of executing the transactions and it is
like using many servers in queuing theory and it permits to scale better
inside a CPU or in internet.

More of my philosophy about Machine programming and about oneAPI from
Intel company..

I will say that when you know C and C++ moderately, it will not be so
difficult to program OpenCL(Read about OpenCL here:
https://en.wikipedia.org/wiki/OpenCL) or CUDA, but the important
question is what is the difference between FPGA and GPU ? so i invite
you to read the following interesting paper about GPU vs FPGA
Performance Comparison:

https://www.bertendsp.com/pdf/whitepaper/BWP001_GPU_vs_FPGA_Performance_Comparison_v1.0.pdf

So i think from this paper above that GPU is the good way when you
want performance and you want too cost efficiency.

So i think that the following oneAPI from Intel company that wants with
it to do all the heavy lifting for you, so you can focus on the
algorithm, rather than on writing OpenCL calls, is not a so smart way of
doing, since as i said above that OpenCL and CUDA programming is not so
difficult, and as you will notice below that oneAPI from Intel permits
you to program FPGA in a higher level manner, but here again from the
paper above we can notice that GPU is the good way when you want
performance and cost efficiency, then so that to approximate well the
efficiency and usefulness of oneAPI from Intel you can still use
efficient and useful libraries.

Here is the new oneAPI from Intel company, read about it:

https://codematters.online/intel-oneapi-faq-part-1-what-is-oneapi/

Click here to read the complete article

Subject	Author
More of philosophy about China and about USA and about Exascale computers..	Amine Moulay Ramdane

Factorials were someone's attempt to make math LOOK exciting.

tech / sci.math / More of philosophy about China and about USA and about Exascale computers..