Message-ID:

"Look! There! Evil!.. pure and simple, total evil from the Eighth Dimension!" -- Buckaroo Banzai

Hello...

More about my inventions of scalable algorithms..

I am a white arab, and i think i am smart since i have also
invented many scalable algorithms and algorithms..

More precision about my new inventions of scalable algorithms..

And look at my below powerful inventions of LW_Fast_RWLockX and Fast_RWLockX that are two powerful scalable RWLocks that are FIFO fair
and Starvation-free and costless on the reader side
(that means with no atomics and with no fences on the reader side), they use sys_membarrier expedited on Linux and FlushProcessWriteBuffers() on windows, and if you look at the source code of my LW_Fast_RWLockX.pas
and Fast_RWLockX.pas inside the zip file, you will notice that in Linux they call two functions that are membarrier1() and membarrier2(), the membarrier1() registers the process's intent to use MEMBARRIER_CMD_PRIVATE_EXPEDITED and membarrier2() executes a memory barrier on each running thread belonging to the same process as the calling thread.

So as you have just noticed he says the following:

"Until today, there is no known efficient reader-writer lock with starvation-freedom guarantees;"

So i think that my above powerful inventions of scalable reader-writer locks are efficient and FIFO fair and Starvation-free.

LW_Fast_RWLockX that is a lightweight scalable Reader-Writer Mutex that uses a technic that looks like Seqlock without looping on the reader side like Seqlock, and this has permitted the reader side to be costless, it is fair and it is of course Starvation-free and it does spin-wait, and also Fast_RWLockX a lightweight scalable Reader-Writer Mutex that uses a technic that looks like Seqlock without looping on the reader side like Seqlock, and this has permitted the reader side to be costless, it is fair and it is of course Starvation-free and it does not spin-wait, but waits on my SemaMonitor, so it is energy efficient.

You can read about them and download them from my website here:

https://sites.google.com/site/scalable68/scalable-rwlock

About the Linux sys_membarrier() expedited and the windows FlushProcessWriteBuffers()..
I have just read the following webpage:
https://lwn.net/Articles/636878/

And it is interesting and it says:
---
Results in liburcu:
Operations in 10s, 6 readers, 2 writers:
memory barriers in reader: 1701557485 reads, 3129842 writes
signal-based scheme: 9825306874 reads, 5386 writes
sys_membarrier expedited: 6637539697 reads, 852129 writes
sys_membarrier non-expedited: 7992076602 reads, 220 writes
---

Look at how "sys_membarrier expedited" is powerful.
Cache-coherency protocols do not use IPIs, and as a user-space level developer you do not care about IPIs at all. One is most interested in the cost of cache-coherency itself. However, Win32 API provides a function that issues IPIs to all processors (in the affinity mask of the current process) FlushProcessWriteBuffers(). You can use it to investigate the cost of IPIs.
When i do simple synthetic test on a dual core machine I've obtained following numbers.
420 cycles is the minimum cost of the FlushProcessWriteBuffers() function on issuing core.
1600 cycles is mean cost of the FlushProcessWriteBuffers() function on issuing core.
1300 cycles is mean cost of the FlushProcessWriteBuffers() function on remote core.
Note that, as far as I understand, the function issues IPI to remote core, then remote core acks it with another IPI, issuing core waits for ack IPI and then returns.
And the IPIs have indirect cost of flushing the processor pipeline.

More about WaitAny() and WaitAll() and more..

Look at the following concurrency abstractions of Microsoft:

https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.waitany?view=netframework-4.8

https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.waitall?view=netframework-4.8

They look like the following WaitForAny() and WaitForAll() of Delphi, here they are:

http://docwiki.embarcadero.com/Libraries/Sydney/en/System.Threading.TTask.WaitForAny

http://docwiki.embarcadero.com/Libraries/Sydney/en/System.Threading.TTask.WaitForAll

So the WaitForAll() is easy and i have implemented it in my Threadpool engine that scales very well and that i have invented, you can read my html tutorial inside The zip file of it to know how to do it, you can download it from my website here:

https://sites.google.com/site/scalable68/an-efficient-threadpool-engine-with-priorities-that-scales-very-well

And about the WaitForAny(), you can also do it using my SemaMonitor,
and i will soon give you an example of how to do it, and you can download my SemaMonitor invention from my website here:

https://sites.google.com/site/scalable68/semacondvar-semamonitor

Here is my other just new software inventions..

I have just looked at the source code of the following multiplatform pevents

https://github.com/neosmart/pevents

And notice that the WaitForMultipleEvents() is implemented with pthread
but it is not scalable on multicores. So i have just invented a WaitForMultipleObjects() that looks like the Windows WaitForMultipleObjects() and that is fully "scalable" on multicores and that works on Windows and Linux and MacOSX and that is blocking when waiting for the objects as WaitForMultipleObjects(), so it doesn't consume CPU cycles when waiting and it works with events and futures and tasks.

Here is my other just new software inventions..

I have just invented a fully "scalable" on multicores latch and a
fully scalable on multicores thread barrier, they are really powerful.

Read about the latches and thread barriers that are not scalable on
multicores of C++ here:

https://www.modernescpp.com/index.php/latches-and-barriers

Here is my other software inventions:

More about my scalable math Linear System Solver Library...

As you have just noticed i have just spoken about my Linear System Solver Library(read below), right now it scales very well, but i will
soon make it "fully" scalable on multicores using one of my scalable algorithm that i have invented and i will extend it much more to also support efficient scalable on multicores matrix operations and more, and since it will come with one of my scalable algorithms that i have invented, i think i will sell it too.

More about mathematics and about scalable Linear System Solver Libraries and more..

I have just noticed that a software architect from Austria
called Michael Rabatscher has designed and implemented MrMath Library that is also a parallelized Library:

Here he is:

https://at.linkedin.com/in/michael-rabatscher-6821702b

And here is his MrMath Library for Delphi and Freepascal:

https://github.com/mikerabat/mrmath

But i think that he is not so smart, and i think i am smart like
a genius and i say that his MrMath Library is not scalable on multicores, and notice that the Linear System Solver of his MrMath Library is not scalable on multicores too, and notice that the threaded matrix operations of his Library are not scalable on multicores too, this is why i have invented a scalable on multicores Conjugate Gradient Linear System Solver Library for C++ and Delphi and Freepascal, and here it is, read about it in my following thoughts(also i will soon extend more my Library to support scalable matrix operations):

About SOR and Conjugate gradient mathematical methods..

I have just looked at SOR(Successive Overrelaxation Method),
and i think it is much less powerful than Conjugate gradient method,
read the following to notice it:

COMPARATIVE PERFORMANCE OF THE CONJUGATE GRADIENT AND SOR METHODS
FOR COMPUTATIONAL THERMAL HYDRAULICS

https://inis.iaea.org/collection/NCLCollectionStore/_Public/19/055/19055644..pdf?r=1&r=1

This is why i have implemented in both C++ and Delphi my Parallel Conjugate Gradient Linear System Solver Library that scales very well, read my following thoughts about it to understand more:

About the convergence properties of the conjugate gradient method

The conjugate gradient method can theoretically be viewed as a direct method, as it produces the exact solution after a finite number of iterations, which is not larger than the size of the matrix, in the absence of round-off error. However, the conjugate gradient method is unstable with respect to even small perturbations, e.g., most directions are not in practice conjugate, and the exact solution is never obtained. Fortunately, the conjugate gradient method can be used as an iterative method as it provides monotonically improving approximations to the exact solution, which may reach the required tolerance after a relatively small (compared to the problem size) number of iterations. The improvement is typically linear and its speed is determined by the condition number κ(A) of the system matrix A: the larger is κ(A), the slower the improvement.

Click here to read the complete article

Subject	Author
More about my inventions of scalable algorithms..	Amine Moulay Ramdane

"Look! There! Evil!.. pure and simple, total evil from the Eighth Dimension!" -- Buckaroo Banzai

devel / comp.arch / More about my inventions of scalable algorithms..