Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  nodelist  faq  login

With your bare hands?!?


programming / comp.lang.asm.x86 / Re: Prologue and epilogue

SubjectAuthor
* Prologue and epilogueantispam
+- Re: Prologue and epilogueJames Harris
`- Re: Prologue and epilogueGeorge Neuner

1
Subject: Prologue and epilogue
From: antis...@nospicedham.math.uni.wroc.pl
Newsgroups: comp.lang.asm.x86
Organization: Politechnika Wroclawska
Date: Fri, 12 Jul 2019 01:15 UTC
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: antis...@nospicedham.math.uni.wroc.pl
Newsgroups: comp.lang.asm.x86
Subject: Prologue and epilogue
Date: Fri, 12 Jul 2019 01:15:45 +0000 (UTC)
Organization: Politechnika Wroclawska
Lines: 63
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <qg8n00$rs$1@z-news.wcss.wroc.pl>
Injection-Info: h2725194.stratoserver.net; posting-host="bb20086070bd96ca54a808306dc85883";
logging-data="5017"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18cPNi4rkwunjl2ljdUjgT2rcIMeXImsRE="
User-Agent: tin/2.4.1-20161224 ("Daill") (UNIX) (Linux/4.19.0 (x86_64))
Cancel-Lock: sha1:TkTP7gbEq7SLkvyuQiGxIc0EWLc=
View all headers
There was recently discussion about speed of various
ways of saving and restoring registers.  I did a little
microbenchmark, saving and restorin 7 registers in
different ways:

a - save and restore using moves in ascending order
b - save using moves in descending order, restore in ascending order
c - save using pushes, restore using moves in ascending order
d - save and restore using pushes and pops

Called function had 17 instructions (two arithmethic instructions
for stack adjustments, 14 save/restore instructions and return).
Version b (with descending stores and ascending loads):

foo1:
   subq  $0x78, %rsp
   movq  %rbp,  0x70(%rsp)
   movq  %r10,  0x68(%rsp)
   movq  %r11,  0x60(%rsp)
   movq  %r12,  0x58(%rsp)
   movq  %r13,  0x50(%rsp)
   movq  %r14,  0x48(%rsp)
   movq  %r15,  0x40(%rsp)

   movq  0x40(%rsp), %r15
   movq  0x48(%rsp), %r14
   movq  0x50(%rsp), %r13
   movq  0x58(%rsp), %r12
   movq  0x60(%rsp), %r11
   movq  0x68(%rsp), %r10
   movq  0x70(%rsp), %rbp
   addq  $0x78, %rsp

Version a had stores in opposite order, vesrsion c replaced stores
by pushed and moved stack adjustment after pushed, version d
additionaly replaced loads by pops and moved stack adjustment
befor pushes.

This function was called from loop consisting of 3 instructions:

  4003d0:       e8 2b 01 00 00          callq  400500 <foo1>
  4003d5:       48 83 eb 01             sub    $0x1,%rbx
  4003d9:       75 f5                   jne    4003d0 <main+0x10>

(that was actually from C code).  So critical loop has 20
instructions and 16 memory transfers (7 data stores, pushing
return address, 7 data load + reading return address).

I tested on 1.7 GHz i5 and on 1.60 GHz Celeron N3060.
On i5 all versions took 8 clock per loop interation with
smal error (less than 2%).  On Celeron versions a, b and c
take each 16 clocks (with very small error).  Version d
needs 18 clocks.  Removing stack adjustments from version d
reduced time to 17 clocks.  So, at least on modern Intel
processors differences between moves and pushes are very
small.

Of course this is very naive benchmark and and only covers
two processor types.

--
                              Waldek Hebisch



Subject: Re: Prologue and epilogue
From: James Harris
Newsgroups: comp.lang.asm.x86
Organization: A noiseless patient Spider
Date: Fri, 12 Jul 2019 11:52 UTC
References: 1
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: james.ha...@nospicedham.gmail.com (James Harris)
Newsgroups: comp.lang.asm.x86
Subject: Re: Prologue and epilogue
Date: Fri, 12 Jul 2019 12:52:19 +0100
Organization: A noiseless patient Spider
Lines: 38
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <qg9s9k$nc4$1@dont-email.me>
References: <qg8n00$rs$1@z-news.wcss.wroc.pl>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: h2725194.stratoserver.net; posting-host="bb20086070bd96ca54a808306dc85883";
logging-data="14892"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+9M6+R6ZkVJuijdB+oZnaG/IQOD9C1BbU="
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101
Thunderbird/60.8.0
Cancel-Lock: sha1:ug3gQR8gopWfD8A9aVI8WzUz/jY=
View all headers
On 12/07/2019 02:15, antispam@nospicedham.math.uni.wroc.pl wrote:
There was recently discussion about speed of various
ways of saving and restoring registers.  I did a little
microbenchmark, saving and restorin 7 registers in
different ways:

a - save and restore using moves in ascending order
b - save using moves in descending order, restore in ascending order
c - save using pushes, restore using moves in ascending order
d - save and restore using pushes and pops

.... tests snipped

So, at least on modern Intel
processors differences between moves and pushes are very
small.

Of course this is very naive benchmark and and only covers
two processor types.

Thanks for posting the info. It's something I would have eventually needed to test.

Of course, moves take up a lot more code space than pushes and pops so until and unless further info is forthcoming your timings suggest that the use of push/pop where possible appears to be a good rule-of-thumb way to go.

As for how push and pop would behave in earlier CPU generations, I haven't checked but I suspect the stack adjustments and transfers inherent in those instructions have been carried out in parallel since long, long ago. Maybe Pentium Pro. Maybe Pentium. Maybe 386. Maybe even earlier.


--
James Harris



Subject: Re: Prologue and epilogue
From: George Neuner
Newsgroups: comp.lang.asm.x86
Organization: A noiseless patient Spider
Date: Sat, 13 Jul 2019 04:09 UTC
References: 1
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: gneun...@nospicedham.comcast.net (George Neuner)
Newsgroups: comp.lang.asm.x86
Subject: Re: Prologue and epilogue
Date: Sat, 13 Jul 2019 00:09:05 -0400
Organization: A noiseless patient Spider
Lines: 14
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <u0miiedj4nhe692ajeq10e7lickuhrafm9@4ax.com>
References: <qg8n00$rs$1@z-news.wcss.wroc.pl>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Info: h2725194.stratoserver.net; posting-host="0ec96458073f74f21824083d7f00ef17";
logging-data="30336"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/NHHL6tuAQimaYYsxHdGZ3tJp+a7z8yTY="
User-Agent: ForteAgent/8.00.32.1272
Cancel-Lock: sha1:okBcpP3+HPrAqcC8Di9hNOmsbLo=
View all headers
On Fri, 12 Jul 2019 01:15:45 +0000 (UTC),
antispam@nospicedham.math.uni.wroc.pl wrote:

--- an interesting test ...

So, at least on modern Intel processors differences between
moves and pushes are very small.

On modern Intel and AMD processors, pushes and pops both are converted
into the equivalent move operations - and they can be executed
out-of-order.

George



1
rocksolid light 0.7.2
clearneti2ptor