Rocksolid Light

Welcome to novaBBS (click a section below)

mail  files  register  nodelist  faq  login

The reason computer chips are so small is computers don't eat much.


programming / comp.lang.asm.x86 / Re: I need to decode x86 machine language for control flow

SubjectAuthor
* I need to decode x86 machine language for control flow instructionsolcott
`* Re: I need to decode x86 machine language for control flowRick C. Hodgin
 `* Re: I need to decode x86 machine language for control flowolcott
  `* Re: I need to decode x86 machine language for control flowRick C. Hodgin
   `* Re: I need to decode x86 machine language for control flowolcott
    +* Re: I need to decode x86 machine language for control flowRick C. Hodgin
    |+- Re: I need to decode x86 machine language for control flowolcott
    |`- Re: I need to decode x86 machine language for control flowolcott
    `* Re: I need to decode x86 machine language for control flowAndrew Cooper
     `* Re: I need to decode x86 machine language for control flowolcott
      +- Re: I need to decode x86 machine language for control flowAndrew Cooper
      +* Re: I need to decode x86 machine language for control flowAndrew Cooper
      |+* Re: I need to decode x86 machine language for control flowolcott
      ||`- Re: I need to decode x86 machine language for control flowRick C. Hodgin
      |`- Re: I need to decode x86 machine language for control flowRick C. Hodgin
      `- Re: I need to decode x86 machine language for control flowwolfgang kern

1
Subject: I need to decode x86 machine language for control flow instructions
From: olcott
Newsgroups: comp.lang.asm.x86
Organization: A noiseless patient Spider
Date: Tue, 24 Nov 2020 21:29 UTC
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: NoO...@nospicedham.NoWhere.com (olcott)
Newsgroups: comp.lang.asm.x86
Subject: I need to decode x86 machine language for control flow instructions
Date: Tue, 24 Nov 2020 15:29:27 -0600
Organization: A noiseless patient Spider
Lines: 32
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <z5ydnab1J-au5iDCnZ2dnUU7-KPNnZ2d@giganews.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="3a14f184046c04543b4e21f51aeecde9";
logging-data="19338"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+CD6llG0XaQRihq/kJfe1M8iLbAAfXBc0="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.5.0
Cancel-Lock: sha1:xKqN0LR5cMgXlJkKk4VZWTIiI8o=
View all headers
I need to know ALL of the numerical values for every aspect of x86 control flow machine language bytes so that I can fully decode all of these bytes.

I posted some good online documentation (see below) yet some of the numerical values are not listed in this documentation.

I don't understand exactly what numeric values that I need to look for
cb, cw, cd, cp, iw,

I don't know what this means: /2, /3, /4, /5

Jump if Condition Is Met
https://c9x.me/x86/html/file_module_x86_id_146.html

Jump
https://c9x.me/x86/html/file_module_x86_id_147.html

Call Procedure
https://c9x.me/x86/html/file_module_x86_id_26.html

Return from Procedure
https://c9x.me/x86/html/file_module_x86_id_280.html



--
Copyright 2020 Pete Olcott

"Great spirits have always encountered violent opposition from mediocre minds." Einstein



Subject: Re: I need to decode x86 machine language for control flow
From: Rick C. Hodgin
Newsgroups: comp.lang.asm.x86
Organization: Liberty Software Foundation
Date: Tue, 24 Nov 2020 21:46 UTC
References: 1
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: rick.c.h...@nospicedham.gmail.com (Rick C. Hodgin)
Newsgroups: comp.lang.asm.x86
Subject: Re: I need to decode x86 machine language for control flow
Date: Tue, 24 Nov 2020 16:46:51 -0500
Organization: Liberty Software Foundation
Lines: 44
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <rpjv0c$oh8$1@dont-email.me>
References: <z5ydnab1J-au5iDCnZ2dnUU7-KPNnZ2d@giganews.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="3a14f184046c04543b4e21f51aeecde9";
logging-data="29075"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/HVZDAsrSP7N7CrlnM5F9g+lqZrHeOBrM="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:Tn7ojPtNRF3/9XISiTtAAmtdyvg=
View all headers
On 11/24/20 4:29 PM, olcott wrote:
I need to know ALL of the numerical values for every aspect of x86 control flow machine language bytes so that I can fully decode all of these bytes.

There's also INT* instructions, and IRET* instructions.  Later CPUs also have a fast call to access the core OS functions.

I posted some good online documentation (see below) yet some of the numerical values are not listed in this documentation.

I don't understand exactly what numeric values that I need to look for
cb, cw, cd, cp, iw,

I don't know what this means: /2, /3, /4, /5

Jump if Condition Is Met
https://c9x.me/x86/html/file_module_x86_id_146.html

Jump
https://c9x.me/x86/html/file_module_x86_id_147.html

Call Procedure
https://c9x.me/x86/html/file_module_x86_id_26.html

Return from Procedure
https://c9x.me/x86/html/file_module_x86_id_280.html

x86 encodes opcode bits in the primary opcode, plus some additional bits in the Mod/Reg/RM and SIB bits.

If you search online for an older IA-32 manual set (Pentium or 486) you'll find a small manual that will get you the flow control instructions and describe how the x86 opcode decoding unit operates.

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf

Specifically, the Volume 2: Instruction Set Reference on page 31.  You can see samples of the Intel instruction encoding syntax on pages 42 and the CMC example.

--
Rick C. Hodgin



Subject: Re: I need to decode x86 machine language for control flow
From: olcott
Newsgroups: comp.lang.asm.x86
Organization: A noiseless patient Spider
Date: Wed, 25 Nov 2020 05:40 UTC
References: 1 2
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: NoO...@nospicedham.NoWhere.com (olcott)
Newsgroups: comp.lang.asm.x86
Subject: Re: I need to decode x86 machine language for control flow
Date: Tue, 24 Nov 2020 23:40:43 -0600
Organization: A noiseless patient Spider
Lines: 72
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <DvydnaiWvOnKcyDCnZ2dnUU7-VednZ2d@giganews.com>
References: <z5ydnab1J-au5iDCnZ2dnUU7-KPNnZ2d@giganews.com>
<rpjv0c$oh8$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="4f3ebb8cc63ef343ede044b15adf6a07";
logging-data="10139"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX194hhl05/zdX/OCx+9agwZ6Ue1NC/Y14q4="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.5.0
Cancel-Lock: sha1:AnhnX+SZXfk9sAVRyPjHlcaBnx8=
View all headers
On 11/24/2020 3:46 PM, Rick C. Hodgin wrote:
On 11/24/20 4:29 PM, olcott wrote:
I need to know ALL of the numerical values for every aspect of x86 control flow machine language bytes so that I can fully decode all of these bytes.

There's also INT* instructions, and IRET* instructions.  Later CPUs also have a fast call to access the core OS functions.

I posted some good online documentation (see below) yet some of the numerical values are not listed in this documentation.

I don't understand exactly what numeric values that I need to look for
cb, cw, cd, cp, iw,

I don't know what this means: /2, /3, /4, /5

Jump if Condition Is Met
https://c9x.me/x86/html/file_module_x86_id_146.html

Jump
https://c9x.me/x86/html/file_module_x86_id_147.html

Call Procedure
https://c9x.me/x86/html/file_module_x86_id_26.html

Return from Procedure
https://c9x.me/x86/html/file_module_x86_id_280.html

x86 encodes opcode bits in the primary opcode, plus some additional bits in the Mod/Reg/RM and SIB bits.

If you search online for an older IA-32 manual set (Pentium or 486) you'll find a small manual that will get you the flow control instructions and describe how the x86 opcode decoding unit operates.

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf

Specifically, the Volume 2: Instruction Set Reference on page 31.  You can see samples of the Intel instruction encoding syntax on pages 42 and the CMC example.


That was very helpful I almost have what I need. I found a little inconsistency:

The text seems to indicate that the SIB BYTE alway follows a MODR/M byte:

     Certain encodings of the ModR/M byte require a second
     addressing byte, the SIB byte,

Yet the instructions below only match Table 2-3 and do not seem to have a MODR/M BYTE

                 01 010 101 55
                 00 010 101 15
                 11 010 001 D1

[0000057a](03)  ff5508              call dword [ebp+08]
[00000589](06)  ff150b020000        call dword [0000020b]
[0000059d](02)  ffd1                call ecx

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Table 2-3. 32-Bit Addressing Forms with the SIB Byte  (Page 2-7)


--
Copyright 2020 Pete Olcott

"Great spirits have always encountered violent opposition from mediocre minds." Einstein



Subject: Re: I need to decode x86 machine language for control flow
From: Rick C. Hodgin
Newsgroups: comp.lang.asm.x86
Organization: Liberty Software Foundation
Date: Wed, 25 Nov 2020 13:30 UTC
References: 1 2 3
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: rick.c.h...@nospicedham.gmail.com (Rick C. Hodgin)
Newsgroups: comp.lang.asm.x86
Subject: Re: I need to decode x86 machine language for control flow
Date: Wed, 25 Nov 2020 08:30:34 -0500
Organization: Liberty Software Foundation
Lines: 88
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <rplm9r$hi9$1@dont-email.me>
References: <z5ydnab1J-au5iDCnZ2dnUU7-KPNnZ2d@giganews.com>
<rpjv0c$oh8$1@dont-email.me> <DvydnaiWvOnKcyDCnZ2dnUU7-VednZ2d@giganews.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="4f3ebb8cc63ef343ede044b15adf6a07";
logging-data="24110"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+iC4DqzcuyfFilBw8GVLEQI6vyQl0DLVc="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:Xh5zhAsrvPGMUU/7f1/gNbr0NZE=
View all headers
On 11/25/20 12:40 AM, olcott wrote:
On 11/24/2020 3:46 PM, Rick C. Hodgin wrote:
On 11/24/20 4:29 PM, olcott wrote:
I need to know ALL of the numerical values for every aspect of x86 control flow machine language bytes so that I can fully decode all of these bytes.

There's also INT* instructions, and IRET* instructions.  Later CPUs also have a fast call to access the core OS functions.

I posted some good online documentation (see below) yet some of the numerical values are not listed in this documentation.

I don't understand exactly what numeric values that I need to look for
cb, cw, cd, cp, iw,

I don't know what this means: /2, /3, /4, /5

Jump if Condition Is Met
https://c9x.me/x86/html/file_module_x86_id_146.html

Jump
https://c9x.me/x86/html/file_module_x86_id_147.html

Call Procedure
https://c9x.me/x86/html/file_module_x86_id_26.html

Return from Procedure
https://c9x.me/x86/html/file_module_x86_id_280.html

x86 encodes opcode bits in the primary opcode, plus some additional bits in the Mod/Reg/RM and SIB bits.

If you search online for an older IA-32 manual set (Pentium or 486) you'll find a small manual that will get you the flow control instructions and describe how the x86 opcode decoding unit operates.

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf

Specifically, the Volume 2: Instruction Set Reference on page 31.  You can see samples of the Intel instruction encoding syntax on pages 42 and the CMC example.

That was very helpful I almost have what I need. I found a little inconsistency:

The text seems to indicate that the SIB BYTE alway follows a MODR/M byte:

     Certain encodings of the ModR/M byte require a second
     addressing byte, the SIB byte,

Yet the instructions below only match Table 2-3 and do not seem to have a MODR/M BYTE

                 01 010 101 55
                 00 010 101 15
                 11 010 001 D1

[0000057a](03)  ff5508              call dword [ebp+08]
[00000589](06)  ff150b020000        call dword [0000020b]
[0000059d](02)  ffd1                call ecx

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Table 2-3. 32-Bit Addressing Forms with the SIB Byte  (Page 2-7)

The SIB byte is indicated by the mode the CPU's in, or prefix override bytes, and is signaled by a special encoding in the Mod/Reg/RM byte. See page 36 and the reference to the encoding in the Mod/Reg/RM byte.

x86 is a little hairy to decode.  But, it does follow rules.  If you get a handle on how they work, you can decode anything.

Prefix bytes override defaults for the current cpu mode
Opcode bytes drive the base instruction
Mod/Reg/RM bytes convey follow-on information, including signaling if a SIB follows
SIB is optional
Offset data is optional
Immediate data is optional

It can be a maximum of 16 bytes long before signaling a fault.  And when you get into AMD64 mode, they have REX prefixes which remove some opcodes from their 32-bit meanings, and add additional bits and flags making the Reg/RM portions be 4 bits (16 registers instead of 8).

--
Rick C. Hodgin



Subject: Re: I need to decode x86 machine language for control flow
From: olcott
Newsgroups: comp.lang.asm.x86
Organization: A noiseless patient Spider
Date: Wed, 25 Nov 2020 14:41 UTC
References: 1 2 3 4
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: NoO...@nospicedham.NoWhere.com (olcott)
Newsgroups: comp.lang.asm.x86
Subject: Re: I need to decode x86 machine language for control flow
Date: Wed, 25 Nov 2020 08:41:23 -0600
Organization: A noiseless patient Spider
Lines: 98
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <udGdnVZLVPyT8CPCnZ2dnUU7-S_NnZ2d@giganews.com>
References: <z5ydnab1J-au5iDCnZ2dnUU7-KPNnZ2d@giganews.com>
<rpjv0c$oh8$1@dont-email.me> <DvydnaiWvOnKcyDCnZ2dnUU7-VednZ2d@giganews.com>
<rplm9r$hi9$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="4f3ebb8cc63ef343ede044b15adf6a07";
logging-data="19651"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/fxpRLH5jrKRhoIoKR/QiNE/WSJEGfa9c="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.5.0
Cancel-Lock: sha1:OARpzWKBA9k8KhfacRsQLeBxxZ0=
View all headers
On 11/25/2020 7:30 AM, Rick C. Hodgin wrote:
On 11/25/20 12:40 AM, olcott wrote:
On 11/24/2020 3:46 PM, Rick C. Hodgin wrote:
On 11/24/20 4:29 PM, olcott wrote:
I need to know ALL of the numerical values for every aspect of x86 control flow machine language bytes so that I can fully decode all of these bytes.

There's also INT* instructions, and IRET* instructions.  Later CPUs also have a fast call to access the core OS functions.

I posted some good online documentation (see below) yet some of the numerical values are not listed in this documentation.

I don't understand exactly what numeric values that I need to look for
cb, cw, cd, cp, iw,

I don't know what this means: /2, /3, /4, /5

Jump if Condition Is Met
https://c9x.me/x86/html/file_module_x86_id_146.html

Jump
https://c9x.me/x86/html/file_module_x86_id_147.html

Call Procedure
https://c9x.me/x86/html/file_module_x86_id_26.html

Return from Procedure
https://c9x.me/x86/html/file_module_x86_id_280.html

x86 encodes opcode bits in the primary opcode, plus some additional bits in the Mod/Reg/RM and SIB bits.

If you search online for an older IA-32 manual set (Pentium or 486) you'll find a small manual that will get you the flow control instructions and describe how the x86 opcode decoding unit operates.

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf

Specifically, the Volume 2: Instruction Set Reference on page 31.  You can see samples of the Intel instruction encoding syntax on pages 42 and the CMC example.

That was very helpful I almost have what I need. I found a little inconsistency:

The text seems to indicate that the SIB BYTE alway follows a MODR/M byte:

     Certain encodings of the ModR/M byte require a second
     addressing byte, the SIB byte,

Yet the instructions below only match Table 2-3 and do not seem to have a MODR/M BYTE

                 01 010 101 55
                 00 010 101 15
                 11 010 001 D1

[0000057a](03)  ff5508              call dword [ebp+08]
[00000589](06)  ff150b020000        call dword [0000020b]
[0000059d](02)  ffd1                call ecx

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Table 2-3. 32-Bit Addressing Forms with the SIB Byte  (Page 2-7)

The SIB byte is indicated by the mode the CPU's in, or prefix override bytes, and is signaled by a special encoding in the Mod/Reg/RM byte. See page 36 and the reference to the encoding in the Mod/Reg/RM byte.

x86 is a little hairy to decode.  But, it does follow rules.  If you get a handle on how they work, you can decode anything.

Prefix bytes override defaults for the current cpu mode
Opcode bytes drive the base instruction
Mod/Reg/RM bytes convey follow-on information, including signaling if a SIB follows
SIB is optional
Offset data is optional
Immediate data is optional

It can be a maximum of 16 bytes long before signaling a fault.  And when you get into AMD64 mode, they have REX prefixes which remove some opcodes from their 32-bit meanings, and add additional bits and flags making the Reg/RM portions be 4 bits (16 registers instead of 8).


My issue is that the handbook seems to say that you can't have a SIB byte unless you have a MODR/M byte preceding it. The three call instruction listed above do have an SIB byte (55,15,D1) without a MODR/M byte preceding it. How can this be, is the handbook wrong?

--
Copyright 2020 Pete Olcott

"Great spirits have always encountered violent opposition from mediocre minds." Einstein



Subject: Re: I need to decode x86 machine language for control flow
From: Rick C. Hodgin
Newsgroups: comp.lang.asm.x86
Organization: Liberty Software Foundation
Date: Wed, 25 Nov 2020 15:02 UTC
References: 1 2 3 4 5
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: rick.c.h...@nospicedham.gmail.com (Rick C. Hodgin)
Newsgroups: comp.lang.asm.x86
Subject: Re: I need to decode x86 machine language for control flow
Date: Wed, 25 Nov 2020 10:02:04 -0500
Organization: Liberty Software Foundation
Lines: 34
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <rplrle$rh6$1@dont-email.me>
References: <z5ydnab1J-au5iDCnZ2dnUU7-KPNnZ2d@giganews.com>
<rpjv0c$oh8$1@dont-email.me> <DvydnaiWvOnKcyDCnZ2dnUU7-VednZ2d@giganews.com>
<rplm9r$hi9$1@dont-email.me> <udGdnVZLVPyT8CPCnZ2dnUU7-S_NnZ2d@giganews.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="4f3ebb8cc63ef343ede044b15adf6a07";
logging-data="1518"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+D1gdTgM11fc6YApvE5ogAAqtVhsFDKAE="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:GvxqZAQwEtLncXsIXN8x2JXZXe0=
View all headers
On 11/25/20 9:41 AM, olcott wrote:
On 11/25/2020 7:30 AM, Rick C. Hodgin wrote:
On 11/25/20 12:40 AM, olcott wrote:
Yet the instructions below only match Table 2-3 and do not seem to have a MODR/M BYTE

                 01 010 101 55
                 00 010 101 15
                 11 010 001 D1

[0000057a](03)  ff5508              call dword [ebp+08]
[00000589](06)  ff150b020000        call dword [0000020b]
[0000059d](02)  ffd1                call ecx

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Table 2-3. 32-Bit Addressing Forms with the SIB Byte  (Page 2-7)

The SIB byte is indicated by the mode the CPU's in, or prefix override bytes, and is signaled by a special encoding in the Mod/Reg/RM byte. See page 36 and the reference to the encoding in the Mod/Reg/RM byte.

My issue is that the handbook seems to say that you can't have a SIB byte unless you have a MODR/M byte preceding it. The three call instruction listed above do have an SIB byte (55,15,D1) without a MODR/M byte preceding it. How can this be, is the handbook wrong?

Look at the encoding for the call instruction with the 0xff format on page 93.  It shows that there is a Mod/Reg/RM byte there in byte two, and then the immediate data afterwards in the first two, and no other data following in the call ecx instruction.

--
Rick C. Hodgin



Subject: Re: I need to decode x86 machine language for control flow
From: olcott
Newsgroups: comp.lang.asm.x86
Organization: A noiseless patient Spider
Date: Wed, 25 Nov 2020 15:34 UTC
References: 1 2 3 4 5 6
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: NoO...@nospicedham.NoWhere.com (olcott)
Newsgroups: comp.lang.asm.x86
Subject: Re: I need to decode x86 machine language for control flow
Date: Wed, 25 Nov 2020 09:34:05 -0600
Organization: A noiseless patient Spider
Lines: 46
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <kN6dnUWO__z65CPCnZ2dnUU7-VHNnZ2d@giganews.com>
References: <z5ydnab1J-au5iDCnZ2dnUU7-KPNnZ2d@giganews.com>
<rpjv0c$oh8$1@dont-email.me> <DvydnaiWvOnKcyDCnZ2dnUU7-VednZ2d@giganews.com>
<rplm9r$hi9$1@dont-email.me> <udGdnVZLVPyT8CPCnZ2dnUU7-S_NnZ2d@giganews.com>
<rplrle$rh6$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="4f3ebb8cc63ef343ede044b15adf6a07";
logging-data="14802"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/akcdnbo8Kxh4RB41CbFadn0G0g6kui3Q="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.5.0
Cancel-Lock: sha1:uB0kexI6Cg0PHoNLO6l4XvF8bNU=
View all headers
On 11/25/2020 9:02 AM, Rick C. Hodgin wrote:
On 11/25/20 9:41 AM, olcott wrote:
On 11/25/2020 7:30 AM, Rick C. Hodgin wrote:
On 11/25/20 12:40 AM, olcott wrote:
Yet the instructions below only match Table 2-3 and do not seem to have a MODR/M BYTE

                 01 010 101 55
                 00 010 101 15
                 11 010 001 D1

[0000057a](03)  ff5508              call dword [ebp+08]
[00000589](06)  ff150b020000        call dword [0000020b]
[0000059d](02)  ffd1                call ecx

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Table 2-3. 32-Bit Addressing Forms with the SIB Byte  (Page 2-7)

The SIB byte is indicated by the mode the CPU's in, or prefix override bytes, and is signaled by a special encoding in the Mod/Reg/RM byte. See page 36 and the reference to the encoding in the Mod/Reg/RM byte.

My issue is that the handbook seems to say that you can't have a SIB byte unless you have a MODR/M byte preceding it. The three call instruction listed above do have an SIB byte (55,15,D1) without a MODR/M byte preceding it. How can this be, is the handbook wrong?

Look at the encoding for the call instruction with the 0xff format on page 93.  It shows that there is a Mod/Reg/RM byte there in byte two, and then the immediate data afterwards in the first two, and no other data following in the call ecx instruction.


That is really weird.
Last night this table was giving me the wrong decode values for the above call instructions and this morning they are correct:
Table 2-2. 32-Bit Addressing Forms with the ModR/M Byte


--
Copyright 2020 Pete Olcott

"Great spirits have always encountered violent opposition from mediocre minds." Einstein



Subject: Re: I need to decode x86 machine language for control flow
From: Andrew Cooper
Newsgroups: comp.lang.asm.x86
Organization: Student-Run Computing Facility (SRCF)
Date: Wed, 25 Nov 2020 15:07 UTC
References: 1 2 3 4 5
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: and...@nospicedham.cam.ac.uk (Andrew Cooper)
Newsgroups: comp.lang.asm.x86
Subject: Re: I need to decode x86 machine language for control flow
Date: Wed, 25 Nov 2020 15:07:22 +0000
Organization: Student-Run Computing Facility (SRCF)
Lines: 102
Sender: amc96@news.srcf.net
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <rplrva$q3a$1@flame.srcf.net>
References: <z5ydnab1J-au5iDCnZ2dnUU7-KPNnZ2d@giganews.com>
<rpjv0c$oh8$1@dont-email.me> <DvydnaiWvOnKcyDCnZ2dnUU7-VednZ2d@giganews.com>
<rplm9r$hi9$1@dont-email.me> <udGdnVZLVPyT8CPCnZ2dnUU7-S_NnZ2d@giganews.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="4f3ebb8cc63ef343ede044b15adf6a07";
logging-data="31871"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX197Z+4lC+O/DYwfJ9ndDyeGuvZfIqGN8XU="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:iAGiAFV962QcHlOGbjGwZJKw/hw=
View all headers
On 25/11/2020 14:41, olcott wrote:
On 11/25/2020 7:30 AM, Rick C. Hodgin wrote:
On 11/25/20 12:40 AM, olcott wrote:
On 11/24/2020 3:46 PM, Rick C. Hodgin wrote:
On 11/24/20 4:29 PM, olcott wrote:
I need to know ALL of the numerical values for every aspect of x86
control flow machine language bytes so that I can fully decode all
of these bytes.

There's also INT* instructions, and IRET* instructions.  Later CPUs
also have a fast call to access the core OS functions.

I posted some good online documentation (see below) yet some of the
numerical values are not listed in this documentation.

I don't understand exactly what numeric values that I need to look for
cb, cw, cd, cp, iw,

I don't know what this means: /2, /3, /4, /5

Jump if Condition Is Met
https://c9x.me/x86/html/file_module_x86_id_146.html

Jump
https://c9x.me/x86/html/file_module_x86_id_147.html

Call Procedure
https://c9x.me/x86/html/file_module_x86_id_26.html

Return from Procedure
https://c9x.me/x86/html/file_module_x86_id_280.html

x86 encodes opcode bits in the primary opcode, plus some additional
bits in the Mod/Reg/RM and SIB bits.

If you search online for an older IA-32 manual set (Pentium or 486)
you'll find a small manual that will get you the flow control
instructions and describe how the x86 opcode decoding unit operates.

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf

Specifically, the Volume 2: Instruction Set Reference on page 31. 
You can see samples of the Intel instruction encoding syntax on
pages 42 and the CMC example.

That was very helpful I almost have what I need. I found a little
inconsistency:

The text seems to indicate that the SIB BYTE alway follows a MODR/M
byte:

     Certain encodings of the ModR/M byte require a second
     addressing byte, the SIB byte,

Yet the instructions below only match Table 2-3 and do not seem to
have a MODR/M BYTE

                 01 010 101 55
                 00 010 101 15
                 11 010 001 D1

[0000057a](03)  ff5508              call dword [ebp+08]
[00000589](06)  ff150b020000        call dword [0000020b]
[0000059d](02)  ffd1                call ecx

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Table 2-3. 32-Bit Addressing Forms with the SIB Byte  (Page 2-7)

The SIB byte is indicated by the mode the CPU's in, or prefix override
bytes, and is signaled by a special encoding in the Mod/Reg/RM byte.
See page 36 and the reference to the encoding in the Mod/Reg/RM byte.

x86 is a little hairy to decode.  But, it does follow rules.  If you
get a handle on how they work, you can decode anything.

Prefix bytes override defaults for the current cpu mode
Opcode bytes drive the base instruction
Mod/Reg/RM bytes convey follow-on information, including signaling if
a SIB follows
SIB is optional
Offset data is optional
Immediate data is optional

It can be a maximum of 16 bytes long before signaling a fault.  And
when you get into AMD64 mode, they have REX prefixes which remove some
opcodes from their 32-bit meanings, and add additional bits and flags
making the Reg/RM portions be 4 bits (16 registers instead of 8).


My issue is that the handbook seems to say that you can't have a SIB
byte unless you have a MODR/M byte preceding it. The three call
instruction listed above do have an SIB byte (55,15,D1) without a MODR/M
byte preceding it. How can this be, is the handbook wrong?

A ModRM byte is optional, determined by the opcode.
A SIB byte is optional, determined by the ModRM encoding.

SIB is generally only used for the more complicated memory addressing
modes.  Simple instructions tend not to use them.

~Andrew



Subject: Re: I need to decode x86 machine language for control flow
From: olcott
Newsgroups: comp.lang.asm.x86
Organization: A noiseless patient Spider
Date: Wed, 25 Nov 2020 20:33 UTC
References: 1 2 3 4 5 6
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: NoO...@nospicedham.NoWhere.com (olcott)
Newsgroups: comp.lang.asm.x86
Subject: Re: I need to decode x86 machine language for control flow
Date: Wed, 25 Nov 2020 14:33:31 -0600
Organization: A noiseless patient Spider
Lines: 118
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <2tGdnUu3fNULIiPCnZ2dnUU7-UfNnZ2d@giganews.com>
References: <z5ydnab1J-au5iDCnZ2dnUU7-KPNnZ2d@giganews.com>
<rpjv0c$oh8$1@dont-email.me> <DvydnaiWvOnKcyDCnZ2dnUU7-VednZ2d@giganews.com>
<rplm9r$hi9$1@dont-email.me> <udGdnVZLVPyT8CPCnZ2dnUU7-S_NnZ2d@giganews.com>
<rplrva$q3a$1@flame.srcf.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="4f3ebb8cc63ef343ede044b15adf6a07";
logging-data="19329"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+SmwWS1WbtKg2mUdNKca5kR6eYsC93Otc="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.5.0
Cancel-Lock: sha1:2kcsex0iFzgum/fqTeDLfj4JRv4=
View all headers
On 11/25/2020 9:07 AM, Andrew Cooper wrote:
On 25/11/2020 14:41, olcott wrote:
On 11/25/2020 7:30 AM, Rick C. Hodgin wrote:
On 11/25/20 12:40 AM, olcott wrote:
On 11/24/2020 3:46 PM, Rick C. Hodgin wrote:
On 11/24/20 4:29 PM, olcott wrote:
I need to know ALL of the numerical values for every aspect of x86
control flow machine language bytes so that I can fully decode all
of these bytes.

There's also INT* instructions, and IRET* instructions.  Later CPUs
also have a fast call to access the core OS functions.

I posted some good online documentation (see below) yet some of the
numerical values are not listed in this documentation.

I don't understand exactly what numeric values that I need to look for
cb, cw, cd, cp, iw,

I don't know what this means: /2, /3, /4, /5

Jump if Condition Is Met
https://c9x.me/x86/html/file_module_x86_id_146.html

Jump
https://c9x.me/x86/html/file_module_x86_id_147.html

Call Procedure
https://c9x.me/x86/html/file_module_x86_id_26.html

Return from Procedure
https://c9x.me/x86/html/file_module_x86_id_280.html

x86 encodes opcode bits in the primary opcode, plus some additional
bits in the Mod/Reg/RM and SIB bits.

If you search online for an older IA-32 manual set (Pentium or 486)
you'll find a small manual that will get you the flow control
instructions and describe how the x86 opcode decoding unit operates.

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf

Specifically, the Volume 2: Instruction Set Reference on page 31.
You can see samples of the Intel instruction encoding syntax on
pages 42 and the CMC example.

That was very helpful I almost have what I need. I found a little
inconsistency:

The text seems to indicate that the SIB BYTE alway follows a MODR/M
byte:

      Certain encodings of the ModR/M byte require a second
      addressing byte, the SIB byte,

Yet the instructions below only match Table 2-3 and do not seem to
have a MODR/M BYTE

                  01 010 101 55
                  00 010 101 15
                  11 010 001 D1

[0000057a](03)  ff5508              call dword [ebp+08]
[00000589](06)  ff150b020000        call dword [0000020b]
[0000059d](02)  ffd1                call ecx

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Table 2-3. 32-Bit Addressing Forms with the SIB Byte  (Page 2-7)

The SIB byte is indicated by the mode the CPU's in, or prefix override
bytes, and is signaled by a special encoding in the Mod/Reg/RM byte.
See page 36 and the reference to the encoding in the Mod/Reg/RM byte.

x86 is a little hairy to decode.  But, it does follow rules.  If you
get a handle on how they work, you can decode anything.

Prefix bytes override defaults for the current cpu mode
Opcode bytes drive the base instruction
Mod/Reg/RM bytes convey follow-on information, including signaling if
a SIB follows
SIB is optional
Offset data is optional
Immediate data is optional

It can be a maximum of 16 bytes long before signaling a fault.  And
when you get into AMD64 mode, they have REX prefixes which remove some
opcodes from their 32-bit meanings, and add additional bits and flags
making the Reg/RM portions be 4 bits (16 registers instead of 8).


My issue is that the handbook seems to say that you can't have a SIB
byte unless you have a MODR/M byte preceding it. The three call
instruction listed above do have an SIB byte (55,15,D1) without a MODR/M
byte preceding it. How can this be, is the handbook wrong?

A ModRM byte is optional, determined by the opcode.
A SIB byte is optional, determined by the ModRM encoding.

SIB is generally only used for the more complicated memory addressing
modes.  Simple instructions tend not to use them.

~Andrew


I can't understand the specified notational conventions so that I know to look for a ModRM byte a SIB byte both or neither.

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
We can use page 3-53 pdf(93) CALL for one of concrete examples.



--
Copyright 2020 Pete Olcott

"Great spirits have always encountered violent opposition from mediocre minds." Einstein



Subject: Re: I need to decode x86 machine language for control flow
From: Andrew Cooper
Newsgroups: comp.lang.asm.x86
Organization: Student-Run Computing Facility (SRCF)
Date: Wed, 25 Nov 2020 21:18 UTC
References: 1 2 3 4 5 6 7
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: and...@nospicedham.cam.ac.uk (Andrew Cooper)
Newsgroups: comp.lang.asm.x86
Subject: Re: I need to decode x86 machine language for control flow
Date: Wed, 25 Nov 2020 21:18:33 +0000
Organization: Student-Run Computing Facility (SRCF)
Lines: 54
Sender: amc96@news.srcf.net
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <b27f2617-49da-e87d-6689-c44d6fa0b16a@cam.ac.uk>
References: <z5ydnab1J-au5iDCnZ2dnUU7-KPNnZ2d@giganews.com>
<rpjv0c$oh8$1@dont-email.me> <DvydnaiWvOnKcyDCnZ2dnUU7-VednZ2d@giganews.com>
<rplm9r$hi9$1@dont-email.me> <udGdnVZLVPyT8CPCnZ2dnUU7-S_NnZ2d@giganews.com>
<rplrva$q3a$1@flame.srcf.net> <2tGdnUu3fNULIiPCnZ2dnUU7-UfNnZ2d@giganews.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="4f3ebb8cc63ef343ede044b15adf6a07";
logging-data="6600"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/7bTW6HL8amN6TdALpe9/1niT2KOzPCU4="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:UNeuPU5oP075XivIFKM3TH0qGfI=
View all headers
On 25/11/2020 20:33, olcott wrote:
On 11/25/2020 9:07 AM, Andrew Cooper wrote:
I can't understand the specified notational conventions so that I know
to look for a ModRM byte a SIB byte both or neither.

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
We can use page 3-53 pdf(93) CALL for one of concrete examples.

(As a tangent, that is an exceedingly obsolete version of the spec.  You
can obtain up-to-date ones from https://intel.com/sdm/ or
https://developer.amd.com/resources/developer-guides-manuals/ but it
doesn't matter for this specific purpose.)

As for `CALL indirect`, the general form looks like this:

FF <modrm>

There is *always* a ModRM byte.  You know this because you know the
opcode FF always has one.

From the examples given previously:

FF D1 - CALL ecx
The ModRM byte is D1, encoding just ecx.

FF 55 08 - CALL DWORD [ebp + 8]
The ModRM byte is 55, encoding ebp, and also that a single byte
immediate (08) follows.

FF 15 0b020000 - CALL DWORD [0000020b]
The ModRM byte is 15, encoding a memory address composed only of a
4-byte immediate value.

If you want to start getting into SIB, then you want the more
complicated addressing modes, such as:

FF 14 03 - CALL DWORD [ebx + eax]
The ModRM byte is 14 (doesn't encode any registers, but does encode a
presence of SIB), and the SIB byte is 03, encoding a base of ebx, index
of eax, and a scale of 1.


In all cases, the value of the ModRM byte controls what follows.  It
might be nothing, or it might be an immediate (1, 2 or 4 bytes), and/or
there might be a SIB.

An example using both might be:

FF 54 03 08 - CALL DWORD [ebx + eax + 8]
The ModRM byte is 54, encoding both a SIB byte (03) and a 1-byte
immediate (08) to follow.

~Andrew



Subject: Re: I need to decode x86 machine language for control flow
From: Andrew Cooper
Newsgroups: comp.lang.asm.x86
Organization: Student-Run Computing Facility (SRCF)
Date: Wed, 25 Nov 2020 21:19 UTC
References: 1 2 3 4 5 6 7
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: and...@nospicedham.cam.ac.uk (Andrew Cooper)
Newsgroups: comp.lang.asm.x86
Subject: Re: I need to decode x86 machine language for control flow
Date: Wed, 25 Nov 2020 21:19:15 +0000
Organization: Student-Run Computing Facility (SRCF)
Lines: 54
Sender: amc96@news.srcf.net
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <rpmhoj$ud1$2@flame.srcf.net>
References: <z5ydnab1J-au5iDCnZ2dnUU7-KPNnZ2d@giganews.com>
<rpjv0c$oh8$1@dont-email.me> <DvydnaiWvOnKcyDCnZ2dnUU7-VednZ2d@giganews.com>
<rplm9r$hi9$1@dont-email.me> <udGdnVZLVPyT8CPCnZ2dnUU7-S_NnZ2d@giganews.com>
<rplrva$q3a$1@flame.srcf.net> <2tGdnUu3fNULIiPCnZ2dnUU7-UfNnZ2d@giganews.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="4f3ebb8cc63ef343ede044b15adf6a07";
logging-data="6624"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Z1GQ5bnErDS3JlOjlf6sl6qeQ/8Tzirc="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:ley6RI5ktktSHPAE1nZlYG/XRTU=
View all headers
On 25/11/2020 20:33, olcott wrote:
On 11/25/2020 9:07 AM, Andrew Cooper wrote:
I can't understand the specified notational conventions so that I know
to look for a ModRM byte a SIB byte both or neither.

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
We can use page 3-53 pdf(93) CALL for one of concrete examples.

(As a tangent, that is an exceedingly obsolete version of the spec.  You
can obtain up-to-date ones from https://intel.com/sdm/ or
https://developer.amd.com/resources/developer-guides-manuals/ but it
doesn't matter for this specific purpose.)

As for `CALL indirect`, the general form looks like this:

FF <modrm>

There is *always* a ModRM byte.  You know this because you know the
opcode FF always has one.

From the examples given previously:

FF D1 - CALL ecx
The ModRM byte is D1, encoding just ecx.

FF 55 08 - CALL DWORD [ebp + 8]
The ModRM byte is 55, encoding ebp, and also that a single byte
immediate (08) follows.

FF 15 0b020000 - CALL DWORD [0000020b]
The ModRM byte is 15, encoding a memory address composed only of a
4-byte immediate value.

If you want to start getting into SIB, then you want the more
complicated addressing modes, such as:

FF 14 03 - CALL DWORD [ebx + eax]
The ModRM byte is 14 (doesn't encode any registers, but does encode a
presence of SIB), and the SIB byte is 03, encoding a base of ebx, index
of eax, and a scale of 1.


In all cases, the value of the ModRM byte controls what follows.  It
might be nothing, or it might be an immediate (1, 2 or 4 bytes), and/or
there might be a SIB.

An example using both might be:

FF 54 03 08 - CALL DWORD [ebx + eax + 8]
The ModRM byte is 54, encoding both a SIB byte (03) and a 1-byte
immediate (08) to follow.

~Andrew



Subject: Re: I need to decode x86 machine language for control flow
From: olcott
Newsgroups: comp.lang.asm.x86
Organization: A noiseless patient Spider
Date: Wed, 25 Nov 2020 22:18 UTC
References: 1 2 3 4 5 6
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: NoO...@nospicedham.NoWhere.com (olcott)
Newsgroups: comp.lang.asm.x86
Subject: Re: I need to decode x86 machine language for control flow
Date: Wed, 25 Nov 2020 16:18:43 -0600
Organization: A noiseless patient Spider
Lines: 44
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <p6CdnffvEIGgRSPCnZ2dnUU7-cmdnZ2d@giganews.com>
References: <z5ydnab1J-au5iDCnZ2dnUU7-KPNnZ2d@giganews.com>
<rpjv0c$oh8$1@dont-email.me> <DvydnaiWvOnKcyDCnZ2dnUU7-VednZ2d@giganews.com>
<rplm9r$hi9$1@dont-email.me> <udGdnVZLVPyT8CPCnZ2dnUU7-S_NnZ2d@giganews.com>
<rplrle$rh6$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="4f3ebb8cc63ef343ede044b15adf6a07";
logging-data="30906"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/6kmRLb4S0n763htvgYPY1PcdOagzCSTc="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.5.0
Cancel-Lock: sha1:remHURUBG2dTyAfF2aI72M2i7uA=
View all headers
On 11/25/2020 9:02 AM, Rick C. Hodgin wrote:
On 11/25/20 9:41 AM, olcott wrote:
On 11/25/2020 7:30 AM, Rick C. Hodgin wrote:
On 11/25/20 12:40 AM, olcott wrote:
Yet the instructions below only match Table 2-3 and do not seem to have a MODR/M BYTE

                 01 010 101 55
                 00 010 101 15
                 11 010 001 D1

[0000057a](03)  ff5508              call dword [ebp+08]
[00000589](06)  ff150b020000        call dword [0000020b]
[0000059d](02)  ffd1                call ecx

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
Table 2-3. 32-Bit Addressing Forms with the SIB Byte  (Page 2-7)

The SIB byte is indicated by the mode the CPU's in, or prefix override bytes, and is signaled by a special encoding in the Mod/Reg/RM byte. See page 36 and the reference to the encoding in the Mod/Reg/RM byte.

My issue is that the handbook seems to say that you can't have a SIB byte unless you have a MODR/M byte preceding it. The three call instruction listed above do have an SIB byte (55,15,D1) without a MODR/M byte preceding it. How can this be, is the handbook wrong?

Look at the encoding for the call instruction with the 0xff format on page 93.  It shows that there is a Mod/Reg/RM byte there in byte two,

I can't see how it says that. I don't understand the codes.

and then the immediate data afterwards in the first two, and no other data following in the call ecx instruction.



--
Copyright 2020 Pete Olcott

"Great spirits have always encountered violent opposition from mediocre minds." Einstein



Subject: Re: I need to decode x86 machine language for control flow
From: Rick C. Hodgin
Newsgroups: comp.lang.asm.x86
Organization: Liberty Software Foundation
Date: Wed, 25 Nov 2020 23:13 UTC
References: 1 2 3 4 5 6 7 8 9
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: rick.c.h...@nospicedham.gmail.com (Rick C. Hodgin)
Newsgroups: comp.lang.asm.x86
Subject: Re: I need to decode x86 machine language for control flow
Date: Wed, 25 Nov 2020 18:13:35 -0500
Organization: Liberty Software Foundation
Lines: 85
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <rpmof0$gnd$1@dont-email.me>
References: <z5ydnab1J-au5iDCnZ2dnUU7-KPNnZ2d@giganews.com>
<rpjv0c$oh8$1@dont-email.me> <DvydnaiWvOnKcyDCnZ2dnUU7-VednZ2d@giganews.com>
<rplm9r$hi9$1@dont-email.me> <udGdnVZLVPyT8CPCnZ2dnUU7-S_NnZ2d@giganews.com>
<rplrva$q3a$1@flame.srcf.net> <2tGdnUu3fNULIiPCnZ2dnUU7-UfNnZ2d@giganews.com>
<rpmhoj$ud1$2@flame.srcf.net> <LM-dnfhJTcijSiPCnZ2dnUU7-fvNnZ2d@giganews.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="0f00861eb5f9607b5f89b8d711026c02";
logging-data="20645"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/JnRjg5LgJkNzSj6hkQRyiF7WmtCSAiAE="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:uotzWC9qq82mrwtKPem0mCR3XAk=
View all headers
On 11/25/20 5:14 PM, olcott wrote:
As for `CALL indirect`, the general form looks like this:

FF <modrm>

There is *always* a ModRM byte.  You know this because you know the
opcode FF always has one.

 From the examples given previously:

FF D1 - CALL ecx
The ModRM byte is D1, encoding just ecx.

FF 55 08 - CALL DWORD [ebp + 8]
The ModRM byte is 55, encoding ebp, and also that a single byte
immediate (08) follows.

FF 15 0b020000 - CALL DWORD [0000020b]
The ModRM byte is 15, encoding a memory address composed only of a
4-byte immediate value.

If you want to start getting into SIB, then you want the more
complicated addressing modes, such as:

FF 14 03 - CALL DWORD [ebx + eax]
The ModRM byte is 14 (doesn't encode any registers, but does encode a
presence of SIB), and the SIB byte is 03, encoding a base of ebx, index
of eax, and a scale of 1.


In all cases, the value of the ModRM byte controls what follows.  It
might be nothing, or it might be an immediate (1, 2 or 4 bytes), and/or
there might be a SIB.

An example using both might be:

FF 54 03 08 - CALL DWORD [ebx + eax + 8]
The ModRM byte is 54, encoding both a SIB byte (03) and a 1-byte
immediate (08) to follow.

~Andrew


I need to know how to decode this to understand that it whether or not to look for a ModRM byte a SIB byte both or neither for all instructions.

FF /2 CALL r/m16 Call near, absolute indirect, address given in r/m16
FF /2 CALL r/m32 Call near, absolute indirect, address given in r/m32

Is the /2 somehow supposed to tell us this?

It's part of the opcode.  It overflows from the 8 bits in the first byte into those 3 bits from the Mod/Reg/RM byte.

On page 93 it shows two separate encodings for the 0xff opcode.  The first is the /2, and the second is the /3.  That means you'll find the bit pattern 010 in the Reg bits.

If you look on page 42, it says:  "/digit - A digit between 0 and 7 indicates that the Mod/Reg/RM byte of the instruction uses only the RM (register or memory) operand.  The Reg field contains the digit that provides an extension to the instruction's opcode."

In cases where you see the "/r" encoding, that means the Reg field actually does contain a register.  This would be for two-operand sources, like "mov eax,ebx".  In that case, it would use both the Reg and RM components to indicate the two registers.  In the case of "call ecx" you're only using one register, so the Reg bits are opened up as not being in use, and the x86 designers decided to use those bits to allow for additional encodings.

For the CALL instruction, they added /2 and /3, which yields two completely different call operations, such as call r/m32, and call m16:16, or call m16:32 depending on which mode you're in, either natively, or due to override prefixes.

Fast-forward to page 186, and you see the DEC instruction uses the 0xff /1 encoding, meaning the same 0xff opcode, but the /1 indicates it's not a CALL instruction, but rather a DEC instruction.

Make sense?

--
Rick C. Hodgin



Subject: Re: I need to decode x86 machine language for control flow
From: Rick C. Hodgin
Newsgroups: comp.lang.asm.x86
Organization: Liberty Software Foundation
Date: Wed, 25 Nov 2020 23:17 UTC
References: 1 2 3 4 5 6 7 8
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: rick.c.h...@nospicedham.gmail.com (Rick C. Hodgin)
Newsgroups: comp.lang.asm.x86
Subject: Re: I need to decode x86 machine language for control flow
Date: Wed, 25 Nov 2020 18:17:11 -0500
Organization: Liberty Software Foundation
Lines: 29
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <rpmolp$i1f$1@dont-email.me>
References: <z5ydnab1J-au5iDCnZ2dnUU7-KPNnZ2d@giganews.com>
<rpjv0c$oh8$1@dont-email.me> <DvydnaiWvOnKcyDCnZ2dnUU7-VednZ2d@giganews.com>
<rplm9r$hi9$1@dont-email.me> <udGdnVZLVPyT8CPCnZ2dnUU7-S_NnZ2d@giganews.com>
<rplrva$q3a$1@flame.srcf.net> <2tGdnUu3fNULIiPCnZ2dnUU7-UfNnZ2d@giganews.com>
<rpmhoj$ud1$2@flame.srcf.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: reader02.eternal-september.org; posting-host="0f00861eb5f9607b5f89b8d711026c02";
logging-data="20661"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18PgjcwP0C/ZJ2GWg+eJYGzkvibaMAL5iA="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
Thunderbird/68.10.0
Cancel-Lock: sha1:BWXAlAa1bS7jdUG+6cA13gXw1IE=
View all headers
On 11/25/20 4:19 PM, Andrew Cooper wrote:
On 25/11/2020 20:33, olcott wrote:
https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
We can use page 3-53 pdf(93) CALL for one of concrete examples.

(As a tangent, that is an exceedingly obsolete version of the spec.  You
can obtain up-to-date ones from https://intel.com/sdm/ or
https://developer.amd.com/resources/developer-guides-manuals/ but it
doesn't matter for this specific purpose.)

I gave him that reference because it's an old and simple x86 ISA manual.   The newer ones have that information as well, but they're also convoluted with 64-bit support, new instructions, different operating modes, etc.

When you're learning, I find it easiest to start with something simple. In fact, the MASM 6.x Reference manual that came with MASM 6.1 when I bought it was my introduction to 80386 programming.  I wrote my first assemblers, debuggers, and compilers there.  I have different colored highlighter markers all the way through it for where I would code and test and validate the various instructions.

I just thought it would be easier to start out with something easier, like what existed back in the 90s before 64-bit, before virtualization, before extended ISAs (beyond early SIMD anyway).

--
Rick C. Hodgin



Subject: Re: I need to decode x86 machine language for control flow
From: wolfgang kern
Newsgroups: comp.lang.asm.x86
Organization: KESYS-development
Date: Thu, 26 Nov 2020 13:48 UTC
References: 1 2 3 4 5 6 7
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!suck
Funky-Path: eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: nowh...@nospicedham.never.at (wolfgang kern)
Newsgroups: comp.lang.asm.x86
Subject: Re: I need to decode x86 machine language for control flow
Date: Thu, 26 Nov 2020 14:48:01 +0100
Organization: KESYS-development
Lines: 39
Approved: fbkotler@myfairpoint.net - comp.lang.asm.x86 moderation team.
Message-ID: <rpobnq$sh1$1@gioia.aioe.org>
References: <z5ydnab1J-au5iDCnZ2dnUU7-KPNnZ2d@giganews.com>
<rpjv0c$oh8$1@dont-email.me> <DvydnaiWvOnKcyDCnZ2dnUU7-VednZ2d@giganews.com>
<rplm9r$hi9$1@dont-email.me> <udGdnVZLVPyT8CPCnZ2dnUU7-S_NnZ2d@giganews.com>
<rplrva$q3a$1@flame.srcf.net> <2tGdnUu3fNULIiPCnZ2dnUU7-UfNnZ2d@giganews.com>
Reply-To: nowhere@never.at
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Funky-Injection-Info: reader02.eternal-september.org; posting-host="0f00861eb5f9607b5f89b8d711026c02";
logging-data="13420"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19rDMhH7h9bOdpqJGRQEg8OQ4r4ub+3OIY="
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101
Thunderbird/78.5.0
Cancel-Lock: sha1:Jx7OObTbMM+R8F77OB5sRPMJ0m4=
Funky-Xref: reader02.eternal-september.org comp.lang.asm.x86:12356
View all headers
On 25.11.2020 21:33, olcott wrote:

I can't understand the specified notational conventions so that I know to look for a ModRM byte a SIB byte both or neither.

https://www.cs.cmu.edu/~410/doc/intel-isr.pdf
We can use page 3-53 pdf(93) CALL for one of concrete examples.

look at Appendix B1 for instruction format first.

so what can't you understand here:

E8 xx xx         ;call near signed relative 16bit to next (+3)
E8 xx xx xx xx   ;call near signed relative 32bit to next (+5)

FF /2 CALL r/m16 Call near, absolute indirect, address given in r/m16
FF /2 CALL r/m32 Call near, absolute indirect, address given in r/m32

this /2 mean the 3 bits |7|6|x|x|x|2|1|0| are 010 for a CALL absolute

9A xx xx:xx xx
9A xx xx xx xx:xx xx

this two CALL FAR by immediate value pointer SEG:Offset

FF /3 CALL m16:16 Call far, absolute indirect, address given in m16:16
FF /3 CALL m16:32 Call far, absolute indirect, address given in m16:32

this /3 mean the 3 bits |7|6|x|x|x|2|1|0| are 011 for FAR CALL absolute

the mode bits:          |x|x|5|3|3|2|1|0| ;mode 3 is always by register
the RM bits:            |7|6|5|4|3|x|x|x| ;RM4 may* mean an SIB follows

*SIB bytes can only ne used in 32 bit addressing modes,

hth,
__
wolfgang



1
rocksolid light 0.7.2
clearneti2ptor