x86

History

graph LR
1[Intel \n 8088 Series]
2[Intel Itanium \n IA64 Arch.]
3[AMD \n AMD64 Arch.]
1 --"backward \n incompatible"--> 2
1 --"backward \n compatible" --> 3
3 --> 4[x86-64 Arch.]
2 -. "discontinued" .-> 4

Registers

  • %rsp - stack pointer, which points to the last value pushed on stack.
  • %rbp - frame pointer/base pointer
  • Part of the register can be accessed with a different name. This enables backward compatibility for 32-bit program.
  • Value in %rax is the default return value. (the accumulator)
  • Some instructions related to string operation use %rsi and %rdi
  • Remember to save the state of registers (by pushing and popping stacks)

Watch for the %!

  • AT&T syntax has a % before registers, in src, dstn format. (Used by gcc)
  • Intel syntax has not, in dstn, src format.

Addressing

  • Global value - main
  • Immediate value - $32
  • Register value - %rsp refers to value stored in %rsp
  • Indirect value - (%rsp) refers to value pointed to by %rsp
  • Base-relative - -16(%rcx)
  • Complex - -16(%rbx, %rcx, 8), in which %rbx is the base of array, %rcx is index, 8 is size of the items, -16 is an offset relative to the item selected.

Instructions

Data Movement

  • Move Data
    • MOV{B,W,L,Q} %rbx %rax (MOVE source, destination)
    • Stands for BYTE (8 bits), WORD (16 bits), LONG (32 bits), QUADWORD (64 bits)
  • Load effective address LEA

Arithmetic

  • Add
  • Multiply
    • IMUL takes operand, multiply by the value in %rax
    • Leaves the low 64 bits in %rax, and high 64 bits in %rdx
  • Division
    • IDIV takes 128-bit integer in %rdx %rax, divide it by operand
    • Quotient is placed in %rax, remainder in %rdx
  • IDIV and IMUL (signed div/mul) are more hardware-optimized
  • Use CQO to sign-extend %rax to %rdx (for quad word)

Conditional

  • CMPQ, comparison instruction puts result into EFLAGS register. All comparisons are done at the same time?
  • JLE, JL, JEQ, JNE, JGE, JG

Function Call

32-bit system uses stack calling convention:

PUSH %rip
J f
POP %rip

64-bit system uses register calling convention — System V ABI Calling Convention

name: .string "Peter"
 
LEAQ str, %rdi # LEAQ instead of MOVQ

Function Definition

int compute(int a, int b) {
    return a + b * 3;
}
.global compute
compute:
    MOVQ  %rdi, %rbx   # %rbx should be saved
    MOVQ  %rsi, %rax   # %rax is scratch
    MOVQ  $3,   %rcx   # %rcx is scatch
    IMULQ %rcx         # implicitly implies %rax * %rcx
    ADDQ  %rbx, %rax
    RET

Nest-able Functions

.compute:
    PUSH %rbp
    MOVQ %rsp, %rbp
 
    PUSH %rdi
    PUSH %rsi
    SUBQ $24, %rsp # 3 local vars of 8 bytes
 
    PUSH %rbx
    ... # Save other registers
 
    MOVQ - 8(%rbp), %r10 # arg0
    MOVQ -16(%rbp), %r11 # arg1
    MOVQ $3, %r12
    MOVQ %r11, %rax
    IMULQ %r12
    ADDQ %r10, %rax
    MOVQ %rax, -24(%rbx) # put results into a local var
 
    POP ... $ pop scratch regs
    # local vars and args automatically bypassed
    MOVQ %rbp, $rsp # go to the caller
    POP %rbp
    RET
  • First to instructions are idioms which can be found in the start of functions of any languages.
  • Save the arguments
  • Save space for local variables — so that it’s easier to reference them
  • Save the local variables (which are already on the stack)

Program Elements

  • Directives, begin with a .
    • .file
    • .data
    • .text
  • Labels
    • Name them starting with . is ideal. These are meant to be put in the data section.
    • .global main, to make main visible to outer scope for it to be called.

Debugging

  • Compile with -d flag
  • stepi to go through individual instructions
  • Use print on $rax, $eflags, $rip, or info registers to view all registers