Assembly Fundamentals

Core concepts for x86-64 assembly programming.

Reference

The Mental Model

Assembly is the human-readable form of machine code. Every high-level language compiles down to this. Understanding assembly means understanding:

Registers - CPU’s fast storage (like variables, but fixed and few)
Memory - RAM accessed by address (stack, heap, data sections)
Instructions - Operations the CPU executes one at a time
Flags - Single-bit results of operations (zero, negative, overflow)

High-Level Code          Assembly              Machine Code
─────────────────────────────────────────────────────────────
int x = 5;        →      mov eax, 5      →     B8 05 00 00 00
x = x + 3;        →      add eax, 3      →     83 C0 03
if (x == 8)       →      cmp eax, 8      →     83 F8 08
                         je  equal       →     74 XX

x86-64 General Purpose Registers

64-bit	32-bit	16-bit	8-bit	Primary Purpose
RAX	EAX	AX	AL	Accumulator, return values, multiply/divide
RBX	EBX	BX	BL	Base register (callee-saved)
RCX	ECX	CX	CL	Counter for loops, 4th argument
RDX	EDX	DX	DL	Data, I/O, 3rd argument, multiply high bits
RSI	ESI	SI	SIL	Source index, 2nd argument
RDI	EDI	DI	DIL	Destination index, 1st argument
RBP	EBP	BP	BPL	Base pointer (stack frame)
RSP	ESP	SP	SPL	Stack pointer (top of stack)
R8	R8D	R8W	R8B	5th argument
R9	R9D	R9W	R9B	6th argument
R10	R10D	R10W	R10B	Temporary (caller-saved)
R11	R11D	R11W	R11B	Temporary (caller-saved)
R12	R12D	R12W	R12B	Callee-saved
R13	R13D	R13W	R13B	Callee-saved
R14	R14D	R14W	R14B	Callee-saved
R15	R15D	R15W	R15B	Callee-saved

64-bit

32-bit

16-bit

8-bit

Primary Purpose

RAX

EAX

Accumulator, return values, multiply/divide

RBX

EBX

Base register (callee-saved)

RCX

ECX

Counter for loops, 4th argument

RDX

EDX

Data, I/O, 3rd argument, multiply high bits

RSI

ESI

SIL

Source index, 2nd argument

RDI

EDI

DIL

Destination index, 1st argument

RBP

EBP

BPL

Base pointer (stack frame)

RSP

ESP

SPL

Stack pointer (top of stack)

R8D

R8W

R8B

5th argument

R9D

R9W

R9B

6th argument

R10

R10D

R10W

R10B

Temporary (caller-saved)

R11

R11D

R11W

R11B

Temporary (caller-saved)

R12

R12D

R12W

R12B

Callee-saved

R13

R13D

R13W

R13B

Callee-saved

R14

R14D

R14W

R14B

Callee-saved

R15

R15D

R15W

R15B

Callee-saved

Key insight: Using EAX (32-bit) automatically zeros the upper 32 bits of RAX. Using AX (16-bit) does NOT zero upper bits. This matters for security (info leaks).

; Register size examples
mov rax, 0x123456789ABCDEF0   ; 64-bit: RAX = 0x123456789ABCDEF0
mov eax, 0x11111111           ; 32-bit: RAX = 0x0000000011111111 (upper zeroed!)
mov ax, 0x2222                ; 16-bit: RAX = 0x0000000011112222 (upper preserved)
mov al, 0x33                  ;  8-bit: RAX = 0x0000000011112233 (upper preserved)

Special Registers

; RIP - Instruction Pointer (Program Counter)
; Points to the NEXT instruction to execute
; You cannot mov to RIP directly; jumps/calls change it

; RFLAGS - Status Flags Register
; Individual bits set by operations:
;   ZF (Zero Flag)     - Set if result is zero
;   SF (Sign Flag)     - Set if result is negative (MSB = 1)
;   CF (Carry Flag)    - Set on unsigned overflow
;   OF (Overflow Flag) - Set on signed overflow
;   PF (Parity Flag)   - Set if low byte has even number of 1s

; Example: How flags work
mov eax, 5
sub eax, 5          ; Result is 0
                    ; ZF=1 (zero), SF=0 (not negative), CF=0, OF=0

mov eax, 0
sub eax, 1          ; Result is -1 (0xFFFFFFFF)
                    ; ZF=0, SF=1 (negative), CF=1 (borrow), OF=0

mov al, 127
add al, 1           ; Result is 128 (0x80)
                    ; ZF=0, SF=1, CF=0, OF=1 (signed overflow: 127+1 != 128)

Flag	When It’s Set
ZF	Result equals zero
SF	Result’s MSB (most significant bit) is 1 (negative in signed)
CF	Unsigned operation overflowed/underflowed (carry out of MSB)
OF	Signed operation overflowed (result wrong sign)

Flag

When It’s Set

Result equals zero

Result’s MSB (most significant bit) is 1 (negative in signed)

Unsigned operation overflowed/underflowed (carry out of MSB)

Signed operation overflowed (result wrong sign)

Process Memory Layout

High Address (0x7FFF...)
┌─────────────────────────┐
│        Stack            │  ← RSP points here (grows DOWN)
│    (local variables,    │
│     return addresses)   │
├─────────────────────────┤
│          ↓              │
│     (free space)        │
│          ↑              │
├─────────────────────────┤
│        Heap             │  ← malloc() allocates here (grows UP)
│   (dynamic allocation)  │
├─────────────────────────┤
│   BSS (uninitialized)   │  ← Global/static vars initialized to 0
├─────────────────────────┤
│   Data (initialized)    │  ← Global/static vars with initial values
├─────────────────────────┤
│   Text (code)           │  ← Your compiled instructions (read-only)
└─────────────────────────┘
Low Address (0x400000...)

# See actual memory layout of a running process
cat /proc/self/maps

# Example output (simplified):
# 00400000-00401000 r-xp  /bin/cat     ← Text (executable)
# 00601000-00602000 rw-p  /bin/cat     ← Data
# 7f8a12000000-7f8a12021000 rw-p       ← Heap
# 7ffc9e800000-7ffc9e821000 rw-p       ← Stack

Memory Addressing Modes

; IMMEDIATE - Value is in the instruction itself
mov eax, 42             ; eax = 42 (decimal)
mov eax, 0x2A           ; eax = 42 (hex)
mov eax, 0b101010       ; eax = 42 (binary)

; REGISTER - Value is in a register
mov eax, ebx            ; eax = ebx

; DIRECT - Address is literal in instruction
mov eax, [0x601000]     ; eax = value at memory address 0x601000

; REGISTER INDIRECT - Address is in a register
mov eax, [rbx]          ; eax = value at address stored in rbx

; DISPLACEMENT - Register + offset
mov eax, [rbx + 8]      ; eax = value at (rbx + 8)
mov eax, [rbp - 4]      ; eax = local variable (stack)

; INDEXED - Base + (index * scale)
mov eax, [rbx + rcx*4]  ; Array access: rbx=base, rcx=index, 4=element size

; FULL FORM - Base + (index * scale) + displacement
mov eax, [rbx + rcx*4 + 16]  ; struct array: arr[i].field

Array Access Pattern:

// C code
int arr[10];
int x = arr[3];

// Assembly equivalent
; Assume arr base address in rbx
mov eax, [rbx + 3*4]    ; 3 = index, 4 = sizeof(int)
; Or with index in register:
mov ecx, 3
mov eax, [rbx + rcx*4]

Data Sizes and Suffixes

Size	Bytes	Intel Suffix	AT&T Suffix
Byte	1	BYTE PTR	b (movb)
Word	2	WORD PTR	w (movw)
Double Word	4	DWORD PTR	l (movl)
Quad Word	8	QWORD PTR	q (movq)

Size

Bytes

Intel Suffix

AT&T Suffix

Byte

BYTE PTR

b (movb)

Word

WORD PTR

w (movw)

Double Word

DWORD PTR

l (movl)

Quad Word

QWORD PTR

q (movq)

; Intel syntax (NASM, used in this guide)
mov BYTE PTR [rax], 0x41      ; Store 1 byte (ASCII 'A')
mov WORD PTR [rax], 0x4142    ; Store 2 bytes
mov DWORD PTR [rax], 0x41424344 ; Store 4 bytes
mov QWORD PTR [rax], rbx      ; Store 8 bytes

; AT&T syntax (GAS, objdump default) - source, destination reversed!
movb $0x41, (%rax)            ; Same as BYTE PTR above
movl $0x41424344, (%rax)      ; Same as DWORD PTR above

CRITICAL: Intel vs AT&T Syntax

Intel (NASM):     mov  destination, source     ; dst = src
AT&T (GAS):       movl source, destination     ; src → dst

Intel:  mov eax, [rbx + rcx*4 + 8]
AT&T:   movl 8(%rbx,%rcx,4), %eax

Use objdump -M intel to get Intel syntax from disassembly.

Essential Instructions

; DATA MOVEMENT
mov rax, rbx        ; rax = rbx (copy)
mov rax, [rbx]      ; rax = *rbx (load from memory)
mov [rax], rbx      ; *rax = rbx (store to memory)
lea rax, [rbx + 8]  ; rax = rbx + 8 (address calculation, NOT memory access)
xchg rax, rbx       ; Swap rax and rbx

; LEA is powerful for math without memory access:
lea rax, [rbx + rcx*2]  ; rax = rbx + rcx*2 (no memory read!)
lea rax, [rax + rax*4]  ; rax = rax * 5 (multiply by 5)
lea rax, [rax*8 + rax]  ; rax = rax * 9

; ZEROING (these are equivalent but different sizes)
xor eax, eax        ; eax = 0 (smallest encoding, preferred)
mov eax, 0          ; eax = 0 (5 bytes)
sub eax, eax        ; eax = 0 (also works)

; SIGN/ZERO EXTENSION
movzx eax, BYTE PTR [rbx]   ; Zero-extend byte to 32-bit
movsx eax, BYTE PTR [rbx]   ; Sign-extend byte to 32-bit
movsxd rax, DWORD PTR [rbx] ; Sign-extend 32-bit to 64-bit
cdqe                        ; Sign-extend EAX to RAX

; STACK OPERATIONS
push rax            ; RSP -= 8; [RSP] = RAX
pop rax             ; RAX = [RSP]; RSP += 8
push QWORD PTR [rbx] ; Push value from memory

Endianness (Little-Endian)

x86 is little-endian: least significant byte at lowest address.

Value: 0x12345678

Memory Address:    0x100  0x101  0x102  0x103
                   ┌──────┬──────┬──────┬──────┐
Little-Endian:     │ 0x78 │ 0x56 │ 0x34 │ 0x12 │  ← x86
                   └──────┴──────┴──────┴──────┘
                   (LSB)                 (MSB)

Big-Endian:        │ 0x12 │ 0x34 │ 0x56 │ 0x78 │  ← Network byte order
                   (MSB)                 (LSB)

; Demonstrating endianness
mov DWORD PTR [rax], 0x41424344  ; Store "DCBA" (reversed!)

; In memory:
; [rax+0] = 0x44 ('D')
; [rax+1] = 0x43 ('C')
; [rax+2] = 0x42 ('B')
; [rax+3] = 0x41 ('A')

Why it matters: - Strings appear "backwards" in memory dumps - Network protocols use big-endian (need htonl/ntohl) - When debugging, read hex dumps right-to-left for multi-byte values

Two’s Complement (Signed Integers)

The Math: To negate a number, invert all bits and add 1.

8-bit examples:

 127 = 0111 1111  (largest positive)
   1 = 0000 0001
   0 = 0000 0000
  -1 = 1111 1111  (invert 1 → 1111 1110, add 1 → 1111 1111)
  -2 = 1111 1110
-128 = 1000 0000  (most negative)

Key insight: Same bit pattern, different interpretation
0xFF = 255 (unsigned) = -1 (signed)
0x80 = 128 (unsigned) = -128 (signed)

; CPU doesn't know if you mean signed or unsigned
; YOU choose by which instructions/jumps you use

mov al, 0xFF        ; Is this 255 or -1? Depends on context.

; Unsigned comparison
cmp al, 0           ; Compare
ja label            ; Jump if Above (unsigned: 255 > 0, jumps)

; Signed comparison
cmp al, 0           ; Compare
jg label            ; Jump if Greater (signed: -1 < 0, doesn't jump)

Overflow Examples:

; Signed overflow (OF flag)
mov al, 127         ; Maximum positive signed byte
add al, 1           ; Result: 128 (0x80) = -128 signed
                    ; OF=1 (overflow), SF=1 (negative), CF=0

; Unsigned overflow (CF flag)
mov al, 255         ; Maximum unsigned byte
add al, 1           ; Result: 0 (wrapped around)
                    ; CF=1 (carry), ZF=1 (zero), OF=0

Common Gotchas

; WRONG: Expecting 64-bit operation with 32-bit registers
mov eax, -1         ; EAX = 0xFFFFFFFF, but RAX = 0x00000000FFFFFFFF
                    ; (32-bit mov zero-extends to 64-bit!)

; CORRECT: Use full 64-bit register for signed values
mov rax, -1         ; RAX = 0xFFFFFFFFFFFFFFFF

; WRONG: Forgetting memory requires size specifier
mov [rax], 5        ; Error: How many bytes? 1? 4? 8?

; CORRECT: Specify size
mov DWORD PTR [rax], 5  ; Store 4 bytes

; WRONG: Using LEA like MOV
lea rax, [rbx]      ; Works but wasteful, just use mov rax, rbx

; CORRECT: LEA is for address calculation
lea rax, [rbx + rcx*4 + 8]  ; Complex address math in one instruction

; WRONG: Forgetting little-endian in string comparisons
mov eax, "ABCD"     ; Actually stores as 0x44434241 = "DCBA" reversed!

; WRONG: Assuming register preservation across calls
mov rbx, important_value
call some_function
; RBX is STILL preserved (callee-saved)
; But RAX, RCX, RDX, RSI, RDI, R8-R11 may be DESTROYED

Compile and Examine

# Write simple C to see assembly
cat << 'EOF' > /tmp/test.c
int add(int a, int b) {
    return a + b;
}

int main() {
    int x = 5;
    int y = 3;
    int z = add(x, y);
    return z;
}
EOF

# Compile with debug info, no optimization
gcc -g -O0 -o /tmp/test /tmp/test.c

# Disassemble with Intel syntax
objdump -d -M intel /tmp/test | grep -A20 '<add>:'

# Example output:
# 0000000000001129 <add>:
#     1129:   push   rbp              ; Save old base pointer
#     112a:   mov    rbp,rsp          ; Set up stack frame
#     112d:   mov    DWORD PTR [rbp-0x4],edi  ; Store first arg (a)
#     1131:   mov    DWORD PTR [rbp-0x8],esi  ; Store second arg (b)
#     1135:   mov    edx,DWORD PTR [rbp-0x4]  ; Load a
#     1138:   mov    eax,DWORD PTR [rbp-0x8]  ; Load b
#     113b:   add    eax,edx          ; eax = a + b
#     113d:   pop    rbp              ; Restore base pointer
#     113e:   ret                     ; Return (result in eax)

# Compile with optimization to see efficient code
gcc -O2 -o /tmp/test_opt /tmp/test.c
objdump -d -M intel /tmp/test_opt | grep -A5 '<add>:'

# Optimized output:
# <add>:
#     lea    eax,[rdi+rsi*1]   ; Single instruction! eax = edi + esi
#     ret

Practice Exercises

; Exercise 1: What's in RAX after each instruction?
mov rax, 0x123456789ABCDEF0
mov eax, 0x11111111      ; RAX = ?  (Answer: 0x0000000011111111)
mov ax, 0x2222           ; RAX = ?  (Answer: 0x0000000011112222)
mov al, 0x33             ; RAX = ?  (Answer: 0x0000000011112233)

; Exercise 2: Calculate the effective address
; Given: RBX = 0x1000, RCX = 5
lea rax, [rbx + rcx*4 + 16]  ; RAX = ?  (Answer: 0x1000 + 20 + 16 = 0x1024)

; Exercise 3: What flags are set?
mov al, 0x7F             ; 127 decimal
add al, 1                ; Result = 0x80 (128)
; ZF = ?  (Answer: 0 - result is not zero)
; SF = ?  (Answer: 1 - MSB is 1)
; OF = ?  (Answer: 1 - signed overflow: 127 + 1 ≠ 128 in signed)
; CF = ?  (Answer: 0 - no unsigned overflow)

; Exercise 4: Memory layout
; At address 0x1000, we store: mov DWORD PTR [0x1000], 0x41424344
; What byte is at each address?
; 0x1000 = ?  (Answer: 0x44 = 'D')
; 0x1001 = ?  (Answer: 0x43 = 'C')
; 0x1002 = ?  (Answer: 0x42 = 'B')
; 0x1003 = ?  (Answer: 0x41 = 'A')