Assembly Fundamentals
Core concepts for x86-64 assembly programming.
The Mental Model
Assembly is the human-readable form of machine code. Every high-level language compiles down to this. Understanding assembly means understanding:
-
Registers - CPU’s fast storage (like variables, but fixed and few)
-
Memory - RAM accessed by address (stack, heap, data sections)
-
Instructions - Operations the CPU executes one at a time
-
Flags - Single-bit results of operations (zero, negative, overflow)
High-Level Code Assembly Machine Code
─────────────────────────────────────────────────────────────
int x = 5; → mov eax, 5 → B8 05 00 00 00
x = x + 3; → add eax, 3 → 83 C0 03
if (x == 8) → cmp eax, 8 → 83 F8 08
je equal → 74 XX
x86-64 General Purpose Registers
| 64-bit | 32-bit | 16-bit | 8-bit | Primary Purpose |
|---|---|---|---|---|
RAX |
EAX |
AX |
AL |
Accumulator, return values, multiply/divide |
RBX |
EBX |
BX |
BL |
Base register (callee-saved) |
RCX |
ECX |
CX |
CL |
Counter for loops, 4th argument |
RDX |
EDX |
DX |
DL |
Data, I/O, 3rd argument, multiply high bits |
RSI |
ESI |
SI |
SIL |
Source index, 2nd argument |
RDI |
EDI |
DI |
DIL |
Destination index, 1st argument |
RBP |
EBP |
BP |
BPL |
Base pointer (stack frame) |
RSP |
ESP |
SP |
SPL |
Stack pointer (top of stack) |
R8 |
R8D |
R8W |
R8B |
5th argument |
R9 |
R9D |
R9W |
R9B |
6th argument |
R10 |
R10D |
R10W |
R10B |
Temporary (caller-saved) |
R11 |
R11D |
R11W |
R11B |
Temporary (caller-saved) |
R12 |
R12D |
R12W |
R12B |
Callee-saved |
R13 |
R13D |
R13W |
R13B |
Callee-saved |
R14 |
R14D |
R14W |
R14B |
Callee-saved |
R15 |
R15D |
R15W |
R15B |
Callee-saved |
Key insight: Using EAX (32-bit) automatically zeros the upper 32 bits of RAX. Using AX (16-bit) does NOT zero upper bits. This matters for security (info leaks).
; Register size examples
mov rax, 0x123456789ABCDEF0 ; 64-bit: RAX = 0x123456789ABCDEF0
mov eax, 0x11111111 ; 32-bit: RAX = 0x0000000011111111 (upper zeroed!)
mov ax, 0x2222 ; 16-bit: RAX = 0x0000000011112222 (upper preserved)
mov al, 0x33 ; 8-bit: RAX = 0x0000000011112233 (upper preserved)
Special Registers
; RIP - Instruction Pointer (Program Counter)
; Points to the NEXT instruction to execute
; You cannot mov to RIP directly; jumps/calls change it
; RFLAGS - Status Flags Register
; Individual bits set by operations:
; ZF (Zero Flag) - Set if result is zero
; SF (Sign Flag) - Set if result is negative (MSB = 1)
; CF (Carry Flag) - Set on unsigned overflow
; OF (Overflow Flag) - Set on signed overflow
; PF (Parity Flag) - Set if low byte has even number of 1s
; Example: How flags work
mov eax, 5
sub eax, 5 ; Result is 0
; ZF=1 (zero), SF=0 (not negative), CF=0, OF=0
mov eax, 0
sub eax, 1 ; Result is -1 (0xFFFFFFFF)
; ZF=0, SF=1 (negative), CF=1 (borrow), OF=0
mov al, 127
add al, 1 ; Result is 128 (0x80)
; ZF=0, SF=1, CF=0, OF=1 (signed overflow: 127+1 != 128)
| Flag | When It’s Set |
|---|---|
ZF |
Result equals zero |
SF |
Result’s MSB (most significant bit) is 1 (negative in signed) |
CF |
Unsigned operation overflowed/underflowed (carry out of MSB) |
OF |
Signed operation overflowed (result wrong sign) |
Process Memory Layout
High Address (0x7FFF...)
┌─────────────────────────┐
│ Stack │ ← RSP points here (grows DOWN)
│ (local variables, │
│ return addresses) │
├─────────────────────────┤
│ ↓ │
│ (free space) │
│ ↑ │
├─────────────────────────┤
│ Heap │ ← malloc() allocates here (grows UP)
│ (dynamic allocation) │
├─────────────────────────┤
│ BSS (uninitialized) │ ← Global/static vars initialized to 0
├─────────────────────────┤
│ Data (initialized) │ ← Global/static vars with initial values
├─────────────────────────┤
│ Text (code) │ ← Your compiled instructions (read-only)
└─────────────────────────┘
Low Address (0x400000...)
# See actual memory layout of a running process
cat /proc/self/maps
# Example output (simplified):
# 00400000-00401000 r-xp /bin/cat ← Text (executable)
# 00601000-00602000 rw-p /bin/cat ← Data
# 7f8a12000000-7f8a12021000 rw-p ← Heap
# 7ffc9e800000-7ffc9e821000 rw-p ← Stack
Memory Addressing Modes
; IMMEDIATE - Value is in the instruction itself
mov eax, 42 ; eax = 42 (decimal)
mov eax, 0x2A ; eax = 42 (hex)
mov eax, 0b101010 ; eax = 42 (binary)
; REGISTER - Value is in a register
mov eax, ebx ; eax = ebx
; DIRECT - Address is literal in instruction
mov eax, [0x601000] ; eax = value at memory address 0x601000
; REGISTER INDIRECT - Address is in a register
mov eax, [rbx] ; eax = value at address stored in rbx
; DISPLACEMENT - Register + offset
mov eax, [rbx + 8] ; eax = value at (rbx + 8)
mov eax, [rbp - 4] ; eax = local variable (stack)
; INDEXED - Base + (index * scale)
mov eax, [rbx + rcx*4] ; Array access: rbx=base, rcx=index, 4=element size
; FULL FORM - Base + (index * scale) + displacement
mov eax, [rbx + rcx*4 + 16] ; struct array: arr[i].field
Array Access Pattern:
// C code
int arr[10];
int x = arr[3];
// Assembly equivalent
; Assume arr base address in rbx
mov eax, [rbx + 3*4] ; 3 = index, 4 = sizeof(int)
; Or with index in register:
mov ecx, 3
mov eax, [rbx + rcx*4]
Data Sizes and Suffixes
| Size | Bytes | Intel Suffix | AT&T Suffix |
|---|---|---|---|
Byte |
1 |
BYTE PTR |
b (movb) |
Word |
2 |
WORD PTR |
w (movw) |
Double Word |
4 |
DWORD PTR |
l (movl) |
Quad Word |
8 |
QWORD PTR |
q (movq) |
; Intel syntax (NASM, used in this guide)
mov BYTE PTR [rax], 0x41 ; Store 1 byte (ASCII 'A')
mov WORD PTR [rax], 0x4142 ; Store 2 bytes
mov DWORD PTR [rax], 0x41424344 ; Store 4 bytes
mov QWORD PTR [rax], rbx ; Store 8 bytes
; AT&T syntax (GAS, objdump default) - source, destination reversed!
movb $0x41, (%rax) ; Same as BYTE PTR above
movl $0x41424344, (%rax) ; Same as DWORD PTR above
CRITICAL: Intel vs AT&T Syntax
Intel (NASM): mov destination, source ; dst = src
AT&T (GAS): movl source, destination ; src → dst
Intel: mov eax, [rbx + rcx*4 + 8]
AT&T: movl 8(%rbx,%rcx,4), %eax
Use objdump -M intel to get Intel syntax from disassembly.
Essential Instructions
; DATA MOVEMENT
mov rax, rbx ; rax = rbx (copy)
mov rax, [rbx] ; rax = *rbx (load from memory)
mov [rax], rbx ; *rax = rbx (store to memory)
lea rax, [rbx + 8] ; rax = rbx + 8 (address calculation, NOT memory access)
xchg rax, rbx ; Swap rax and rbx
; LEA is powerful for math without memory access:
lea rax, [rbx + rcx*2] ; rax = rbx + rcx*2 (no memory read!)
lea rax, [rax + rax*4] ; rax = rax * 5 (multiply by 5)
lea rax, [rax*8 + rax] ; rax = rax * 9
; ZEROING (these are equivalent but different sizes)
xor eax, eax ; eax = 0 (smallest encoding, preferred)
mov eax, 0 ; eax = 0 (5 bytes)
sub eax, eax ; eax = 0 (also works)
; SIGN/ZERO EXTENSION
movzx eax, BYTE PTR [rbx] ; Zero-extend byte to 32-bit
movsx eax, BYTE PTR [rbx] ; Sign-extend byte to 32-bit
movsxd rax, DWORD PTR [rbx] ; Sign-extend 32-bit to 64-bit
cdqe ; Sign-extend EAX to RAX
; STACK OPERATIONS
push rax ; RSP -= 8; [RSP] = RAX
pop rax ; RAX = [RSP]; RSP += 8
push QWORD PTR [rbx] ; Push value from memory
Endianness (Little-Endian)
x86 is little-endian: least significant byte at lowest address.
Value: 0x12345678
Memory Address: 0x100 0x101 0x102 0x103
┌──────┬──────┬──────┬──────┐
Little-Endian: │ 0x78 │ 0x56 │ 0x34 │ 0x12 │ ← x86
└──────┴──────┴──────┴──────┘
(LSB) (MSB)
Big-Endian: │ 0x12 │ 0x34 │ 0x56 │ 0x78 │ ← Network byte order
(MSB) (LSB)
; Demonstrating endianness
mov DWORD PTR [rax], 0x41424344 ; Store "DCBA" (reversed!)
; In memory:
; [rax+0] = 0x44 ('D')
; [rax+1] = 0x43 ('C')
; [rax+2] = 0x42 ('B')
; [rax+3] = 0x41 ('A')
Why it matters:
- Strings appear "backwards" in memory dumps
- Network protocols use big-endian (need htonl/ntohl)
- When debugging, read hex dumps right-to-left for multi-byte values
Two’s Complement (Signed Integers)
The Math: To negate a number, invert all bits and add 1.
8-bit examples:
127 = 0111 1111 (largest positive)
1 = 0000 0001
0 = 0000 0000
-1 = 1111 1111 (invert 1 → 1111 1110, add 1 → 1111 1111)
-2 = 1111 1110
-128 = 1000 0000 (most negative)
Key insight: Same bit pattern, different interpretation
0xFF = 255 (unsigned) = -1 (signed)
0x80 = 128 (unsigned) = -128 (signed)
; CPU doesn't know if you mean signed or unsigned
; YOU choose by which instructions/jumps you use
mov al, 0xFF ; Is this 255 or -1? Depends on context.
; Unsigned comparison
cmp al, 0 ; Compare
ja label ; Jump if Above (unsigned: 255 > 0, jumps)
; Signed comparison
cmp al, 0 ; Compare
jg label ; Jump if Greater (signed: -1 < 0, doesn't jump)
Overflow Examples:
; Signed overflow (OF flag)
mov al, 127 ; Maximum positive signed byte
add al, 1 ; Result: 128 (0x80) = -128 signed
; OF=1 (overflow), SF=1 (negative), CF=0
; Unsigned overflow (CF flag)
mov al, 255 ; Maximum unsigned byte
add al, 1 ; Result: 0 (wrapped around)
; CF=1 (carry), ZF=1 (zero), OF=0
Common Gotchas
; WRONG: Expecting 64-bit operation with 32-bit registers
mov eax, -1 ; EAX = 0xFFFFFFFF, but RAX = 0x00000000FFFFFFFF
; (32-bit mov zero-extends to 64-bit!)
; CORRECT: Use full 64-bit register for signed values
mov rax, -1 ; RAX = 0xFFFFFFFFFFFFFFFF
; WRONG: Forgetting memory requires size specifier
mov [rax], 5 ; Error: How many bytes? 1? 4? 8?
; CORRECT: Specify size
mov DWORD PTR [rax], 5 ; Store 4 bytes
; WRONG: Using LEA like MOV
lea rax, [rbx] ; Works but wasteful, just use mov rax, rbx
; CORRECT: LEA is for address calculation
lea rax, [rbx + rcx*4 + 8] ; Complex address math in one instruction
; WRONG: Forgetting little-endian in string comparisons
mov eax, "ABCD" ; Actually stores as 0x44434241 = "DCBA" reversed!
; WRONG: Assuming register preservation across calls
mov rbx, important_value
call some_function
; RBX is STILL preserved (callee-saved)
; But RAX, RCX, RDX, RSI, RDI, R8-R11 may be DESTROYED
Compile and Examine
# Write simple C to see assembly
cat << 'EOF' > /tmp/test.c
int add(int a, int b) {
return a + b;
}
int main() {
int x = 5;
int y = 3;
int z = add(x, y);
return z;
}
EOF
# Compile with debug info, no optimization
gcc -g -O0 -o /tmp/test /tmp/test.c
# Disassemble with Intel syntax
objdump -d -M intel /tmp/test | grep -A20 '<add>:'
# Example output:
# 0000000000001129 <add>:
# 1129: push rbp ; Save old base pointer
# 112a: mov rbp,rsp ; Set up stack frame
# 112d: mov DWORD PTR [rbp-0x4],edi ; Store first arg (a)
# 1131: mov DWORD PTR [rbp-0x8],esi ; Store second arg (b)
# 1135: mov edx,DWORD PTR [rbp-0x4] ; Load a
# 1138: mov eax,DWORD PTR [rbp-0x8] ; Load b
# 113b: add eax,edx ; eax = a + b
# 113d: pop rbp ; Restore base pointer
# 113e: ret ; Return (result in eax)
# Compile with optimization to see efficient code
gcc -O2 -o /tmp/test_opt /tmp/test.c
objdump -d -M intel /tmp/test_opt | grep -A5 '<add>:'
# Optimized output:
# <add>:
# lea eax,[rdi+rsi*1] ; Single instruction! eax = edi + esi
# ret
Practice Exercises
; Exercise 1: What's in RAX after each instruction?
mov rax, 0x123456789ABCDEF0
mov eax, 0x11111111 ; RAX = ? (Answer: 0x0000000011111111)
mov ax, 0x2222 ; RAX = ? (Answer: 0x0000000011112222)
mov al, 0x33 ; RAX = ? (Answer: 0x0000000011112233)
; Exercise 2: Calculate the effective address
; Given: RBX = 0x1000, RCX = 5
lea rax, [rbx + rcx*4 + 16] ; RAX = ? (Answer: 0x1000 + 20 + 16 = 0x1024)
; Exercise 3: What flags are set?
mov al, 0x7F ; 127 decimal
add al, 1 ; Result = 0x80 (128)
; ZF = ? (Answer: 0 - result is not zero)
; SF = ? (Answer: 1 - MSB is 1)
; OF = ? (Answer: 1 - signed overflow: 127 + 1 ≠ 128 in signed)
; CF = ? (Answer: 0 - no unsigned overflow)
; Exercise 4: Memory layout
; At address 0x1000, we store: mov DWORD PTR [0x1000], 0x41424344
; What byte is at each address?
; 0x1000 = ? (Answer: 0x44 = 'D')
; 0x1001 = ? (Answer: 0x43 = 'C')
; 0x1002 = ? (Answer: 0x42 = 'B')
; 0x1003 = ? (Answer: 0x41 = 'A')