ARM64 Architecture
ARM vs x86-64
Architecture comparison
Feature x86-64 (CISC) ARM64 (RISC)
──────────────────────────────────────────────────────────────
Design Complex instructions Simple, fixed-length
Instruction size Variable (1-15 bytes) Fixed (4 bytes)
Registers 16 GPR 31 GPR (X0-X30)
Endianness Little-endian Bi-endian (usually LE)
Power Higher per core Lower per core
Prevalence Desktops, servers Mobile, embedded, Apple
ISA style Register-memory Load-store (reg-reg only)
Load-store architecture:
x86: add eax, [rbx] ; Memory operand in arithmetic
ARM: ldr w0, [x1] ; Load from memory to register
add w0, w0, w2 ; Arithmetic only between registers
str w0, [x1] ; Store back to memory
ARM64 Registers
General purpose registers
64-bit 32-bit Purpose (AAPCS64)
──────────────────────────────────────────
X0-X7 W0-W7 Arguments + return values
X8 W8 Indirect result location
X9-X15 W9-W15 Scratch (caller-saved)
X16-X17 W16-W17 Intra-procedure-call scratch
X18 W18 Platform register (reserved)
X19-X28 W19-W28 Callee-saved
X29 (FP) W29 Frame pointer (like RBP)
X30 (LR) W30 Link register (return address)
SP Stack pointer (separate from GPRs)
PC Program counter (like RIP)
XZR/WZR Zero register (reads as 0, writes discarded)
Key difference from x86-64:
- 31 GPRs vs 16 (fewer register pressure issues)
- Zero register (no need for xor eax, eax)
- Link register stores return address (no implicit push by CALL)
- Separate stack pointer from GPRs
NEON/SVE (SIMD)
V0-V31: 128-bit SIMD registers (NEON)
Can be accessed as:
Bn (8-bit), Hn (16-bit), Sn (32-bit), Dn (64-bit), Qn (128-bit)
SVE: Scalable Vector Extension (256-2048 bit, varies by implementation)
Server ARM chips (AWS Graviton, etc.)
ARM64 Instructions
Data movement
; Load/Store (ARM is load-store architecture)
ldr x0, [x1] ; Load 64-bit from address in x1
ldr w0, [x1] ; Load 32-bit
ldrb w0, [x1] ; Load byte
ldrh w0, [x1] ; Load halfword (16-bit)
ldrsw x0, [x1] ; Load 32-bit, sign-extend to 64-bit
str x0, [x1] ; Store 64-bit
str w0, [x1] ; Store 32-bit
strb w0, [x1] ; Store byte
; Addressing modes
ldr x0, [x1] ; Simple: base
ldr x0, [x1, #8] ; Offset: base + immediate
ldr x0, [x1, x2] ; Register offset: base + reg
ldr x0, [x1, x2, lsl #3] ; Scaled: base + reg × 8
; Move
mov x0, #42 ; Immediate to register
mov x0, x1 ; Register to register
movz x0, #0x1234 ; Move with zero
movk x0, #0x5678, lsl #16 ; Move keep (set upper bits)
Arithmetic
add x0, x1, x2 ; x0 = x1 + x2
add x0, x1, #10 ; x0 = x1 + 10
sub x0, x1, x2 ; x0 = x1 - x2
mul x0, x1, x2 ; x0 = x1 × x2 (low 64 bits)
sdiv x0, x1, x2 ; x0 = x1 / x2 (signed)
udiv x0, x1, x2 ; x0 = x1 / x2 (unsigned)
madd x0, x1, x2, x3 ; x0 = x1 × x2 + x3 (fused)
msub x0, x1, x2, x3 ; x0 = x3 - x1 × x2
; With flags (S suffix)
adds x0, x1, x2 ; Same as add but sets flags
subs x0, x1, x2 ; Same as sub but sets flags
cmp x1, x2 ; subs + discard result
Conditional execution
; ARM64: conditional branch (like x86)
cmp x0, #10
b.eq label ; Branch if equal
b.ne label ; Branch if not equal
b.gt label ; Branch if greater (signed)
b.lt label ; Branch if less (signed)
b.hi label ; Branch if higher (unsigned)
b.lo label ; Branch if lower (unsigned)
; Conditional select (branchless, like CMOV)
csel x0, x1, x2, eq ; x0 = (flags==eq) ? x1 : x2
csinc x0, x1, x2, ne ; x0 = (flags==ne) ? x1 : x2+1
; Function call
bl function ; Branch with Link (saves return addr in LR)
ret ; Return (branch to LR)
; ARM's unique: branch and link stores return address in LR register
; No implicit stack push — you must push LR yourself if calling deeper
ARM64 Linux Syscalls
Syscall convention
Register Purpose
──────────────────────────
X8 Syscall number
X0-X5 Arguments 1-6
X0 Return value
Invoke: svc #0 (supervisor call)
Hello world (ARM64)
// hello.s - ARM64 Linux
// Assemble: as hello.s -o hello.o
// Link: ld hello.o -o hello
.data
msg: .ascii "Hello from ARM64!\n"
len = . - msg
.text
.global _start
_start:
mov x8, #64 // sys_write
mov x0, #1 // stdout
ldr x1, =msg // buffer
mov x2, #len // length
svc #0 // syscall
mov x8, #93 // sys_exit
mov x0, #0 // status
svc #0
Key syscall numbers (ARM64 vs x86-64)
Syscall ARM64 x86-64
───────────────────────────────
read 63 0
write 64 1
open (openat) 56/257 2
close 57 3
exit 93 60
exit_group 94 231
fork (clone) 220 57
execve 221 59
Note: ARM64 syscall numbers are completely different from x86-64!
Cross-Compilation and Emulation
Working with ARM64 on x86-64
# Install cross-compiler and QEMU
# Arch: pacman -S aarch64-linux-gnu-gcc qemu-user-static
# Cross-assemble
aarch64-linux-gnu-as hello.s -o hello.o
aarch64-linux-gnu-ld hello.o -o hello
# Run with QEMU user-mode emulation
qemu-aarch64-static ./hello
# Cross-compile C to ARM64 assembly
aarch64-linux-gnu-gcc -S -O2 -o test.s test.c
# Disassemble
aarch64-linux-gnu-objdump -d hello
# Debug under QEMU
qemu-aarch64-static -g 1234 ./hello &
gdb-multiarch -ex "target remote :1234" ./hello