ARM64 Architecture

ARM vs x86-64

Architecture comparison
Feature           x86-64 (CISC)          ARM64 (RISC)
──────────────────────────────────────────────────────────────
Design            Complex instructions    Simple, fixed-length
Instruction size  Variable (1-15 bytes)   Fixed (4 bytes)
Registers         16 GPR                  31 GPR (X0-X30)
Endianness        Little-endian           Bi-endian (usually LE)
Power             Higher per core         Lower per core
Prevalence        Desktops, servers       Mobile, embedded, Apple
ISA style         Register-memory         Load-store (reg-reg only)

Load-store architecture:
  x86:  add eax, [rbx]        ; Memory operand in arithmetic
  ARM:  ldr w0, [x1]          ; Load from memory to register
        add w0, w0, w2        ; Arithmetic only between registers
        str w0, [x1]          ; Store back to memory

ARM64 Registers

General purpose registers
64-bit    32-bit    Purpose (AAPCS64)
──────────────────────────────────────────
X0-X7     W0-W7    Arguments + return values
X8        W8       Indirect result location
X9-X15    W9-W15   Scratch (caller-saved)
X16-X17   W16-W17  Intra-procedure-call scratch
X18       W18      Platform register (reserved)
X19-X28   W19-W28  Callee-saved
X29 (FP)  W29      Frame pointer (like RBP)
X30 (LR)  W30      Link register (return address)
SP                 Stack pointer (separate from GPRs)
PC                 Program counter (like RIP)
XZR/WZR            Zero register (reads as 0, writes discarded)

Key difference from x86-64:
  - 31 GPRs vs 16 (fewer register pressure issues)
  - Zero register (no need for xor eax, eax)
  - Link register stores return address (no implicit push by CALL)
  - Separate stack pointer from GPRs
NEON/SVE (SIMD)
V0-V31:   128-bit SIMD registers (NEON)
           Can be accessed as:
           Bn (8-bit), Hn (16-bit), Sn (32-bit), Dn (64-bit), Qn (128-bit)

SVE:      Scalable Vector Extension (256-2048 bit, varies by implementation)
           Server ARM chips (AWS Graviton, etc.)

ARM64 Instructions

Data movement
; Load/Store (ARM is load-store architecture)
ldr  x0, [x1]          ; Load 64-bit from address in x1
ldr  w0, [x1]          ; Load 32-bit
ldrb w0, [x1]          ; Load byte
ldrh w0, [x1]          ; Load halfword (16-bit)
ldrsw x0, [x1]         ; Load 32-bit, sign-extend to 64-bit

str  x0, [x1]          ; Store 64-bit
str  w0, [x1]          ; Store 32-bit
strb w0, [x1]          ; Store byte

; Addressing modes
ldr x0, [x1]           ; Simple: base
ldr x0, [x1, #8]       ; Offset: base + immediate
ldr x0, [x1, x2]       ; Register offset: base + reg
ldr x0, [x1, x2, lsl #3]  ; Scaled: base + reg × 8

; Move
mov  x0, #42           ; Immediate to register
mov  x0, x1            ; Register to register
movz x0, #0x1234       ; Move with zero
movk x0, #0x5678, lsl #16  ; Move keep (set upper bits)
Arithmetic
add  x0, x1, x2        ; x0 = x1 + x2
add  x0, x1, #10       ; x0 = x1 + 10
sub  x0, x1, x2        ; x0 = x1 - x2
mul  x0, x1, x2        ; x0 = x1 × x2 (low 64 bits)
sdiv x0, x1, x2        ; x0 = x1 / x2 (signed)
udiv x0, x1, x2        ; x0 = x1 / x2 (unsigned)
madd x0, x1, x2, x3    ; x0 = x1 × x2 + x3 (fused)
msub x0, x1, x2, x3    ; x0 = x3 - x1 × x2

; With flags (S suffix)
adds x0, x1, x2        ; Same as add but sets flags
subs x0, x1, x2        ; Same as sub but sets flags
cmp  x1, x2            ; subs + discard result
Conditional execution
; ARM64: conditional branch (like x86)
cmp  x0, #10
b.eq label              ; Branch if equal
b.ne label              ; Branch if not equal
b.gt label              ; Branch if greater (signed)
b.lt label              ; Branch if less (signed)
b.hi label              ; Branch if higher (unsigned)
b.lo label              ; Branch if lower (unsigned)

; Conditional select (branchless, like CMOV)
csel x0, x1, x2, eq    ; x0 = (flags==eq) ? x1 : x2
csinc x0, x1, x2, ne   ; x0 = (flags==ne) ? x1 : x2+1

; Function call
bl  function            ; Branch with Link (saves return addr in LR)
ret                     ; Return (branch to LR)

; ARM's unique: branch and link stores return address in LR register
; No implicit stack push — you must push LR yourself if calling deeper

ARM64 Linux Syscalls

Syscall convention
Register   Purpose
──────────────────────────
X8         Syscall number
X0-X5      Arguments 1-6
X0         Return value

Invoke: svc #0     (supervisor call)
Hello world (ARM64)
// hello.s - ARM64 Linux
// Assemble: as hello.s -o hello.o
// Link:     ld hello.o -o hello

.data
msg:    .ascii "Hello from ARM64!\n"
len = . - msg

.text
.global _start

_start:
    mov  x8, #64          // sys_write
    mov  x0, #1           // stdout
    ldr  x1, =msg         // buffer
    mov  x2, #len         // length
    svc  #0               // syscall

    mov  x8, #93          // sys_exit
    mov  x0, #0           // status
    svc  #0
Key syscall numbers (ARM64 vs x86-64)
Syscall         ARM64     x86-64
───────────────────────────────
read              63        0
write             64        1
open (openat)    56/257     2
close             57        3
exit              93       60
exit_group        94      231
fork (clone)     220       57
execve           221       59

Note: ARM64 syscall numbers are completely different from x86-64!

Cross-Compilation and Emulation

Working with ARM64 on x86-64
# Install cross-compiler and QEMU
# Arch: pacman -S aarch64-linux-gnu-gcc qemu-user-static

# Cross-assemble
aarch64-linux-gnu-as hello.s -o hello.o
aarch64-linux-gnu-ld hello.o -o hello

# Run with QEMU user-mode emulation
qemu-aarch64-static ./hello

# Cross-compile C to ARM64 assembly
aarch64-linux-gnu-gcc -S -O2 -o test.s test.c

# Disassemble
aarch64-linux-gnu-objdump -d hello

# Debug under QEMU
qemu-aarch64-static -g 1234 ./hello &
gdb-multiarch -ex "target remote :1234" ./hello