Phase 0: Compiler Setup
Objective
Write, compile, and run your first C program. Understand what gcc does β preprocessing, compilation, assembly, linking. No IDE β terminal + nvim only.
Step 1: Verify Toolchain
for cmd in gcc strace ltrace xxd gdb; do
printf "%-10s %s\n" "$cmd" "$(command -v $cmd >/dev/null 2>&1 && echo 'OK' || echo 'MISSING')"
done
If anything is MISSING:
sudo pacman -S strace ltrace gdb tinyxxd
xxd comes from tinyxxd on Arch β not vim. Use pacman -F xxd to find which package owns a binary.
|
Step 2: Write Your First Program
Use tee with a heredoc β write the file and see the contents at the same time:
tee /tmp/hello.c << 'EOF'
#include <stdio.h>
int main(void) {
printf("Hello from C\n");
return 0;
}
EOF
What each line does:
-
#include <stdio.h>β tells the preprocessor to paste the standard I/O header. Without it,printfis undefined. -
int main(void)β entry point. Every C program starts here. Returns anint(exit code). -
printf("Hello from C\n")β writes to stdout.\nis newline. This is a library function (man 3 printf), not a syscall. -
return 0β exit code 0 = success. The shell reads this with$?.
Step 3: Compile and Run
gcc /tmp/hello.c -o /tmp/hello && /tmp/hello
Hello from C
That’s one command: gcc compiles, -o /tmp/hello names the binary, && runs it only if compilation succeeds.
What the Flags Mean
| Flag | What it does |
|---|---|
(no flags) |
|
|
Name the output binary instead of |
|
Enable all common warnings. Always use this. Catches bugs the compiler sees but doesn’t normally tell you about. |
|
Add debug symbols for |
|
The combo you should always use during development. |
gcc -Wall -g /tmp/hello.c -o /tmp/hello && /tmp/hello
If there are no warnings, gcc prints nothing β silence means success.
Step 4: See What Your Program Actually Does
strace β Every Kernel Call
strace /tmp/hello
This shows every system call your program makes. Look near the bottom for:
write(1, "Hello from C\n", 13) = 13
That’s your printf at the kernel level:
-
writeβ the syscall (notprintfβprintfis a library wrapper aroundwrite) -
1β file descriptor 1 = stdout -
"Hello from C\n"β the string -
13β bytes written -
= 13β return value (13 bytes successfully written)
Everything above that line is the OS loading your program β finding libc.so.6, mapping memory, setting up the stack.
strace -c /tmp/hello
strace -e trace=write /tmp/hello
ltrace β Library Calls
One level above syscalls β shows C library function calls:
ltrace /tmp/hello
You’ll see printf("Hello from C\n") β the library call. strace shows the write() syscall underneath it. Two different layers of the same action.
xxd β Raw Bytes
xxd /tmp/hello | head -20
This is the compiled binary as raw hex. Every file is bytes β xxd shows you that truth.
file β What Type Is It?
file /tmp/hello
Should say ELF 64-bit LSB executable β that’s a Linux binary.
ldd β Linked Libraries
ldd /tmp/hello
Shows which shared libraries your program depends on. You’ll see libc.so.6 β that’s where printf lives.
Step 5: See What gcc Does Internally
gcc runs 4 stages. You can stop at each one:
# 1. Preprocess β expands #include, replaces #define macros
gcc -E /tmp/hello.c -o /tmp/hello.i
head -20 /tmp/hello.i
# You'll see hundreds of lines β that's stdio.h pasted in
# 2. Compile to assembly β human-readable CPU instructions
gcc -S /tmp/hello.c -o /tmp/hello.s
cat /tmp/hello.s
# Find "printf" or "call" β that's your function call in assembly
# 3. Assemble to object file β machine code, not yet linked
gcc -c /tmp/hello.c -o /tmp/hello.o
file /tmp/hello.o
# "ELF 64-bit LSB relocatable" β not executable yet
# 4. Link β connect your code to libc, produce the final executable
gcc /tmp/hello.c -o /tmp/hello
file /tmp/hello
# "ELF 64-bit LSB executable" β ready to run
Understanding the Byte Count
strace showed write(1, "Hello from C\n", 13) β why 13?
Counting Bytes
Every character is one byte. \n looks like two characters in source code but compiles to one byte (newline, ASCII 10):
H e l l o Β· f r o m Β· C \n 1 2 3 4 5 6 7 8 9 10 11 12 13
printf 'Hello from C\n' | wc -c
# β 13
printf 'Hello from C\n' | xxd
# β 00000000: 4865 6c6c 6f20 6672 6f6d 2043 0a Hello from C.
printf 'Hello from C\n' | od -An -td1
# β 72 101 108 108 111 32 102 114 111 109 32 67 10
Hex, Nibbles, Bytes β the Same Math as Networking
Each hex digit is a nibble (4 bits). Place values: 8, 4, 2, 1.
Nibble: 8 4 2 1
β β β β
0100 = 0 1 0 0 = 4
1000 = 1 0 0 0 = 8
1010 = 1 0 1 0 = 8+2 = 10 = 0x0A (your newline)
1111 = 1 1 1 1 = 8+4+2+1 = 15 = 0x0F
Two nibbles = one byte (8 bits). Range: 0x00 (0) to 0xFF (255).
Hex 48 = 0100 1000 = "H" (ASCII 72)
4 8
This Is the Same System You Already Know
| Context | Example | Bytes | Same math |
|---|---|---|---|
IP address |
|
4 bytes β each octet is 0-255 (one byte) |
Subnet: /24 = first 24 bits (3 bytes) are network |
MAC address |
|
6 bytes β each pair is one hex byte |
OUI = first 3 bytes ( |
C string |
|
13 bytes β each character is one byte |
|
Subnet mask |
|
|
|
The hex in xxd output is the same hex in MAC addresses is the same hex in subnet masks. One system. You already know it β you’ve been subnetting with it for years. C just shows you the bytes that the network abstracts away.
Exercises
1. [x] Write hello.c, compile, run
tee /tmp/hello.c << 'EOF'
#include <stdio.h>
int main(void) {
printf("Hello from C\n");
return 0;
}
EOF
gcc -Wall -g /tmp/hello.c -o /tmp/hello && /tmp/hello
# β Hello from C
2. [x] strace β find the write syscall
strace /tmp/hello
# Near the bottom:
# write(1, "Hello from C\n", 13) = 13
Answer: 13 bytes. printf becomes write(1, …) at the kernel level. File descriptor 1 = stdout.
3. [x] strace -c β which syscall is called most?
strace -c /tmp/hello
Answer: strace -c sorts by time spent, not call count. Look at the calls column β mmap and pread64 have the highest counts (loading shared libraries into memory). execve runs only once (starting the program). The "inconsistency" is that time and count are different axes.
4. [x] ltrace β find the printf call
ltrace /tmp/hello
puts("Hello from C"Hello from C
) = 13
+++ exited (status 0) +++
Answer: Yes β puts IS the printf call. GCC optimizes printf("string\n") to puts("string") when there are no format specifiers (%s, %d, etc.). puts is faster β it just writes a string + appends a newline. No format parsing needed. Same result, fewer CPU instructions. Verify with man 3 puts.
This is your first encounter with compiler optimization β the compiler doesn’t translate your code literally. It finds a faster equivalent. You wrote printf, the binary runs puts. The output is identical.
5. [x] ldd β what library provides printf?
ldd /tmp/hello
linux-vdso.so.1 (0x00007fe7c2cff000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fe7c2ad8000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fe7c2d01000)
Answer: libc.so.6 β the C standard library. This is where printf, puts, fopen, malloc, and every standard function lives. The other two:
-
linux-vdso.so.1β virtual DSO. Kernel shortcuts for fast syscalls (clock, getpid). Not a real file. -
ld-linux-x86-64.so.2β the dynamic linker. It loadslibc.so.6into memory before your program starts.
6. [x] gcc -E β preprocessor output
gcc -E /tmp/hello.c | wc -l
# β 844
Answer: Your 6-line program becomes 844 lines because include <stdio.h> is a literal copy-paste instruction. The preprocessor replaces that line with the entire contents of stdio.h β plus everything stdio.h itself includes (features.h, bits/types.h, bits/wordsize.h, etc.). The 1 "/usr/include/stdio.h" lines are preprocessor markers showing which file each block came from. Your actual code is at the very bottom β line ~840.
This is why headers exist: you write one #include line, the preprocessor pastes hundreds of function declarations so the compiler knows what printf looks like.
7. [x] gcc -S β find printf in assembly
gcc -S /tmp/hello.c -o /tmp/hello.s && cat /tmp/hello.s
.file "hello.c"
.section .rodata
.LC0:
.string "Hello from C" (1)
.text
.globl main
main:
pushq %rbp
movq %rsp, %rbp
leaq .LC0(%rip), %rax (2)
movq %rax, %rdi (3)
call puts@PLT (4)
movl $0, %eax (5)
popq %rbp
ret
| 1 | Your string "Hello from C" stored in read-only data (.rodata). No \n β puts adds it. |
| 2 | leaq loads the address of the string into register %rax. |
| 3 | movq copies it to %rdi β the first argument register in x86-64 calling convention. |
| 4 | call puts@PLT β calls puts via the Procedure Linkage Table (dynamic linking). This is your printf after GCC optimized it. |
| 5 | movl $0, %eax β sets return value to 0. This is your return 0. |
The Full Journey β One Line of C Through Every Layer
printf("Hello from C\n"); β you wrote this (source)
β
844 lines of preprocessed code β gcc -E (preprocessor expands #include)
β
call puts@PLT β gcc -S (compiler optimizes printf β puts)
β
ELF 64-bit executable β gcc (linker connects to libc.so.6)
β
puts("Hello from C") β ltrace (library call)
β
write(1, "Hello from C\n", 13) β strace (kernel syscall)
β
13 bytes on stdout β your terminal
Phase 0 complete. You traced one line of C from source through every abstraction layer to the kernel. Every concept here β hex, bytes, syscalls, library calls, assembly β comes back in every phase that follows.
Related
-
man gccβ full compiler documentation -
man 3 printfβ the C library function you just used -
man 2 writeβ the syscall underneath printf -
man straceβ system call tracer -
Next: Man Pages as Curriculum