Phase 0: Compiler Setup

Objective

Write, compile, and run your first C program. Understand what gcc does β€” preprocessing, compilation, assembly, linking. No IDE β€” terminal + nvim only.

Step 1: Verify Toolchain

for cmd in gcc strace ltrace xxd gdb; do
  printf "%-10s %s\n" "$cmd" "$(command -v $cmd >/dev/null 2>&1 && echo 'OK' || echo 'MISSING')"
done

If anything is MISSING:

sudo pacman -S strace ltrace gdb tinyxxd
xxd comes from tinyxxd on Arch β€” not vim. Use pacman -F xxd to find which package owns a binary.

Step 2: Write Your First Program

Use tee with a heredoc β€” write the file and see the contents at the same time:

tee /tmp/hello.c << 'EOF'
#include <stdio.h>

int main(void) {
    printf("Hello from C\n");
    return 0;
}
EOF

What each line does:

  • #include <stdio.h> β€” tells the preprocessor to paste the standard I/O header. Without it, printf is undefined.

  • int main(void) β€” entry point. Every C program starts here. Returns an int (exit code).

  • printf("Hello from C\n") β€” writes to stdout. \n is newline. This is a library function (man 3 printf), not a syscall.

  • return 0 β€” exit code 0 = success. The shell reads this with $?.

Step 3: Compile and Run

gcc /tmp/hello.c -o /tmp/hello && /tmp/hello
expected output
Hello from C

That’s one command: gcc compiles, -o /tmp/hello names the binary, && runs it only if compilation succeeds.

What the Flags Mean

Flag What it does

(no flags)

gcc hello.c β†’ produces a.out in current directory

-o hello

Name the output binary instead of a.out

-Wall

Enable all common warnings. Always use this. Catches bugs the compiler sees but doesn’t normally tell you about.

-g

Add debug symbols for gdb. Makes the binary larger but lets you step through line by line.

-Wall -g

The combo you should always use during development.

compile with all warnings + debug info
gcc -Wall -g /tmp/hello.c -o /tmp/hello && /tmp/hello

If there are no warnings, gcc prints nothing β€” silence means success.

Step 4: See What Your Program Actually Does

strace β€” Every Kernel Call

strace /tmp/hello

This shows every system call your program makes. Look near the bottom for:

write(1, "Hello from C\n", 13)          = 13

That’s your printf at the kernel level:

  • write β€” the syscall (not printf β€” printf is a library wrapper around write)

  • 1 β€” file descriptor 1 = stdout

  • "Hello from C\n" β€” the string

  • 13 β€” bytes written

  • = 13 β€” return value (13 bytes successfully written)

Everything above that line is the OS loading your program β€” finding libc.so.6, mapping memory, setting up the stack.

summary view β€” count syscalls by type
strace -c /tmp/hello
filter to only write syscalls
strace -e trace=write /tmp/hello

ltrace β€” Library Calls

One level above syscalls β€” shows C library function calls:

ltrace /tmp/hello

You’ll see printf("Hello from C\n") β€” the library call. strace shows the write() syscall underneath it. Two different layers of the same action.

xxd β€” Raw Bytes

xxd /tmp/hello | head -20

This is the compiled binary as raw hex. Every file is bytes β€” xxd shows you that truth.

file β€” What Type Is It?

file /tmp/hello

Should say ELF 64-bit LSB executable β€” that’s a Linux binary.

ldd β€” Linked Libraries

ldd /tmp/hello

Shows which shared libraries your program depends on. You’ll see libc.so.6 β€” that’s where printf lives.

Step 5: See What gcc Does Internally

gcc runs 4 stages. You can stop at each one:

# 1. Preprocess β€” expands #include, replaces #define macros
gcc -E /tmp/hello.c -o /tmp/hello.i
head -20 /tmp/hello.i
# You'll see hundreds of lines β€” that's stdio.h pasted in

# 2. Compile to assembly β€” human-readable CPU instructions
gcc -S /tmp/hello.c -o /tmp/hello.s
cat /tmp/hello.s
# Find "printf" or "call" β€” that's your function call in assembly

# 3. Assemble to object file β€” machine code, not yet linked
gcc -c /tmp/hello.c -o /tmp/hello.o
file /tmp/hello.o
# "ELF 64-bit LSB relocatable" β€” not executable yet

# 4. Link β€” connect your code to libc, produce the final executable
gcc /tmp/hello.c -o /tmp/hello
file /tmp/hello
# "ELF 64-bit LSB executable" β€” ready to run

Understanding the Byte Count

strace showed write(1, "Hello from C\n", 13) β€” why 13?

Counting Bytes

Every character is one byte. \n looks like two characters in source code but compiles to one byte (newline, ASCII 10):

H   e   l   l   o   Β·   f   r   o   m   Β·   C   \n
1   2   3   4   5   6   7   8   9   10  11  12  13
prove it
printf 'Hello from C\n' | wc -c
# β†’ 13

printf 'Hello from C\n' | xxd
# β†’ 00000000: 4865 6c6c 6f20 6672 6f6d 2043 0a   Hello from C.

printf 'Hello from C\n' | od -An -td1
# β†’ 72 101 108 108 111 32 102 114 111 109 32 67 10

Hex, Nibbles, Bytes β€” the Same Math as Networking

Each hex digit is a nibble (4 bits). Place values: 8, 4, 2, 1.

Nibble:   8  4  2  1
          ─  ─  ─  ─
0100    = 0  1  0  0  = 4
1000    = 1  0  0  0  = 8
1010    = 1  0  1  0  = 8+2 = 10 = 0x0A (your newline)
1111    = 1  1  1  1  = 8+4+2+1 = 15 = 0x0F

Two nibbles = one byte (8 bits). Range: 0x00 (0) to 0xFF (255).

Hex 48  = 0100 1000  = "H" (ASCII 72)
            4    8

This Is the Same System You Already Know

Context Example Bytes Same math

IP address

192.168.1.10

4 bytes β€” each octet is 0-255 (one byte)

Subnet: /24 = first 24 bits (3 bytes) are network

MAC address

40:AC:8D:00:93:E5

6 bytes β€” each pair is one hex byte

OUI = first 3 bytes (40:AC:8D = TCP clock vendor)

C string

"Hello from C\n"

13 bytes β€” each character is one byte

0x0A at the end = newline

Subnet mask

255.255.255.0

FF:FF:FF:00 in hex

11111111.11111111.11111111.00000000 in binary β€” same 8-4-2-1 math

The hex in xxd output is the same hex in MAC addresses is the same hex in subnet masks. One system. You already know it β€” you’ve been subnetting with it for years. C just shows you the bytes that the network abstracts away.

Exercises

1. [x] Write hello.c, compile, run

tee /tmp/hello.c << 'EOF'
#include <stdio.h>

int main(void) {
    printf("Hello from C\n");
    return 0;
}
EOF

gcc -Wall -g /tmp/hello.c -o /tmp/hello && /tmp/hello
# β†’ Hello from C

2. [x] strace β€” find the write syscall

strace /tmp/hello
# Near the bottom:
# write(1, "Hello from C\n", 13) = 13

Answer: 13 bytes. printf becomes write(1, …​) at the kernel level. File descriptor 1 = stdout.

3. [x] strace -c β€” which syscall is called most?

strace -c /tmp/hello

Answer: strace -c sorts by time spent, not call count. Look at the calls column β€” mmap and pread64 have the highest counts (loading shared libraries into memory). execve runs only once (starting the program). The "inconsistency" is that time and count are different axes.

4. [x] ltrace β€” find the printf call

ltrace /tmp/hello
puts("Hello from C"Hello from C
)                                = 13
+++ exited (status 0) +++

Answer: Yes β€” puts IS the printf call. GCC optimizes printf("string\n") to puts("string") when there are no format specifiers (%s, %d, etc.). puts is faster β€” it just writes a string + appends a newline. No format parsing needed. Same result, fewer CPU instructions. Verify with man 3 puts.

This is your first encounter with compiler optimization β€” the compiler doesn’t translate your code literally. It finds a faster equivalent. You wrote printf, the binary runs puts. The output is identical.

5. [x] ldd β€” what library provides printf?

ldd /tmp/hello
	linux-vdso.so.1 (0x00007fe7c2cff000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007fe7c2ad8000)
	/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fe7c2d01000)

Answer: libc.so.6 β€” the C standard library. This is where printf, puts, fopen, malloc, and every standard function lives. The other two:

  • linux-vdso.so.1 β€” virtual DSO. Kernel shortcuts for fast syscalls (clock, getpid). Not a real file.

  • ld-linux-x86-64.so.2 β€” the dynamic linker. It loads libc.so.6 into memory before your program starts.

6. [x] gcc -E β€” preprocessor output

gcc -E /tmp/hello.c | wc -l
# β†’ 844

Answer: Your 6-line program becomes 844 lines because include <stdio.h> is a literal copy-paste instruction. The preprocessor replaces that line with the entire contents of stdio.h β€” plus everything stdio.h itself includes (features.h, bits/types.h, bits/wordsize.h, etc.). The 1 "/usr/include/stdio.h" lines are preprocessor markers showing which file each block came from. Your actual code is at the very bottom β€” line ~840.

This is why headers exist: you write one #include line, the preprocessor pastes hundreds of function declarations so the compiler knows what printf looks like.

7. [x] gcc -S β€” find printf in assembly

gcc -S /tmp/hello.c -o /tmp/hello.s && cat /tmp/hello.s
output
	.file	"hello.c"
	.section	.rodata
.LC0:
	.string	"Hello from C"           (1)
	.text
	.globl	main
main:
	pushq	%rbp
	movq	%rsp, %rbp
	leaq	.LC0(%rip), %rax         (2)
	movq	%rax, %rdi               (3)
	call	puts@PLT                 (4)
	movl	$0, %eax                 (5)
	popq	%rbp
	ret
1 Your string "Hello from C" stored in read-only data (.rodata). No \n β€” puts adds it.
2 leaq loads the address of the string into register %rax.
3 movq copies it to %rdi β€” the first argument register in x86-64 calling convention.
4 call puts@PLT β€” calls puts via the Procedure Linkage Table (dynamic linking). This is your printf after GCC optimized it.
5 movl $0, %eax β€” sets return value to 0. This is your return 0.

The Full Journey β€” One Line of C Through Every Layer

printf("Hello from C\n");          ← you wrote this (source)
         ↓
844 lines of preprocessed code     ← gcc -E (preprocessor expands #include)
         ↓
call puts@PLT                      ← gcc -S (compiler optimizes printf β†’ puts)
         ↓
ELF 64-bit executable              ← gcc (linker connects to libc.so.6)
         ↓
puts("Hello from C")               ← ltrace (library call)
         ↓
write(1, "Hello from C\n", 13)     ← strace (kernel syscall)
         ↓
13 bytes on stdout                 ← your terminal

Phase 0 complete. You traced one line of C from source through every abstraction layer to the kernel. Every concept here β€” hex, bytes, syscalls, library calls, assembly β€” comes back in every phase that follows.

  • man gcc β€” full compiler documentation

  • man 3 printf β€” the C library function you just used

  • man 2 write β€” the syscall underneath printf

  • man strace β€” system call tracer

  • Next: Man Pages as Curriculum