Phase 1: Reading Files — fopen, fgetc, fclose
Objective
Read a file one byte at a time. Understand what fopen, fgetc, and fclose actually do. Handle errors. This is the foundation for reading any data — config files, packets, certificates, ISE exports.
The Three Functions
| Function | What it does | Man page |
|---|---|---|
|
Opens a file. Returns a |
|
|
Reads one byte from the file. Returns the byte as |
|
|
Closes the file. Frees the resources. |
|
These are library functions (section 3) — not syscalls. They wrap the real syscalls (open, read, close) with buffering and convenience. You’ll see the raw versions in Phase 2.
|
Step 1: Create a Test File
printf 'Hello from C\nSecond line\n' > /tmp/test.txt
xxd /tmp/test.txt
00000000: 4865 6c6c 6f20 6672 6f6d 2043 0a53 6563 Hello from C.Sec 00000010: 6f6e 6420 6c69 6e65 0a ond line.
25 bytes. Two 0a newlines — one after each line.
Step 2: Read It Byte by Byte
Create /tmp/reader.c:
#include <stdio.h>
int main(void) {
FILE *fp = fopen("/tmp/test.txt", "r"); (1)
if (fp == NULL) { (2)
printf("Error: cannot open file\n");
return 1;
}
int c; (3)
while ((c = fgetc(fp)) != EOF) { (4)
printf("%c", c); (5)
}
fclose(fp); (6)
return 0;
}
| 1 | fopen opens the file for reading ("r"). Returns a FILE * pointer — a handle to the open file. |
| 2 | Always check for NULL. If the file doesn’t exist or you don’t have permission, fopen returns NULL. Without this check, the program crashes. |
| 3 | int c — not char. This is critical. fgetc returns an int because it needs to return 256 possible byte values (0-255) PLUS EOF (-1). A char can’t hold all of those. |
| 4 | The idiom: (c = fgetc(fp)) != EOF. Assigns the byte to c AND checks if it’s EOF in one expression. This is the standard C pattern — you’ll see it everywhere. |
| 5 | %c prints the byte as a character. %d would print the ASCII number. %02x would print it as hex. |
| 6 | fclose — always close what you open. Frees the FILE struct and flushes any buffered data. |
gcc -Wall -g /tmp/reader.c -o /tmp/reader && /tmp/reader
Hello from C Second line
Step 3: See the Bytes — Hex Mode
Change %c to %02x ` to see what `fgetc actually returns:
#include <stdio.h>
int main(void) {
FILE *fp = fopen("/tmp/test.txt", "r");
if (fp == NULL) {
printf("Error: cannot open file\n");
return 1;
}
int c;
int count = 0;
while ((c = fgetc(fp)) != EOF) {
printf("%02x ", c); (1)
count++;
if (count % 16 == 0) (2)
printf("\n");
}
printf("\n(%d bytes)\n", count); (3)
fclose(fp);
return 0;
}
| 1 | %02x — print as 2-digit hex, zero-padded. Same format as xxd, MAC addresses, and packet captures. |
| 2 | Newline every 16 bytes — just like xxd output. |
| 3 | Total byte count — should match wc -c /tmp/test.txt. |
Save as /tmp/hexreader.c:
gcc -Wall -g /tmp/hexreader.c -o /tmp/hexreader && /tmp/hexreader
48 65 6c 6c 6f 20 66 72 6f 6d 20 43 0a 53 65 63 6f 6e 64 20 6c 69 6e 65 0a (25 bytes)
You just built xxd — a simplified version. Same hex output you’ve been reading. Byte 0a is the newline. Byte 48 is H. Same system, you’re constructing the tool instead of using it.
Step 4: Error Handling — What If the File Doesn’t Exist?
# Try to read a file that doesn't exist
cat > /tmp/reader_error.c << 'EOF'
#include <stdio.h>
int main(void) {
FILE *fp = fopen("/tmp/nonexistent.txt", "r");
if (fp == NULL) {
perror("fopen"); (1)
return 1;
}
fclose(fp);
return 0;
}
EOF
gcc -Wall -g /tmp/reader_error.c -o /tmp/reader_error && /tmp/reader_error
| 1 | perror prints the system error message — "fopen: No such file or directory". More informative than your own error string. man 3 perror. |
fopen: No such file or directory
echo $?
# → 1 (error — your return 1)
This is what $? is in bash — the return value from main(). return 0 = success. return 1 = error. Every bash script you’ve written with && echo OK || echo FAIL is checking this return value.
Step 5: strace Your Reader
strace /tmp/reader 2>&1 | grep -E 'open|read|write|close'
You’ll see:
-
openat(AT_FDCWD, "/tmp/test.txt", O_RDONLY)—fopenbecomesopenatsyscall -
read(3, "Hello from C\nSecond line\n", 4096)—fgetctriggers a 4096-byte buffer read (stdio buffers!) -
write(1, "Hello from C\nSecond line\n", 25)—printfwrites to stdout -
close(3)—fclosebecomesclosesyscall
Notice: fgetc reads one byte at a time in YOUR code, but stdio reads 4096 bytes at once from the kernel and serves them to you one at a time from a buffer. That’s the abstraction layer. Phase 2 strips it away.
Understanding EOF
EOF is not a character in the file. It’s not a byte. It’s not a signal.
EOF is the return value of fgetc() when there’s nothing left to read. Its value is -1 (defined in <stdio.h> as #define EOF (-1)).
That’s why c must be int, not char:
| Type | Range | Problem |
|---|---|---|
|
-128 to 127 |
Byte |
|
0 to 255 |
Can hold all bytes, but can never equal -1. Your loop never stops. |
|
-2³¹ to 2³¹-1 |
Holds all 256 byte values (0-255) AND -1 (EOF). Correct. |
This is why C uses int for single-byte reads. It’s not obvious until you understand the byte values.
printf '\xff' > /tmp/binary_test.txt
xxd /tmp/binary_test.txt
# → 00000000: ff
# One byte: 0xFF (255 decimal)
If c were char, reading 0xFF would give you -1, which equals EOF — your program thinks the file is empty. With int, 0xFF is 255, which is NOT -1. The loop continues.
Exercises
-
[ ] Write
reader.c, compile, run against/tmp/test.txt -
[ ] Modify to hex mode (
%02x) — compare output withxxd /tmp/test.txt -
[ ] Try reading a nonexistent file — what does
perrorprint? -
[ ] Run
strace /tmp/reader— find thereadsyscall. How many bytes does it read at once? -
[ ] Create a binary file:
printf '\x00\x01\xff\xfe' > /tmp/binary.bin— read it with your hex reader. What values do you see? -
[ ] Change
int ctochar cin reader.c — what happens when reading/tmp/binary.bin? Does the loop end early?
Notes
Write your observations here.
Related
-
man 3 fopen— open modes:"r","w","a","rb" -
man 3 fgetc— returnsint, notchar -
man 3 perror— print system error messages -
Previous: Phase 0: Compiler Setup
-
Next: Syscall Tracing