Phase 1: Reading Files — fopen, fgetc, fclose

Objective

Read a file one byte at a time. Understand what fopen, fgetc, and fclose actually do. Handle errors. This is the foundation for reading any data — config files, packets, certificates, ISE exports.

The Three Functions

Function What it does Man page

fopen(path, mode)

Opens a file. Returns a FILE * pointer. Returns NULL on failure.

man 3 fopen

fgetc(fp)

Reads one byte from the file. Returns the byte as int. Returns EOF (-1) when done.

man 3 fgetc

fclose(fp)

Closes the file. Frees the resources.

man 3 fclose

These are library functions (section 3) — not syscalls. They wrap the real syscalls (open, read, close) with buffering and convenience. You’ll see the raw versions in Phase 2.

Step 1: Create a Test File

printf 'Hello from C\nSecond line\n' > /tmp/test.txt
xxd /tmp/test.txt
expected output
00000000: 4865 6c6c 6f20 6672 6f6d 2043 0a53 6563  Hello from C.Sec
00000010: 6f6e 6420 6c69 6e65 0a                     ond line.

25 bytes. Two 0a newlines — one after each line.

Step 2: Read It Byte by Byte

Create /tmp/reader.c:

#include <stdio.h>

int main(void) {
    FILE *fp = fopen("/tmp/test.txt", "r");  (1)
    if (fp == NULL) {                         (2)
        printf("Error: cannot open file\n");
        return 1;
    }

    int c;                                    (3)
    while ((c = fgetc(fp)) != EOF) {          (4)
        printf("%c", c);                      (5)
    }

    fclose(fp);                               (6)
    return 0;
}
1 fopen opens the file for reading ("r"). Returns a FILE * pointer — a handle to the open file.
2 Always check for NULL. If the file doesn’t exist or you don’t have permission, fopen returns NULL. Without this check, the program crashes.
3 int c — not char. This is critical. fgetc returns an int because it needs to return 256 possible byte values (0-255) PLUS EOF (-1). A char can’t hold all of those.
4 The idiom: (c = fgetc(fp)) != EOF. Assigns the byte to c AND checks if it’s EOF in one expression. This is the standard C pattern — you’ll see it everywhere.
5 %c prints the byte as a character. %d would print the ASCII number. %02x would print it as hex.
6 fclose — always close what you open. Frees the FILE struct and flushes any buffered data.
compile and run
gcc -Wall -g /tmp/reader.c -o /tmp/reader && /tmp/reader
expected output
Hello from C
Second line

Step 3: See the Bytes — Hex Mode

Change %c to %02x ` to see what `fgetc actually returns:

#include <stdio.h>

int main(void) {
    FILE *fp = fopen("/tmp/test.txt", "r");
    if (fp == NULL) {
        printf("Error: cannot open file\n");
        return 1;
    }

    int c;
    int count = 0;
    while ((c = fgetc(fp)) != EOF) {
        printf("%02x ", c);      (1)
        count++;
        if (count % 16 == 0)     (2)
            printf("\n");
    }
    printf("\n(%d bytes)\n", count);  (3)

    fclose(fp);
    return 0;
}
1 %02x — print as 2-digit hex, zero-padded. Same format as xxd, MAC addresses, and packet captures.
2 Newline every 16 bytes — just like xxd output.
3 Total byte count — should match wc -c /tmp/test.txt.

Save as /tmp/hexreader.c:

gcc -Wall -g /tmp/hexreader.c -o /tmp/hexreader && /tmp/hexreader
expected output
48 65 6c 6c 6f 20 66 72 6f 6d 20 43 0a 53 65 63
6f 6e 64 20 6c 69 6e 65 0a
(25 bytes)

You just built xxd — a simplified version. Same hex output you’ve been reading. Byte 0a is the newline. Byte 48 is H. Same system, you’re constructing the tool instead of using it.

Step 4: Error Handling — What If the File Doesn’t Exist?

# Try to read a file that doesn't exist
cat > /tmp/reader_error.c << 'EOF'
#include <stdio.h>

int main(void) {
    FILE *fp = fopen("/tmp/nonexistent.txt", "r");
    if (fp == NULL) {
        perror("fopen");    (1)
        return 1;
    }
    fclose(fp);
    return 0;
}
EOF

gcc -Wall -g /tmp/reader_error.c -o /tmp/reader_error && /tmp/reader_error
1 perror prints the system error message — "fopen: No such file or directory". More informative than your own error string. man 3 perror.
expected output
fopen: No such file or directory
verify the exit code
echo $?
# → 1 (error — your return 1)

This is what $? is in bash — the return value from main(). return 0 = success. return 1 = error. Every bash script you’ve written with && echo OK || echo FAIL is checking this return value.

Step 5: strace Your Reader

strace /tmp/reader 2>&1 | grep -E 'open|read|write|close'

You’ll see:

  • openat(AT_FDCWD, "/tmp/test.txt", O_RDONLY)fopen becomes openat syscall

  • read(3, "Hello from C\nSecond line\n", 4096)fgetc triggers a 4096-byte buffer read (stdio buffers!)

  • write(1, "Hello from C\nSecond line\n", 25)printf writes to stdout

  • close(3)fclose becomes close syscall

Notice: fgetc reads one byte at a time in YOUR code, but stdio reads 4096 bytes at once from the kernel and serves them to you one at a time from a buffer. That’s the abstraction layer. Phase 2 strips it away.

Understanding EOF

EOF is not a character in the file. It’s not a byte. It’s not a signal.

EOF is the return value of fgetc() when there’s nothing left to read. Its value is -1 (defined in <stdio.h> as #define EOF (-1)).

That’s why c must be int, not char:

Type Range Problem

char (signed)

-128 to 127

Byte 0xFF (255) wraps to -1 — same as EOF. Your loop stops early on binary files.

unsigned char

0 to 255

Can hold all bytes, but can never equal -1. Your loop never stops.

int

-2³¹ to 2³¹-1

Holds all 256 byte values (0-255) AND -1 (EOF). Correct.

This is why C uses int for single-byte reads. It’s not obvious until you understand the byte values.

prove it
printf '\xff' > /tmp/binary_test.txt
xxd /tmp/binary_test.txt
# → 00000000: ff
# One byte: 0xFF (255 decimal)

If c were char, reading 0xFF would give you -1, which equals EOF — your program thinks the file is empty. With int, 0xFF is 255, which is NOT -1. The loop continues.

Exercises

  1. [ ] Write reader.c, compile, run against /tmp/test.txt

  2. [ ] Modify to hex mode (%02x) — compare output with xxd /tmp/test.txt

  3. [ ] Try reading a nonexistent file — what does perror print?

  4. [ ] Run strace /tmp/reader — find the read syscall. How many bytes does it read at once?

  5. [ ] Create a binary file: printf '\x00\x01\xff\xfe' > /tmp/binary.bin — read it with your hex reader. What values do you see?

  6. [ ] Change int c to char c in reader.c — what happens when reading /tmp/binary.bin? Does the loop end early?

Notes

Write your observations here.