Chapter 10: Files and Exceptions

Source: Python Crash Course, 3rd Edition by Eric Matthes

Now that you’ve mastered the basic skills you need to write organized programs that are easy to use, it’s time to think about making your programs even more relevant and usable. In this chapter, you’ll learn to work with files so your programs can quickly analyze lots of data.

You’ll learn to handle errors so your programs don’t crash when they encounter unexpected situations. You’ll learn about exceptions, which are special objects Python creates to manage errors that arise while a program is running. You’ll also learn about the json module, which allows you to save user data so it isn’t lost when your program stops running.

Learning to work with files and save data will make your programs easier for people to use. Users will be able to choose what data to enter and when to enter it. People will be able to run your program, do some work, and then close the program and pick up where they left off. Learning to handle exceptions will help you deal with situations in which files don’t exist and deal with other problems that can cause your programs to crash. This will make your programs more robust when they encounter bad data, whether it comes from innocent mistakes or from malicious attempts to break your programs.

Reading from a File

An incredible amount of data is available in text files. Text files can contain weather data, traffic data, socioeconomic data, literary works, and more. Reading from a file is particularly useful in data analysis applications, but it’s also applicable to any situation in which you want to analyze or modify information stored in a file.

When you want to work with the information in a text file, the first step is to read the file into memory. You can then work through all of the file’s contents at once or work through the contents line by line.

Reading the Contents of a File

To begin, we need a file with a few lines of text in it. Let’s start with a file that contains pi to 30 decimal places, with 10 decimal places per line:

pi_digits.txt
3.1415926535
  8979323846
  2643383279

To try the following examples yourself, you can enter these lines in an editor and save the file as pi_digits.txt, or you can download the file from the book’s resources through ehmatthes.github.io/pcc_3e. Save the file in the same directory where you’ll store this chapter’s programs.

Here’s a program that opens this file, reads it, and prints the contents of the file to the screen:

file_reader.py
from pathlib import Path

path = Path('pi_digits.txt')  (1)
contents = path.read_text()   (2)
print(contents)
1 We build a Path object representing the file pi_digits.txt, which we assign to the variable path. Since this file is saved in the same directory as the .py file we’re writing, the filename is all that Path needs to access the file.
2 We use the read_text() method to read the entire contents of the file. The contents of the file are returned as a single string, which we assign to the variable contents.

To work with the contents of a file, we need to tell Python the path to the file. A path is the exact location of a file or folder on a system. Python provides a module called pathlib that makes it easier to work with files and directories, no matter which operating system you or your program’s users are working with. A module that provides specific functionality like this is often called a library, hence the name pathlib.

We start by importing the Path class from pathlib. There’s a lot you can do with a Path object that points to a file. For example, you can check that the file exists before working with it, read the file’s contents, or write new data to the file.

When we print the value of contents, we see the entire contents of the text file:

3.1415926535
  8979323846
  2643383279

The only difference between this output and the original file is the extra blank line at the end of the output. The blank line appears because read_text() returns an empty string when it reaches the end of the file; this empty string shows up as a blank line.

We can remove the extra blank line by using rstrip() on the contents string:

from pathlib import Path

path = Path('pi_digits.txt')
contents = path.read_text()
contents = contents.rstrip()
print(contents)

Recall from Chapter 2 that Python’s rstrip() method removes, or strips, any whitespace characters from the right side of a string. Now the output matches the contents of the original file exactly:

3.1415926535
  8979323846
  2643383279

We can strip the trailing newline character when we read the contents of the file, by applying the rstrip() method immediately after calling read_text():

contents = path.read_text().rstrip()

This line tells Python to call the read_text() method on the file we’re working with. Then it applies the rstrip() method to the string that read_text() returns. The cleaned-up string is then assigned to the variable contents. This approach is called method chaining, and you’ll see it used often in programming.

VS Code looks for files in the folder that was most recently opened. If you’re using VS Code, start by opening the folder where you’re storing this chapter’s programs. For example, if you’re saving your program files in a folder called chapter_10, press Ctrl+O (Cmd+O on macOS), and open that folder.

Relative and Absolute File Paths

When you pass a simple filename like pi_digits.txt to Path, Python looks in the directory where the file that’s currently being executed (that is, your .py program file) is stored.

Sometimes, depending on how you organize your work, the file you want to open won’t be in the same directory as your program file. For example, you might store your program files in a folder called python_work; inside python_work, you might have another folder called text_files to distinguish your program files from the text files they’re manipulating. Even though text_files is in python_work, just passing Path the name of a file in text_files won’t work, because Python will only look in python_work and stop there. To get Python to open files from a directory other than the one where your program file is stored, you need to provide the correct path.

There are two main ways to specify paths in programming. A relative file path tells Python to look for a given location relative to the directory where the currently running program file is stored. Here’s how to build this path:

path = Path('text_files/filename.txt')

You can also tell Python exactly where the file is on your computer, regardless of where the program that’s being executed is stored. This is called an absolute file path. Absolute paths are usually longer than relative paths, because they start at your system’s root folder:

path = Path('/home/eric/data_files/text_files/filename.txt')

Using absolute paths, you can read files from any location on your system. For now it’s easiest to store files in the same directory as your program files, or in a folder such as text_files within the directory that stores your program files.

Windows systems use a backslash (\) instead of a forward slash (/) when displaying file paths, but you should use forward slashes in your code, even on Windows. The pathlib library will automatically use the correct representation of the path when it interacts with your system, or any user’s system.

Accessing a File’s Lines

When you’re working with a file, you’ll often want to examine each line of the file. You might be looking for certain information in the file, or you might want to modify the text in the file in some way.

You can use the splitlines() method to turn a long string into a set of lines, and then use a for loop to examine each line from a file, one at a time:

file_reader.py
from pathlib import Path

path = Path('pi_digits.txt')
contents = path.read_text()         (1)

lines = contents.splitlines()       (2)
for line in lines:
    print(line)
1 We start out by reading the entire contents of the file. If you’re planning to work with the individual lines in a file, you don’t need to strip any whitespace when reading the file.
2 The splitlines() method returns a list of all lines in the file, and we assign this list to the variable lines. We then loop over these lines and print each one.

Since we haven’t modified any of the lines, the output matches the original text file exactly:

3.1415926535
  8979323846
  2643383279

Working with a File’s Contents

After you’ve read the contents of a file into memory, you can do whatever you want with that data. Let’s briefly explore the digits of pi. First, we’ll attempt to build a single string containing all the digits in the file with no whitespace in it:

pi_string.py
from pathlib import Path

path = Path('pi_digits.txt')
contents = path.read_text()

lines = contents.splitlines()
pi_string = ''
for line in lines:     (1)
    pi_string += line

print(pi_string)
print(len(pi_string))
1 We write a loop that adds each line of digits to pi_string.
3.1415926535  8979323846  2643383279
36

The variable pi_string contains the whitespace that was on the left side of the digits in each line, but we can get rid of that by using lstrip() on each line:

# --snip--
for line in lines:
    pi_string += line.lstrip()

print(pi_string)
print(len(pi_string))

Now we have a string containing pi to 30 decimal places. The string is 32 characters long because it also includes the leading 3 and a decimal point:

3.141592653589793238462643383279
32

When Python reads from a text file, it interprets all text in the file as a string. If you read in a number and want to work with that value in a numerical context, you’ll have to convert it to an integer using the int() function or a float using the float() function.

Large Files: One Million Digits

So far, we’ve focused on analyzing a text file that contains only three lines, but the code in these examples would work just as well on much larger files. If we start with a text file that contains pi to 1,000,000 decimal places, instead of just 30, we can create a single string containing all these digits. We don’t need to change our program at all, except to pass it a different file. We’ll also print just the first 50 decimal places, so we don’t have to watch a million digits scroll by in the terminal:

pi_string.py
from pathlib import Path

path = Path('pi_million_digits.txt')
contents = path.read_text()

lines = contents.splitlines()
pi_string = ''
for line in lines:
    pi_string += line.lstrip()

print(f"{pi_string[:52]}...")
print(len(pi_string))

The output shows that we do indeed have a string containing pi to 1,000,000 decimal places:

3.14159265358979323846264338327950288419716939937510...
1000002

Python has no inherent limit to how much data you can work with; you can work with as much data as your system’s memory can handle.

To run this program (and many of the examples that follow), you’ll need to download the resources available at ehmatthes.github.io/pcc_3e.

Is Your Birthday Contained in Pi?

Let’s use the program we just wrote to find out if someone’s birthday appears anywhere in the first million digits of pi. We can do this by expressing each birthday as a string of digits and seeing if that string appears anywhere in pi_string:

pi_birthday.py
# --snip--
for line in lines:
    pi_string += line.strip()

birthday = input("Enter your birthday, in the form mmddyy: ")
if birthday in pi_string:
    print("Your birthday appears in the first million digits of pi!")
else:
    print("Your birthday does not appear in the first million digits of pi.")

We first prompt for the user’s birthday, and then check if that string is in pi_string. Let’s try it:

Enter your birthday, in the form mmddyy: 120372
Your birthday appears in the first million digits of pi!

Once you’ve read from a file, you can analyze its contents in just about any way you can imagine.

Try It Yourself

10-1. Learning Python: Open a blank file in your text editor and write a few lines summarizing what you’ve learned about Python so far. Start each line with the phrase In Python you can…​. Save the file as learning_python.txt in the same directory as your exercises from this chapter. Write a program that reads the file and prints what you wrote two times: print the contents once by reading in the entire file, and once by storing the lines in a list and then looping over each line.

10-2. Learning C: You can use the replace() method to replace any word in a string with a different word. Here’s a quick example showing how to replace 'dog' with 'cat' in a sentence:

>>> message = "I really like dogs."
>>> message.replace('dog', 'cat')
'I really like cats.'

Read in each line from the file you just created, learning_python.txt, and replace the word Python with the name of another language, such as C. Print each modified line to the screen.

10-3. Simpler Code: The program file_reader.py in this section uses a temporary variable, lines, to show how splitlines() works. You can skip the temporary variable and loop directly over the list that splitlines() returns:

for line in contents.splitlines():

Remove the temporary variable from each of the programs in this section, to make them more concise.

Writing to a File

One of the simplest ways to save data is to write it to a file. When you write text to a file, the output will still be available after you close the terminal containing your program’s output. You can examine output after a program finishes running, and you can share the output files with others as well. You can also write programs that read the text back into memory and work with it again later.

Writing a Single Line

Once you have a path defined, you can write to a file using the write_text() method. To see how this works, let’s write a simple message and store it in a file instead of printing it to the screen:

write_message.py
from pathlib import Path

path = Path('programming.txt')
path.write_text("I love programming.")

The write_text() method takes a single argument: the string that you want to write to the file. This program has no terminal output, but if you open the file programming.txt, you’ll see one line:

programming.txt
I love programming.

This file behaves like any other file on your computer. You can open it, write new text in it, copy from it, paste to it, and so forth.

Python can only write strings to a text file. If you want to store numerical data in a text file, you’ll have to convert the data to string format first using the str() function.

Writing Multiple Lines

The write_text() method does a few things behind the scenes. If the file that path points to doesn’t exist, it creates that file. Also, after writing the string to the file, it makes sure the file is closed properly. Files that aren’t closed properly can lead to missing or corrupted data.

To write more than one line to a file, you need to build a string containing the entire contents of the file, and then call write_text() with that string. Let’s write several lines to the programming.txt file:

from pathlib import Path

contents = "I love programming.\n"
contents += "I love creating new games.\n"
contents += "I also love working with data.\n"

path = Path('programming.txt')
path.write_text(contents)

We define a variable called contents that will hold the entire contents of the file. On the next line, we use the += operator to add to this string. We include newline characters at the end of each line, to make sure each statement appears on its own line.

If you run this and then open programming.txt, you’ll see each of these lines in the text file:

I love programming.
I love creating new games.
I also love working with data.

Be careful when calling write_text() on a path object. If the file already exists, write_text() will erase the current contents of the file and write new contents to the file. Later in this chapter, you’ll learn to check whether a file exists using pathlib.

Try It Yourself

10-4. Guest: Write a program that prompts the user for their name. When they respond, write their name to a file called guest.txt.

10-5. Guest Book: Write a while loop that prompts users for their name. Collect all the names that are entered, and then write these names to a file called guest_book.txt. Make sure each entry appears on a new line in the file.

Exceptions

Python uses special objects called exceptions to manage errors that arise during a program’s execution. Whenever an error occurs that makes Python unsure of what to do next, it creates an exception object. If you write code that handles the exception, the program will continue running. If you don’t handle the exception, the program will halt and show a traceback, which includes a report of the exception that was raised.

Exceptions are handled with try-except blocks. A try-except block asks Python to do something, but it also tells Python what to do if an exception is raised. When you use try-except blocks, your programs will continue running even if things start to go wrong. Instead of tracebacks, which can be confusing for users to read, users will see friendly error messages that you’ve written.

Handling the ZeroDivisionError Exception

Let’s look at a simple error that causes Python to raise an exception. You probably know that it’s impossible to divide a number by zero, but let’s ask Python to do it anyway:

division_calculator.py
print(5/0)

Python can’t do this, so we get a traceback:

Traceback (most recent call last):
  File "division_calculator.py", line 1, in <module>
    print(5/0)
          ~^~
ZeroDivisionError: division by zero  (1)
1 The error reported in the traceback, ZeroDivisionError, is an exception object. Python creates this kind of object in response to a situation where it can’t do what we ask it to.

When this happens, Python stops the program and tells us the kind of exception that was raised. We can use this information to modify our program. We’ll tell Python what to do when this kind of exception occurs; that way, if it happens again, we’ll be prepared.

Using try-except Blocks

When you think an error may occur, you can write a try-except block to handle the exception that might be raised. You tell Python to try running some code, and you tell it what to do if the code results in a particular kind of exception.

Here’s what a try-except block for handling the ZeroDivisionError exception looks like:

try:
    print(5/0)
except ZeroDivisionError:
    print("You can't divide by zero!")

We put print(5/0), the line that caused the error, inside a try block. If the code in a try block works, Python skips over the except block. If the code in the try block causes an error, Python looks for an except block whose error matches the one that was raised, and runs the code in that block.

In this example, the code in the try block produces a ZeroDivisionError, so Python looks for an except block telling it how to respond. Python then runs the code in that block, and the user sees a friendly error message instead of a traceback:

You can't divide by zero!

If more code followed the try-except block, the program would continue running because we told Python how to handle the error. Let’s look at an example where catching an error can allow a program to continue running.

Using Exceptions to Prevent Crashes

Handling errors correctly is especially important when the program has more work to do after the error occurs. This happens often in programs that prompt users for input. If the program responds to invalid input appropriately, it can prompt for more valid input instead of crashing.

Let’s create a simple calculator that does only division:

division_calculator.py
print("Give me two numbers, and I'll divide them.")
print("Enter 'q' to quit.")

while True:
    first_number = input("\nFirst number: ")  (1)
    if first_number == 'q':
        break
    second_number = input("Second number: ")   (2)
    if second_number == 'q':
        break
    answer = int(first_number) / int(second_number)  (3)
    print(answer)
1 This program prompts the user to input a first_number.
2 If the user does not enter q to quit, they input a second_number.
3 We then divide these two numbers to get an answer.

This program does nothing to handle errors, so asking it to divide by zero causes it to crash:

Give me two numbers, and I'll divide them.
Enter 'q' to quit.

First number: 5
Second number: 0
Traceback (most recent call last):
  File "division_calculator.py", line 11, in <module>
    answer = int(first_number) / int(second_number)
             ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
ZeroDivisionError: division by zero

It’s bad that the program crashed, but it’s also not a good idea to let users see tracebacks. Nontechnical users will be confused by them, and in a malicious setting, attackers will learn more than you want them to. For example, they’ll know the name of your program file, and they’ll see a part of your code that isn’t working properly.

The else Block

We can make this program more error resistant by wrapping the line that might produce errors in a try-except block. The error occurs on the line that performs the division, so that’s where we’ll put the try-except block. This example also includes an else block. Any code that depends on the try block executing successfully goes in the else block:

# --snip--
while True:
    # --snip--
    if second_number == 'q':
        break
    try:                                              (1)
        answer = int(first_number) / int(second_number)
    except ZeroDivisionError:                        (2)
        print("You can't divide by 0!")
    else:                                            (3)
        print(answer)
1 We ask Python to try to complete the division operation in a try block, which includes only the code that might cause an error.
2 The except block tells Python how to respond when a ZeroDivisionError arises.
3 If the division operation is successful, we use the else block to print the result. Any code that depends on the try block succeeding is added to the else block.

The program continues to run, and the user never sees a traceback:

Give me two numbers, and I'll divide them.
Enter 'q' to quit.

First number: 5
Second number: 0
You can't divide by 0!

First number: 5
Second number: 2
2.5

First number: q

The only code that should go in a try block is code that might cause an exception to be raised. Sometimes you’ll have additional code that should run only if the try block was successful; this code goes in the else block. The except block tells Python what to do in case a certain exception arises when it tries to run the code in the try block.

By anticipating likely sources of errors, you can write robust programs that continue to run even when they encounter invalid data and missing resources.

Handling the FileNotFoundError Exception

One common issue when working with files is handling missing files. The file you’re looking for might be in a different location, the filename might be misspelled, or the file might not exist at all. You can handle all of these situations with a try-except block.

Let’s try to read a file that doesn’t exist. The following program tries to read in the contents of Alice in Wonderland, but the file alice.txt is not saved in the same directory as alice.py:

alice.py
from pathlib import Path

path = Path('alice.txt')
contents = path.read_text(encoding='utf-8')

Note that we’re using read_text() in a slightly different way here than what you saw earlier. The encoding argument is needed when your system’s default encoding doesn’t match the encoding of the file that’s being read. This is most likely to happen when reading from a file that wasn’t created on your system.

Python can’t read from a missing file, so it raises an exception:

Traceback (most recent call last):
  File "alice.py", line 4, in <module>         (1)
    contents = path.read_text(encoding='utf-8') (2)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../pathlib.py", line 1056, in read_text
    ...
FileNotFoundError: [Errno 2] No such file or directory: 'alice.txt'  (3)
1 Looking near the beginning of the traceback, we can see that the error occurred at line 4 in the file alice.py.
2 The next line shows the line of code that caused the error.
3 On the last line, we can see that a FileNotFoundError exception was raised. This is important because it tells us what kind of exception to use in the except block that we’ll write.

It’s often best to start at the very end of a traceback. To handle the error that’s being raised, the try block will begin with the line that was identified as problematic in the traceback β€” the line that contains read_text():

from pathlib import Path

path = Path('alice.txt')
try:
    contents = path.read_text(encoding='utf-8')
except FileNotFoundError:                          (1)
    print(f"Sorry, the file {path} does not exist.")
1 We write an except block that matches the FileNotFoundError. Python then runs the code in that block when the file can’t be found, and the result is a friendly error message instead of a traceback.
Sorry, the file alice.txt does not exist.

Analyzing Text

You can analyze text files containing entire books. Many classic works of literature are available as simple text files because they are in the public domain. The texts used in this section come from Project Gutenberg (gutenberg.org). Project Gutenberg maintains a collection of literary works that are available in the public domain, and it’s a great resource if you’re interested in working with literary texts in your programming projects.

Let’s pull in the text of Alice in Wonderland and try to count the number of words in the text. To do this, we’ll use the string method split(), which by default splits a string wherever it finds any whitespace:

from pathlib import Path

path = Path('alice.txt')
try:
    contents = path.read_text(encoding='utf-8')
except FileNotFoundError:
    print(f"Sorry, the file {path} does not exist.")
else:
    # Count the approximate number of words in the file.
    words = contents.split()      (1)
    num_words = len(words)        (2)
    print(f"The file {path} has about {num_words} words.")
1 We take the string contents and use split() to produce a list of all the words in the book.
2 Using len() on this list gives us a good approximation of the number of words in the original text.

The output tells us how many words are in alice.txt:

The file alice.txt has about 29594 words.

Working with Multiple Files

Let’s add more books to analyze, but before we do, let’s move the bulk of this program to a function called count_words(). This will make it easier to run the analysis for multiple books:

word_count.py
from pathlib import Path

def count_words(path):
    """Count the approximate number of words in a file."""  (1)
    try:
        contents = path.read_text(encoding='utf-8')
    except FileNotFoundError:
        print(f"Sorry, the file {path} does not exist.")
    else:
        # Count the approximate number of words in the file.
        words = contents.split()
        num_words = len(words)
        print(f"The file {path} has about {num_words} words.")

path = Path('alice.txt')
count_words(path)
1 The code has only been indented and moved into the body of count_words(). The comment has also been changed to a docstring.

Now we can write a short loop to count the words in any text we want to analyze. We’ll try to count the words for Alice in Wonderland, Siddhartha, Moby Dick, and Little Women, which are all available in the public domain. I’ve intentionally left siddhartha.txt out of the directory containing word_count.py, so we can see how well our program handles a missing file:

from pathlib import Path

def count_words(filename):
    # --snip--

filenames = ['alice.txt', 'siddhartha.txt', 'moby_dick.txt', 'little_women.txt']
for filename in filenames:
    path = Path(filename)  (1)
    count_words(path)
1 The names of the files are stored as simple strings. Each string is then converted to a Path object before the call to count_words().

The missing siddhartha.txt file has no effect on the rest of the program’s execution:

The file alice.txt has about 29594 words.
Sorry, the file siddhartha.txt does not exist.
The file moby_dick.txt has about 215864 words.
The file little_women.txt has about 189142 words.

Using the try-except block in this example provides two significant advantages. We prevent our users from seeing a traceback, and we let the program continue analyzing the texts it’s able to find. If we don’t catch the FileNotFoundError that siddhartha.txt raises, the user would see a full traceback, and the program would stop running after trying to analyze Siddhartha. It would never analyze Moby Dick or Little Women.

Failing Silently

In the previous example, we informed our users that one of the files was unavailable. But you don’t need to report every exception you catch. Sometimes, you’ll want the program to fail silently when an exception occurs and continue on as if nothing happened. To make a program fail silently, you write a try block as usual, but you explicitly tell Python to do nothing in the except block. Python has a pass statement that tells it to do nothing in a block:

def count_words(path):
    """Count the approximate number of words in a file."""
    try:
        # --snip--
    except FileNotFoundError:
        pass
    else:
        # --snip--

The only difference between this listing and the previous one is the pass statement in the except block. Now when a FileNotFoundError is raised, the code in the except block runs, but nothing happens. No traceback is produced, and there’s no output in response to the error that was raised. Users see the word counts for each file that exists, but they don’t see any indication that a file wasn’t found:

The file alice.txt has about 29594 words.
The file moby_dick.txt has about 215864 words.
The file little_women.txt has about 189142 words.

The pass statement also acts as a placeholder. It’s a reminder that you’re choosing to do nothing at a specific point in your program’s execution and that you might want to do something there later. For example, in this program we might decide to write any missing filenames to a file called missing_files.txt.

Deciding Which Errors to Report

How do you know when to report an error to your users and when to let your program fail silently? If users know which texts are supposed to be analyzed, they might appreciate a message informing them why some texts were not analyzed. If users expect to see some results but don’t know which books are supposed to be analyzed, they might not need to know that some texts were unavailable. Giving users information they aren’t looking for can decrease the usability of your program. Python’s error-handling structures give you fine-grained control over how much to share with users when things go wrong; it’s up to you to decide how much information to share.

Well-written, properly tested code is not very prone to internal errors, such as syntax or logical errors. But every time your program depends on something external such as user input, the existence of a file, or the availability of a network connection, there is a possibility of an exception being raised. A little experience will help you know where to include exception-handling blocks in your program and how much to report to users about errors that arise.

Try It Yourself

10-6. Addition: One common problem when prompting for numerical input occurs when people provide text instead of numbers. When you try to convert the input to an int, you’ll get a ValueError. Write a program that prompts for two numbers. Add them together and print the result. Catch the ValueError if either input value is not a number, and print a friendly error message. Test your program by entering two numbers and then by entering some text instead of a number.

10-7. Addition Calculator: Wrap your code from Exercise 10-6 in a while loop so the user can continue entering numbers, even if they make a mistake and enter text instead of a number.

10-8. Cats and Dogs: Make two files, cats.txt and dogs.txt. Store at least three names of cats in the first file and three names of dogs in the second file. Write a program that tries to read these files and print the contents of the file to the screen. Wrap your code in a try-except block to catch the FileNotFoundError, and print a friendly message if a file is missing. Move one of the files to a different location on your system, and make sure the code in the except block executes properly.

10-9. Silent Cats and Dogs: Modify your except block in Exercise 10-8 to fail silently if either file is missing.

10-10. Common Words: Visit Project Gutenberg (gutenberg.org) and find a few texts you’d like to analyze. Download the text files for these works, or copy the raw text from your browser into a text file on your computer. You can use the count() method to find out how many times a word or phrase appears in a string. For example, the following code counts the number of times 'row' appears in a string:

>>> line = "Row, row, row your boat"
>>> line.count('row')
2
>>> line.lower().count('row')
3

Notice that converting the string to lowercase using lower() catches all appearances of the word you’re looking for, regardless of how it’s formatted. Write a program that reads the files you found at Project Gutenberg and determines how many times the word 'the' appears in each text. Try counting ' the ' with a space in the string, and see how much lower your count is.

Storing Data

Many of your programs will ask users to input certain kinds of information. You might allow users to store preferences in a game or provide data for a visualization. Whatever the focus of your program is, you’ll store the information users provide in data structures such as lists and dictionaries. When users close a program, you’ll almost always want to save the information they entered. A simple way to do this involves storing your data using the json module.

The json module allows you to convert simple Python data structures into JSON-formatted strings, and then load the data from that file the next time the program runs. You can also use json to share data between different Python programs. Even better, the JSON data format is not specific to Python, so you can share data you store in the JSON format with people who work in many other programming languages. It’s a useful and portable format, and it’s easy to learn.

The JSON (JavaScript Object Notation) format was originally developed for JavaScript. However, it has since become a common format used by many languages, including Python.

Using json.dumps() and json.loads()

Let’s write a short program that stores a set of numbers and another program that reads these numbers back into memory. The first program will use json.dumps() to store the set of numbers, and the second program will use json.loads().

The json.dumps() function takes one argument: a piece of data that should be converted to the JSON format. The function returns a string, which we can then write to a data file:

number_writer.py
from pathlib import Path
import json

numbers = [2, 3, 5, 7, 11, 13]

path = Path('numbers.json')   (1)
contents = json.dumps(numbers)  (2)
path.write_text(contents)
1 We choose a filename in which to store the list of numbers. It’s customary to use the file extension .json to indicate that the data in the file is stored in the JSON format.
2 We use json.dumps() to generate a string containing the JSON representation of the data we’re working with. Once we have this string, we write it to the file using write_text().

This program has no output, but if you open the file numbers.json, you’ll see the data is stored in a format that looks just like Python:

[2, 3, 5, 7, 11, 13]

Now we’ll write a separate program that uses json.loads() to read the list back into memory:

number_reader.py
from pathlib import Path
import json

path = Path('numbers.json')   (1)
contents = path.read_text()   (2)
numbers = json.loads(contents)  (3)

print(numbers)
1 We make sure to read from the same file we wrote to.
2 Since the data file is just a text file with specific formatting, we can read it with the read_text() method.
3 We then pass the contents of the file to json.loads(). This function takes in a JSON-formatted string and returns a Python object (in this case, a list), which we assign to numbers.
[2, 3, 5, 7, 11, 13]

This is a simple way to share data between two programs.

Saving and Reading User-Generated Data

Saving data with json is useful when you’re working with user-generated data, because if you don’t store your user’s information somehow, you’ll lose it when the program stops running. Let’s look at an example where we prompt the user for their name the first time they run a program and then remember their name when they run the program again.

Let’s start by storing the user’s name:

remember_me.py
from pathlib import Path
import json

username = input("What is your name? ")  (1)

path = Path('username.json')             (2)
contents = json.dumps(username)
path.write_text(contents)

print(f"We'll remember you when you come back, {username}!")  (3)
1 We first prompt for a username to store.
2 We write the data we just collected to a file called username.json.
3 We print a message informing the user that we’ve stored their information.

Now let’s write a new program that greets a user whose name has already been stored:

greet_user.py
from pathlib import Path
import json

path = Path('username.json')
contents = path.read_text()      (1)
username = json.loads(contents)  (2)

print(f"Welcome back, {username}!")
1 We read the contents of the data file.
2 We use json.loads() to assign the recovered data to the variable username. Since we’ve recovered the username, we can welcome the user back with a personalized greeting.

We need to combine these two programs into one file. When someone runs remember_me.py, we want to retrieve their username from memory if possible; if not, we’ll prompt for a username and store it in username.json for next time. We’ll use a handy method from the pathlib module:

remember_me.py
from pathlib import Path
import json

path = Path('username.json')
if path.exists():                   (1)
    contents = path.read_text()
    username = json.loads(contents)
    print(f"Welcome back, {username}!")
else:                               (2)
    username = input("What is your name? ")
    contents = json.dumps(username)
    path.write_text(contents)
    print(f"We'll remember you when you come back, {username}!")
1 The exists() method returns True if a file or folder exists and False if it doesn’t. Here we use path.exists() to find out if a username has already been stored. If username.json exists, we load the username and print a personalized greeting to the user.
2 If the file username.json doesn’t exist, we prompt for a username and store the value that the user enters.

If this is the first time the program runs, this is the output:

What is your name? Eric
We'll remember you when you come back, Eric!

Otherwise:

Welcome back, Eric!

Refactoring

Often, you’ll come to a point where your code will work, but you’ll recognize that you could improve the code by breaking it up into a series of functions that have specific jobs. This process is called refactoring. Refactoring makes your code cleaner, easier to understand, and easier to extend.

We can refactor remember_me.py by moving the bulk of its logic into one or more functions. The focus of remember_me.py is on greeting the user, so let’s move all of our existing code into a function called greet_user():

remember_me.py
from pathlib import Path
import json

def greet_user():
    """Greet the user by name."""  (1)
    path = Path('username.json')
    if path.exists():
        contents = path.read_text()
        username = json.loads(contents)
        print(f"Welcome back, {username}!")
    else:
        username = input("What is your name? ")
        contents = json.dumps(username)
        path.write_text(contents)
        print(f"We'll remember you when you come back, {username}!")

greet_user()
1 Because we’re using a function now, we rewrite the comments as a docstring that reflects how the program currently works.

The function greet_user() is doing more than just greeting the user β€” it’s also retrieving a stored username if one exists and prompting for a new username if one doesn’t. Let’s refactor greet_user() so it’s not doing so many different tasks. We’ll start by moving the code for retrieving a stored username to a separate function:

from pathlib import Path
import json

def get_stored_username(path):
    """Get stored username if available."""  (1)
    if path.exists():
        contents = path.read_text()
        username = json.loads(contents)
        return username
    else:
        return None                          (2)

def greet_user():
    """Greet the user by name."""
    path = Path('username.json')
    username = get_stored_username(path)
    if username:                             (3)
        print(f"Welcome back, {username}!")
    else:
        username = input("What is your name? ")
        contents = json.dumps(username)
        path.write_text(contents)
        print(f"We'll remember you when you come back, {username}!")

greet_user()
1 The new function get_stored_username() has a clear purpose, as stated in the docstring. This function retrieves a stored username and returns the username if it finds one.
2 If the path that’s passed to get_stored_username() doesn’t exist, the function returns None. This is good practice: a function should either return the value you’re expecting, or it should return None.
3 We print a welcome back message to the user if the attempt to retrieve a username is successful, and if it isn’t, we prompt for a new username.

We should factor one more block of code out of greet_user(). If the username doesn’t exist, we should move the code that prompts for a new username to a function dedicated to that purpose:

from pathlib import Path
import json

def get_stored_username(path):
    """Get stored username if available."""
    # --snip--

def get_new_username(path):
    """Prompt for a new username."""
    username = input("What is your name? ")
    contents = json.dumps(username)
    path.write_text(contents)
    return username

def greet_user():
    """Greet the user by name."""
    path = Path('username.json')
    username = get_stored_username(path)  (1)
    if username:
        print(f"Welcome back, {username}!")
    else:
        username = get_new_username(path)  (2)
        print(f"We'll remember you when you come back, {username}!")

greet_user()
1 We call get_stored_username(), which is responsible only for retrieving a stored username if one exists.
2 If necessary, greet_user() calls get_new_username(), which is responsible only for getting a new username and storing it.

Each function in this final version of remember_me.py has a single, clear purpose. We call greet_user(), and that function prints an appropriate message: it either welcomes back an existing user or greets a new user. This compartmentalization of work is an essential part of writing clear code that will be easy to maintain and extend.

Try It Yourself

10-11. Favorite Number: Write a program that prompts for the user’s favorite number. Use json.dumps() to store this number in a file. Write a separate program that reads in this value and prints the message I know your favorite number! It’s _.

10-12. Favorite Number Remembered: Combine the two programs you wrote in Exercise 10-11 into one file. If the number is already stored, report the favorite number to the user. If not, prompt for the user’s favorite number and store it in a file. Run the program twice to see that it works.

10-13. User Dictionary: The remember_me.py example only stores one piece of information, the username. Expand this example by asking for two more pieces of information about the user, then store all the information you collect in a dictionary. Write this dictionary to a file using json.dumps(), and read it back in using json.loads(). Print a summary showing exactly what your program remembers about the user.

10-14. Verify User: The final listing for remember_me.py assumes either that the user has already entered their username or that the program is running for the first time. We should modify it in case the current user is not the person who last used the program. Before printing a welcome back message in greet_user(), ask the user if this is the correct username. If it’s not, call get_new_username() to get the correct username.

Summary

In this chapter, you learned how to work with files. You learned to read the entire contents of a file, and then work through the contents one line at a time if you need to. You learned to write as much text as you want to a file. You also read about exceptions and how to handle the exceptions you’re likely to see in your programs. Finally, you learned how to store Python data structures so you can save information your users provide, preventing them from having to start over each time they run a program.

In Chapter 11, you’ll learn efficient ways to test your code. This will help you trust that the code you develop is correct, and it will help you identify bugs that are introduced as you continue to build on the programs you’ve written.

Applied Exercises: Ch 10 β€” Files and Exceptions

These exercises cover the same concepts as the chapter but use context from real infrastructure, network security, and language learning work. Save each as a separate .py file using lowercase and underscores, e.g. ise_log_reader.py.

Domus Digitalis / Homelab

D10-1. Node Log Reader: Create a text file called node_events.txt with at least five lines describing simulated node events (e.g., kvm-01: disk usage at 82%). Write a program that reads the file using pathlib and read_text(), strips trailing whitespace, and prints each line.

D10-2. VLAN Config Writer: Write a program that builds a multiline string representing a VLAN configuration (at least four VLANs with IDs and names). Write the string to a file called vlan_config.txt using write_text(). Open the file and confirm the output.

D10-3. Service Status Parser: Create a file service_status.txt with at least six lines in the format <service>: <status>. Write a program that reads the file line by line using splitlines(). For each line, split on : ` and print a formatted message like `Service wazuh is active.

D10-4. BGP Peer State Store: Write a program that prompts the user to enter a BGP peer name and its state (up or down). Store the data as a dictionary in a JSON file called bgp_peers.json using json.dumps() and write_text(). Write a second program that reads bgp_peers.json and prints the peer state.

D10-5. Missing Config Handler: Write a program that tries to read a file called domus_config.json. If the file doesn’t exist, catch the FileNotFoundError and print a friendly message. If the file does exist, read and print its contents. Test both branches.

CHLA / ISE / Network Security

C10-1. ISE Log Reader: Create a text file called ise_events.txt with at least five simulated ISE syslog entries (e.g., 5200: Authentication succeeded for user jdoe). Write a program that reads the file, strips whitespace, and prints each log entry using splitlines().

C10-2. Policy Config Writer: Write a program that builds a multiline string representing at least three ISE policy set configurations (name, protocol, result per line). Write the string to policy_config.txt using write_text(). Confirm the output by reading the file back and printing it.

C10-3. Syslog Severity Counter: Create a file syslog_entries.txt with at least 10 simulated syslog lines, some containing CRITICAL and some containing INFO. Write a program that reads the file and counts how many times each word appears using count(). Print the results.

C10-4. Endpoint Blacklist Store: Write a program that stores a list of blocked MAC addresses as a JSON file called blacklist.json using json.dumps() and write_text(). Write a second program that reads the file back using json.loads() and checks whether a user-entered MAC address is in the blacklist. Print an appropriate message.

C10-5. Multi-Source Log Analyzer: Create two log files: ise_logs.txt and ftd_logs.txt. Write a count_events(path) function that reads each file and counts the number of lines. Use try-except with FileNotFoundError and pass so the program continues silently if a file is missing. Intentionally remove one file and confirm the program still runs.

General Sysadmin / Linux

L10-1. Service Log Reader: Create a text file called services.txt with at least five lines in the format <service>: <state>. Write a program that reads the file using pathlib, splits it into lines, and prints each service and state with a formatted message.

L10-2. Backup Manifest Writer: Write a program that builds a multiline string listing at least five file paths to be backed up. Write the string to backup_manifest.txt using write_text(). Read the file back and print each path.

L10-3. Error Log Word Counter: Create a file error_log.txt with at least 10 lines containing simulated error messages. Write a program that reads the file and counts how many times the word error appears (use lower() to normalize). Print the count.

L10-4. Package State Store: Write a program that prompts the user to enter package names and their installed state (installed or missing). Store the data as a dictionary in packages.json using json.dumps(). Write a second program that reads packages.json using json.loads() and prints a summary of all packages and their states.

L10-5. Refactored Config Manager: Write a program modeled on remember_me.py that stores a hostname and IP address in a JSON file. Refactor the logic into three functions: get_stored_config(path), get_new_config(path), and greet_operator(). The main function should check whether config exists and either welcome the operator back or prompt for new config.

Spanish / DELE C2

E10-1. Vocabulary File Reader: Create a text file called vocabulario.txt with at least six lines in the format <palabra>: <definicion>. Write a program that reads the file, splits it into lines, and prints each entry formatted as Palabra: <word> β€” DefiniciΓ³n: <def>.

E10-2. Chapter Notes Writer: Write a program that builds a multiline string with study notes for at least three Don Quijote chapters (one line per chapter with number and topic). Write the string to notas_capitulos.txt using write_text(). Read the file back and print each line.

E10-3. Vocabulary Counter: Create a text file with at least 15 lines of Spanish vocabulary words. Write a program that reads the file and counts how many lines contain a specific letter combination (e.g., ue for diphthong practice) using count(). Print the result.

E10-4. Progress Store: Write a program that prompts the user to enter their current DELE level and last chapter read. Store the data as a dictionary in progreso.json using json.dumps(). Write a second program that reads progreso.json using json.loads() and prints a personalized progress summary.

E10-5. Refactored Study Tracker: Write a program modeled on the refactored remember_me.py that stores a learner’s name and current study goal in a JSON file. Refactor the logic into three functions: get_stored_progress(path), get_new_progress(path), and greet_learner(). The main function should check whether a progress file exists and either welcome the learner back (printing their stored goal) or prompt for new information.