Chapter 10: Files and Exceptions
|
Source: Python Crash Course, 3rd Edition by Eric Matthes |
Now that you’ve mastered the basic skills you need to write organized programs that are easy to use, it’s time to think about making your programs even more relevant and usable. In this chapter, you’ll learn to work with files so your programs can quickly analyze lots of data.
You’ll learn to handle errors so your programs don’t crash when they
encounter unexpected situations. You’ll learn about exceptions, which
are special objects Python creates to manage errors that arise while a
program is running. You’ll also learn about the json module, which
allows you to save user data so it isn’t lost when your program stops
running.
Learning to work with files and save data will make your programs easier for people to use. Users will be able to choose what data to enter and when to enter it. People will be able to run your program, do some work, and then close the program and pick up where they left off. Learning to handle exceptions will help you deal with situations in which files don’t exist and deal with other problems that can cause your programs to crash. This will make your programs more robust when they encounter bad data, whether it comes from innocent mistakes or from malicious attempts to break your programs.
Reading from a File
An incredible amount of data is available in text files. Text files can contain weather data, traffic data, socioeconomic data, literary works, and more. Reading from a file is particularly useful in data analysis applications, but it’s also applicable to any situation in which you want to analyze or modify information stored in a file.
When you want to work with the information in a text file, the first step is to read the file into memory. You can then work through all of the file’s contents at once or work through the contents line by line.
Reading the Contents of a File
To begin, we need a file with a few lines of text in it. Let’s start with a file that contains pi to 30 decimal places, with 10 decimal places per line:
3.1415926535
8979323846
2643383279
To try the following examples yourself, you can enter these lines in an
editor and save the file as pi_digits.txt, or you can download the
file from the book’s resources through
ehmatthes.github.io/pcc_3e. Save the file in the same directory
where you’ll store this chapter’s programs.
Here’s a program that opens this file, reads it, and prints the contents of the file to the screen:
from pathlib import Path
path = Path('pi_digits.txt') (1)
contents = path.read_text() (2)
print(contents)
| 1 | We build a Path object representing the file pi_digits.txt,
which we assign to the variable path. Since this file is saved in
the same directory as the .py file we’re writing, the filename is
all that Path needs to access the file. |
| 2 | We use the read_text() method to read the entire contents of the
file. The contents of the file are returned as a single string,
which we assign to the variable contents. |
To work with the contents of a file, we need to tell Python the path to
the file. A path is the exact location of a file or folder on a
system. Python provides a module called pathlib that makes it easier
to work with files and directories, no matter which operating system you
or your program’s users are working with. A module that provides
specific functionality like this is often called a library, hence the
name pathlib.
We start by importing the Path class from pathlib. There’s a lot you
can do with a Path object that points to a file. For example, you can
check that the file exists before working with it, read the file’s
contents, or write new data to the file.
When we print the value of contents, we see the entire contents of the
text file:
3.1415926535
8979323846
2643383279
The only difference between this output and the original file is the
extra blank line at the end of the output. The blank line appears
because read_text() returns an empty string when it reaches the end of
the file; this empty string shows up as a blank line.
We can remove the extra blank line by using rstrip() on the contents
string:
from pathlib import Path
path = Path('pi_digits.txt')
contents = path.read_text()
contents = contents.rstrip()
print(contents)
Recall from Chapter 2 that Python’s rstrip() method removes, or
strips, any whitespace characters from the right side of a string. Now
the output matches the contents of the original file exactly:
3.1415926535
8979323846
2643383279
We can strip the trailing newline character when we read the contents of
the file, by applying the rstrip() method immediately after calling
read_text():
contents = path.read_text().rstrip()
This line tells Python to call the read_text() method on the file
we’re working with. Then it applies the rstrip() method to the string
that read_text() returns. The cleaned-up string is then assigned to
the variable contents. This approach is called method chaining, and
you’ll see it used often in programming.
|
VS Code looks for files in the folder that was most recently opened. If
you’re using VS Code, start by opening the folder where you’re storing
this chapter’s programs. For example, if you’re saving your program
files in a folder called |
Relative and Absolute File Paths
When you pass a simple filename like pi_digits.txt to Path, Python
looks in the directory where the file that’s currently being executed
(that is, your .py program file) is stored.
Sometimes, depending on how you organize your work, the file you want to
open won’t be in the same directory as your program file. For example,
you might store your program files in a folder called python_work;
inside python_work, you might have another folder called text_files
to distinguish your program files from the text files they’re
manipulating. Even though text_files is in python_work, just passing
Path the name of a file in text_files won’t work, because Python
will only look in python_work and stop there. To get Python to open
files from a directory other than the one where your program file is
stored, you need to provide the correct path.
There are two main ways to specify paths in programming. A relative file path tells Python to look for a given location relative to the directory where the currently running program file is stored. Here’s how to build this path:
path = Path('text_files/filename.txt')
You can also tell Python exactly where the file is on your computer, regardless of where the program that’s being executed is stored. This is called an absolute file path. Absolute paths are usually longer than relative paths, because they start at your system’s root folder:
path = Path('/home/eric/data_files/text_files/filename.txt')
Using absolute paths, you can read files from any location on your
system. For now it’s easiest to store files in the same directory as
your program files, or in a folder such as text_files within the
directory that stores your program files.
|
Windows systems use a backslash ( |
Accessing a File’s Lines
When you’re working with a file, you’ll often want to examine each line of the file. You might be looking for certain information in the file, or you might want to modify the text in the file in some way.
You can use the splitlines() method to turn a long string into a set
of lines, and then use a for loop to examine each line from a file,
one at a time:
from pathlib import Path
path = Path('pi_digits.txt')
contents = path.read_text() (1)
lines = contents.splitlines() (2)
for line in lines:
print(line)
| 1 | We start out by reading the entire contents of the file. If you’re planning to work with the individual lines in a file, you don’t need to strip any whitespace when reading the file. |
| 2 | The splitlines() method returns a list of all lines in the file,
and we assign this list to the variable lines. We then loop over
these lines and print each one. |
Since we haven’t modified any of the lines, the output matches the original text file exactly:
3.1415926535
8979323846
2643383279
Working with a File’s Contents
After you’ve read the contents of a file into memory, you can do whatever you want with that data. Let’s briefly explore the digits of pi. First, we’ll attempt to build a single string containing all the digits in the file with no whitespace in it:
from pathlib import Path
path = Path('pi_digits.txt')
contents = path.read_text()
lines = contents.splitlines()
pi_string = ''
for line in lines: (1)
pi_string += line
print(pi_string)
print(len(pi_string))
| 1 | We write a loop that adds each line of digits to pi_string. |
3.1415926535 8979323846 2643383279
36
The variable pi_string contains the whitespace that was on the left
side of the digits in each line, but we can get rid of that by using
lstrip() on each line:
# --snip--
for line in lines:
pi_string += line.lstrip()
print(pi_string)
print(len(pi_string))
Now we have a string containing pi to 30 decimal places. The string is 32 characters long because it also includes the leading 3 and a decimal point:
3.141592653589793238462643383279
32
|
When Python reads from a text file, it interprets all text in the file
as a string. If you read in a number and want to work with that value in
a numerical context, you’ll have to convert it to an integer using the
|
Large Files: One Million Digits
So far, we’ve focused on analyzing a text file that contains only three lines, but the code in these examples would work just as well on much larger files. If we start with a text file that contains pi to 1,000,000 decimal places, instead of just 30, we can create a single string containing all these digits. We don’t need to change our program at all, except to pass it a different file. We’ll also print just the first 50 decimal places, so we don’t have to watch a million digits scroll by in the terminal:
from pathlib import Path
path = Path('pi_million_digits.txt')
contents = path.read_text()
lines = contents.splitlines()
pi_string = ''
for line in lines:
pi_string += line.lstrip()
print(f"{pi_string[:52]}...")
print(len(pi_string))
The output shows that we do indeed have a string containing pi to 1,000,000 decimal places:
3.14159265358979323846264338327950288419716939937510...
1000002
Python has no inherent limit to how much data you can work with; you can work with as much data as your system’s memory can handle.
|
To run this program (and many of the examples that follow), you’ll need to download the resources available at ehmatthes.github.io/pcc_3e. |
Is Your Birthday Contained in Pi?
Let’s use the program we just wrote to find out if someone’s birthday
appears anywhere in the first million digits of pi. We can do this by
expressing each birthday as a string of digits and seeing if that string
appears anywhere in pi_string:
# --snip--
for line in lines:
pi_string += line.strip()
birthday = input("Enter your birthday, in the form mmddyy: ")
if birthday in pi_string:
print("Your birthday appears in the first million digits of pi!")
else:
print("Your birthday does not appear in the first million digits of pi.")
We first prompt for the user’s birthday, and then check if that string
is in pi_string. Let’s try it:
Enter your birthday, in the form mmddyy: 120372
Your birthday appears in the first million digits of pi!
Once you’ve read from a file, you can analyze its contents in just about any way you can imagine.
Try It Yourself
10-1. Learning Python: Open a blank file in your text editor and write
a few lines summarizing what you’ve learned about Python so far. Start
each line with the phrase In Python you can…. Save the file as
learning_python.txt in the same directory as your exercises from this
chapter. Write a program that reads the file and prints what you wrote
two times: print the contents once by reading in the entire file, and
once by storing the lines in a list and then looping over each line.
10-2. Learning C: You can use the replace() method to replace any
word in a string with a different word. Here’s a quick example showing
how to replace 'dog' with 'cat' in a sentence:
>>> message = "I really like dogs."
>>> message.replace('dog', 'cat')
'I really like cats.'
Read in each line from the file you just created, learning_python.txt,
and replace the word Python with the name of another language, such as
C. Print each modified line to the screen.
10-3. Simpler Code: The program file_reader.py in this section uses
a temporary variable, lines, to show how splitlines() works. You can
skip the temporary variable and loop directly over the list that
splitlines() returns:
for line in contents.splitlines():
Remove the temporary variable from each of the programs in this section, to make them more concise.
Writing to a File
One of the simplest ways to save data is to write it to a file. When you write text to a file, the output will still be available after you close the terminal containing your program’s output. You can examine output after a program finishes running, and you can share the output files with others as well. You can also write programs that read the text back into memory and work with it again later.
Writing a Single Line
Once you have a path defined, you can write to a file using the
write_text() method. To see how this works, let’s write a simple
message and store it in a file instead of printing it to the screen:
from pathlib import Path
path = Path('programming.txt')
path.write_text("I love programming.")
The write_text() method takes a single argument: the string that you
want to write to the file. This program has no terminal output, but if
you open the file programming.txt, you’ll see one line:
I love programming.
This file behaves like any other file on your computer. You can open it, write new text in it, copy from it, paste to it, and so forth.
|
Python can only write strings to a text file. If you want to store
numerical data in a text file, you’ll have to convert the data to string
format first using the |
Writing Multiple Lines
The write_text() method does a few things behind the scenes. If the
file that path points to doesn’t exist, it creates that file. Also,
after writing the string to the file, it makes sure the file is closed
properly. Files that aren’t closed properly can lead to missing or
corrupted data.
To write more than one line to a file, you need to build a string
containing the entire contents of the file, and then call write_text()
with that string. Let’s write several lines to the programming.txt
file:
from pathlib import Path
contents = "I love programming.\n"
contents += "I love creating new games.\n"
contents += "I also love working with data.\n"
path = Path('programming.txt')
path.write_text(contents)
We define a variable called contents that will hold the entire
contents of the file. On the next line, we use the += operator to add
to this string. We include newline characters at the end of each line,
to make sure each statement appears on its own line.
If you run this and then open programming.txt, you’ll see each of
these lines in the text file:
I love programming.
I love creating new games.
I also love working with data.
|
Be careful when calling |
Try It Yourself
10-4. Guest: Write a program that prompts the user for their name.
When they respond, write their name to a file called guest.txt.
10-5. Guest Book: Write a while loop that prompts users for their
name. Collect all the names that are entered, and then write these names
to a file called guest_book.txt. Make sure each entry appears on a new
line in the file.
Exceptions
Python uses special objects called exceptions to manage errors that arise during a program’s execution. Whenever an error occurs that makes Python unsure of what to do next, it creates an exception object. If you write code that handles the exception, the program will continue running. If you don’t handle the exception, the program will halt and show a traceback, which includes a report of the exception that was raised.
Exceptions are handled with try-except blocks. A try-except block asks Python to do something, but it also tells Python what to do if an exception is raised. When you use try-except blocks, your programs will continue running even if things start to go wrong. Instead of tracebacks, which can be confusing for users to read, users will see friendly error messages that you’ve written.
Handling the ZeroDivisionError Exception
Let’s look at a simple error that causes Python to raise an exception. You probably know that it’s impossible to divide a number by zero, but let’s ask Python to do it anyway:
print(5/0)
Python can’t do this, so we get a traceback:
Traceback (most recent call last):
File "division_calculator.py", line 1, in <module>
print(5/0)
~^~
ZeroDivisionError: division by zero (1)
| 1 | The error reported in the traceback, ZeroDivisionError, is an
exception object. Python creates this kind of object in response to
a situation where it can’t do what we ask it to. |
When this happens, Python stops the program and tells us the kind of exception that was raised. We can use this information to modify our program. We’ll tell Python what to do when this kind of exception occurs; that way, if it happens again, we’ll be prepared.
Using try-except Blocks
When you think an error may occur, you can write a try-except block to handle the exception that might be raised. You tell Python to try running some code, and you tell it what to do if the code results in a particular kind of exception.
Here’s what a try-except block for handling the ZeroDivisionError
exception looks like:
try:
print(5/0)
except ZeroDivisionError:
print("You can't divide by zero!")
We put print(5/0), the line that caused the error, inside a try
block. If the code in a try block works, Python skips over the
except block. If the code in the try block causes an error, Python
looks for an except block whose error matches the one that was raised,
and runs the code in that block.
In this example, the code in the try block produces a
ZeroDivisionError, so Python looks for an except block telling it
how to respond. Python then runs the code in that block, and the user
sees a friendly error message instead of a traceback:
You can't divide by zero!
If more code followed the try-except block, the program would continue running because we told Python how to handle the error. Let’s look at an example where catching an error can allow a program to continue running.
Using Exceptions to Prevent Crashes
Handling errors correctly is especially important when the program has more work to do after the error occurs. This happens often in programs that prompt users for input. If the program responds to invalid input appropriately, it can prompt for more valid input instead of crashing.
Let’s create a simple calculator that does only division:
print("Give me two numbers, and I'll divide them.")
print("Enter 'q' to quit.")
while True:
first_number = input("\nFirst number: ") (1)
if first_number == 'q':
break
second_number = input("Second number: ") (2)
if second_number == 'q':
break
answer = int(first_number) / int(second_number) (3)
print(answer)
| 1 | This program prompts the user to input a first_number. |
| 2 | If the user does not enter q to quit, they input a second_number. |
| 3 | We then divide these two numbers to get an answer. |
This program does nothing to handle errors, so asking it to divide by zero causes it to crash:
Give me two numbers, and I'll divide them.
Enter 'q' to quit.
First number: 5
Second number: 0
Traceback (most recent call last):
File "division_calculator.py", line 11, in <module>
answer = int(first_number) / int(second_number)
~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
ZeroDivisionError: division by zero
It’s bad that the program crashed, but it’s also not a good idea to let users see tracebacks. Nontechnical users will be confused by them, and in a malicious setting, attackers will learn more than you want them to. For example, they’ll know the name of your program file, and they’ll see a part of your code that isn’t working properly.
The else Block
We can make this program more error resistant by wrapping the line that
might produce errors in a try-except block. The error occurs on the line
that performs the division, so that’s where we’ll put the try-except
block. This example also includes an else block. Any code that depends
on the try block executing successfully goes in the else block:
# --snip--
while True:
# --snip--
if second_number == 'q':
break
try: (1)
answer = int(first_number) / int(second_number)
except ZeroDivisionError: (2)
print("You can't divide by 0!")
else: (3)
print(answer)
| 1 | We ask Python to try to complete the division operation in a try
block, which includes only the code that might cause an error. |
| 2 | The except block tells Python how to respond when a
ZeroDivisionError arises. |
| 3 | If the division operation is successful, we use the else block to
print the result. Any code that depends on the try block
succeeding is added to the else block. |
The program continues to run, and the user never sees a traceback:
Give me two numbers, and I'll divide them.
Enter 'q' to quit.
First number: 5
Second number: 0
You can't divide by 0!
First number: 5
Second number: 2
2.5
First number: q
The only code that should go in a try block is code that might cause
an exception to be raised. Sometimes you’ll have additional code that
should run only if the try block was successful; this code goes in the
else block. The except block tells Python what to do in case a
certain exception arises when it tries to run the code in the try
block.
By anticipating likely sources of errors, you can write robust programs that continue to run even when they encounter invalid data and missing resources.
Handling the FileNotFoundError Exception
One common issue when working with files is handling missing files. The file you’re looking for might be in a different location, the filename might be misspelled, or the file might not exist at all. You can handle all of these situations with a try-except block.
Let’s try to read a file that doesn’t exist. The following program tries
to read in the contents of Alice in Wonderland, but the file alice.txt
is not saved in the same directory as alice.py:
from pathlib import Path
path = Path('alice.txt')
contents = path.read_text(encoding='utf-8')
Note that we’re using read_text() in a slightly different way here
than what you saw earlier. The encoding argument is needed when your
system’s default encoding doesn’t match the encoding of the file that’s
being read. This is most likely to happen when reading from a file that
wasn’t created on your system.
Python can’t read from a missing file, so it raises an exception:
Traceback (most recent call last):
File "alice.py", line 4, in <module> (1)
contents = path.read_text(encoding='utf-8') (2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../pathlib.py", line 1056, in read_text
...
FileNotFoundError: [Errno 2] No such file or directory: 'alice.txt' (3)
| 1 | Looking near the beginning of the traceback, we can see that the
error occurred at line 4 in the file alice.py. |
| 2 | The next line shows the line of code that caused the error. |
| 3 | On the last line, we can see that a FileNotFoundError exception was
raised. This is important because it tells us what kind of exception
to use in the except block that we’ll write. |
It’s often best to start at the very end of a traceback. To handle the
error that’s being raised, the try block will begin with the line that
was identified as problematic in the traceback β the line that contains
read_text():
from pathlib import Path
path = Path('alice.txt')
try:
contents = path.read_text(encoding='utf-8')
except FileNotFoundError: (1)
print(f"Sorry, the file {path} does not exist.")
| 1 | We write an except block that matches the FileNotFoundError.
Python then runs the code in that block when the file can’t be
found, and the result is a friendly error message instead of a
traceback. |
Sorry, the file alice.txt does not exist.
Analyzing Text
You can analyze text files containing entire books. Many classic works of literature are available as simple text files because they are in the public domain. The texts used in this section come from Project Gutenberg (gutenberg.org). Project Gutenberg maintains a collection of literary works that are available in the public domain, and it’s a great resource if you’re interested in working with literary texts in your programming projects.
Let’s pull in the text of Alice in Wonderland and try to count the
number of words in the text. To do this, we’ll use the string method
split(), which by default splits a string wherever it finds any
whitespace:
from pathlib import Path
path = Path('alice.txt')
try:
contents = path.read_text(encoding='utf-8')
except FileNotFoundError:
print(f"Sorry, the file {path} does not exist.")
else:
# Count the approximate number of words in the file.
words = contents.split() (1)
num_words = len(words) (2)
print(f"The file {path} has about {num_words} words.")
| 1 | We take the string contents and use split() to produce a list of
all the words in the book. |
| 2 | Using len() on this list gives us a good approximation of the
number of words in the original text. |
The output tells us how many words are in alice.txt:
The file alice.txt has about 29594 words.
Working with Multiple Files
Let’s add more books to analyze, but before we do, let’s move the bulk
of this program to a function called count_words(). This will make it
easier to run the analysis for multiple books:
from pathlib import Path
def count_words(path):
"""Count the approximate number of words in a file.""" (1)
try:
contents = path.read_text(encoding='utf-8')
except FileNotFoundError:
print(f"Sorry, the file {path} does not exist.")
else:
# Count the approximate number of words in the file.
words = contents.split()
num_words = len(words)
print(f"The file {path} has about {num_words} words.")
path = Path('alice.txt')
count_words(path)
| 1 | The code has only been indented and moved into the body of
count_words(). The comment has also been changed to a docstring. |
Now we can write a short loop to count the words in any text we want to
analyze. We’ll try to count the words for Alice in Wonderland,
Siddhartha, Moby Dick, and Little Women, which are all available in the
public domain. I’ve intentionally left siddhartha.txt out of the
directory containing word_count.py, so we can see how well our program
handles a missing file:
from pathlib import Path
def count_words(filename):
# --snip--
filenames = ['alice.txt', 'siddhartha.txt', 'moby_dick.txt', 'little_women.txt']
for filename in filenames:
path = Path(filename) (1)
count_words(path)
| 1 | The names of the files are stored as simple strings. Each string is
then converted to a Path object before the call to count_words(). |
The missing siddhartha.txt file has no effect on the rest of the
program’s execution:
The file alice.txt has about 29594 words.
Sorry, the file siddhartha.txt does not exist.
The file moby_dick.txt has about 215864 words.
The file little_women.txt has about 189142 words.
Using the try-except block in this example provides two significant
advantages. We prevent our users from seeing a traceback, and we let the
program continue analyzing the texts it’s able to find. If we don’t
catch the FileNotFoundError that siddhartha.txt raises, the user
would see a full traceback, and the program would stop running after
trying to analyze Siddhartha. It would never analyze Moby Dick or Little
Women.
Failing Silently
In the previous example, we informed our users that one of the files was
unavailable. But you don’t need to report every exception you catch.
Sometimes, you’ll want the program to fail silently when an exception
occurs and continue on as if nothing happened. To make a program fail
silently, you write a try block as usual, but you explicitly tell
Python to do nothing in the except block. Python has a pass
statement that tells it to do nothing in a block:
def count_words(path):
"""Count the approximate number of words in a file."""
try:
# --snip--
except FileNotFoundError:
pass
else:
# --snip--
The only difference between this listing and the previous one is the
pass statement in the except block. Now when a FileNotFoundError
is raised, the code in the except block runs, but nothing happens. No
traceback is produced, and there’s no output in response to the error
that was raised. Users see the word counts for each file that exists,
but they don’t see any indication that a file wasn’t found:
The file alice.txt has about 29594 words.
The file moby_dick.txt has about 215864 words.
The file little_women.txt has about 189142 words.
The pass statement also acts as a placeholder. It’s a reminder that
you’re choosing to do nothing at a specific point in your program’s
execution and that you might want to do something there later. For
example, in this program we might decide to write any missing filenames
to a file called missing_files.txt.
Deciding Which Errors to Report
How do you know when to report an error to your users and when to let your program fail silently? If users know which texts are supposed to be analyzed, they might appreciate a message informing them why some texts were not analyzed. If users expect to see some results but don’t know which books are supposed to be analyzed, they might not need to know that some texts were unavailable. Giving users information they aren’t looking for can decrease the usability of your program. Python’s error-handling structures give you fine-grained control over how much to share with users when things go wrong; it’s up to you to decide how much information to share.
Well-written, properly tested code is not very prone to internal errors, such as syntax or logical errors. But every time your program depends on something external such as user input, the existence of a file, or the availability of a network connection, there is a possibility of an exception being raised. A little experience will help you know where to include exception-handling blocks in your program and how much to report to users about errors that arise.
Try It Yourself
10-6. Addition: One common problem when prompting for numerical input
occurs when people provide text instead of numbers. When you try to
convert the input to an int, you’ll get a ValueError. Write a
program that prompts for two numbers. Add them together and print the
result. Catch the ValueError if either input value is not a number,
and print a friendly error message. Test your program by entering two
numbers and then by entering some text instead of a number.
10-7. Addition Calculator: Wrap your code from Exercise 10-6 in a
while loop so the user can continue entering numbers, even if they
make a mistake and enter text instead of a number.
10-8. Cats and Dogs: Make two files, cats.txt and dogs.txt. Store
at least three names of cats in the first file and three names of dogs
in the second file. Write a program that tries to read these files and
print the contents of the file to the screen. Wrap your code in a
try-except block to catch the FileNotFoundError, and print a friendly
message if a file is missing. Move one of the files to a different
location on your system, and make sure the code in the except block
executes properly.
10-9. Silent Cats and Dogs: Modify your except block in Exercise
10-8 to fail silently if either file is missing.
10-10. Common Words: Visit Project Gutenberg (gutenberg.org)
and find a few texts you’d like to analyze. Download the text files for
these works, or copy the raw text from your browser into a text file on
your computer. You can use the count() method to find out how many
times a word or phrase appears in a string. For example, the following
code counts the number of times 'row' appears in a string:
>>> line = "Row, row, row your boat"
>>> line.count('row')
2
>>> line.lower().count('row')
3
Notice that converting the string to lowercase using lower() catches
all appearances of the word you’re looking for, regardless of how it’s
formatted. Write a program that reads the files you found at Project
Gutenberg and determines how many times the word 'the' appears in each
text. Try counting ' the ' with a space in the string, and see how
much lower your count is.
Storing Data
Many of your programs will ask users to input certain kinds of
information. You might allow users to store preferences in a game or
provide data for a visualization. Whatever the focus of your program is,
you’ll store the information users provide in data structures such as
lists and dictionaries. When users close a program, you’ll almost always
want to save the information they entered. A simple way to do this
involves storing your data using the json module.
The json module allows you to convert simple Python data structures
into JSON-formatted strings, and then load the data from that file the
next time the program runs. You can also use json to share data
between different Python programs. Even better, the JSON data format is
not specific to Python, so you can share data you store in the JSON
format with people who work in many other programming languages. It’s a
useful and portable format, and it’s easy to learn.
|
The JSON (JavaScript Object Notation) format was originally developed for JavaScript. However, it has since become a common format used by many languages, including Python. |
Using json.dumps() and json.loads()
Let’s write a short program that stores a set of numbers and another
program that reads these numbers back into memory. The first program
will use json.dumps() to store the set of numbers, and the second
program will use json.loads().
The json.dumps() function takes one argument: a piece of data that
should be converted to the JSON format. The function returns a string,
which we can then write to a data file:
from pathlib import Path
import json
numbers = [2, 3, 5, 7, 11, 13]
path = Path('numbers.json') (1)
contents = json.dumps(numbers) (2)
path.write_text(contents)
| 1 | We choose a filename in which to store the list of numbers. It’s
customary to use the file extension .json to indicate that the
data in the file is stored in the JSON format. |
| 2 | We use json.dumps() to generate a string containing the JSON
representation of the data we’re working with. Once we have this
string, we write it to the file using write_text(). |
This program has no output, but if you open the file numbers.json,
you’ll see the data is stored in a format that looks just like Python:
[2, 3, 5, 7, 11, 13]
Now we’ll write a separate program that uses json.loads() to read the
list back into memory:
from pathlib import Path
import json
path = Path('numbers.json') (1)
contents = path.read_text() (2)
numbers = json.loads(contents) (3)
print(numbers)
| 1 | We make sure to read from the same file we wrote to. |
| 2 | Since the data file is just a text file with specific formatting, we
can read it with the read_text() method. |
| 3 | We then pass the contents of the file to json.loads(). This
function takes in a JSON-formatted string and returns a Python object
(in this case, a list), which we assign to numbers. |
[2, 3, 5, 7, 11, 13]
This is a simple way to share data between two programs.
Saving and Reading User-Generated Data
Saving data with json is useful when you’re working with
user-generated data, because if you don’t store your user’s information
somehow, you’ll lose it when the program stops running. Let’s look at an
example where we prompt the user for their name the first time they run
a program and then remember their name when they run the program again.
Let’s start by storing the user’s name:
from pathlib import Path
import json
username = input("What is your name? ") (1)
path = Path('username.json') (2)
contents = json.dumps(username)
path.write_text(contents)
print(f"We'll remember you when you come back, {username}!") (3)
| 1 | We first prompt for a username to store. |
| 2 | We write the data we just collected to a file called username.json. |
| 3 | We print a message informing the user that we’ve stored their information. |
Now let’s write a new program that greets a user whose name has already been stored:
from pathlib import Path
import json
path = Path('username.json')
contents = path.read_text() (1)
username = json.loads(contents) (2)
print(f"Welcome back, {username}!")
| 1 | We read the contents of the data file. |
| 2 | We use json.loads() to assign the recovered data to the variable
username. Since we’ve recovered the username, we can welcome the
user back with a personalized greeting. |
We need to combine these two programs into one file. When someone runs
remember_me.py, we want to retrieve their username from memory if
possible; if not, we’ll prompt for a username and store it in
username.json for next time. We’ll use a handy method from the
pathlib module:
from pathlib import Path
import json
path = Path('username.json')
if path.exists(): (1)
contents = path.read_text()
username = json.loads(contents)
print(f"Welcome back, {username}!")
else: (2)
username = input("What is your name? ")
contents = json.dumps(username)
path.write_text(contents)
print(f"We'll remember you when you come back, {username}!")
| 1 | The exists() method returns True if a file or folder exists and
False if it doesn’t. Here we use path.exists() to find out if a
username has already been stored. If username.json exists, we load
the username and print a personalized greeting to the user. |
| 2 | If the file username.json doesn’t exist, we prompt for a username
and store the value that the user enters. |
If this is the first time the program runs, this is the output:
What is your name? Eric
We'll remember you when you come back, Eric!
Otherwise:
Welcome back, Eric!
Refactoring
Often, you’ll come to a point where your code will work, but you’ll recognize that you could improve the code by breaking it up into a series of functions that have specific jobs. This process is called refactoring. Refactoring makes your code cleaner, easier to understand, and easier to extend.
We can refactor remember_me.py by moving the bulk of its logic into
one or more functions. The focus of remember_me.py is on greeting the
user, so let’s move all of our existing code into a function called
greet_user():
from pathlib import Path
import json
def greet_user():
"""Greet the user by name.""" (1)
path = Path('username.json')
if path.exists():
contents = path.read_text()
username = json.loads(contents)
print(f"Welcome back, {username}!")
else:
username = input("What is your name? ")
contents = json.dumps(username)
path.write_text(contents)
print(f"We'll remember you when you come back, {username}!")
greet_user()
| 1 | Because we’re using a function now, we rewrite the comments as a docstring that reflects how the program currently works. |
The function greet_user() is doing more than just greeting the user β
it’s also retrieving a stored username if one exists and prompting for a
new username if one doesn’t. Let’s refactor greet_user() so it’s not
doing so many different tasks. We’ll start by moving the code for
retrieving a stored username to a separate function:
from pathlib import Path
import json
def get_stored_username(path):
"""Get stored username if available.""" (1)
if path.exists():
contents = path.read_text()
username = json.loads(contents)
return username
else:
return None (2)
def greet_user():
"""Greet the user by name."""
path = Path('username.json')
username = get_stored_username(path)
if username: (3)
print(f"Welcome back, {username}!")
else:
username = input("What is your name? ")
contents = json.dumps(username)
path.write_text(contents)
print(f"We'll remember you when you come back, {username}!")
greet_user()
| 1 | The new function get_stored_username() has a clear purpose, as
stated in the docstring. This function retrieves a stored username
and returns the username if it finds one. |
| 2 | If the path that’s passed to get_stored_username() doesn’t exist,
the function returns None. This is good practice: a function
should either return the value you’re expecting, or it should return
None. |
| 3 | We print a welcome back message to the user if the attempt to retrieve a username is successful, and if it isn’t, we prompt for a new username. |
We should factor one more block of code out of greet_user(). If the
username doesn’t exist, we should move the code that prompts for a new
username to a function dedicated to that purpose:
from pathlib import Path
import json
def get_stored_username(path):
"""Get stored username if available."""
# --snip--
def get_new_username(path):
"""Prompt for a new username."""
username = input("What is your name? ")
contents = json.dumps(username)
path.write_text(contents)
return username
def greet_user():
"""Greet the user by name."""
path = Path('username.json')
username = get_stored_username(path) (1)
if username:
print(f"Welcome back, {username}!")
else:
username = get_new_username(path) (2)
print(f"We'll remember you when you come back, {username}!")
greet_user()
| 1 | We call get_stored_username(), which is responsible only for
retrieving a stored username if one exists. |
| 2 | If necessary, greet_user() calls get_new_username(), which is
responsible only for getting a new username and storing it. |
Each function in this final version of remember_me.py has a single,
clear purpose. We call greet_user(), and that function prints an
appropriate message: it either welcomes back an existing user or greets
a new user. This compartmentalization of work is an essential part of
writing clear code that will be easy to maintain and extend.
Try It Yourself
10-11. Favorite Number: Write a program that prompts for the user’s
favorite number. Use json.dumps() to store this number in a file.
Write a separate program that reads in this value and prints the message
I know your favorite number! It’s _.
10-12. Favorite Number Remembered: Combine the two programs you wrote in Exercise 10-11 into one file. If the number is already stored, report the favorite number to the user. If not, prompt for the user’s favorite number and store it in a file. Run the program twice to see that it works.
10-13. User Dictionary: The remember_me.py example only stores one
piece of information, the username. Expand this example by asking for
two more pieces of information about the user, then store all the
information you collect in a dictionary. Write this dictionary to a file
using json.dumps(), and read it back in using json.loads(). Print a
summary showing exactly what your program remembers about the user.
10-14. Verify User: The final listing for remember_me.py assumes
either that the user has already entered their username or that the
program is running for the first time. We should modify it in case the
current user is not the person who last used the program. Before
printing a welcome back message in greet_user(), ask the user if this
is the correct username. If it’s not, call get_new_username() to get
the correct username.
Summary
In this chapter, you learned how to work with files. You learned to read the entire contents of a file, and then work through the contents one line at a time if you need to. You learned to write as much text as you want to a file. You also read about exceptions and how to handle the exceptions you’re likely to see in your programs. Finally, you learned how to store Python data structures so you can save information your users provide, preventing them from having to start over each time they run a program.
In Chapter 11, you’ll learn efficient ways to test your code. This will help you trust that the code you develop is correct, and it will help you identify bugs that are introduced as you continue to build on the programs you’ve written.
Applied Exercises: Ch 10 β Files and Exceptions
These exercises cover the same concepts as the chapter but use context
from real infrastructure, network security, and language learning work.
Save each as a separate .py file using lowercase and underscores, e.g.
ise_log_reader.py.
Domus Digitalis / Homelab
D10-1. Node Log Reader: Create a text file called node_events.txt
with at least five lines describing simulated node events (e.g.,
kvm-01: disk usage at 82%). Write a program that reads the file using
pathlib and read_text(), strips trailing whitespace, and prints each
line.
D10-2. VLAN Config Writer: Write a program that builds a multiline
string representing a VLAN configuration (at least four VLANs with IDs
and names). Write the string to a file called vlan_config.txt using
write_text(). Open the file and confirm the output.
D10-3. Service Status Parser: Create a file service_status.txt with
at least six lines in the format <service>: <status>. Write a program
that reads the file line by line using splitlines(). For each line,
split on : ` and print a formatted message like
`Service wazuh is active.
D10-4. BGP Peer State Store: Write a program that prompts the user to
enter a BGP peer name and its state (up or down). Store the data as
a dictionary in a JSON file called bgp_peers.json using json.dumps()
and write_text(). Write a second program that reads bgp_peers.json
and prints the peer state.
D10-5. Missing Config Handler: Write a program that tries to read a
file called domus_config.json. If the file doesn’t exist, catch the
FileNotFoundError and print a friendly message. If the file does
exist, read and print its contents. Test both branches.
CHLA / ISE / Network Security
C10-1. ISE Log Reader: Create a text file called ise_events.txt with
at least five simulated ISE syslog entries (e.g.,
5200: Authentication succeeded for user jdoe). Write a program that
reads the file, strips whitespace, and prints each log entry using
splitlines().
C10-2. Policy Config Writer: Write a program that builds a multiline
string representing at least three ISE policy set configurations (name,
protocol, result per line). Write the string to policy_config.txt
using write_text(). Confirm the output by reading the file back and
printing it.
C10-3. Syslog Severity Counter: Create a file syslog_entries.txt
with at least 10 simulated syslog lines, some containing CRITICAL and
some containing INFO. Write a program that reads the file and counts
how many times each word appears using count(). Print the results.
C10-4. Endpoint Blacklist Store: Write a program that stores a list of
blocked MAC addresses as a JSON file called blacklist.json using
json.dumps() and write_text(). Write a second program that reads the
file back using json.loads() and checks whether a user-entered MAC
address is in the blacklist. Print an appropriate message.
C10-5. Multi-Source Log Analyzer: Create two log files:
ise_logs.txt and ftd_logs.txt. Write a count_events(path) function
that reads each file and counts the number of lines. Use try-except
with FileNotFoundError and pass so the program continues silently if
a file is missing. Intentionally remove one file and confirm the program
still runs.
General Sysadmin / Linux
L10-1. Service Log Reader: Create a text file called services.txt
with at least five lines in the format <service>: <state>. Write a
program that reads the file using pathlib, splits it into lines, and
prints each service and state with a formatted message.
L10-2. Backup Manifest Writer: Write a program that builds a multiline
string listing at least five file paths to be backed up. Write the
string to backup_manifest.txt using write_text(). Read the file back
and print each path.
L10-3. Error Log Word Counter: Create a file error_log.txt with at
least 10 lines containing simulated error messages. Write a program that
reads the file and counts how many times the word error appears (use
lower() to normalize). Print the count.
L10-4. Package State Store: Write a program that prompts the user to
enter package names and their installed state (installed or missing).
Store the data as a dictionary in packages.json using json.dumps().
Write a second program that reads packages.json using json.loads()
and prints a summary of all packages and their states.
L10-5. Refactored Config Manager: Write a program modeled on
remember_me.py that stores a hostname and IP address in a JSON file.
Refactor the logic into three functions: get_stored_config(path),
get_new_config(path), and greet_operator(). The main function should
check whether config exists and either welcome the operator back or
prompt for new config.
Spanish / DELE C2
E10-1. Vocabulary File Reader: Create a text file called
vocabulario.txt with at least six lines in the format
<palabra>: <definicion>. Write a program that reads the file, splits
it into lines, and prints each entry formatted as
Palabra: <word> β DefiniciΓ³n: <def>.
E10-2. Chapter Notes Writer: Write a program that builds a multiline
string with study notes for at least three Don Quijote chapters (one
line per chapter with number and topic). Write the string to
notas_capitulos.txt using write_text(). Read the file back and print
each line.
E10-3. Vocabulary Counter: Create a text file with at least 15 lines
of Spanish vocabulary words. Write a program that reads the file and
counts how many lines contain a specific letter combination (e.g., ue
for diphthong practice) using count(). Print the result.
E10-4. Progress Store: Write a program that prompts the user to enter
their current DELE level and last chapter read. Store the data as a
dictionary in progreso.json using json.dumps(). Write a second
program that reads progreso.json using json.loads() and prints a
personalized progress summary.
E10-5. Refactored Study Tracker: Write a program modeled on the
refactored remember_me.py that stores a learner’s name and current
study goal in a JSON file. Refactor the logic into three functions:
get_stored_progress(path), get_new_progress(path), and
greet_learner(). The main function should check whether a progress
file exists and either welcome the learner back (printing their stored
goal) or prompt for new information.