Chapter 17: Working with APIs

Source: Python Crash Course, 3rd Edition by Eric Matthes

In this chapter, you’ll learn how to write a self-contained program that generates a visualization based on data it retrieves. Your program will use an application programming interface (API) to automatically request specific information from a website and then use that information to generate a visualization. Because programs written like this will always use current data to generate a visualization, even when that data might be rapidly changing, the visualization will always be up to date.

Using an API

An API is a part of a website designed to interact with programs. Those programs use very specific URLs to request certain information. This kind of request is called an API call. The requested data will be returned in an easily processed format, such as JSON or CSV. Most apps that use external data sources, such as apps that integrate with social media sites, rely on API calls.

Git and GitHub

We’ll base our visualization on information from GitHub (github.com), a site that allows programmers to collaborate on coding projects. We’ll use GitHub’s API to request information about Python projects on the site, and then generate an interactive visualization of the relative popularity of these projects using Plotly.

GitHub takes its name from Git, a distributed version control system. Git helps people manage their work on a project in a way that prevents changes made by one person from interfering with changes other people are making. When you implement a new feature in a project, Git tracks the changes you make to each file. When your new code works, you commit the changes you’ve made, and Git records the new state of your project. If you make a mistake and want to revert your changes, you can easily return to any previously working state. (To learn more about version control using Git, see Appendix D.)

Projects on GitHub are stored in repositories, which contain everything associated with the project: its code, information on its collaborators, any issues or bug reports, and so on.

When users on GitHub like a project, they can "star" it to show their support and keep track of projects they might want to use. In this chapter, we’ll write a program to automatically download information about the most-starred Python projects on GitHub, and then we’ll create an informative visualization of these projects.

Requesting Data Using an API Call

GitHub’s API lets you request a wide range of information through API calls. To see what an API call looks like, enter the following into your browser’s address bar and press ENTER:

https://api.github.com/search/repositories?q=language:python+sort:stars

This call returns the number of Python projects currently hosted on GitHub, as well as information about the most popular Python repositories. Let’s examine the call. The first part, api.github.com/, directs the request to the part of GitHub that responds to API calls. The next part, search/repositories, tells the API to conduct a search through all the repositories on GitHub.

The question mark after repositories signals that we’re about to pass an argument. The q stands for query, and the equal sign (=) lets us begin specifying a query (q=). By using language:python, we indicate that we want information only on repositories that have Python as the primary language. The final part, +sort:stars, sorts the projects by the number of stars they’ve been given.

The following snippet shows the first few lines of the response:

{
    "total_count": 8961993,          (1)
    "incomplete_results": true,      (2)
    "items": [                       (3)
        {
            "id": 54346799,
            "node_id": "MDEwOlJlcG9zaXRvcnk1NDM0Njc5OQ==",
            "name": "public-apis",
            "full_name": "public-apis/public-apis",
            --snip--
1 GitHub found just under nine million Python projects as of this writing.
2 The value for "incomplete_results" is true, which tells us that GitHub didn’t fully process the query. GitHub limits how long each query can run, in order to keep the API responsive for all users. In this case it found some of the most popular Python repositories, but it didn’t have time to find all of them; we’ll fix that in a moment.
3 The "items" returned are displayed in the list that follows, which contains details about the most popular Python projects on GitHub.

Installing Requests

The Requests package allows a Python program to easily request information from a website and examine the response. Use pip to install Requests:

$ python -m pip install --user requests

If you use a command other than python to run programs or start a terminal session, such as python3, your command will look like this:

$ python3 -m pip install --user requests

Processing an API Response

Now we’ll write a program to automatically issue an API call and process the results:

python_repos.py
import requests

# Make an API call and check the response.
url = "https://api.github.com/search/repositories"
url += "?q=language:python+sort:stars+stars:>10000"   (1)

headers = {"Accept": "application/vnd.github.v3+json"}  (2)
r = requests.get(url, headers=headers)                   (3)
print(f"Status code: {r.status_code}")                   (4)

# Convert the response object to a dictionary.
response_dict = r.json()                                 (5)

# Process results.
print(response_dict.keys())
1 We assign the URL of the API call to the url variable. This is a long URL, so we break it into two lines. The first line is the main part of the URL, and the second line is the query string. We’ve included one more condition: stars:>10000, which tells GitHub to only look for Python repositories that have more than 10,000 stars. This should allow GitHub to return a complete, consistent set of results.
2 GitHub is currently on the third version of its API, so we define headers for the API call that ask explicitly to use this version of the API, and return the results in the JSON format.
3 We use requests to make the call to the API. We call get() and pass it the URL and the header that we defined, and we assign the response object to the variable r.
4 The response object has an attribute called status_code, which tells us whether the request was successful. A status code of 200 indicates a successful response. We print the value of status_code so we can make sure the call went through successfully.
5 We asked the API to return the information in JSON format, so we use the json() method to convert the information to a Python dictionary. We assign the resulting dictionary to response_dict.

Finally, we print the keys from response_dict and see the following output:

Status code: 200
dict_keys(['total_count', 'incomplete_results', 'items'])

Because the status code is 200, we know that the request was successful. The response dictionary contains only three keys: 'total_count', 'incomplete_results', and 'items'. Let’s take a look inside the response dictionary.

Working with the Response Dictionary

With the information from the API call represented as a dictionary, we can work with the data stored there. Let’s generate some output that summarizes the information. This is a good way to make sure we received the information we expected, and to start examining the information we’re interested in:

python_repos.py
import requests

# Make an API call and store the response.
# --snip--

# Convert the response object to a dictionary.
response_dict = r.json()
print(f"Total repositories: {response_dict['total_count']}")     (1)
print(f"Complete results: {not response_dict['incomplete_results']}")

# Explore information about the repositories.
repo_dicts = response_dict['items']                              (2)
print(f"Repositories returned: {len(repo_dicts)}")

# Examine the first repository.
repo_dict = repo_dicts[0]                                        (3)
print(f"Keys: {len(repo_dict)}")                                 (4)
for key in sorted(repo_dict.keys()):                             (5)
    print(key)
1 We start exploring the response dictionary by printing the value associated with 'total_count', which represents the total number of Python repositories returned by this API call. We also use the value associated with 'incomplete_results', so we’ll know if GitHub was able to fully process the query. Rather than printing this value directly, we print its opposite: a value of True will indicate that we received a complete set of results.
2 The value associated with 'items' is a list containing a number of dictionaries, each of which contains data about an individual Python repository. We assign this list of dictionaries to repo_dicts. We then print the length of repo_dicts to see how many repositories we have information for.
3 To look closer at the information returned about each repository, we pull out the first item from repo_dicts and assign it to repo_dict.
4 We then print the number of keys in the dictionary to see how much information we have.
5 Finally, we print all the dictionary’s keys to see what kind of information is included.

The results give us a clearer picture of the actual data:

Status code: 200
Total repositories: 248           (1)
Complete results: True            (2)
Repositories returned: 30

Keys: 78                          (3)
allow_forking
archive_url
archived
--snip--
url
visiblity
watchers
watchers_count
1 At the time of this writing, there are only 248 Python repositories with over 10,000 stars.
2 We can see that GitHub was able to fully process the API call. In this response, GitHub returned information about the first 30 repositories that match the conditions of our query. If we want more repositories, we can request additional pages of data.
3 GitHub’s API returns a lot of information about each repository: there are 78 keys in repo_dict. When you look through these keys, you’ll get a sense of the kind of information you can extract about a project.

Let’s pull out the values for some of the keys in repo_dict:

python_repos.py
# --snip--
# Examine the first repository.
repo_dict = repo_dicts[0]

print("Selected information about first repository:")
print(f"Name: {repo_dict['name']}")                            (1)
print(f"Owner: {repo_dict['owner']['login']}")                 (2)
print(f"Stars: {repo_dict['stargazers_count']}")               (3)
print(f"Repository: {repo_dict['html_url']}")
print(f"Created: {repo_dict['created_at']}")                   (4)
print(f"Updated: {repo_dict['updated_at']}")                   (5)
print(f"Description: {repo_dict['description']}")
1 We start with the name of the project.
2 An entire dictionary represents the project’s owner, so we use the key owner to access the dictionary representing the owner, and then use the key login to get the owner’s login name.
3 Next, we print how many stars the project has earned and the URL for the project’s GitHub repository.
4 We then show when it was created.
5 And when it was last updated. Finally, we print the repository’s description.

The output should look something like this:

Status code: 200
Total repositories: 248
Complete results: True
Repositories returned: 30

Selected information about first repository:
Name: public-apis
Owner: public-apis
Stars: 191493
Repository: https://github.com/public-apis/public-apis
Created: 2016-03-20T23:49:42Z
Updated: 2022-05-12T06:37:11Z
Description: A collective list of free APIs

We can see that the most-starred Python project on GitHub as of this writing is public-apis. Its owner is an organization with the same name, and it has been starred by almost 200,000 GitHub users.

Summarizing the Top Repositories

When we make a visualization for this data, we’ll want to include more than one repository. Let’s write a loop to print selected information about each repository the API call returns so we can include them all in the visualization:

python_repos.py
# --snip--
# Explore information about the repositories.
repo_dicts = response_dict['items']
print(f"Repositories returned: {len(repo_dicts)}")

print("\nSelected information about each repository:")   (1)
for repo_dict in repo_dicts:                             (2)
    print(f"Name: {repo_dict['name']}")
    print(f"Owner: {repo_dict['owner']['login']}")
    print(f"Stars: {repo_dict['stargazers_count']}")
    print(f"Repository: {repo_dict['html_url']}")
    print(f"Description: {repo_dict['description']}")
1 We first print an introductory message.
2 Then we loop through all the dictionaries in repo_dicts. Inside the loop, we print the name of each project, its owner, how many stars it has, its URL on GitHub, and the project’s description.

Some interesting projects appear in these results, and it might be worth looking at a few. But don’t spend too much time here, because we’re about to create a visualization that will make the results much easier to read.

Monitoring API Rate Limits

Most APIs have rate limits, which means there’s a limit to how many requests you can make in a certain amount of time. To see if you’re approaching GitHub’s limits, enter api.github.com/rate_limit into a web browser. You should see a response that begins like this:

{
    "resources": {
        --snip--
        "search": {              (1)
            "limit": 10,         (2)
            "remaining": 9,      (3)
            "reset": 1652338832, (4)
            "used": 1,
            "resource": "search"
        },
        --snip--
1 The information we’re interested in is the rate limit for the search API.
2 We see that the limit is 10 requests per minute.
3 We have 9 requests remaining for the current minute.
4 The value associated with the key "reset" represents the time in Unix or epoch time (the number of seconds since midnight on January 1, 1970) when our quota will reset. If you reach your quota, you’ll get a short response that lets you know you’ve reached the API limit. If you reach the limit, just wait until your quota resets.

Many APIs require you to register and obtain an API key or access token to make API calls. As of this writing, GitHub has no such requirement, but if you obtain an access token, your limits will be much higher.

Visualizing Repositories Using Plotly

Let’s make a visualization using the data we’ve gathered to show the relative popularity of Python projects on GitHub. We’ll make an interactive bar chart: the height of each bar will represent the number of stars the project has acquired, and you’ll be able to click the bar’s label to go to that project’s home on GitHub.

Save a copy of the program we’ve been working on as python_repos_visual.py, then modify it so it reads as follows:

python_repos_visual.py
import requests
import plotly.express as px

# Make an API call and check the response.
url = "https://api.github.com/search/repositories"
url += "?q=language:python+sort:stars+stars:>10000"

headers = {"Accept": "application/vnd.github.v3+json"}
r = requests.get(url, headers=headers)
print(f"Status code: {r.status_code}")   (1)

# Process overall results.
response_dict = r.json()
print(f"Complete results: {not response_dict['incomplete_results']}")  (2)

# Process repository information.
repo_dicts = response_dict['items']
repo_names, stars = [], []               (3)
for repo_dict in repo_dicts:
    repo_names.append(repo_dict['name'])
    stars.append(repo_dict['stargazers_count'])

# Make visualization.
fig = px.bar(x=repo_names, y=stars)     (4)
fig.show()
1 We import Plotly Express and then make the API call as we have been doing. We continue to print the status of the API call response so we’ll know if there is a problem.
2 When we process the overall results, we continue to print the message confirming that we got a complete set of results. We remove the rest of the print() calls because we’re no longer in the exploratory phase; we know we have the data we want.
3 We then create two empty lists to store the data we’ll include in the initial chart. We’ll need the name of each project to label the bars (repo_names) and the number of stars to determine the height of the bars (stars). In the loop, we append the name of each project and the number of stars it has to these lists.
4 We make the initial visualization with just two lines of code. This is consistent with Plotly Express’s philosophy that you should be able to see your visualization as quickly as possible before refining its appearance. Here we use the px.bar() function to create a bar chart.

We can see that the first few projects are significantly more popular than the rest, but all of them are important projects in the Python ecosystem.

Styling the Chart

Plotly supports a number of ways to style and customize the plots, once you know the information in the plot is correct. We’ll make some changes in the initial px.bar() call and then make some further adjustments to the fig object after it’s been created.

We’ll start styling the chart by adding a title and labels for each axis:

python_repos_visual.py
# --snip--
# Make visualization.
title = "Most-Starred Python Projects on GitHub"
labels = {'x': 'Repository', 'y': 'Stars'}
fig = px.bar(x=repo_names, y=stars, title=title, labels=labels)

fig.update_layout(title_font_size=28, xaxis_title_font_size=20,  (1)
    yaxis_title_font_size=20)

fig.show()
1 We first add a title and labels for each axis, as we did in Chapters 15 and 16. We then use the fig.update_layout() method to modify specific elements of the chart. Plotly uses a convention where aspects of a chart element are connected by underscores. As you become familiar with Plotly’s documentation, you’ll start to see consistent patterns in how different elements of a chart are named and modified. Here we set the title font size to 28 and the font size for each axis title to 20.

Adding Custom Tooltips

In Plotly, you can hover the cursor over an individual bar to show the information the bar represents. This is commonly called a tooltip, and in this case, it currently shows the number of stars a project has. Let’s create a custom tooltip to show each project’s description as well as the project’s owner.

We need to pull some additional data to generate the tooltips:

python_repos_visual.py
# --snip--
# Process repository information.
repo_dicts = response_dict['items']
repo_names, stars, hover_texts = [], [], []   (1)
for repo_dict in repo_dicts:
    repo_names.append(repo_dict['name'])
    stars.append(repo_dict['stargazers_count'])

    # Build hover texts.
    owner = repo_dict['owner']['login']        (2)
    description = repo_dict['description']
    hover_text = f"{owner}<br />{description}" (3)
    hover_texts.append(hover_text)

# Make visualization.
title = "Most-Starred Python Projects on GitHub"
labels = {'x': 'Repository', 'y': 'Stars'}
fig = px.bar(x=repo_names, y=stars, title=title, labels=labels,
    hover_name=hover_texts)                    (4)

fig.update_layout(title_font_size=28, xaxis_title_font_size=20,
    yaxis_title_font_size=20)

fig.show()
1 We first define a new empty list, hover_texts, to hold the text we want to display for each project.
2 In the loop where we process the data, we pull the owner and the description for each project.
3 Plotly allows you to use HTML code within text elements, so we generate a string for the label with a line break (<br />) between the project owner’s username and the description. We then append this label to the list hover_texts.
4 In the px.bar() call, we add the hover_name argument and pass it hover_texts. This is the same approach we used to customize the label for each dot in the map of global earthquake activity. As Plotly creates each bar, it will pull labels from this list and only display them when the viewer hovers over a bar.

Because Plotly allows you to use HTML on text elements, we can easily add links to a chart. Let’s use the x-axis labels as a way to let the viewer visit any project’s home page on GitHub. We need to pull the URLs from the data and use them when generating the x-axis labels:

python_repos_visual.py
# --snip--
# Process repository information.
repo_dicts = response_dict['items']
repo_links, stars, hover_texts = [], [], []    (1)
for repo_dict in repo_dicts:
    # Turn repo names into active links.
    repo_name = repo_dict['name']
    repo_url = repo_dict['html_url']           (2)
    repo_link = f"<a href='{repo_url}'>{repo_name}</a>"  (3)
    repo_links.append(repo_link)

    stars.append(repo_dict['stargazers_count'])
    # --snip--

# Make visualization.
title = "Most-Starred Python Projects on GitHub"
labels = {'x': 'Repository', 'y': 'Stars'}
fig = px.bar(x=repo_links, y=stars, title=title, labels=labels,
    hover_name=hover_texts)

fig.update_layout(title_font_size=28, xaxis_title_font_size=20,
    yaxis_title_font_size=20)

fig.show()
1 We update the name of the list we’re creating from repo_names to repo_links to more accurately communicate the kind of information we’re putting together for the chart.
2 We then pull the URL for the project from repo_dict and assign it to the temporary variable repo_url.
3 Next, we generate a link to the project. We use the HTML anchor tag, which has the form <a href='url'>link text</a>, to generate the link. We then append this link to repo_links.

When we call px.bar(), we use repo_links for the x-values in the chart. The result looks the same as before, but now the viewer can click any of the project names at the bottom of the chart to visit that project’s home page on GitHub. Now we have an interactive, informative visualization of data retrieved through an API!

Customizing Marker Colors

Once a chart has been created, almost any aspect of the chart can be customized through an update method. We’ve used the update_layout() method previously. Another method, update_traces(), can be used to customize the data that’s represented on a chart.

Let’s change the bars to a darker blue, with some transparency:

python_repos_visual.py
# --snip--
fig.update_layout(title_font_size=28, xaxis_title_font_size=20,
    yaxis_title_font_size=20)

fig.update_traces(marker_color='SteelBlue', marker_opacity=0.6)

fig.show()

In Plotly, a trace refers to a collection of data on a chart. The update_traces() method can take a number of different arguments; any argument that starts with marker_ affects the markers on the chart. Here we set each marker’s color to 'SteelBlue'; any named CSS color will work here. We also set the opacity of each marker to 0.6. An opacity of 1.0 will be entirely opaque, and an opacity of 0 will be entirely invisible.

More About Plotly and the GitHub API

Plotly’s documentation is extensive and well organized; however, it can be hard to know where to start reading. A good place to start is with the article Plotly Express in Python at plotly.com/python/plotly-express. This is an overview of all the plots you can make with Plotly Express, and you can find links to longer articles about each individual chart type.

If you want to understand how to customize Plotly charts better, the article Styling Plotly Express Figures in Python will expand on what you’ve seen in Chapters 15-17. You can find this article at plotly.com/python/styling-plotly-express.

For more about the GitHub API, refer to its documentation at docs.github.com/en/rest. Here you’ll learn how to pull a wide variety of information from GitHub. If you have a GitHub account, you can work with your own data as well as the publicly available data from other users' repositories.

The Hacker News API

To explore how to use API calls on other sites, let’s take a quick look at Hacker News (news.ycombinator.com). On Hacker News, people share articles about programming and technology and engage in lively discussions about those articles. The Hacker News API provides access to data about all submissions and comments on the site, and you can use the API without having to register for a key.

The following call returns information about the current top article as of this writing:

https://hacker-news.firebaseio.com/v0/item/31353677.json

When you enter this URL in a browser, you’ll see that the text on the page is enclosed by braces, meaning it’s a dictionary. But the response is difficult to examine without some better formatting. Let’s run this URL through the json.dumps() method, like we did in the earthquake project in Chapter 16, so we can explore the kind of information that’s returned about an article:

hn_article.py
import requests
import json

# Make an API call, and store the response.
url = "https://hacker-news.firebaseio.com/v0/item/31353677.json"
r = requests.get(url)
print(f"Status code: {r.status_code}")

# Explore the structure of the data.
response_dict = r.json()
response_string = json.dumps(response_dict, indent=4)
print(response_string)    (1)
1 Everything in this program should look familiar, because we’ve used it all in the previous two chapters. The main difference here is that we can print the formatted response string instead of writing it to a file, because the output is not particularly long.

The output is a dictionary of information about the article with the ID 31353677:

{
    "by": "sohkamyung",
    "descendants": 302,     (1)
    "id": 31353677,
    "kids": [               (2)
        31354987,
        31354235,
        --snip--
    ],
    "score": 785,
    "time": 1652361401,
    "title": "Astronomers reveal first image of the black hole at the heart of our galaxy",  (3)
    "type": "story",
    "url": "https://public.nrao.edu/news/.../",   (4)
}
1 The key "descendants" tells us the number of comments the article has received.
2 The key "kids" provides the IDs of all comments made directly in response to this submission. Each of these comments might have comments of their own as well, so the number of descendants a submission has is usually greater than its number of kids.
3 We can see the title of the article being discussed.
4 And a URL for the article being discussed as well.

The following URL returns a simple list of all the IDs of the current top articles on Hacker News:

https://hacker-news.firebaseio.com/v0/topstories.json

We can use this call to find out which articles are on the home page right now, and then generate a series of API calls similar to the one we just examined. With this approach, we can print a summary of all the articles on the front page of Hacker News at the moment:

hn_submissions.py
from operator import itemgetter

import requests

# Make an API call and check the response.
url = "https://hacker-news.firebaseio.com/v0/topstories.json"   (1)
r = requests.get(url)
print(f"Status code: {r.status_code}")

# Process information about each submission.
submission_ids = r.json()                                        (2)
submission_dicts = []                                            (3)
for submission_id in submission_ids[:30]:
    # Make a new API call for each submission.
    url = f"https://hacker-news.firebaseio.com/v0/item/{submission_id}.json"  (4)
    r = requests.get(url)
    print(f"id: {submission_id}\tstatus: {r.status_code}")
    response_dict = r.json()

    # Build a dictionary for each article.
    submission_dict = {                                          (5)
        'title': response_dict['title'],
        'hn_link': f"https://news.ycombinator.com/item?id={submission_id}",
        'comments': response_dict['descendants'],
    }
    submission_dicts.append(submission_dict)                     (6)

submission_dicts = sorted(submission_dicts,                      (7)
    key=itemgetter('comments'), reverse=True)

for submission_dict in submission_dicts:                         (8)
    print(f"\nTitle: {submission_dict['title']}")
    print(f"Discussion link: {submission_dict['hn_link']}")
    print(f"Comments: {submission_dict['comments']}")
1 First, we make an API call and print the status of the response. This API call returns a list containing the IDs of up to 500 of the most popular articles on Hacker News at the time the call is issued.
2 We then convert the response object to a Python list, which we assign to submission_ids. We’ll use these IDs to build a set of dictionaries, each of which contains information about one of the current submissions.
3 We set up an empty list called submission_dicts to store these dictionaries. We then loop through the IDs of the top 30 submissions.
4 We make a new API call for each submission by generating a URL that includes the current value of submission_id. We print the status of each request along with its ID, so we can see whether it’s successful.
5 Next, we create a dictionary for the submission currently being processed. We store the title of the submission, a link to the discussion page for that item, and the number of comments the article has received so far.
6 Then we append each submission_dict to the list submission_dicts.
7 Each submission on Hacker News is ranked according to an overall score based on a number of factors, including how many times it’s been voted on, how many comments it’s received, and how recent the submission is. We want to sort the list of dictionaries by the number of comments. To do this, we use a function called itemgetter(), which comes from the operator module. We pass this function the key 'comments', and it pulls the value associated with that key from each dictionary in the list. The sorted() function then uses this value as its basis for sorting the list. We sort the list in reverse order, to place the most-commented stories first.
8 Once the list is sorted, we loop through the list and print out three pieces of information about each of the top submissions: the title, a link to the discussion page, and the number of comments the submission currently has.

The output should look something like this:

Status code: 200
id: 31390506    status: 200
id: 31389893    status: 200
id: 31390742    status: 200
--snip--

Title: Fly.io: The reclaimer of Heroku's magic
Discussion link: https://news.ycombinator.com/item?id=31390506
Comments: 134

Title: The weird Hewlett Packard FreeDOS option
Discussion link: https://news.ycombinator.com/item?id=31389893
Comments: 64

Title: Modern JavaScript Tutorial
Discussion link: https://news.ycombinator.com/item?id=31390742
Comments: 20
--snip--

You would use a similar process to access and analyze information with any API. With this data, you could make a visualization showing which submissions have inspired the most active recent discussions. This is also the basis for apps that provide a customized reading experience for sites like Hacker News. To learn more about what kind of information you can access through the Hacker News API, visit the documentation page at github.com/HackerNews/API.

Hacker News sometimes allows companies it supports to make special hiring posts, and comments are disabled on these posts. If you run this program while one of these posts is present, you’ll get a KeyError. If this causes an issue, you can wrap the code that builds submission_dict in a try-except block and skip over these posts.

Try It Yourself

17-1. Other Languages: Modify the API call in python_repos.py so it generates a chart showing the most popular projects in other languages. Try languages such as JavaScript, Ruby, C, Java, Perl, Haskell, and Go.

17-2. Active Discussions: Using the data from hn_submissions.py, make a bar chart showing the most active discussions currently happening on Hacker News. The height of each bar should correspond to the number of comments each submission has. The label for each bar should include the submission’s title and act as a link to the discussion page for that submission. If you get a KeyError when creating a chart, use a try-except block to skip over the promotional posts.

17-3. Testing python_repos.py: In python_repos.py, we printed the value of status_code to make sure the API call was successful. Write a program called test_python_repos.py that uses pytest to assert that the value of status_code is 200. Figure out some other assertions you can make: for example, that the number of items returned is expected and that the total number of repositories is greater than a certain amount.

17-4. Further Exploration: Visit the documentation for Plotly and either the GitHub API or the Hacker News API. Use some of the information you find there to either customize the style of the plots we’ve already made or pull some different information and create your own visualizations. If you’re curious about exploring other APIs, take a look at the APIs mentioned in the GitHub repository at github.com/public-apis.

Summary

In this chapter, you learned how to use APIs to write self-contained programs that automatically gather the data they need and use that data to create a visualization. You used the GitHub API to explore the most-starred Python projects on GitHub, and you also looked briefly at the Hacker News API. You learned how to use the Requests package to automatically issue an API call and how to process the results of that call. We also introduced some Plotly settings that further customize the appearance of the charts you generate.

In the next chapter, you’ll use Django to build a web application as your final project.

Applied Exercises: Ch 17 — Working with APIs

These exercises apply the chapter’s patterns — API calls with requests, JSON response parsing, rate limit awareness, nested dict extraction, list building, itemgetter() sorting, and bar chart generation — to infrastructure, security, and language learning contexts. Where live API access isn’t available, exercises use simulated response dicts.

Domus Digitalis / Homelab

D17-1. Simulated GitHub API Response: Create a Python dict that mimics the GitHub API response structure — a 'total_count' integer, 'incomplete_results' boolean, and 'items' list of repo dicts (each with 'name', 'stargazers_count', 'html_url', 'owner' with 'login', and 'description'). Use at least 8 items representing homelab projects. Write a program that extracts name, stars, and description into separate lists and prints a summary for each.

D17-2. Homelab Repo Sorting: Extend D17-1: add 'open_issues_count' and 'updated_at' fields to each repo dict. Use itemgetter() to sort the list by 'open_issues_count' in descending order. Print the sorted list showing name, stars, and open issues.

D17-3. Rate Limit Simulator: Write a class called APIRateLimit with limit = 10, remaining = 10, and reset_time (Unix timestamp 60 seconds from now). Add a make_call() method that decrements remaining and raises a RuntimeError with a friendly message when remaining reaches 0. Add a time_until_reset() method. Simulate 12 calls and handle the error gracefully.

D17-4. Hover Text Builder: Using your simulated repo list from D17-1, build a hover_texts list for each repo in the format "{owner}<br />{description}". Also build a repo_links list using the HTML anchor tag format <a href='{url}'>{name}</a>. Print the first 3 items from each list.

D17-5. Multi-API Aggregator: Simulate two API responses — one for GitHub repos and one for Hacker News submissions — each as a Python dict. Write a function merge_results(github_items, hn_items) that returns a combined list of dicts with a source field ('github' or 'hn'), a title field, and a score field. Sort the combined list by score descending using itemgetter(). Print the merged results.

CHLA / ISE / Network Security

C17-1. Simulated ISE API Response: Create a Python dict that mimics a JSON REST API response from ISE — a 'total' count, 'page' info, and 'SearchResult' with a 'resources' list of endpoint dicts (each with 'id', 'name', 'mac', 'status'). Use at least 8 items. Write a program that extracts name, mac, and status into separate lists and prints a summary for each endpoint.

C17-2. Endpoint Sorting: Extend C17-1: add 'last_seen' (ISO timestamp) and 'auth_failures' integer fields to each endpoint dict. Use itemgetter() to sort the list by 'auth_failures' in descending order. Print the sorted list.

C17-3. Rate Limit for Security APIs: Write a class called SecAPILimit with limit = 100, remaining = 100, and a reset_time timestamp. Add make_call(endpoint) that decrements remaining, prints the endpoint called, and raises RuntimeError when exhausted. Simulate 12 calls to different ISE endpoints and handle the error.

C17-4. Alert Hover Text Builder: Using a simulated list of security alerts (each with 'name', 'severity', 'source', 'detail_url'), build a hover_texts list in the format "{severity}<br />{source}" and an alert_links list using the HTML anchor tag format. Print the first 3 items from each list.

C17-5. Monad Pipeline API Aggregator: Simulate two API responses — one for Monad pipeline stages and one for Sentinel alert rules — each as a Python dict. Write merge_pipeline_data(monad_items, sentinel_items) that returns a combined list with source, name, and event_count fields. Sort by event_count descending. Print the merged results.

General Sysadmin / Linux

L17-1. Simulated Package Repo API: Create a Python dict that mimics a package repository API response — a 'total_count', 'results' list of package dicts (each with 'name', 'version', 'downloads', 'description', 'maintainer'). Use at least 8 packages. Extract name, downloads, and description into separate lists. Print a summary for each.

L17-2. Package Sorting: Extend L17-1: add 'open_bugs' and 'last_updated' fields to each package dict. Use itemgetter() to sort by 'open_bugs' descending. Print the sorted list.

L17-3. Rate Limit for Package API: Write a class called PkgAPILimit with limit = 30, remaining = 30, and a reset_time. Add make_call(pkg_name) that decrements remaining and raises RuntimeError when exhausted. Simulate 35 calls and handle the error.

L17-4. Package Hover Text Builder: Using your simulated package list, build hover_texts in the format "{maintainer}<br />{description}" and pkg_links using HTML anchor tags. Print the first 3 items from each list.

L17-5. Multi-Repo Aggregator: Simulate two package repo API responses (e.g., PyPI and a private repo). Write merge_pkg_results(pypi_items, private_items) that returns a combined list with source, name, and downloads fields. Sort by downloads descending. Print the merged results.

Spanish / DELE C2

E17-1. Simulated Vocabulary API Response: Create a Python dict that mimics a Spanish vocabulary API response — a 'total_count', and a 'words' list of word dicts (each with 'palabra', 'tipo', 'definicion', 'ejemplo', 'frecuencia'). Use at least 8 words from Don Quijote. Extract palabra, frecuencia, and definicion into separate lists. Print a summary for each word.

E17-2. Vocabulary Sorting: Extend E17-1: add 'dificultad' (1–5) and 'capitulo' integer fields to each word dict. Use itemgetter() to sort by 'dificultad' descending. Print the sorted list.

E17-3. Rate Limit for Language API: Write a class called LangAPILimit with limit = 20, remaining = 20, and a reset_time. Add make_call(word) that decrements remaining and raises RuntimeError when exhausted. Simulate 25 vocabulary lookups and handle the error gracefully with a user-friendly message.

E17-4. Word Hover Text Builder: Using your simulated word list, build hover_texts in the format "{tipo}<br />{ejemplo}" and word_links using HTML anchor tags pointing to a dictionary URL. Print the first 3 items from each list.

E17-5. Multi-Source Vocabulary Aggregator: Simulate two API responses — one for Don Quijote vocabulary and one for DELE C2 exam word lists. Write merge_vocab(quijote_items, dele_items) that returns a combined list with source, palabra, and frecuencia fields. Sort by frecuencia descending using itemgetter(). Print the merged results.