About: This short document demonstrates how to retrieve structured data from public APIs using R. In this example, we get information about Parliamentarians using Parliaments’s API and plot the changing daily temperature in Washington DC using the National Weather Service’s API. Both examples use the httr2 package and retrieve data in JSON format.
Note: this document was designed for training purposes, is non-exhaustive, and is in progress.
Application Programming Interfaces (APIs) are powerful tools that enable communication between different software systems. In the context of the web, APIs allow users to access structured information from websites without manually scraping or copying content. By making requests to an API, users can efficiently retrieve data in formats easily read by a computer.
One common format is JavaScript Object Notation (JSON) - a lightweight and flexible data format. JSON represents data as key-value pairs and supports nested structures, allowing for complex yet well-organized data sets. Below is an example of data in JSON format:
{
"name": "Kier Starmer",
"role": "Prime Minister",
"address": {
"street": "10 Downing Street",
"city": "London",
"postcode": "SW1A 2AA"
},
"hobbies": ["politics", "football"]
}
The code below uses R’s httr2 package. The package helps users send and handle HTTP requests, making it easier to interact with APIs and web services. The package further provides tools for authentication, request building, response parsing, and error handling.
# Load libraries into memory
library(httr2)
library(dplyr)
In this example, we will retrieve information about the Parliamentarian speaking in the main chamber of the House of Commons or House of Lords at a specific point in time.
Parliament has a number of APIs. We will use two - one for Parliament’s annunciator system (the screens around Parliament showing what is happening in its two chambers) and one which provides information on MPs and Lords.
The URLs for both APIs begin
https://now-api.parliament.uk/api/
. By appending this base
URL, we can retrieve different information.
# Set base URL
base_url <- 'https://now-api.parliament.uk/api/'
By reading the annunciator API’s
documentation, we can see the base URL should be appended with
/Message/message/{annunciator}/{date}
.
{annunciator}
can be substituted with
CommonsMain
or LordsMain
for the Commons and
Lords chambers respectively. {date}
can accept
current
for the live status of a chamber or, alternatively,
a specific date and time in ISO 8601
format (e.g. 2025-05-21T09:38:00Z
) for its status at
that point in time.
For example, the following URL would provide information on what was
happening in the House of Lords Chamber at 15:00 on 20 May 2025:
https://now-api.parliament.uk/api/Message/message/LordsMain/2025-05-20T15:00:00Z
.
By clicking this link, you can see the data returned by the URL in JSON
format. Tip: if viewing this data in Microsoft Edge, ticking
‘pretty-print’ (top left), shows the returned data in a more
human-readable format.
Using the httr2 package, we append the URL with these fields, make an
HTTP request, and retrieve the data. If successful, a status of
200 (OK)
should be contained in the request response (in
this instance parl_response$status_code
).
# Appends the URL, makes a request, and retrieves data
parl_response <- request(base_url) |>
req_url_path_append(
'/Message/message',
'LordsMain',
'2025-05-20T15:00:00Z'
) |> req_perform()
# View response
glimpse(parl_response)
## List of 7
## $ method : chr "GET"
## $ url : chr "https://now-api.parliament.uk/api/Message/message/LordsMain/2025-05-20T15:00:00Z"
## $ status_code: int 200
## $ headers : <httr2_headers>
## ..$ Date : chr "Wed, 28 May 2025 10:14:37 GMT"
## ..$ Content-Type : chr "application/json; charset=utf-8"
## ..$ Transfer-Encoding : chr "chunked"
## ..$ Connection : chr "keep-alive"
## ..$ Cache-Control : chr "no-store, public, no-cache"
## ..$ Content-Encoding : chr "gzip"
## ..$ Vary : chr "Accept-Encoding,Accept-Encoding"
## ..$ Request-Context : chr "appId=cid-v1:f22bc8d6-2658-4889-af2e-335887f7eed2"
## ..$ CF-Cache-Status : chr "EXPIRED"
## ..$ Last-Modified : chr "Wed, 28 May 2025 10:14:37 GMT"
## ..$ Set-Cookie : chr "__cf_bm=eKvxLp1umpMkxXL7s0W06_QgKGn65kLATnOLMY7k_xE-1748427277-1.0.1.1-aIaVRxrcnBa4Bax1dxnE37XB3ceos0GCLfRcw6Qr"| __truncated__
## ..$ Permissions-Policy : chr "accelerometer=(), camera=(), geolocation=(), gyroscope=(), magnetometer=(), microphone=(), usb=()"
## ..$ Referrer-Policy : chr "strict-origin"
## ..$ Strict-Transport-Security: chr "max-age=2592000"
## ..$ X-Content-Type-Options : chr "nosniff"
## ..$ X-Frame-Options : chr "SAMEORIGIN"
## ..$ X-XSS-Protection : chr "1; mode=block"
## ..$ Server : chr "cloudflare"
## ..$ CF-RAY : chr "946cfc761892cd81-LHR"
## ..$ alt-svc : chr "h3=\":443\"; ma=86400"
## $ body : raw [1:2206] 7b 22 61 6e ...
## $ request :List of 8
## ..$ url : chr "https://now-api.parliament.uk/api/Message/message/LordsMain/2025-05-20T15:00:00Z"
## ..$ method : NULL
## ..$ headers : list()
## ..$ body : NULL
## ..$ fields : list()
## ..$ options : list()
## ..$ policies: list()
## ..$ state :<environment: 0x00000290ee5b6ca8>
## ..- attr(*, "class")= chr "httr2_request"
## $ cache :<environment: 0x00000290ee90c688>
## - attr(*, "class")= chr "httr2_response"
In the response object above, a status code of 200 (OK)
can be seen in the {status_code}
variable. The data we want
to retrieve (i.e. details on the status of the Chamber) exists as
non-human readable raw bytes in the {body}
variable. Before
doing anything with this data, we must first convert it to another
format.
{req_perform}
above returns the server’s response as an
{httr2_response}
object. We convert the content of this
response to parsed JSON using the {resp_body_json()}
function as follows:
parl_response_processed <- parl_response |>
resp_body_json() |>
glimpse()
## List of 9
## $ annunciatorDisabled: logi FALSE
## $ id : int 143768
## $ slides :List of 1
## ..$ :List of 8
## .. ..$ lines :List of 4
## .. ..$ type : chr "Generic"
## .. ..$ carouselOrder : int 1
## .. ..$ carouselDisplaySeconds: NULL
## .. ..$ speakerTime : chr "2025-05-20T14:57:00"
## .. ..$ slideTime : chr "2025-05-20T14:57:59.746965"
## .. ..$ soundToPlay : chr "NewSlide"
## .. ..$ id : int 0
## $ scrollingMessages : list()
## $ annunciatorType : chr "LordsMain"
## $ publishTime : chr "2025-05-20T14:57:59.369"
## $ isSecurityOverride : logi FALSE
## $ showCommonsBell : logi FALSE
## $ showLordsBell : logi FALSE
The retrieved data contains a lot of fields. In this example, we are interested in the Peer speaking in the Chamber at the defined time and their unique ID number. By exploring the retrieved information and reading the API’s documentation, we can dive into the nested data and extract the contents of these fields as follows:
# Get Peer name and unique ID
peer_name <- parl_response_processed[["slides"]][[1]][["lines"]][[2]][["member"]][["nameFullTitle"]]
peer_id <- parl_response_processed[["slides"]][[1]][["lines"]][[2]][["member"]][["id"]]
# Output name and ID to screen
cat("Name: ", peer_name, ". ID: ", peer_id, ".", sep ="")
## Name: The Rt Hon. the Baroness Anelay of St Johns DBE. ID: 3474.
Note: the pluck()
function provides an alternative,
safer way to extract this information e.g.:
pluck(parl_response_processed, "slides", 1, "lines", 2, "member", "id", .default = "Value not found")
.
Using this unique ID, we can get further information on the Peer
using a second API - the members API. As
per the API’s documentation, there are ~20 API routes available, each
beginning
https://members-api.parliament.uk/api/Members
.
The route {id}/WrittenQuestions
takes a
Parliamentarian’s unique ID and returns a list of written
questions tabled by that member.
# Set base URL
member_base_url <- "https://members-api.parliament.uk/api/Members"
# Append base URL, perform request, parse response
questions <- request(member_base_url) |>
req_url_path_append(
as.character(peer_id),
'WrittenQuestions'
) |>
req_perform() |>
resp_body_json()
# Get total number of questions from JSON response
total_questions <- questions$totalResults
# Print Peer name and total number of questions to screen
cat(peer_name, "has asked", total_questions, "Parliamentary questions.")
## The Rt Hon. the Baroness Anelay of St Johns DBE has asked 151 Parliamentary questions.
In this second example, weather data is retrieved from the US’s National Weather Service API. The data (time course data for Washington DC) is retrieved using httr2, processed using tidyverse, and plotted using ggplot2.
# Load required libraries
library(httr2)
library(tidyverse)
library(ggplot2)
# Set base URL
NWS_base_url <- 'https://api.weather.gov'
# Append base URL, make request, retrieve data, convert to R object, extract associated API URL
forecast_url <- request(NWS_base_url) |>
req_url_path_append(
'points',
'38.8894,-77.0352'
) |> req_perform() |>
resp_body_json() |>
pluck('properties', 'forecastHourly')
# Create and perform a new request using extracted API URL
forecast <- request(forecast_url) |>
req_perform() |>
resp_body_json() |>
# Extract time course data
pluck('properties', 'periods') |>
# Convert list into a structured dataframe with columns for time, temp, etc.
map_dfr(
\(x) {
tibble(
time = x |> pluck('startTime'),
temp_F = x |> pluck('temperature'),
rain_prob = x |> pluck('probabilityOfPrecipitation', 'value'),
forecast = x |> pluck('shortForecast')
)
}
) |>
# Changes time value (as exists) into a properly formatted date-time object
mutate(
time = time |> ymd_hms()
)
# Plot data using ggplot
ggplot(forecast, aes(x = time, y = temp_F)) +
geom_line(color = "blue") +
labs(title = "Washington DC Temperature",
x = "Date",
y = "Temperature (°F)") +
theme_minimal()