Getting Data via Web APIs with R

Background
- APIs and JSON
- httr2
Example 1: Getting Parliamentary data
Example 2: plotting weather data from an API
Further reading

About: This short document demonstrates how to retrieve structured data from public APIs using R. In this example, we get information about Parliamentarians using Parliaments’s API and plot the changing daily temperature in Washington DC using the National Weather Service’s API. Both examples use the httr2 package and retrieve data in JSON format.

Note: this document was designed for training purposes, is non-exhaustive, and is in progress.

Background

APIs and JSON

Application Programming Interfaces (APIs) are powerful tools that enable communication between different software systems. In the context of the web, APIs allow users to access structured information from websites without manually scraping or copying content. By making requests to an API, users can efficiently retrieve data in formats easily read by a computer.

One common format is JavaScript Object Notation (JSON) - a lightweight and flexible data format. JSON represents data as key-value pairs and supports nested structures, allowing for complex yet well-organized data sets. Below is an example of data in JSON format:

{
  "name": "Kier Starmer",
  "role": "Prime Minister",
  "address": {
    "street": "10 Downing Street",
    "city": "London",
    "postcode": "SW1A 2AA"
  },
  "hobbies": ["politics", "football"]
}

httr2

The code below uses R’s httr2 package. The package helps users send and handle HTTP requests, making it easier to interact with APIs and web services. The package further provides tools for authentication, request building, response parsing, and error handling.

# Load libraries into memory
library(httr2)
library(dplyr)

Example 1: Getting Parliamentary data

In this example, we will retrieve information about the Parliamentarian speaking in the main chamber of the House of Commons or House of Lords at a specific point in time.

Base URL

Parliament has a number of APIs. We will use two - one for Parliament’s annunciator system (the screens around Parliament showing what is happening in its two chambers) and one which provides information on MPs and Lords.

The URLs for both APIs begin https://now-api.parliament.uk/api/. By appending this base URL, we can retrieve different information.

# Set base URL
base_url <- 'https://now-api.parliament.uk/api/'

Appending the base URL and making a request

By reading the annunciator API’s documentation, we can see the base URL should be appended with /Message/message/{annunciator}/{date}.

{annunciator} can be substituted with CommonsMain or LordsMain for the Commons and Lords chambers respectively. {date} can accept current for the live status of a chamber or, alternatively, a specific date and time in ISO 8601 format (e.g. 2025-05-21T09:38:00Z) for its status at that point in time.

For example, the following URL would provide information on what was happening in the House of Lords Chamber at 15:00 on 20 May 2025: https://now-api.parliament.uk/api/Message/message/LordsMain/2025-05-20T15:00:00Z. By clicking this link, you can see the data returned by the URL in JSON format. Tip: if viewing this data in Microsoft Edge, ticking ‘pretty-print’ (top left), shows the returned data in a more human-readable format.

Using the httr2 package, we append the URL with these fields, make an HTTP request, and retrieve the data. If successful, a status of 200 (OK) should be contained in the request response (in this instance parl_response$status_code).

# Appends the URL, makes a request, and retrieves data
parl_response <- request(base_url) |>
  req_url_path_append(
    '/Message/message',
    'LordsMain',
    '2025-05-20T15:00:00Z'
  ) |> req_perform()

# View response
glimpse(parl_response)

## List of 7
##  $ method     : chr "GET"
##  $ url        : chr "https://now-api.parliament.uk/api/Message/message/LordsMain/2025-05-20T15:00:00Z"
##  $ status_code: int 200
##  $ headers    : <httr2_headers>
##   ..$ Date                     : chr "Wed, 28 May 2025 10:14:37 GMT"
##   ..$ Content-Type             : chr "application/json; charset=utf-8"
##   ..$ Transfer-Encoding        : chr "chunked"
##   ..$ Connection               : chr "keep-alive"
##   ..$ Cache-Control            : chr "no-store, public, no-cache"
##   ..$ Content-Encoding         : chr "gzip"
##   ..$ Vary                     : chr "Accept-Encoding,Accept-Encoding"
##   ..$ Request-Context          : chr "appId=cid-v1:f22bc8d6-2658-4889-af2e-335887f7eed2"
##   ..$ CF-Cache-Status          : chr "EXPIRED"
##   ..$ Last-Modified            : chr "Wed, 28 May 2025 10:14:37 GMT"
##   ..$ Set-Cookie               : chr "__cf_bm=eKvxLp1umpMkxXL7s0W06_QgKGn65kLATnOLMY7k_xE-1748427277-1.0.1.1-aIaVRxrcnBa4Bax1dxnE37XB3ceos0GCLfRcw6Qr"| __truncated__
##   ..$ Permissions-Policy       : chr "accelerometer=(), camera=(), geolocation=(), gyroscope=(), magnetometer=(), microphone=(), usb=()"
##   ..$ Referrer-Policy          : chr "strict-origin"
##   ..$ Strict-Transport-Security: chr "max-age=2592000"
##   ..$ X-Content-Type-Options   : chr "nosniff"
##   ..$ X-Frame-Options          : chr "SAMEORIGIN"
##   ..$ X-XSS-Protection         : chr "1; mode=block"
##   ..$ Server                   : chr "cloudflare"
##   ..$ CF-RAY                   : chr "946cfc761892cd81-LHR"
##   ..$ alt-svc                  : chr "h3=\":443\"; ma=86400"
##  $ body       : raw [1:2206] 7b 22 61 6e ...
##  $ request    :List of 8
##   ..$ url     : chr "https://now-api.parliament.uk/api/Message/message/LordsMain/2025-05-20T15:00:00Z"
##   ..$ method  : NULL
##   ..$ headers : list()
##   ..$ body    : NULL
##   ..$ fields  : list()
##   ..$ options : list()
##   ..$ policies: list()
##   ..$ state   :<environment: 0x00000290ee5b6ca8> 
##   ..- attr(*, "class")= chr "httr2_request"
##  $ cache      :<environment: 0x00000290ee90c688> 
##  - attr(*, "class")= chr "httr2_response"

In the response object above, a status code of 200 (OK) can be seen in the {status_code} variable. The data we want to retrieve (i.e. details on the status of the Chamber) exists as non-human readable raw bytes in the {body} variable. Before doing anything with this data, we must first convert it to another format.

Converting JSON to an R object

{req_perform} above returns the server’s response as an {httr2_response} object. We convert the content of this response to parsed JSON using the {resp_body_json()} function as follows:

parl_response_processed <- parl_response |>
  resp_body_json() |>
  glimpse()

## List of 9
##  $ annunciatorDisabled: logi FALSE
##  $ id                 : int 143768
##  $ slides             :List of 1
##   ..$ :List of 8
##   .. ..$ lines                 :List of 4
##   .. ..$ type                  : chr "Generic"
##   .. ..$ carouselOrder         : int 1
##   .. ..$ carouselDisplaySeconds: NULL
##   .. ..$ speakerTime           : chr "2025-05-20T14:57:00"
##   .. ..$ slideTime             : chr "2025-05-20T14:57:59.746965"
##   .. ..$ soundToPlay           : chr "NewSlide"
##   .. ..$ id                    : int 0
##  $ scrollingMessages  : list()
##  $ annunciatorType    : chr "LordsMain"
##  $ publishTime        : chr "2025-05-20T14:57:59.369"
##  $ isSecurityOverride : logi FALSE
##  $ showCommonsBell    : logi FALSE
##  $ showLordsBell      : logi FALSE

Extracting data

The retrieved data contains a lot of fields. In this example, we are interested in the Peer speaking in the Chamber at the defined time and their unique ID number. By exploring the retrieved information and reading the API’s documentation, we can dive into the nested data and extract the contents of these fields as follows:

# Get Peer name and unique ID
peer_name <- parl_response_processed[["slides"]][[1]][["lines"]][[2]][["member"]][["nameFullTitle"]]
peer_id <- parl_response_processed[["slides"]][[1]][["lines"]][[2]][["member"]][["id"]]

# Output name and ID to screen
cat("Name: ", peer_name, ". ID: ", peer_id, ".", sep ="")

## Name: The Rt Hon. the Baroness Anelay of St Johns DBE. ID: 3474.

Note: the pluck() function provides an alternative, safer way to extract this information e.g.: pluck(parl_response_processed, "slides", 1, "lines", 2, "member", "id", .default = "Value not found").

Member API

Using this unique ID, we can get further information on the Peer using a second API - the members API. As per the API’s documentation, there are ~20 API routes available, each beginning https://members-api.parliament.uk/api/Members.

The route {id}/WrittenQuestions takes a Parliamentarian’s unique ID and returns a list of written questions tabled by that member.

# Set base URL
member_base_url <- "https://members-api.parliament.uk/api/Members"

# Append base URL, perform request, parse response
questions <- request(member_base_url) |>
  req_url_path_append(
    as.character(peer_id), 
    'WrittenQuestions'
  ) |>
  req_perform() |>
  resp_body_json() 

# Get total number of questions from JSON response
total_questions <- questions$totalResults

# Print Peer name and total number of questions to screen
cat(peer_name, "has asked", total_questions, "Parliamentary questions.")

## The Rt Hon. the Baroness Anelay of St Johns DBE has asked 151 Parliamentary questions.

Example 2: plotting weather data from an API

In this second example, weather data is retrieved from the US’s National Weather Service API. The data (time course data for Washington DC) is retrieved using httr2, processed using tidyverse, and plotted using ggplot2.

# Load required libraries
library(httr2)
library(tidyverse)
library(ggplot2)

# Set base URL
NWS_base_url <- 'https://api.weather.gov'

# Append base URL, make request, retrieve data, convert to R object, extract associated API URL
forecast_url <- request(NWS_base_url) |>
  req_url_path_append(
    'points',
    '38.8894,-77.0352'
  ) |> req_perform() |>
  resp_body_json() |>
  pluck('properties', 'forecastHourly')

# Create and perform a new request using extracted API URL
forecast <- request(forecast_url) |>
  req_perform() |>
  resp_body_json() |>

  # Extract time course data
  pluck('properties', 'periods') |>
  
  # Convert list into a structured dataframe with columns for time, temp, etc.
  map_dfr(
    \(x) {
      tibble(
        time = x |> pluck('startTime'),
        temp_F = x |> pluck('temperature'),
        rain_prob = x |> pluck('probabilityOfPrecipitation', 'value'),
        forecast = x |> pluck('shortForecast')
      )
    }
  ) |>
  
  # Changes time value (as exists) into a properly formatted date-time object
  mutate(
    time = time |> ymd_hms()
  )

# Plot data using ggplot
ggplot(forecast, aes(x = time, y = temp_F)) +
  geom_line(color = "blue") +
  labs(title = "Washington DC Temperature",
       x = "Date",
       y = "Temperature (°F)") +
  theme_minimal()