API Queries in Shell

Overview

Teaching: 50 min
Exercises: 20 min
Questions
  • How do I fetch data from API’s using the Unix shell?

  • How do I process API responses to get plain text, usable by standard Unix shell tools?

Objectives
  • Understand how to fetch data from APIs using the Unix shell

  • Choose and operate tools to convert API response to plain text

Fetching data from the web

With a URL query in hand, the Unix shell provides a powerful command line driven interface for processing and interacting with large amounts of text data. As the response from many APIs will be in a text-based format, such as JSON or XML, various shell tools will provide us with the functionality to acquire data and process the response, allowing for easy use of the large existing set of Unix tools.

Shell Tools

Building on the skill from Unix shell lesson, we add the following command line programs for working with APIs

In this lesson we will use curl command for interacting with our API endpoints and either jq or xmlstarlet to process the return based on the type response that is given by the endpoint. In both cases we use the Unix shell pipe | to redirect the output of the curl command into the next for processing.

API Get Requests

A tool for transferring data via URL, the curl command allows use to interact with API endpoints from the command line. A full-featured tool, the curl command has built-in functionality to work with most types of API endpoint and authentication schemes. While our examples using the Yale University Library Voyager API is does not require authentication, the full manual of usage options are available on the command line by entering man curl

Using the Voyager API we enter the command curl followed the URL for the Bibliographic item with the ISBN 9780415704953

$ curl https://libapp.library.yale.edu/VoySearch/GetBibItem?isxn=9780415704953
{"record":[{
"title":"Animism and the question of life /",
"author":"Praet, Istvan, 1974-",
"pdescription":"xiv, 198 pages ; 24 cm.",
"publisher":"Routledge,",
"pubplace":"New York :",
"pubdate":"2014.",
"isxn":"9780415704953",
"oclcmrn":"ocn853435847",
"bibid":"11736943",
"items":[
{"loccode":"sml","itemenum":"NA","itypename":"Circulating","callno":"GN471 .P73X 2014 (LC)","itemstatus":"Not Charged","barcodestatus":"Active","itemchron":"NA","itemid":"10677986","itemformat":"SPM","itypecode":"circ","itemspinelabel":"NA","itemstat":"NA","locname":"SML, Stacks, LC Classification","barcode":"39002123260565","mfhdid":"11858762","availdate":"NA"}
]
}]}

The API response can be written to a file using the angled bracket > followed by a file name:

$ curl https://libapp.library.yale.edu/VoySearch/GetBibItem?isxn=9780415704953 > file.txt

Alternatively, the output can be piped to another command for futher action:

$ curl https://libapp.library.yale.edu/VoySearch/GetBibItem?isxn=9780415704953 | wc -l

This example uses the wc command with the -l` option to count the number of lines in the response.

JSON filtering with jq

The jq command transforms JSON content through selection and filtering. Using the Unix shell “pipe” | allows us to take the API response that we receive from our curl command, and process it in various ways with jq

Pretty-print with jq .

Be default, the JSON returned in most API calls has the white-space removed, while this is more efficient for transfer and makes no operational difference, it makes the data difficult for humans to read. The jq tool can be used to pretty-print, or format the JSON in a human-readable way, using the . command.

$ curl https://libapp.library.yale.edu/VoySearch/GetBibItem?isxn=9780415704953  | jq '.'
{
  "record": [
    {
      "title": "Animism and the question of life /",
      "author": "Praet, Istvan, 1974-",
      "pdescription": "xiv, 198 pages ; 24 cm.",
      "publisher": "Routledge,",
      "pubplace": "New York :",
      "pubdate": "2014.",
      "isxn": "9780415704953",
      "oclcmrn": "ocn853435847",
      "bibid": "11736943",
      "items": [
        {
          "loccode": "sml",
          "itemenum": "NA",
          "itypename": "Circulating",
          "callno": "GN471 .P73X 2014 (LC)",
          "itemstatus": "Not Charged",
          "barcodestatus": "Active",
          "itemchron": "NA",
          "itemid": "10677986",
          "itemformat": "SPM",
          "itypecode": "circ",
          "itemspinelabel": "NA",
          "itemstat": "NA",
          "locname": "SML, Stacks, LC Classification",
          "barcode": "39002123260565",
          "mfhdid": "11858762",
          "availdate": "NA"
        }
      ]
    }
  ]
}

Filter by Key

jq can be used to select the values of a specified key, or keys. This allows for specific pieces of data, from the overall JSON object to be filtered and returned.

In this example, we return just the title.

$ curl https://libapp.library.yale.edu/VoySearch/GetBibItem?isxn=9780415704953  | jq '.record[].title'
"Animism and the question of life /"

Since the Voyager API JSON response begins with the root “record” key, with use the element key .record followed by the open and close square brackets[] to put all of the child elements in an array, the final selector is the .title to select all of the title elements for items in the this JSON response.

We can specific multiple keys in a single filter, In this example, we return the title and author by specified both keys, separated by a comma.

$ curl https://libapp.library.yale.edu/VoySearch/GetBibItem?isxn=9780415704953  | jq '.record[].title, .record[].author'

"Animism and the question of life /"
"Praet, Istvan, 1974-"

XML Processing

$ curl https://libapp.library.yale.edu/VoySearch/GetBibMarc?isxn=9780415704953 
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01082pam a2200361 i 4500</leader><controlfield tag="001">11736943</controlfield><controlfield tag="005">20131211185012.0</controlfield><controlfield tag="008">130405s2014    nyua     b    001 0 eng  </controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">  2013013852</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780415704953</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0415704952</subfield></datafield><datafield tag="024" ind1="8" ind2=" "><subfield code="a">40022828439</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)ocn864945837</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(NhCcYBP)  2013013852</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">11736943</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DLC</subfield><subfield code="b">eng</subfield><subfield code="e">rda</subfield><subfield code="c">DLC</subfield><subfield code="d">OCLCO</subfield><subfield code="d">YDXCP</subfield><subfield code="d">NhCcYBP</subfield></datafield><datafield tag="042" ind1=" " ind2=" "><subfield code="a">pcc</subfield></datafield><datafield tag="050" ind1="0" ind2="0"><subfield code="a">GN471</subfield><subfield code="b">.P73 2014</subfield></datafield><datafield tag="079" ind1=" " ind2=" "><subfield code="a">ocn853435847</subfield></datafield><datafield tag="090" ind1=" " ind2=" "><subfield code="a">GN471</subfield><subfield code="b">.P73X 2014 (LC)</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Praet, Istvan,</subfield><subfield code="d">1974-</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Animism and the question of life /</subfield><subfield code="c">Istvan Praet.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">New York :</subfield><subfield code="b">Routledge,</subfield><subfield code="c">2014.</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">xiv, 198 pages ;</subfield><subfield code="c">24 cm.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">text</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">unmediated</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">volume</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Routledge studies in anthropology ;</subfield><subfield code="v">15</subfield></datafield><datafield tag="504" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references and index.</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Animism.</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Anthropology of religion.</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Life.</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Routledge studies in anthropology ;</subfield><subfield code="v">15.</subfield></datafield></record></collection>

Converting response from Voyager API to text, select element

Pretty-print with xmlstarlet

$ curl -s https://libapp.library.yale.edu/VoySearch/GetBibMarc?isxn=9780415704953 | xmlstarlet format --indent-tab

<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
  <record>
    <leader>01082pam a2200361 i 4500</leader>
    <controlfield tag="001">11736943</controlfield>
    <controlfield tag="005">20131211185012.0</controlfield>
    <controlfield tag="008">130405s2014    nyua     b    001 0 eng  </controlfield>
    <datafield tag="010" ind1=" " ind2=" ">
      <subfield code="a">  2013013852</subfield>
    </datafield>
    <datafield tag="020" ind1=" " ind2=" ">
      <subfield code="a">9780415704953</subfield>
    </datafield>
    <datafield tag="020" ind1=" " ind2=" ">
      <subfield code="a">0415704952</subfield>
    </datafield>
    <datafield tag="024" ind1="8" ind2=" ">
      <subfield code="a">40022828439</subfield>
    </datafield>
    <datafield tag="035" ind1=" " ind2=" ">
      <subfield code="a">(OCoLC)ocn864945837</subfield>
    </datafield>
    <datafield tag="035" ind1=" " ind2=" ">
      <subfield code="a">(NhCcYBP)  2013013852</subfield>
    </datafield>
    <datafield tag="035" ind1=" " ind2=" ">
      <subfield code="a">11736943</subfield>
    </datafield>
    <datafield tag="040" ind1=" " ind2=" ">
      <subfield code="a">DLC</subfield>
      <subfield code="b">eng</subfield>
      <subfield code="e">rda</subfield>
      <subfield code="c">DLC</subfield>
      <subfield code="d">OCLCO</subfield>
      <subfield code="d">YDXCP</subfield>
      <subfield code="d">NhCcYBP</subfield>
    </datafield>
    <datafield tag="042" ind1=" " ind2=" ">
      <subfield code="a">pcc</subfield>
    </datafield>
    <datafield tag="050" ind1="0" ind2="0">
      <subfield code="a">GN471</subfield>
      <subfield code="b">.P73 2014</subfield>
    </datafield>
    <datafield tag="079" ind1=" " ind2=" ">
      <subfield code="a">ocn853435847</subfield>
    </datafield>
    <datafield tag="090" ind1=" " ind2=" ">
      <subfield code="a">GN471</subfield>
      <subfield code="b">.P73X 2014 (LC)</subfield>
    </datafield>
    <datafield tag="100" ind1="1" ind2=" ">
      <subfield code="a">Praet, Istvan,</subfield>
      <subfield code="d">1974-</subfield>
    </datafield>
    <datafield tag="245" ind1="1" ind2="0">
      <subfield code="a">Animism and the question of life /</subfield>
      <subfield code="c">Istvan Praet.</subfield>
    </datafield>
    <datafield tag="264" ind1=" " ind2="1">
      <subfield code="a">New York :</subfield>
      <subfield code="b">Routledge,</subfield>
      <subfield code="c">2014.</subfield>
    </datafield>
    <datafield tag="300" ind1=" " ind2=" ">
      <subfield code="a">xiv, 198 pages ;</subfield>
      <subfield code="c">24 cm.</subfield>
    </datafield>
    <datafield tag="336" ind1=" " ind2=" ">
      <subfield code="a">text</subfield>
      <subfield code="2">rdacontent</subfield>
    </datafield>
    <datafield tag="337" ind1=" " ind2=" ">
      <subfield code="a">unmediated</subfield>
      <subfield code="2">rdamedia</subfield>
    </datafield>
    <datafield tag="338" ind1=" " ind2=" ">
      <subfield code="a">volume</subfield>
      <subfield code="2">rdacarrier</subfield>
    </datafield>
    <datafield tag="490" ind1="1" ind2=" ">
      <subfield code="a">Routledge studies in anthropology ;</subfield>
      <subfield code="v">15</subfield>
    </datafield>
    <datafield tag="504" ind1=" " ind2=" ">
      <subfield code="a">Includes bibliographical references and index.</subfield>
    </datafield>
    <datafield tag="650" ind1=" " ind2="0">
      <subfield code="a">Animism.</subfield>
    </datafield>
    <datafield tag="650" ind1=" " ind2="0">
      <subfield code="a">Anthropology of religion.</subfield>
    </datafield>
    <datafield tag="650" ind1=" " ind2="0">
      <subfield code="a">Life.</subfield>
    </datafield>
    <datafield tag="830" ind1=" " ind2="0">
      <subfield code="a">Routledge studies in anthropology ;</subfield>
      <subfield code="v">15.</subfield>
    </datafield>
  </record>
</collection>

Filter by Element Tag

XPath can be used along with xmlstarlet to select the values of a specified element. This allows for specific pieces of data, from the overall XML document to be filtered and returned.

In this example, we return just the subject headings from the MARCXML record:

$ curl -s https://libapp.library.yale.edu/VoySearch/GetBibMarc?isxn=9780415704953 | xmlstarlet  sel -t -v '//*[@tag="650"]//text()' 
Animism.
Anthropology of religion.
Life.

Automating with Loops

Loops are a powerful method for working with APIs using the shell tools. Building on the Automating the tedious with loops Lesson from yesterday, we can extract specified by making multiple API calls.

In the example below, we pipe | together several shell commands and introduce the while loop to call the Voyager API and return the title for a list of items.

$ cat ISBN.txt | while read line; 
    do curl -s https://libapp.library.yale.edu/VoySearch/GetBibItem?isxn=$line  | jq '.record[].title'; 
    done

Let’s breakdown this command and examine what function each piece is performing.

First we will need a file containing a list of ISBN numbers which we will pass to the Voyager API to identify which items we would like bibliographic information about.

$ cat isbn.txt |

We use the cat command to read the contents of the file, followed by the pipe symbol to send to the next part of our command:

while read line;

The while keyword initiates a loop to that uses the read command to assign the content of each line in the text file to the variable line, one line at a time. Similar to a for loop, the while loop repeats this action, in this case, until all lines in the file have been read.

do curl -s https://libapp.library.yale.edu/VoySearch/GetBibItem?isxn=$line |

In the body of the loop, we use our curl command to query the Voyager API endpoint. We call this command using the $line variable in place of the ISBN number in the URL, this allows for the value read from the text file be substituted in each iteration of the loop. We then pipe the API response to the final part of our command:

jq '.record[].title'; done

This portion of the command uses the jq tool to select the title for the JSON returned for each item.

Key Points

  • You can request and process data from APIs using Unix shell tools

  • JSON or XML results can be parsed to plain text, to be consumed by standard Unix shell tools