Exercise: OCLC WorldCat Search API
Overview
Teaching: 5 min
Exercises: 20 minQuestions
What are the different OCLC APIs and how are they different?
What can you query using the WorldCat Search API
Objectives
Explore creating queries using the WorldCat Search API
Use the WorldCat API to enhance existing bibliographic data
Accomplish the exercises using the Shell or OpenRefine
Using OCLC’s WorldCat Search API
OCLC provides access to there WorldCat database via their Search API. This API will let you search by keywords or by various identifirers (ISBN, ISSN, OCLC, etc.) and retireve the record for each item. It will even let you find which libraries hold copies of each item.
- Documentation
- More on Library Locations
- The OCLC API requires an API key to authenticate your queries. Your instructor will provide you with an API key for the exercises below.
Creating a query of OCLC WOrldCat Search API
We want to write a query to retrieve the MARCXML records for an item. Use the link to documentation above to create your query.
- Which ‘Operation’ should you use to find mulitple records in MARCXML?
- What is the base URL for the query?
- Which parameters are required? Which parameters aren’t required, but might need to be considered?
- Which query index/indices do you need to use?
- Write a query returning all records associated with the ISBN 9781442237360, test your query in a web browser, then share your query in the etherpad. What title did you find? (Hint: turn off frbrGrouping)
- Write a query returning records associated with the OCLC number 466135850, test your query in a web browser, then share your query in the etherpad. What title did you find? (Hint: turn off frbrGrouping)
Solution
- SRU
- http://www.worldcat.org/webservices/catalog/search/sru?query={CQLQuery}
- Required: query, wskey. Optional: all could be useful depending on your goals.
- ISBN, srw.bn
- Nobody expects the Spanish Inquisition
- Urgh!: a music war
OpenRefine or Shell?
You can complete the last two exercises in OpenRefine or Shell. The next section starts with querying OCLC with OpenRefine; the Shell section is below.
OpenRefine
New GREL Functions
In addition to the functions we’ve already covered in the OpenRefine Lesson, these GREL functions will help you complete the exercises in this lesson. You can find the link to the documentation for each function below:
Fetch MARCXML results (OpenRefine)
We’ll start by using the query you wrote in the previous exercise, but expand it to query our csv of titles & ISBNs.
- Create a new project in OpenRefine using the provided data. You should have 1 column: ISBN.
- Add a new column called fetchOCLC that contains the full MARCXML record for each title.
- What steps did you take to create column fetchOCLC?
- How did you change your query to work in OpenRefine?
Solution
- & 2. Ask your instructor for the solution
Now that we fetched data from our API, we need to parse the resulting XML to pull our information.
Parse MARCXML results (OpenRefine)
Each MARC record retrieved can contain multiple items from WorldCat associated with the ISBN. We need to parse the XML to pull out data about each item. The steps below will walk through creating a new column that contains every OCLC number associated with each item in our file.
- In the fetchOCLC column, select ‘Edit column’ -> ‘Add column based on this column…’
- New column name: OCLCNumber
- Add the expression (don’t click OK yet):
value.parseXml().select("controlfield[tag=001]")
- We use
select()
to select the element with the OCLC number.- The OCLC number is found in an element called controlfield with an attribute called tag that is equal to 001.
- The preview shows us our results. We should see an array of XML elements that contain the OCLC numbers associated with each ISBN.
- Change your expression to:
forEach(value.parseXml().select("controlfield[tag=001]"),v,v.xmlText())
- To pull out each number, we need to use a forEach loop (like the Shell for loop).
forEach()
takes 3 arguements: an array, a variable, and variable.function()value.parseXml().select("controlfield[tag=001]")
is our arrayv
is our variablev.xmlText()
is our function- The results give us an array of OCLC numbers.
- OpenRefine won’t save an array in a cell, what would we add so the numbers are seperated by the pipe symbol | instead of an array?
- Click OK!
MORE OCLC in OpenRefine!
We worked through the previous example together, this time you’re on your own. Figure out how to answer the questions below using the provided data and the OCLC API.
- Create a column with each callNo associated with the ISBN. What expression did you use? (Bonus if you can include only the unique callNos in your results)
- Create a column with the number of records returned for each ISBN. Which title returns the most records?
- Write and run a query that will return only the OCLC MARCXML records for each item that Yale has a copy (Yale Library holdings: ‘YUS’). Record your query in the etherpad.
- Advacned: Using the results from question 3, add a new column that will contain “TRUE” if Yale has a copy of an item and “FALSE” if Yale doesn’t. Copy your expression to etherpad. (hint: try
if()
function)Solution
forEach(value.parseXml().select("datafield[tag=050]"),v,v.xmlText()).uniques().join(" | ")
value.parseXml().select("numberOfRecords")[0].xmlText()
, “Brief history of death / W.M. Spellman.” has 12 records"http://www.worldcat.org/webservices/catalog/search/sru?wskey={API-KEY}&query=srw.bn=" + value + "+AND+srw.li=YUS&frbrGrouping=off"
if(value.parseXml().select("numberOfRecords")[0].xmlText().toNumber() > 0, "TRUE","FALSE")
Shell
Fetch MARCXML results (Shell)
- Use the query written in the first exercise to pull the MARCXML for the item with the ISBN 9781442237360. Run this query in the shell and save the MARCXML results as a file. What commands did you use? (hint: put your query URL in “quotes”)
- Create a text file named 520.txt that contains the Summary field (520) for each of the records in our MARCXML. What command did you use?
Solution
curl "http://www.worldcat.org/webservices/catalog/search/sru?wskey={API.KEY}&query=srw.bn=9781442237360&frbrGrouping=off" > output.xml
curl "http://www.worldcat.org/webservices/catalog/search/sru?wskey={API.KEY}&query=srw.bn=9781442237360&frbrGrouping=off" | xmlstarlet sel -t -v '//*[@tag="520"]//text()' > 520.txt
More advanced queries in Shell
We worked on retrieving one result at a time in the Shell, but we can scale up this process using loops. Please download/use the file isbn.txt for this exercise.
- Write a loop in BASH that will open the file isbn.txt, query for each ISBN, and save each result as a unique MARCXML file.
- Write a command that will open the file isbn.txt, query for each ISBN, and save the subject headings in a single file called subjects.txt (hint: append a file).
Solution
cat isbn.txt | while read isbn; do curl "http://www.worldcat.org/webservices/catalog/search/sru?wskey={API.KEY}&frbrGrouping=off&query=srw.bn=$isbn" > $isbn.xml; done
cat isbn.txt | while read isbn; do curl "http://www.worldcat.org/webservices/catalog/search/sru?wskey={API.KEY}&frbrGrouping=off&query=srw.bn=$isbn" | xmlstarlet sel -t -v '//*[@tag="650"]//text()' >> subjects.txt; done
Key Points