Exercise: NCBI E-utilities
Overview
Teaching: 5 min
Exercises: 20 minQuestions
How can we use the Entrez Programming Utilities (E-utilities) to search across the Entrez Molecular Sequence Database System?
Objectives
Introduce the Entrez Molecular Sequence Database System (Entrez) and the databases it includes.
Provide sources for documentation and more information about using the E-utilities.
Develop API calls answer research questions using data pulled from Entrez through the E-utilities.
Background Information
Entrez Molecular Sequence Database System (Entrez)
Entrez is a molecular biology database system that provides integrated access to nucleotide and protein sequence data, genomic mapping informaiton, 3D structure data, PubMed MEDLINE, and more. This system is produced by the National Center for Biotechnology Information (NCBI).
Entrez is NCBI’s primary text search and retreival system that integrates the PubMed database of biomedical literature with 38 other literature and molecular databases
The web based search interface for these NCBI databases is avaiable to the public here, through the U.S. National Library of Medicine.
Databases included in Entrez
You can find a full list of Entrez databases listed here.
Entrez Programming Utilities (E-utilities)
The E-utilities are made up of 9 programs that provide access to Entrez. You can find a list of these 9 programs in the table below. The information shown in this table was taken from Eric Sayers A General Introduction to the E-utilities.
E-utilities | Query string (base URL for the API) | Use |
---|---|---|
EInfo | https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi | Provides the number of records indexed in each field of a given database, the date of the last update of the database, and the available links from the database to other Entrez databases. |
ESearch | https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi | Responds to a text query with the list of matching UIDs in a given database (for later use in ESummary, EFetch or ELink), along with the term translations of the query. |
EPost | https://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi | Accepts a list of UIDs from a given database, stores the set on the History Server, and responds with a query key and web environment for the uploaded dataset. |
ESummary | https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi | Responds to a list of UIDs from a given database with the corresponding document summaries. |
EFetch | https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi | Responds to a list of UIDs in a given database with the corresponding data records in a specified format. |
ELink | https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi | Responds to a list of UIDs in a given database with either a list of related UIDs (and relevancy scores) in the same database or a list of linked UIDs in another Entrez database; checks for the existence of a specified link from a list of one or more UIDs; creates a hyperlink to the primary LinkOut provider for a specific UID and database, or lists LinkOut URLs and attributes for multiple UIDs. |
EGQuery | https://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi | Responds to a text query with the number of records matching the query in each Entrez database. |
ESpell | https://eutils.ncbi.nlm.nih.gov/entrez/eutils/espell.fcgi | Retrieves spelling suggestions for a text query in a given database. |
ECitMatch | https://eutils.ncbi.nlm.nih.gov/entrez/eutils/ecitmatch.cgi | Retrieves PubMed IDs (PMIDs) corresponding to a set of input citation strings. |
E-utilities Documentation
Basic Searching
API string: esearch.fcgi?db=
&term= Input: Entrez database (&db); Any Entrez text query (&term)
Output: List of UIDs matching the Entrez query
Example: Get the PubMed IDs (PMIDs) for articles about breast cancer published in Science in 2008
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=science[journal]+AND+breast+cancer+AND+2008[pdat]
Basic Downloading
API string: efetch.fcgi?db=
&id= &rettype= &retmode= Input: List of UIDs (&id); Entrez database (&db); Retrieval type (&rettype); Retrieval mode (&retmode)
Output: Formatted data records as specified
Example: Download nuccore GIs 34577062 and 24475906 in FASTA format
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=34577062,24475906&rettype=fasta&retmode=text
Getting Database Statistics and Search Fields
API string: einfo.fcgi?db=
Input: Entrez database (&db)
Output: XML containing database statistics
Note: If no database parameter is supplied, einfo will return a list of all valid Entrez databases.
Example: Find database statistics for Entrez Protein.
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=protein
Exercises
Find Pubmed articles
Use E-utility ESearch to perform a search about the condition Cold Uricaria in PubMed where the results meet the following parameters:
- The results are sorted by publication date
- The results contain cold urticaria in the title
Solution
- https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pmc&term=cold+urticaria&field=title&sort=pub+date
Gather statistics on an Entrez database
- Identify the E-utility you would use to complete this task.
- Identify the database you would like to view the statistics for.
- Write the API call.
Solution
- Here is an example solution:
- https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pubmed
- Share your solution in the etherpad!
Perform a search across all Entrez databases
Perform a global Entrez search to determine racial/ethnic representation across database contents.
- Identify which E-utility to use for this task.
- Identify how you will conceptualize and categorize racial/ethnic groupings for this task.
- NIH racial and ethnic categories include American Indian or Alaska Native, Asian, Black or African American, Hispanic or Latino, Native Hawaiian or Other Pacific Islander, and White. See NOT-OD-15-089 for more details.
- Write and run the API strings that would enable you to uncover these representations.
Solution
Here is an example of API calls that could provide you with summary data based on NIH racial and ethnic categories. (There are many potential solutions to this exercise!)
- https://eutils.ncbi.nlm.nih.gov/gquery?term=african+AND+black&retmode=xml
- https://eutils.ncbi.nlm.nih.gov/gquery?term=white&retmode=xml
- https://eutils.ncbi.nlm.nih.gov/gquery?term=hispanic+OR+latino&retmode=xml
- https://eutils.ncbi.nlm.nih.gov/gquery?term=native+hawaiian+OR+pacific+islander&retmode=xml
Design your own research question
Develop a research question about databases included in Entrez, or about the data held within the databases. Review the table of E-utilities, their query strings, and use cases above.
- Please share your research question in the class etherpad in the designated section.
- Which E-utilitiy would provide you with the type of data you could use to answer your question?
- Which databases would you need to query with your choses E-utility in order to answer your research question?
- Try writing the APIs that would provide you with the data that would answer your research question!
Solution
- Did it work?
- If it worked, please share your API calls in the etherpad by your research question.
- If it did not work, consider if the problem is conceptual or technical and share your thoughts in the eatherpad near your research question.
Key Points
By linking to the NCBI Entrez system through the E-utilities, you can make complicated data requests across a huge dataset.