Search/Retrieve web services

From refbase

Contents

refbase web service

refbase >v0.8.0 supports the SRU (Search/Retrieve via URL) standard search protocol for Internet search queries. SRU utilizes CQL (Common Query Language), a standard query syntax for representing queries. Both standards are developed by the Library of Congress and provide a generic API for searching a data repository and a mechanism for returning metadata records.

About the SRU/W web services-based protocols

SRU defines a web service for searching databases. SRU is a companion protocol of SRW which is a SOAP version of the protocol. SRU can be regarded as a RESTful version of SRW, since all the requests are simple URLs instead of XML documents being sent via some sort of transport layer.

SRU defines three basic operations: explain, scan and searchRetrieve which define the requests and responses in an SRU interaction. Some simple examples are presented here.

refbase SRU server

Currently, the refbase SRU server (sru.php) supports explain and searchRetrieve operations (but not scan) and returns records as MODS XML wrapped into SRW XML. See the SRU web site for a description of the elements of a searchRetrieve response.

The refbase SRU server can be used as a remote back-end database for the CiteProc processor of the XBib project. CiteProc is a comprehensive solution for bibliographic and citation formatting. See here and here for a description and usage example that shows how to integrate refbase with CiteProc.

Implementation notes

The refbase SRU server allows to query all global refbase fields (from the refs MySQL table) - the given index name must either match one of the 'set.index' names listed in the explain response (sru.php?operation=explain) or match a refbase field name directly. If no index name is given the serial field will be searched by default.

Note that for valid queries (i.e., if the version & query parameters are present in the query), operation=searchRetrieve is assumed if omitted. Additionally, only recordPacking=xml and recordSchema=mods are supported and sru.php will use these settings by default if not given in the query. Data will be returned together with a default stylesheet if the stylesheet parameter wasn't given in the query. XPath, sort and result sets are currently not supported and only SRW version 1.1 is recognized. Also note that, currently, sru.php allows only for a limited set of CQL queries and future versions may offer support for the boolean CQL operators 'and/or/not' and parentheses.

Usage examples

Below are some working examples of an online refbase SRU Server:

You can call sru.php without any parameters or with the operation=explain parameter to retrieve a standard explain response which tells the requesting client the settings and features provided by the refbase web service:

sru.php
sru.php?
sru.php?operation=explain

Here's a simply query that asks for all database entries where the creator index (i.e., the refbase author field) contains "Schmid". Found data are returned as SRW+MODS XML:

sru.php?version=1.1&query=dc.creator=schmid

Note that the version=1.1 and query parameters are mandatory and that the query string of the query parameter is specified as CQL. Mike Taylor has written a nice introduction to CQL. The standard explain response (see above example) lists all available indexes and their corresponding refbase fields.

As mentioned above, a simple default stylesheet will be returned with the response if the stylesheet parameter is omitted in the SRU request. However, you can supply your own stylesheet as follows (note that the given stylesheet just serves as an example and is of no real-world use since it simply strips all XML tags):

sru.php?version=1.1&query=dc.creator=schmid&stylesheet=xml2html.xsl

You can suppress any stylesheets by including the stylesheet parameter without a value:

sru.php?version=1.1&query=dc.creator=schmid&stylesheet=

By default, refbase will return as many records as specified by the admin in variable $defaultNumberOfRecords in initialize/ini.inc.php. You can use the startRecord and maximumRecords parameters to explicitly define the first record and the maximum number of records that shall be returned:

sru.php?version=1.1&query=dc.creator=schmid&startRecord=10&maximumRecords=10

Here are some other SRU queries that should give you some ideas about what's possible (all given queries are valid but the last one in each set is the preferred one since it uses standard 'set.index' names):

  • return record with serial number 1:
sru.php?version=1.1&query=1
sru.php?version=1.1&query=serial=1
sru.php?version=1.1&query=rec.identifier=1
  • find all records where the title field contains either "ecology" OR "diversity":
sru.php?version=1.1&query=title any ecology diversity
sru.php?version=1.1&query=dc.title any ecology diversity
  • find all records where the author field contains both "dieckmann" AND "thomas":
sru.php?version=1.1&query=author all dieckmann thomas
sru.php?version=1.1&query=dc.creator all dieckmann thomas
  • find all records where the publication field equals EXACTLY "Marine Ecology Progress Series":
sru.php?version=1.1&query=publication exact Marine Ecology Progress Series
  • find all records where the year field is greater than or equals "2005":
sru.php?version=1.1&query=year>=2005
sru.php?version=1.1&query=dc.date>=2005
  • find records with serial numbers 1, 123, 499, 612, 21654 & 23013 but return only the three last records:
sru.php?version=1.1&query=1 123 499 612 21654 23013&startRecord=4&maximumRecords=3
  • same as above, but return just the number of found records (and not the full record data):
sru.php?version=1.1&query=1 123 499 612 21654 23013&maximumRecords=0