LSID Tester, a tool for testing Life Science Identifier resolution services
© Page. 2008
Received: 18 January 2008
Accepted: 18 February 2008
Published: 18 February 2008
Life Science Identifiers (LSIDs) are persistent, globally unique identifiers for biological objects. The decentralised nature of LSIDs makes them attractive for identifying distributed resources. Data of interest to biodiversity researchers (including specimen records, images, taxonomic names, and DNA sequences) are distributed over many different providers, and this community has adopted LSIDs as the identifier of choice.
LSID Tester is a web application written in PHP. Given a LSID the application performs seven tests, reporting the results at each step. If all tests are successful the metadata associated with the LSID is displayed, and can be viewed in a range of formats.
The software provides a tool for testing a LSID resolution service.
A key prerequisite for integrating biological information from diverse sources is the use of globally unique identifiers (GUIDs) to consistently identify objects . One approach to deploying GUIDs is to provide a central authority for assigning and resolving identifiers. This is the strategy adopted by many academic publishers through CrossRef , which manages Digital Object Identifiers (DOIs)  for journal articles. In some cases a field may be dominated by a single data provider which issues de-facto GUIDs, for example the genomics community uses GenBank accession numbers to identify molecular sequences. However, neither approach works well for the biodiversity community , which has large numbers of globally distributed data providers serving diverse kinds of information such as taxonomic names, specimen records, images, and DNA sequences. At the time of writing the Global Biodiversity Information Facility (GBIF)  lists some 214 biodiversity data providers, serving a total of 41,139,985 records, mostly of museum specimens. After reviewing various options for GUIDs, the Biodiversity Information Standards (TDWG) organisation  has recommended the use of LSIDs.
Life Science Identifiers (LSIDs) were developed to provide globally unique identifiers for objects in biological databases . Although within mainstream bioinformatics relatively few "early adopters" have deployed LSIDs , the biodiversity informatics community has adopted LSIDs as its GUID of choice . Among the attractions are the distributed nature of the identifier (no central authority is required for registering or resolving identifiers), the low cost, and the convention that resolving a LSID returns metadata as RDF . The later facilitates integrating information from multiple sources using tools being developed for the Semantic Web .
The widely distributed nature of biodiversity data has implications for deploying global identifiers. Providers are unlikely to run a single type of web server, nor are they likely to all use the same web application software. Consequently, there are multiple versions of LSID server software available, including Java, Perl, and .NET implementations . Developers porting servers to new computer programming languages would benefit from having a tool available to test their implementation. Data providers implementing a LSID server would benefit from having a tool to test whether their installation is functioning correctly. The LSID Tester was developed with these two audiences in mind. It is a simple web-based application that tests a LSID service and provides a detailed report on how well the service conforms to the LSID specification .
The LSID Tester is written in the PHP programming language, and makes use of the PEAR Net_DNS module  written by Eric Kilfoil for LSID resolution discovery. The application caches authority WSDL files and metadata for 24 hours. Metadata is displayed in alternative formats using XSL style sheets, including Oliver Becker's XML to HTML Verbatim Formatter . Graphical displays of metadata use a RDF parser from ARC  and require GraphViz  to generate the graphs.
Results and Discussion
A LSID client, such as LSID Tester, resolves a LSID in four steps. Firstly the client discovers the location of the service that can resolve the LSID, for example by querying the DNS service records to find the hostname and TCP/IP service port for the LSID authority. Given the LSID urn:lsid:ubio.org:namebank:11815, querying the DNS for the SRV record for _lsid._tcp.ubio.org returns animalia.ubio.org:80 as the location of the ubio.org LSID service.
Knowing the location of the LSID service, the client appends '/authority/' to the service location, and retrieves the authority WSDL file . This file defines the LSID resolution service, including location and bindings. The LSID standard  defines bindings for SOAP, HTTP GET, and FTP. The HTTP GET binding is the mostly widely used, and is the only one the LSID Tester supports at present. For the LSID urn:lsid:ubio.org:namebank:11815 the HTTP GET binding is http://animalia.ubio.org.
Given the authority WSDL, a LSID client uses its preferred protocol (SOAP, HTTP GET, FTP) to retrieve a second WSDL file (the service WSDL) that specifies how the metadata and/or data corresponding to the LSID can be retrieved. For the LSID urn:lsid:ubio.org:namebank:11815 metadata can be obtained via HTTP GET from http://animalia.ubio.org/authority/metadata.php.
The client can now retrieve the metadata associated with the LSID by appending ?lsid=urn:lsid:ubio.org:namebank:11815 to this URL.
The LSID Tester performs seven main tests:
1. Is the LSID correctly formed?
2. Is the resolution service discoverable?
3. Can it retrieve the authority WSDL?
4. Does the authority WSDL define a HTTP GET binding for the service WSDL?
5. Can it retrieve the service WSDL?
6. Does the service WSDL define a HTTP GET binding for the metadata?
7. Can it retrieve the metadata for the LSID?
LSID Tester is a web application for testing LSID resolution services. Given a LSID the application performs seven tests, reporting the results at each step. If all tests are successful the metadata associated with the LSID is displayed, and can be viewed in a range of formats.
Availability and requirements
Project Name: LSID Tester
Project Home Page: Source code is available from http://code.google.com/p/lsid-php/, and an instance of the application can be viewed at http://linnaeus.zoology.gla.ac.uk/~rpage/lsid/tester.
Operating System: Mac OS X, Linux
Programming Language: PHP
Other Requirements: Web server, GraphViz
License: GNU General Public License version 2
Any restrictions to use by non-academics: None
This work was partly funded by BBSRC grant BB/C004310/1. I thank early users of the LSID Tester for feedback, especially Damian Barnier and Nicky Nicolson, and the two anonymous reviewers for catching some errors in the original text.
- Clark T, Martin S, Liefeld T: Globally distributed object identification for biological knowledgebases. Briefings in Bioinformatics 2004,5(1):59–70.View ArticlePubMed
- CrossRef [http://www.crossref.org]
- The Digital Object Identifier system [http://www.doi.org/]
- Sarkar IN: Biodiversity informatics: organizing and linking information across the spectrum of life. Briefings in Bioinformatics 2007, 8:347–357.View ArticlePubMed
- Global Biodiversity Information Facility [http://www.gbif.org]
- Biodiversity Information Standards (TDWG) [http://www.tdwg.org/]
- Martin S, Hohman MM, Liefeld T: The impact of Life Science Identifier on informatics data. Drug Discovery Today 2005, 10:1566–1572.View ArticlePubMed
- Resource Description Framework (RDF) [http://www.w3.org/RDF/]
- Page RDM: Taxonomic names, metadata, and the Semantic Web. [http://jbi.nhm.ku.edu/index.php/jbi/article/view/25] Biodiversity Informatics 2006., 3:
- Universal Biological Indexer and Organizer [http://www.ubio.org]
- The LSID (Life Sciences Identifier) Project [http://lsids.sourceforge.net]
- The Object Management Group: Life Sciences Identifiers Specification, Version 1.0 [http://www.omg.org/cgi-bin/doc?formal/04-12-01] 2004.
- PEAR Net_DNS [http://pear.php.net/package/Net_DNS]
- Oliver's XSLT page [http://www2.informatik.hu-berlin.de/~obecker/XSLT/]
- ARC: RDF classes for PHP [http://arc.semsol.org/]
- Graphviz – Graph Visualization Software [http://www.graphviz.org/]
- Web Services Description Language (WSDL) 1.1 [http://www.w3.org/TR/wsdl]
- W3C RDF Validation Service [http://www.w3.org/RDF/Validator/]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.