06 January 2010

Transforming Pubmed to Simile/Exhibit with XSLT

Cameron Neylon recently asked on friendfeed:

"Advice request: What is the best approach to exposing a publications list online. Any good tools for generating html/xml/rdfa?".

In the comments, it was suggested to use Simile/Exhibit. This gave me the idea to write a XSLT stylesheet to transform a pubmed xml result to an Exhibit file. This XSLT stylesheet is available at:



Usage

Save your pubmed result as XML in a file named pubmed_result.xml and invoke your favorite xslt processor:
xsltproc pubmed2exhibit.xsl ~/pubmed_result.xml > file.html
.
That's all ! You now have an interactive bibliography in a html file !
Note: the JSON data are embedded in the html file thanks to this hack.

Result

I've tested the stylesheet with the following query. The Exhibit displays and filters (by year/journal/author) the articles:



as well as a timeline:



That's it !

Pierre

3 comments:

Egon Willighagen said...

What is an Exhibit file? Is it HTML+RDFa? What ontology is it using, BIBO?

Pierre Lindenbaum said...

Egon, an Exhibit file is just a regular HTML file... with a lot of javascript. The content is generated dynamically from a JSON object, so AFAIK, there is not much interest for the SemanticWeb (nothing there to be crawled/parsed).

Unknown said...

if when trying to use the XSL and XML together you have an error trying to do this, it means one of two things:

A. You need to get the XML the correct way on pubmed and you have not been. Instead of clicking "display settings" and then "XML," click "send to" and then "file" and then under format click "XSL" and then click create. IT will try to save it as "pubmed_results.txt" rename the file extension to "xml"

B. You have been trying to process the XML by linking the files with firefox when the XML needs to be processed by xsltproc or some other command line tool. For some reason, firefox isn't doing it for me even when I put in the header. Maybe IE works.

C. You are saving the link in this blogpost as an XSL file, when really the link is just a link to an html page which displays the XSL file's contents and previous versions about it. That is, you have to click the link and then get the XSL file from there and save that as pubmed2exhibit.xsl (or whatever you'd like to name it.