25 November 2008

Taxonomy and Semantic Web: writing an extension for ARQ/SPARQL

In this post I'll show how I've implemented a custom function in ARQ, the SPARQL/Jena engine for querying a RDF graph. The new function implemented tests if a node in the NCBI-taxonomy hierarchy as a given ancestor.

Requirements


Here are a sample of the very first lines of nodes.dmp: the first column is the node-id of the taxon, the second column is its parent-id.
cat nodes.dmp | cut -c 1-20 | head
1 | 1 | no rank | |
2 | 131567 | superki
6 | 335928 | genus |
7 | 6 | species | AC
9 | 32199 | species
10 | 135621 | genus
11 | 10 | species |
13 | 203488 | genus
14 | 13 | species |
16 | 32011 | genus |



The input


our input is a RDF file:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:tax="http://species.lindenb.org"
>

<tax:Individual rdf:about="http://fr.wikipedia.org/wiki/Tintin">
<dc:title xml:lang="fr">Tintin</dc:title>
<dc:title xml:lang="en">Tintin</dc:title>
<tax:taxon rdf:resource="lsid:ncbi.nlm.nih.gov:taxonomy:9606"/>
</tax:Individual>

<tax:Individual rdf:about="http://fr.wikipedia.org/wiki/Babar">
<dc:title xml:lang="fr">Babar</dc:title>
<dc:title xml:lang="en">Babar</dc:title>
<tax:taxon rdf:resource="lsid:ncbi.nlm.nih.gov:taxonomy:9785"/>
</tax:Individual>

<tax:Individual rdf:about="http://fr.wikipedia.org/wiki/Milou">
<dc:title xml:lang="fr">Milou</dc:title>
<dc:title xml:lang="en">Snowy</dc:title>
<tax:taxon rdf:resource="lsid:ncbi.nlm.nih.gov:taxonomy:9615"/>
</tax:Individual>

<tax:Individual rdf:about="http://fr.wikipedia.org/wiki/Donald_Duck">
<dc:title xml:lang="fr">Donald</dc:title>
<dc:title xml:lang="en">Donald Duck</dc:title>
<tax:taxon rdf:resource="lsid:ncbi.nlm.nih.gov:taxonomy:8839"/>
</tax:Individual>

<tax:Individual rdf:about="http://fr.wikipedia.org/wiki/Le_L%C3%A9zard">
<dc:title xml:lang="fr">Lezard</dc:title>
<dc:title xml:lang="en">Lizard</dc:title>
<dc:title xml:lang="fr">Curt Connors</dc:title>
<dc:title xml:lang="en">Curt Connors</dc:title>
<tax:taxon rdf:resource="lsid:ncbi.nlm.nih.gov:taxonomy:9606"/>
<tax:taxon rdf:resource="lsid:ncbi.nlm.nih.gov:taxonomy:8504"/>
</tax:Individual>

</rdf:RDF>

Images via wikipedia

Tintin & Snowy

Babar

Donald

The Lizard

Basically this file describes
  • 4 individuals: Tintin (human), Snowy (dog), Donal (duck) , Babar (Elephant) and Dr Connors/The Lizard (spiderman's foe)
  • Each individual unambigously identified by his URI in wikipedia
  • Each individual is named in english and in french
  • For each individual, is ID in the NCBI hierarchy is specified using a simple URI (here I've tried to use a LSID, but it could have been something else (a URL... ))


A basic query


The following SPARQL query retrieve the URI, the taxonomy and the english name for each individuals.

The query

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX tax: <http://species.lindenb.org>

SELECT ?individual ?taxon ?title
{
?individual a tax:Individual .
?individual dc:title ?title .
?individual tax:taxon ?taxon .
FILTER langMatches( lang(?title), "en" )
}

Invoking ARQ


arq --query query01.rq --data taxonomy.rdf

Result


-------------------------------------------------------------------------------------------------------------
| individual | taxon | title |
=============================================================================================================
| <http://fr.wikipedia.org/wiki/Le_L%C3%A9zard> | <lsid:ncbi.nlm.nih.gov:taxonomy:8504> | "Curt Connors"@en |
| <http://fr.wikipedia.org/wiki/Le_L%C3%A9zard> | <lsid:ncbi.nlm.nih.gov:taxonomy:9606> | "Curt Connors"@en |
| <http://fr.wikipedia.org/wiki/Le_L%C3%A9zard> | <lsid:ncbi.nlm.nih.gov:taxonomy:8504> | "Lizard"@en |
| <http://fr.wikipedia.org/wiki/Le_L%C3%A9zard> | <lsid:ncbi.nlm.nih.gov:taxonomy:9606> | "Lizard"@en |
| <http://fr.wikipedia.org/wiki/Donald_Duck> | <lsid:ncbi.nlm.nih.gov:taxonomy:8839> | "Donald Duck"@en |
| <http://fr.wikipedia.org/wiki/Milou> | <lsid:ncbi.nlm.nih.gov:taxonomy:9615> | "Snowy"@en |
| <http://fr.wikipedia.org/wiki/Babar> | <lsid:ncbi.nlm.nih.gov:taxonomy:9785> | "Babar"@en |
| <http://fr.wikipedia.org/wiki/Tintin> | <lsid:ncbi.nlm.nih.gov:taxonomy:9606> | "Tintin"@en |
-------------------------------------------------------------------------------------------------------------


Adding a custom function


Now, I want to add a new function in sparql. This function 'isA' will take as input to parameters: the taxon/LSID of the child and the taxon/LSID of the parent and it will return a boolean 'true' if the 'child' has the 'parent' in his phylogeny. This new function is implemented by extending the class com.hp.hpl.jena.sparql.function.FunctionBase2. This new class contains an associative array child2parent mapping each taxon-id to its parent. This map is loaded as described bellow:

Pattern pat= Pattern.compile("[ \t]*\\|[ \t]*");
String line;
BufferedReader r= new BufferedReader(new FileReader(TAXONOMY_NODES_PATH));
while((line=r.readLine())!=null)
{
String tokens[]=pat.split(line, 3);
this.child2parent.put(
Integer.parseInt(tokens[0]),
Integer.parseInt(tokens[1])
);
}
r.close();
(...)

The function 'exec' will check if the two arguments are an URI and will invoke the method isChildOf

public NodeValue exec(NodeValue childNode, NodeValue parentNode)
{
(...check the nodes are URI)
return NodeValue.makeBoolean(isChildOf(childId,parentId));
}


The function 'isChildOf' loops in the map child2parent to check if the parent is an ancestor of the child:

while(true)
{
Integer id= child2parent.get(childid);
if(id==null || id==childid) return false;
if(id==parentid) return true;
childid=id;
}

Here is the complete source code of this class:

package org.lindenb.arq4taxonomy;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.regex.Pattern;

import com.hp.hpl.jena.sparql.expr.ExprEvalException;
import com.hp.hpl.jena.sparql.expr.NodeValue;
import com.hp.hpl.jena.sparql.function.FunctionBase2;

public class isA
extends FunctionBase2
{
public static final String LSID="lsid:ncbi.nlm.nih.gov:taxonomy:";
public static final String TAXONOMY_NODES_PATH="/home/lindenb/tmp/TAXONOMY_NCBI/nodes.dmp";
private Map<Integer, Integer> child2parent=null;

public isA()
{

}
/**
* return a associative map child.id -> parent.id
* @return
*/
private Map<Integer, Integer> getTaxonomy()
{
if(this.child2parent==null)
{
this.child2parent= new HashMap<Integer, Integer>();
try
{
Pattern pat= Pattern.compile("[ \t]*\\|[ \t]*");
String line;
BufferedReader r= new BufferedReader(new FileReader(TAXONOMY_NODES_PATH));
while((line=r.readLine())!=null)
{
String tokens[]=pat.split(line, 3);
this.child2parent.put(
Integer.parseInt(tokens[0]),
Integer.parseInt(tokens[1])
);
}
r.close();
System.err.println(this.child2parent.size());
}
catch(IOException err)
{
err.printStackTrace();
throw new ExprEvalException(err);
}
}
return this.child2parent;
}

private boolean isChildOf(int childid,int parentid)
{
if(childid==parentid) return true;
Map<Integer,Integer> map= getTaxonomy();
while(true)
{
Integer id= map.get(childid);
if(id==null || id==childid) return false;
if(id==parentid) return true;
childid=id;
}
}

@Override
public NodeValue exec(NodeValue childNode, NodeValue parentNode)
{

if( childNode.isLiteral() ||
parentNode.isLiteral() ||
childNode.asNode().isBlank() ||
parentNode.asNode().isBlank())
{
return NodeValue.makeBoolean(false);
}

String childURI = childNode.asNode().getURI();
if(!childURI.startsWith(LSID))
{
return NodeValue.makeBoolean(false);
}


String parentURI = parentNode.asNode().getURI();
if(!parentURI.startsWith(LSID))
{
return NodeValue.makeBoolean(false);
}

int childId=0;
try {
childId= Integer.parseInt(childURI.substring(LSID.length()));
}
catch (NumberFormatException e)
{
return NodeValue.makeBoolean(false);
}

int parentId=0;
try {
parentId= Integer.parseInt(parentURI.substring(LSID.length()));
}
catch (NumberFormatException e)
{
return NodeValue.makeBoolean(false);
}

return NodeValue.makeBoolean(isChildOf(childId,parentId));
}

}

This class is then compiled and packaged into the file tax.jar:

javac -cp $(ARQ_CLASSPATH):. -sourcepath src src/org/lindenb/arq4taxonomy/isA.java
jar cvf tax.jar -C src org


and we add this jar in the classpath:
export CP=$PWD/tax.jar

To tell ARQ about this new functio,n we just add its classpath as a new PREFIX in the SPARQL query:
PREFIX fn: <java:org.lindenb.arq4taxonomy.>



First test


the following SPARQL query retrieves all the Mammals (http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=40674) in the data set.

The query


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX tax: <http://species.lindenb.org>
PREFIX fn: <java:org.lindenb.arq4taxonomy.>

SELECT ?individual ?taxon ?title
{
?individual a tax:Individual .
?individual dc:title ?title .
?individual tax:taxon ?taxon .
FILTER fn:isA(?taxon,<lsid:ncbi.nlm.nih.gov:taxonomy:40674> )
FILTER langMatches( lang(?title), "en" )
}

The command line


arq --query query02.rq --data taxonomy.rdf


The result


-------------------------------------------------------------------------------------------------------------
| individual | taxon | title |
=============================================================================================================
| <http://fr.wikipedia.org/wiki/Le_L%C3%A9zard> | <lsid:ncbi.nlm.nih.gov:taxonomy:9606> | "Curt Connors"@en |
| <http://fr.wikipedia.org/wiki/Le_L%C3%A9zard> | <lsid:ncbi.nlm.nih.gov:taxonomy:9606> | "Lizard"@en |
| <http://fr.wikipedia.org/wiki/Milou> | <lsid:ncbi.nlm.nih.gov:taxonomy:9615> | "Snowy"@en |
| <http://fr.wikipedia.org/wiki/Babar> | <lsid:ncbi.nlm.nih.gov:taxonomy:9785> | "Babar"@en |
| <http://fr.wikipedia.org/wiki/Tintin> | <lsid:ncbi.nlm.nih.gov:taxonomy:9606> | "Tintin"@en |
-------------------------------------------------------------------------------------------------------------


Second query


the following SPARQL query retrieves all the 'Sauropdias' (http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=8457) in the RDF file.

The SPARQL file


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX tax: <http://species.lindenb.org>
PREFIX fn: <java:org.lindenb.arq4taxonomy.>

SELECT ?individual ?taxon ?title
{
?individual a tax:Individual .
?individual dc:title ?title .
?individual tax:taxon ?taxon .
FILTER fn:isA(?taxon,<lsid:ncbi.nlm.nih.gov:taxonomy:8457> )
FILTER langMatches( lang(?title), "en" )
}

Command line


arq --query query03.rq --datataxonomy.rdf


The result


-------------------------------------------------------------------------------------------------------------
| individual | taxon | title |
=============================================================================================================
| <http://fr.wikipedia.org/wiki/Le_L%C3%A9zard> | <lsid:ncbi.nlm.nih.gov:taxonomy:8504> | "Curt Connors"@en |
| <http://fr.wikipedia.org/wiki/Le_L%C3%A9zard> | <lsid:ncbi.nlm.nih.gov:taxonomy:8504> | "Lizard"@en |
| <http://fr.wikipedia.org/wiki/Donald_Duck> | <lsid:ncbi.nlm.nih.gov:taxonomy:8839> | "Donald Duck"@en |
-------------------------------------------------------------------------------------------------------------



Et hop ! voila ! That's it !

22 November 2008

A Web Service for ONSolubility.

This post is about the ONSolubility project (For references search FriendFeed for Solubility). This post is about how I've used Egon's code to create a web service to query the data of solubility. Egon has already done a great job by using the google java spreasheet API to download Jean-Claude's Solubility data. On his side, Rajarshi Guha wrote an HTML page querying those data using the Google Query-API. Here I show how I have created a webservice searching for the measurements based on their solvent/solute/concentration.

Server Side


Classes


I've added some JAXB(Java Architecture for XML Binding) annotations to Egon's Measurement.java. Those annotations help the web-service compiler (wsgen) to understand how the data will be transmitted to the client.
@javax.xml.bind.annotation .XmlRootElement(name="Measurement")
public class Measurement
implements Serializable
{
(...)

Then we create the WebService ONService.java. This service is just a java class containing also a few annotations. First we flag the class as a webservice:
@javax.jws.WebService(
name="onsolubility",
serviceName="ons"
)
public class ONService
{
Then comes the function seach provided by this service. This function will download the data from google using Egon's API and will return a collection of Measurement based on their solute/solvent/concentration. Again the java annotations will help the compiler to implement the service
@WebMethod(action="urn:search",operationName="search")
public List search(
@WebParam(name="solute")String solute,
@WebParam(name="solvent")String solvent,
@WebParam(name="concMin")Double concMin,
@WebParam(name="concMax")Double concMax
) throws Exception
{....
. The web service is launched with only 3 lines of code (!).
ONService service=new ONService();
Endpoint endpoint = Endpoint.create(service);
endpoint.publish("http://localhost:8080/onsolubility");

Compilation


I've create a ant file invoking wsgen generating the stubs and installing the webservice. Here is the ouput
compile-webservice:
[javac] Compiling 1 source file to /home/pierre/tmp/onssolubility/ons.solubility.data/bin
[wsgen] command line: wsgen -classpath (...) -verbose ons.solubility.ws.ONService
[wsgen] Note: ap round: 1
[wsgen] [ProcessedMethods Class: ons.solubility.ws.ONService]
[wsgen] [should process method: search hasWebMethods: true ]
[wsgen] [endpointReferencesInterface: false]
[wsgen] [declaring class has WebSevice: true]
[wsgen] [returning: true]
[wsgen] [WrapperGen - method: search(java.lang.String,java.lang.String,java.lang.Double,java.lang.Double)]
[wsgen] [method.getDeclaringType(): ons.solubility.ws.ONService]
[wsgen] [requestWrapper: ons.solubility.ws.jaxws.Search]
[wsgen] [should process method: main hasWebMethods: true ]
[wsgen] [webMethod == null]
[wsgen] [ProcessedMethods Class: java.lang.Object]
[wsgen] ons/solubility/ws/jaxws/ExceptionBean.java
[wsgen] ons/solubility/ws/jaxws/Search.java
[wsgen] ons/solubility/ws/jaxws/SearchResponse.java
[wsgen] Note: ap round: 2

publish-webservice:
[java] Publishing Service on http://localhost:8080/onsolubility?WSDL
.
And... that's it. When I open my browser on http://localhost:8080/onsolubility?WSDL , I can now see the WSDL description/schema of this service.

Client Side


Writing a client using this api looks the same way I did for a previous post about the IntAct/EBI API where the wsimport command generated the stubs from the WSDL file. I then wrote a simple test ONServiceTest.java, invoking our service several times.
private void test(
String solute,
String solvent,
Double concMin,
Double concMax)
{
try
{
Ons service=new Ons();
Onsolubility port=service.getOnsolubilityPort();
List data=port.search(solute, solvent, concMin, concMax);

for(Measurement measure:data)
{
System.out.println(
" sample :\t"+measure.getSample()+"\n"+
" solute :\t"+measure.getSolute()+"\n"+
" solvent :\t"+measure.getSolvent()+"\n"+
" experiment:\t"+measure.getExperiment()+"\n"+
" reference :\t"+measure.getReference()+"\n"+
" conc :\t"+measure.getConcentration()+"\n"
);
}
} catch(Throwable err)

{
System.err.println("#error:"+err.getMessage());
}
}
private void test()
{
test(null,null,null,null);
test("4-nitrobenzaldehyde",null,null,null);
test("4-nitrobenzaldehyde",null,0.3,0.4);
}
Here is the output
ant test-webservice
Buildfile: build.xml
test-webservice
[wsimport] parsing WSDL...
[wsimport] generating code...
[javac] Compiling 1 source file to onssolubility/ons.solubility.data/bin
[java] ##Searching solute: null solvent: null conc: null-null
[java] sample : 9
[java] solute : D-Glucose
[java] solvent : THF
[java] experiment: 1
[java] reference : http://onschallenge.wikispaces.com/JennyHale-1
[java] conc : 0.00222
[java]
[java] sample : 6
[java] solute : D-Mannitol
[java] solvent : Methanol
[java] experiment: 1
[java] reference : http://onschallenge.wikispaces.com/JennyHale-1
[java] conc : 0.00548
[java]
(...)
[java]
[java] sample : 10
[java] solute : D-Mannitol
[java] solvent : THF
[java] experiment: 1
[java] reference : http://onschallenge.wikispaces.com/JennyHale-1
[java] conc : 0.01098
[java] ##Searching solute: 4-nitrobenzaldehyde solvent: null conc: 0.3-0.4
[java] sample : 2b
[java] solute : 4-nitrobenzaldehyde
[java] solvent : Methanol
[java] experiment: 212
[java] reference : http://usefulchem.wikispaces.com/exp212
[java] conc : 0.38

That's it and that's enough code for the week-end.

Pierre

11 November 2008

SPARQL for solubility/RDF: my notebook

In a recent thread on FriendFeed , I've transformed Jean-Claude's Bradley's data about the solubility of some compounds into RDF.


The original data set looks like this:

The RDF version looks like this:
<!DOCTYPE rdf:RDF [
<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
<!ENTITY doap "http://usefulinc.com/ns/doap#">
<!ENTITY foaf "http://xmlns.com/foaf/0.1/">
<!ENTITY dc "http://purl.org/dc/elements/1.1/">
<!ENTITY chem "http://blueobelisk.sourceforge.net/chemistryblogs/">
]>
<rdf:RDF
xmlns:rdf="&rdf;"
xmlns:dc="&dc;"
xmlns:rdfs="&rdfs;"
xmlns:doap="&doap;"
xmlns:foaf="&foaf;"
xmlns:chem="&chem;"
>
<!--=== PERSONS ============================================================== -->
<foaf:Person rdf:about="http://www.chemistry.drexel.edu/people/bradley/bradley.asp">
<foaf:name>Jean-Claude Bradley</foaf:name>
<foaf:nick>jcbradley</foaf:nick>
<foaf:sha1_sum>b68f7dca9555a1cfe1ad18c6d2be0db6e552d678</foaf:sha1_sum>
<foaf:holdsAccount>
<foaf:OnlineAccount>
<foaf:accountServiceHomepage rdf:resource="http://www.linkedin.com/"/>
<foaf:accountProfilePage rdf:resource="http://www.linkedin.com/in/jcbradley"/>
</foaf:OnlineAccount>
</foaf:holdsAccount>
</foaf:Person>

<!--=== PROJECT ============================================================== -->
<doap:Project rdf:ID="SolubilityProject">
<doap:name>Solubility</doap:name>
<doap:homepage rdf:resource="http://spreadsheets.google.com/ccc?key=plwwufp30hfq0udnEmRD1aQ" />
<doap:shortdesc xml:lang="en">Solubility</doap:shortdesc>
<doap:shortdesc xml:lang="fr">Solubilité</doap:shortdesc>
<doap:description xml:lang="en">Solubility</doap:description>
<doap:description xml:lang="fr">Solubilité</doap:description>
</doap:Project>
<!--=== Compound ============================================================== -->

<chem:Compound rdf:about="http://commons.wikimedia.org/wiki/Image:D-Mannitol_structure.png">
<chem:name>D-Manitol</chem:name>
<chem:image rdf:resource="http://upload.wikimedia.org/wikipedia/commons/b/bb/D-Mannitol_structure.png"/>
<chem:smiles>O[C@H]([C@H](O)CO)[C@H](O)[C@H](O)CO</chem:smiles>
</chem:Compound>

<chem:Compound rdf:about="http://en.wikipedia.org/wiki/Ethanol">
<chem:name>Ethanol</chem:name>
<chem:image rdf:resource="http://upload.wikimedia.org/wikipedia/commons/6/6f/Ethanol_flat_structure.png"/>
<chem:smiles>OCC</chem:smiles>
</chem:Compound>

<chem:Compound rdf:about="http://en.wikipedia.org/wiki/Sodium_chloride">
<chem:name>Sodium chloride</chem:name>
<chem:image rdf:resource="http://upload.wikimedia.org/wikipedia/commons/e/e9/Sodium-chloride-3D-ionic.png"/>
<chem:smiles>[Na+].[Cl-]</chem:smiles>
</chem:Compound>

<!--=== Experiment ============================================================== -->
<chem:Experiment rdf:about="http://usefulchem.wikispaces.com/exp207">
<dc:name>Hello, I'm Experiment 207</dc:name>
<chem:project rdf:resource="#SolubilityProject"/>
</chem:Experiment>

<chem:Experiment rdf:about="http://usefulchem.wikispaces.com/exp1">
<dc:name>Hello, I'm Experiment 1</dc:name>
<chem:project rdf:resource="#SolubilityProject"/>
</chem:Experiment>

<!--=== Sample ============================================================== -->
<chem:Sample rdf:about="sample:11">
<dc:name>Hello, I'm Sample 11</dc:name>
</chem:Sample>
<chem:Sample rdf:about="sample:3">
<dc:name>Hello, I'm Sample 3</dc:name>
</chem:Sample>
<chem:Sample rdf:about="sample:12">
<dc:name>Hello, I'm Sample 12</dc:name>
</chem:Sample>
<!--=== Experimental Data ============================================================== -->
<chem:ExperimentalData >
<dc:date>2008-01-01</dc:date>
<chem:author rdf:resource="http://www.chemistry.drexel.edu/people/bradley/bradley.asp"/>
<chem:solute rdf:resource="http://commons.wikimedia.org/wiki/Image:D-Mannitol_structure.png"/>
<chem:solvent rdf:resource="http://en.wikipedia.org/wiki/Ethanol"/>
<chem:experiment-id rdf:resource="http://usefulchem.wikispaces.com/exp207"/>
<chem:sample rdf:resource="sample:11"/>
<chem:concentration rdf:datatype="chem:Molar">0.00</chem:concentration>
</chem:ExperimentalData>

<chem:ExperimentalData>
<dc:date>2008-02-01</dc:date>
<chem:author rdf:resource="http://www.chemistry.drexel.edu/people/bradley/bradley.asp"/>
<chem:solute rdf:resource="http://en.wikipedia.org/wiki/Sodium_chloride"/>
<chem:solvent rdf:resource="http://en.wikipedia.org/wiki/Ethanol"/>
<chem:experiment-id rdf:resource="http://onschallenge.wikispaces.com/JennyHale-1"/>
<chem:sample rdf:resource="sample:3"/>
<chem:concentration rdf:datatype="chem:Molar">0.00</chem:concentration>
</chem:ExperimentalData>

<chem:ExperimentalData>
<dc:date>2008-03-01</dc:date>
<chem:author rdf:resource="http://www.chemistry.drexel.edu/people/bradley/bradley.asp"/>
<chem:solute rdf:resource="http://en.wikipedia.org/wiki/Sodium_chloride"/>
<chem:solvent rdf:resource="http://en.wikipedia.org/wiki/Ethanol"/>
<chem:experiment-id rdf:resource="http://usefulchem.wikispaces.com/exp207"/>
<chem:sample rdf:resource="sample:12"/>
<chem:concentration rdf:datatype="chem:Molar">0.00</chem:concentration>
</chem:ExperimentalData>

</rdf:RDF>

Here I describe how I used SPARQL to retrieve Jean-Claude's original data set from this RDF file.
I've downloaded ARQ , the SPARQL engine, from http://jena.sourceforge.net/ARQ/.
Here are a few queries:

listing all the chem:Compound


query


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX doap: <http://usefulinc.com/ns/doap#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX chem: <http://blueobelisk.sourceforge.net/chemistryblogs/>

SELECT ?x
{
?x
}

Running the query


sparql -query jeter.rq --data=solubility.rdf

Result


----------------------------------------------------------------------
| x |
======================================================================
| <http://en.wikipedia.org/wiki/Sodium_chloride> |
| <http://en.wikipedia.org/wiki/Ethanol> |
| <http://commons.wikimedia.org/wiki/Image:D-Mannitol_structure.png> |
----------------------------------------------------------------------

The same query but using prefixes


query


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX doap: <http://usefulinc.com/ns/doap#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX chem: <http://blueobelisk.sourceforge.net/chemistryblogs/>

SELECT ?x
{
?x rdf:type chem:Compound
}

result


----------------------------------------------------------------------
| x |
======================================================================
| <http://en.wikipedia.org/wiki/Sodium_chloride> |
| <http://en.wikipedia.org/wiki/Ethanol> |
| <http://commons.wikimedia.org/wiki/Image:D-Mannitol_structure.png> |
----------------------------------------------------------------------

Listing the compounds , their names, their 'smiles'


query


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX doap: <http://usefulinc.com/ns/doap#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX chem: <http://blueobelisk.sourceforge.net/chemistryblogs/>

SELECT ?compound ?compoundName ?compoundSmiles
{
?compound rdf:type chem:Compound .
?compound chem:name ?compoundName .
?compound chem:smiles ?compoundSmiles .
}

result


-----------------------------------------------------------------------------------------------------------------------------------
| compound | compoundName | compoundSmiles |
===================================================================================================================================
| <http://en.wikipedia.org/wiki/Sodium_chloride> | "Sodium chloride" | "[Na+].[Cl-]" |
| <http://en.wikipedia.org/wiki/Ethanol> | "Ethanol" | "OCC" |
| | "D-Manitol" | "O[C@H]([C@H](O)CO)[C@H](O)[C@H](O)CO" |
-----------------------------------------------------------------------------------------------------------------------------------

The same, but only the compounds with a name containing "OL"


query


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX doap: <http://usefulinc.com/ns/doap#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX chem: <http://blueobelisk.sourceforge.net/chemistryblogs/>

SELECT ?compound ?compoundName ?compoundSmiles
{
?compound rdf:type chem:Compound .
?compound chem:name ?compoundName .
?compound chem:smiles ?compoundSmiles .
FILTER regex(?compoundName, "ol", "i")
}

result


------------------------------------------------------------------------------------------------------------------------------
| compound | compoundName | compoundSmiles |
==============================================================================================================================
| <http://en.wikipedia.org/wiki/Ethanol> | "Ethanol" | "OCC" |
| <http://commons.wikimedia.org/wiki/Image:D-Mannitol_structure.png> | "D-Manitol" | "O[C@H]([C@H](O)CO)[C@H](O)[C@H](O)CO" |
------------------------------------------------------------------------------------------------------------------------------

the same, but add the 'chem:description', if any


query


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX doap: <http://usefulinc.com/ns/doap#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX chem: <http://blueobelisk.sourceforge.net/chemistryblogs/>

SELECT ?compound ?compoundName ?compoundSmiles ?description
{
?compound rdf:type chem:Compound .
?compound chem:name ?compoundName .
?compound chem:smiles ?compoundSmiles .
FILTER regex(?compoundName, "ol", "i")
OPTIONAL { ?compound chem:description ?description }

}

result


--------------------------------------------------------------------------------------------------------------------------------------------
| compound | compoundName | compoundSmiles | description |
============================================================================================================================================
| <http://en.wikipedia.org/wiki/Ethanol> | "Ethanol" | "OCC" | |
| <http://commons.wikimedia.org/wiki/Image:D-Mannitol_structure.png> | "D-Manitol" | "O[C@H]([C@H](O)CO)[C@H](O)[C@H](O)CO" | |
--------------------------------------------------------------------------------------------------------------------------------------------

retrieving Jean-Claude's data


The query


PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX chem: <http://blueobelisk.sourceforge.net/chemistryblogs/>

SELECT
?exp
?sample
?solvent ?solventName ?solventSmiles
?solute ?soluteName ?soluteSmiles
?conc

{
?solvent rdf:type chem:Compound .
?solvent chem:name ?solventName .
?solvent chem:smiles ?solventSmiles .

?solute rdf:type chem:Compound .
?solute chem:name ?soluteName .
?solute chem:smiles ?soluteSmiles .

?expData rdf:type chem:ExperimentalData .
?expData chem:solute ?solute .
?expData chem:solvent ?solvent .
?expData chem:concentration ?conc .
?expData chem:experiment-id ?exp .
?expData chem:sample ?sample .
}

result


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| exp | sample | solvent | solventName | solventSmiles | solute | soluteName | soluteSmiles | conc |
==================================================================================================================================================================================================================================================================================================
| <http://usefulchem.wikispaces.com/exp207> | <sample:12> | <http://en.wikipedia.org/wiki/Ethanol> | "Ethanol" | "OCC" | <http://en.wikipedia.org/wiki/Sodium_chloride> | "Sodium chloride" | "[Na+].[Cl-]" | "0.00"^^<chem:Molar> |
| <http://onschallenge.wikispaces.com/JennyHale-1> | <sample:3> | <http://en.wikipedia.org/wiki/Ethanol> | "Ethanol" | "OCC" | <http://en.wikipedia.org/wiki/Sodium_chloride> | "Sodium chloride" | "[Na+].[Cl-]" | "0.00"^^<chem:Molar> |
| <http://usefulchem.wikispaces.com/exp207> | <sample:11> | <http://en.wikipedia.org/wiki/Ethanol> | "Ethanol" | "OCC" | <http://commons.wikimedia.org/wiki/Image:D-Mannitol_structure.png> | "D-Manitol" | "O[C@H]([C@H](O)CO)[C@H](O)[C@H](O)CO" | "0.00"^^<chem:Molar> |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


That's it

Pierre

05 November 2008

IBM many eyes wikified.

I've just received my invitation to test the wikified version of ManyEyes.



(see my old post about ManyEyes [here]). This wikified version is really cool. Your data are edited in a wiki. For example I've downloaded a count of the snps on the human genome from the UCSC:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg18 -e 'select chrom,(ROUND(chromStart/1E6)*1E6) as position ,count(*) as total from snp129 group by chrom,position'
and copied the data in the wiki. (I could not preview the page)



To create a visualization about a given page, you just add a colon ':' after the name of the data page followed by the name of your visualization. Your browser is then redirected to a new wiki page where you'll build a new visualization.

(hum... back to the data page, I could not see any link to the visualization )

Really nice !