05 February 2013

Making use of Picard Metrics files using XML and XSLT. #ngs

Many tools in the Picard package produce some "Metrics File" (described at http://picard.sourceforge.net/picard-metric-definitions.shtml). The picard API contains a java parser "MetricsFile" parsing those metrics-file:

MetricsFile<MetricBase, Comparable<?>> metricsFile=new MetricsFile<MetricBase, Comparable<?>>();
metricsFile.read(new FileReader("metrics.txt"));
In order produce some custom reports from those files, I've created a tool that dump the content of the MetricsFile as a XML file. The source code is available at: http://code.google.com/p/jvarkit/source/browse/trunk/src/main/java/fr/inserm/umr1087/jvarkit/tools/picard/metrics2xml/PicardMetricsToXML.java.

Compilation

$ mkdir tmp
$ javac -d tmp -cp  /path/to/picard.jar:/path/to/sam.jar \
     -sourcepath  src/main/java \
     src/main/java/fr/inserm/umr1087/jvarkit/tools/picard/metrics2xml/PicardMetricsToXML.java
$ jar vcf picardmetrics2xml.jar -C tmp .

Usage

Say you have used the tool 'CollectInsertSizeMetrics.jar' from picard:
$ java -jar/path/to/CollectInsertSizeMetrics.jar \
 O=out.metrics \
 I=/path/to/samtools/examples/sorted.bam \
 AS=true \
 R=/path/to/samtools/ex1.fa \
 H=chart.pdf
The file out.metrics looks like this:
## net.sf.picard.metrics.StringHeader
# net.sf.picard.analysis.CollectInsertSizeMetrics HISTOGRAM_FILE=(...)
## net.sf.picard.metrics.StringHeader
# Started on: Tue Feb 05 12:51:30 CET 2013

## METRICS CLASS net.sf.picard.analysis.InsertSizeMetrics
MEDIAN_INSERT_SIZE MEDIAN_ABSOLUTE_DEVIATION MIN_INSERT_SIZE MAX_INSERT_SIZE MEAN_INSERT_SIZE STANDARD_DEVIATION READ_PAIRS
209 10 54 243 208.857506 13.614603 4716 FR 5 9 13 17 21 25 29 35 43 

## HISTOGRAM java.lang.Integer
insert_size All_Reads.fr_count
54 3
170 3
173 9
174 3
175 3
177 6
(...)
This file can be converted to XML using the following command:
$ java -cp /path/to/picard.jar:/path/to/sam.jar:picardmetrics2xml.jar file.metrics


<?xml version="1.0" encoding="UTF-8"?><picard-metrics xmlns="http://picard.sourc
eforge.net/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><metrics-file
 file="file.metrics"><headers><header class="net.sf.picard.metrics.StringHeader"
>net.sf.picard.analysis.CollectInsertSizeMetrics HISTOGRAM_FILE=jeter2 INPUT=/ho
me/lindenb/package/samtools-0.1.18/examples/sorted.bam OUTPUT=jeter REFERENCE_SE
QUENCE=/home/lindenb/package/samtools-0.1.18/examples/ex1.fa ASSUME_SORTED=true 
   DEVIATIONS=10.0 MINIMUM_PCT=0.05 METRIC_ACCUMULATION_LEVEL=[ALL_READS] STOP_A
FTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL
=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false</header><h
eader class="net.sf.picard.metrics.StringHeader">Started on: Tue Feb 05 12:51:30
 CET 2013</header></headers><metrics><thead class="net.sf.picard.analysis.Insert
SizeMetrics"><th class="double">MEDIAN_INSERT_SIZE</th><th class="double">MEDIAN
_ABSOLUTE_DEVIATION</th><th class="int">MIN_INSERT_SIZE</th><th class="int">MAX_
INSERT_SIZE</th><th class="double">MEAN_INSERT_SIZE</th><th class="double">STAND
ARD_DEVIATION</th><th class="long">READ_PAIRS</th><th class="net.sf.picard.sam.S
amPairUtil$PairOrientation">PAIR_ORIENTATION</th><th class="int">WIDTH_OF_10_PER
CENT</th><th class="int">WIDTH_OF_20_PERCENT</th><th class="int">WIDTH_OF_30_PER
CENT</th><th class="int">WIDTH_OF_40_PERCENT</th><th class="int">WIDTH_OF_50_PER
CENT</th><th class="int">WIDTH_OF_60_PERCENT</th><th class="int">WIDTH_OF_70_PER
CENT</th><th class="int">WIDTH_OF_80_PERCENT</th><th class="int">WIDTH_OF_90_PER
CENT</th><th class="int">WIDTH_OF_99_PERCENT</th><th class="java.lang.String">SA
MPLE</th><th class="java.lang.String">LIBRARY</th><th class="java.lang.String">R
EAD_GROUP</th></thead><tbody><tr><td>209.0</td><td>10.0</td><td>54</td><td>243</
td><td>208.857506</td><td>13.614603</td><td>4716</td><td>FR</td><td>5</td><td>9<
/td><td>13</td><td>17</td><td>21</td><td>25</td><td>29</td><td>35</td><td>43</td
><td>65</td><td xsi:nil="true"/><td xsi:nil="true"/><td xsi:nil="true"/></tr></t
body></metrics><histogram class="java.lang.Integer"><thead><th>insert_size</th><
th>All_Reads.fr_count</th></thead><tbody><tr><td>54</td><td>3.0</td></tr><tr><td
>170</td><td>3.0</td></tr><tr><td>173</td><td>9.0</td></tr><tr><td>174</td><td>3
.0</td></tr><tr><td>175</td><td>3.0</td></tr><tr><td>177</td><td>6.0</td></tr><t
r><td>178</td><td>6.0</td></tr><tr><td>179</td><td>9.0</td></tr><tr><td>180</td>
<td>6.0</td></tr><tr><td>181</td><td>6.0</td></tr><tr><td>182</td><td>21.0</td><
/tr><tr><td>183</td><td>9.0</td></tr><tr><td>184</td><td>15.0</td></tr><tr><td>1
85</td><td>33.0</td></tr><tr><td>186</td><td>15.0</td></tr><tr><td>187</td><td>3
(...)

Converting to JSON

Now, we can convert the XML to whatever we want using XSLT. I wrote a stylesheet picardmetrics2json.xsl converting the XML to JSON (though, I should escape the quotes in the strings ).
$ xsltproc picardmetrics2json.xsl metrics.xml


{
    "metrics.xml": {
        "headers": [
            {
                "class": "net.sf.picard.metrics.StringHeader",
                "value": "net.sf.picard.analysis.CollectInsertSizeMetrics HISTOGRAM_FILE=metrics.pdf INPUT=samtools-0.1.18/examples/sorted.bam OUTPUT=metrics.txt REFERENCE_SEQUENCE=/home/lindenb/package/samtools-0.1.18/examples/ex1.fa ASSUME_SORTED=true    DEVIATIONS=10.0 MINIMUM_PCT=0.05 METRIC_ACCUMULATION_LEVEL=[ALL_READS] STOP_AFTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false"
            },
            {
                "class": "net.sf.picard.metrics.StringHeader",
                "value": "Started on: Tue Feb 05 12:51:30 CET 2013"
            }
        ],
        "metrics": [
            {
                "MEDIAN_INSERT_SIZE": 209,
                "MEDIAN_ABSOLUTE_DEVIATION": 10,
                "MIN_INSERT_SIZE": 54,
                "MAX_INSERT_SIZE": 243,
                "MEAN_INSERT_SIZE": 208.857506,
                "STANDARD_DEVIATION": 13.614603,
                "READ_PAIRS": 4716,
                "PAIR_ORIENTATION": "FR",
                "WIDTH_OF_10_PERCENT": 5,
                "WIDTH_OF_20_PERCENT": 9,
                "WIDTH_OF_30_PERCENT": 13,
                "WIDTH_OF_40_PERCENT": 17,
                "WIDTH_OF_50_PERCENT": 21,
                "WIDTH_OF_60_PERCENT": 25,
                "WIDTH_OF_70_PERCENT": 29,
                "WIDTH_OF_80_PERCENT": 35,
                "WIDTH_OF_90_PERCENT": 43,
                "WIDTH_OF_99_PERCENT": 65,
                "SAMPLE": null,
                "LIBRARY": null,
                "READ_GROUP": null
            }
        ],
        "histogram": [
            {
                "insert_size": 54,
                "All_Reads.fr_count": 3
            },
            {
                "insert_size": 170,
                "All_Reads.fr_count": 3
            },(...)

Converting to HTML

Another stylesheet convert the XML to HTML. It also produces the javascript code to display the histograms using Google chart:
$ xsltproc picardmetrics2html.xsl metrics.xml > output.html


That's it,
Pierre

No comments: