08 March 2012

My first walker for the GATK : my notebook

This is my first notebook for developping a new Walker for the Genome Analysis Toolkit. This post was mostly inspired by the following pdf: kvg_20_line_lifesavers_mad_v2.pptx.pdf.

Get the sources

git clone http://github.com/broadgsa/gatk.git GATK.dev
the javac compiler also requires the following library from google :http://code.google.com/p/cofoja/.

A first "Short-Reads" walker

The following class ReadWalker scans the reads and print them as fasta. The @Output annotation tells the GATK that we're going to channel our output through the java.io.PrintStream object. This field is automatically filled by the application runtime.

Compilation

javac -cp /path/to/GenomeAnalysisTK.jar:/path/to/cofoja-1.0-r139.jar:. \
 -sourcepath src \
 -d tmp src/mygatk/HelloRead.java
jar cvf HelloRead.jar -C tmp .

Running

Here I'm using a BAM from the 'examples' folder of samtools. (We need to pre-process this BAM with picard AddOrReplaceReadGroups). We then use our library as follow:
java -cp path/to/GenomeAnalysisTK.jar:HelloRead.jar \
org.broadinstitute.sting.gatk.CommandLineGATK -T HelloRead \
 -I test.bam \
 -R ${SAMTOOLS}/examples/ex1.fa 

Result:

The Makefile

That's it, Pierre

1 comment:

Geraldine Van der Auwera said...

This is a great tutorial. If your readers are hungry for more ways to leverage the power of the GATK, the GATK team at the Broad Institute is planning a workshop for users this Fall. If you’re interested in attending the workshop, you can vote on the topics and activities that you’d like the workshop to include by filling in this survey: http://www.surveymonkey.com/s/T799FQK