23 July 2010

Rules Engine for Bioinformatics. Playing with Drools and the Hapmap data


Drools is a Rules Engine implementation of the Rete algorithm. (from "http://java.sys-con.com/node/45082":)A rule engine is a system for applying some set of if/then rules to the data in a system and then taking some action on the data or the system itself. Its primary purpose is to separate the business logic from the system logic - to externalize the business logic so it can be maintained separately. Use a rule engine to separate the if/then rules from system code - to externalize them so they can be maintained separately from the system code. The behavior of the system can then be modified without changing code or needing to recompile/redeploy. Rules are stored in human-readable form in a file so they can be changed with a text editor or rule editor.

I would like to know if JBOSS can be used for some common bioinformatics tasks. For example, can I use Drools when my users want to select some candidate genes with various parameters : "I want a gene with 3 SNPs in the first exon but with less than 2 microsattelites in the 3'UTR and if there is another gene close to this one I want ... etc... etc...".

In the current post, I've played with the Drools engine to search some mendelians incompatibilities in the hapmap data.

First, download some the pedigrees and the genotypes:

Here is the structure of my project:
./test
./test/Individual.java
./test/Snp.java
./test/Drools01.java
./test/Genotype.java
./test/hapmap.drl


Individual, Genotype and Snp are 3 simple POJO files.

Individual.java


package test;
import java.io.Serializable;

public class Individual implements Serializable
{
private static final long serialVersionUID = 1L;
private int familyId;
private String name;
private String fatherName;
private String motherName;
private int gender;
//constructor, getters, setters, etc...
}


Snp.java


package test;
import java.io.Serializable;

public class Snp
implements Serializable
{
private static final long serialVersionUID = 1L;
private String name;
private String alleles;
private String chrom;
private int position;
//constructor, getters, setters, etc...
}


Genotype.java


package test;
import java.io.Serializable;

public class Genotype
implements Serializable
{
private static final long serialVersionUID = 1L;
private Snp snp;
private Individual individual;
private char a1;
private char a2;

public Genotype(Snp snp, Individual individual, String observed)
{
super();
this.snp = snp;
this.individual = individual;

this.a1 = observed.charAt(0);
this.a2 = observed.charAt(1);
if(this.a1> this.a2)
{
this.a2 = a1;
this.a1=observed.charAt(1);
}
}
public boolean contains(char allele)
{
return getA1()==allele || getA2()==allele;
}

//constructor, getters, setters, etc...
}

hapmap.drl


hapmap.drl is the Drools rules file. It contains the "business logic" and can be modified without changing anything in the java program:
package test

rule "Select snps"
salience 30
when
$snp:Snp($chrom:chrom,$pos:position)
eval(!($chrom=="chr21" && $pos> 9880000 && $pos< 10043000))
then
retract($snp);
System.err.println("Removing : "+$snp.getName());
end

rule "One Parent"
salience 20

when
$children : Individual($dad:fatherName,$mum:motherName)
$parent: Individual(name==$dad || name==$mum)
$genotypeChild : Genotype(individual==$children,$snp:snp,a1!='N',a2!='N' )
$genotypeParent : Genotype(individual==$parent,snp==$snp,a1!='N',a2!='N')
eval(!(
$genotypeParent.contains($genotypeChild.getA1()) ||
$genotypeParent.contains($genotypeChild.getA2())
))
then
retract($genotypeChild);
System.out.println($snp.getName()+" : problem with "+
$children+"("+$genotypeChild.getA1()+"/"+$genotypeChild.getA2()+") incompatibility with parent:"+
$parent+"("+$genotypeParent.getA1()+"/"+$genotypeParent.getA2()+")"
);
end

rule "Both Parents"
salience 10
when
$children : Individual($dad:fatherName,$mum:motherName)
$father: Individual(name==$dad )
$mother: Individual(name==$mum )
$genotypeChild : Genotype(individual==$children,$snp:snp,a1!='N',a2!='N' )
$genotypeDad : Genotype(individual==$father,snp==$snp,a1!='N',a2!='N')
$genotypeMum : Genotype(individual==$mother,snp==$snp,a1!='N',a2!='N')
eval(!(
($genotypeDad.contains( $genotypeChild.getA1()) && $genotypeMum.contains( $genotypeChild.getA2())) ||
($genotypeDad.contains( $genotypeChild.getA2()) && $genotypeMum.contains( $genotypeChild.getA1()))
))
then
retract($genotypeChild);
System.out.println($snp.getName()+" : problem with "+
$children+"("+$genotypeChild.getA1()+"/"+$genotypeChild.getA2()+") incompatibility with parents:"+
$father+"("+$genotypeDad.getA1()+"/"+$genotypeDad.getA2()+") "+
$mother+"("+$genotypeMum.getA1()+"/"+$genotypeMum.getA2()+") "
);
end
  • The first rule "Select snps" (=with the highest priority (salience)) remove all the SNPs that are not in "chr21:9880000-10043000"
  • The second rule "One Parent" prints a message if there is an incompatibility between a children and one of his parents
  • The last rule "Both Parents" prints a message if the is an incompatibility between a children and both of his parents


Drools01.java

Drools01.java initializes the Drools engine, parses the hapmap files, put those objects in the "KnowledgeBase" and fires all the rules:
package test;
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.Collection;
import java.util.HashMap;
import java.util.Map;

import org.drools.KnowledgeBase;
import org.drools.KnowledgeBaseFactory;
import org.drools.builder.KnowledgeBuilder;
import org.drools.builder.KnowledgeBuilderError;
import org.drools.builder.KnowledgeBuilderErrors;
import org.drools.builder.KnowledgeBuilderFactory;
import org.drools.builder.ResourceType;
import org.drools.definition.KnowledgePackage;
import org.drools.io.ResourceFactory;
import org.drools.runtime.StatefulKnowledgeSession;

public class Drools01
{
private void loadHapMap(StatefulKnowledgeSession session)throws Exception
{
Map<String, Individual> name2individual=new HashMap<String, Individual>();
String line;
BufferedReader in= new BufferedReader(new FileReader(
"/home/lindenb/relationships_w_pops_121708.txt"
));

while((line=in.readLine())!=null)
{
if(line.startsWith("FID") || !line.endsWith("CEU")) continue;
String tokens[]=line.split("[\t]");
Individual indi=new Individual(
Integer.parseInt(tokens[0]),
tokens[1],
tokens[2].equals("0")?null:tokens[2],
tokens[3].equals("0")?null:tokens[3],
Integer.parseInt(tokens[4])
);
name2individual.put(indi.getName(), indi);
}
in.close();

in= new BufferedReader(new FileReader("/home/lindenb/genotypes_chr21_CEU_r27_nr.b36_fwd.txt"));
int r=0;

line=in.readLine();
String tokens[]=line.split("[ ]");
Map<Integer, Individual> index2individual=new HashMap<Integer, Individual>(tokens.length);

for(int i=11;i <tokens.length;++i)
{
Individual indi=name2individual.get(tokens[i]);
session.insert(indi);
if(indi==null) throw new NullPointerException("Cannot get "+tokens[i]);
index2individual.put(i, indi);
}

while((line=in.readLine())!=null)
{
tokens=line.split("[ ]");
Snp snp=new Snp(tokens[0], tokens[1],tokens[2], Integer.parseInt(tokens[3]));
session.insert(snp);
for(int i=11;i <tokens.length;++i)
{
session.insert(new Genotype(snp, index2individual.get(i), tokens[i]));
}
//System.err.println(line);
if(r++>2000) break;//just read the first 2000 rows everything...
}
in.close();
}

private void run()throws Exception
{
KnowledgeBase kbase = KnowledgeBaseFactory.newKnowledgeBase();
KnowledgeBuilder kbuilder = KnowledgeBuilderFactory.newKnowledgeBuilder();
kbuilder.add( ResourceFactory.newClassPathResource(
"hapmap.drl",
Drools01.class ),
ResourceType.DRL );
KnowledgeBuilderErrors errors= kbuilder.getErrors();
if(!errors.isEmpty())
{
for(KnowledgeBuilderError error:errors)
{
System.err.println(error.getMessage());
}
return;
}

Collection<KnowledgePackage> pkgs = kbuilder.getKnowledgePackages();
kbase.addKnowledgePackages( pkgs );
StatefulKnowledgeSession ksession = kbase.newStatefulKnowledgeSession();
loadHapMap(ksession);

ksession.fireAllRules();
}
/**
* @param args
*/
public static void main(String[] args) throws Exception
{
try {
new Drools01().run();
} catch(Exception err)
{
err.printStackTrace();
}
}
}

Compilation


Makefile::
DROOLS=/home/lindenb/package/drools-5.0
ECLIPSE=/home/lindenb/package/eclipse/plugins
CP=$(DROOLS)/drools-core-5.0.1.jar:$(DROOLS)/drools-api-5.0.1.jar:$(DROOLS)/lib/mvel2-2.0.10.jar:$(DROOLS)/lib/antlr-runtime-3.1.1.jar:$(DROOLS)/drools-compiler-5.0.1.jar:$(ECLIPSE)/org.eclipse.jdt.core_3.5.2.v_981_R35x.jar

all:
javac -cp ${CP}:. test/Drools01.java
java -cp ${CP}:. test.Drools01

Results

Removing : rs885550
Removing : rs28363862
Removing : rs28783163
rs1028272 : problem with NA10838(T/T) incompatibility with parent:NA12003(A/A)
rs9647052 : problem with NA10847(A/A) incompatibility with parent:NA12239(C/C)
rs1882882 : problem with NA07014(A/A) incompatibility with parent:NA07031(G/G)
rs12627714 : problem with NA07048(G/G) incompatibility with parent:NA07034(A/A)
rs17240368 : problem with NA07048(A/A) incompatibility with parent:NA07034(G/G)
rs2822605 : problem with NA07048(C/C) incompatibility with parent:NA07034(G/G)
rs2822593 : problem with NA07048(G/G) incompatibility with parent:NA07034(A/A)
rs7276922 : problem with NA07048(T/T) incompatibility with parent:NA07034(C/C)
rs10439653 : problem with NA07048(A/A) incompatibility with parent:NA07034(C/C)
rs10439652 : problem with NA07048(C/C) incompatibility with parent:NA07034(T/T)
rs17001769 : problem with NA10856(A/A) incompatibility with parent:NA11829(C/C)
rs9977658 : problem with NA10860(C/C) incompatibility with parent:NA11992(T/T)
rs8133625 : problem with NA10856(G/G) incompatibility with parent:NA11830(A/A)
rs12627045 : problem with NA12740(A/A) incompatibility with parent:NA12750(G/G)
rs416083 : problem with NA10843(G/G) incompatibility with parent:NA11919(A/A)
rs2822484 : problem with NA06991(A/A) incompatibility with parent:NA06985(G/G)
rs9977169 : problem with NA12865(C/C) incompatibility with parent:NA12875(T/T)
rs379724 : problem with NA07019(G/G) incompatibility with parent:NA07056(A/A)
rs13046593 : problem with NA10860(C/C) incompatibility with parent:NA11992(G/G)
rs9984592 : problem with NA10854(G/G) incompatibility with parent:NA11840(A/A)
rs2187275 : problem with NA10831(C/C) incompatibility with parent:NA12156(T/T)
rs6516605 : problem with NA12708(G/G) incompatibility with parent:NA12718(C/C)
rs7283783 : problem with NA10830(G/G) incompatibility with parent:NA12154(A/A)
rs13051673 : problem with NA12739(G/G) incompatibility with parent:NA12748(A/A)
rs3115488 : problem with NA10860(A/A) incompatibility with parent:NA11993(G/G)
rs8132413 : problem with NA10860(T/T) incompatibility with parent:NA11993(A/A)
rs2207842 : problem with NA10838(A/A) incompatibility with parent:NA12004(G/G)
rs2821973 : problem with NA12832(C/C) incompatibility with parent:NA12843(T/T)
rs469471 : problem with NA07349(A/A) incompatibility with parent:NA07347(G/G)
rs8129674 : problem with NA10839(G/G) incompatibility with parent:NA12006(A/A)
rs2257224 : problem with NA10854(G/G) incompatibility with parent:NA11840(A/A)
rs865859 : problem with NA10855(C/C) incompatibility with parent:NA11832(T/T)
rs2742158 : problem with NA10855(C/C) incompatibility with parent:NA11832(T/T)
rs4111253 : problem with NA10836(C/C) incompatibility with parent:NA12275(T/T)
rs240444 : problem with NA10861(T/T) incompatibility with parent:NA11994(C/C)
rs469812 : problem with NA06991(C/C) incompatibility with parent:NA06985(G/G)
rs210534 : problem with NA06991(T/T) incompatibility with parent:NA06985(A/A)
rs2822670 : problem with NA10861(C/T) incompatibility with parents:NA11994(C/C) NA11995(C/C)
rs9305335 : problem with NA10831(C/T) incompatibility with parents:NA12155(C/C) NA12156(C/C)
rs9305297 : problem with NA12801(A/G) incompatibility with parents:NA12812(G/G) NA12813(G/G)
rs2822641 : problem with NA07349(A/C) incompatibility with parents:NA07347(C/C) NA07346(C/C)
rs9977057 : problem with NA12801(G/T) incompatibility with parents:NA12812(T/T) NA12813(T/T)
rs2822614 : problem with NA12877(C/T) incompatibility with parents:NA12889(C/C) NA12890(C/C)
rs2178907 : problem with NA07349(A/G) incompatibility with parents:NA07347(G/G) NA07346(G/G)
rs1124322 : problem with NA10837(A/G) incompatibility with parents:NA12272(G/G) NA12273(G/G)
rs2822537 : problem with NA12336(A/G) incompatibility with parents:NA12342(G/G) NA12343(G/G)
rs386524 : problem with NA12753(C/T) incompatibility with parents:NA12762(C/C) NA12763(C/C)
rs367249 : problem with NA12802(A/G) incompatibility with parents:NA12814(G/G) NA12815(G/G)
rs17001380 : problem with NA10856(G/T) incompatibility with parents:NA11829(G/G) NA11830(G/G)
rs2155965 : problem with NA12336(A/T) incompatibility with parents:NA12342(A/A) NA12343(A/A)
rs8133601 : problem with NA10854(C/T) incompatibility with parents:NA11839(C/C) NA11840(C/C)
rs8134986 : problem with NA07014(A/C) incompatibility with parents:NA07051(C/C) NA07031(C/C)
rs3094804 : problem with NA12865(C/T) incompatibility with parents:NA12874(T/T) NA12875(T/T)
rs1929150 : problem with NA06997(C/T) incompatibility with parents:NA06986(T/T) NA07045(T/T)
rs7276195 : problem with NA07019(A/G) incompatibility with parents:NA07022(A/A) NA07056(A/A)
rs3855691 : problem with NA12877(A/G) incompatibility with parents:NA12889(G/G) NA12890(G/G)
rs17468376 : problem with NA10835(A/G) incompatibility with parents:NA12248(A/A) NA12249(A/A)
rs2747618 : problem with NA10835(C/T) incompatibility with parents:NA12248(C/C) NA12249(C/C)
rs2943900 : problem with NA10854(C/T) incompatibility with parents:NA11839(T/T) NA11840(T/T)
rs4009972 : problem with NA10835(G/T) incompatibility with parents:NA12248(G/G) NA12249(G/G)
rs8133159 : problem with NA12767(A/G) incompatibility with parents:NA12777(A/A) NA12778(A/A)


Questions

So many questions: how should I model my data ? what if those data are alread present in database) ? how drools supports a large amount of data ? etc... etc...

That's it

Pierre

11 July 2010

PDFBox: insert/extract metadata from/into a PDF document


The apache project PDFBox contains is an API for handling some PDF documents. In the current post I'll show how I've used the PDFBox API to insert and extract some XMP metadata into/from a PDFDocument.

Extracting metadata from a PDF document

Reading the metadat is as simple as:
InputStream in=new FileInputStream(pdfFile);
PDFParser parser=new PDFParser(in);
parser.parse();
PDMetadata metadata = parser.getPDDocument().getDocumentCatalog().getMetadata();
if(metadata!=null)
{
System.out.println(metadata.getInputStreamAsString());
}

Inserting metadata into a PDF document

The metadata to be inserted are stored in a XML file.
<?xml version="1.0" encoding="UTF-8"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="">
<dc:creator rdf:resource="mailto:plindenbaum@yahoo.fr"/>
<dc:title>Hello World</dc:title>
<dc:date>2010-07-11</dc:date>
</rdf:Description>
<foaf:Person rdf:about="mailto:plindenbaum@yahoo.fr">
<foaf:name>Pierre Lindenbaum</foaf:name>
<foaf:depiction rdf:resource="http://a3.twimg.com/profile_images/51679789/photoIG_bigger.jpg"/>
</foaf:Person>
</rdf:RDF>
</x:xmpmeta>

This XML file is loaded as a DOM object in memory:
DocumentBuilderFactory f= DocumentBuilderFactory.newInstance();
f.setExpandEntityReferences(true);
f.setIgnoringComments(true);
f.setIgnoringElementContentWhitespace(true);
f.setValidating(false);
f.setCoalescing(true);
f.setNamespaceAware(true);
DocumentBuilder builder=f.newDocumentBuilder();
xmpDoc= builder.parse(xmpIn);
The pdf source is opened and the DOM document is inserted as a metadata. The pdf is then saved:
InputStream in=new FileInputStream(pdfIn);
PDFParser parser=new PDFParser(in);
parser.parse();
document= parser.getPDDocument();
PDDocumentCatalog cat = document.getDocumentCatalog();
PDMetadata metadata = new PDMetadata(document);
metadata.importXMPMetadata(new XMPMetadata(xmpDoc));
cat.setMetadata(metadata);
document.save(pdfOut);

Source code: ExtractXMP.java

import java.io.*;
import org.apache.pdfbox.pdfparser.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.pdmodel.common.*;
import org.apache.jempbox.xmp.XMPMetadata;


public class ExtractXMP
{
static private void extract(InputStream in)
throws Exception
{
PDDocument document=null;
try
{
PDFParser parser=new PDFParser(in);
parser.parse();
document= parser.getPDDocument();
if(document.isEncrypted())
{
System.err.println("Document is Encrypted!");
}
PDDocumentCatalog cat = document.getDocumentCatalog();
PDMetadata metadata = cat.getMetadata();
if(metadata!=null)
{
System.out.println(metadata.getInputStreamAsString());
}
}
catch(Exception err)
{
throw err;
}
finally
{
if(document!=null) try { document.close();} catch(Throwable err2) {}
}
}

static public void main(String args[])
{
try
{
int optind=0;
while(optind<args.length)
{
if(args[optind].equals("-h"))
{
System.err.println("Pierre Lindenbaum PhD. 2010");
System.err.println("-h this screen");
System.err.println("pdf1 pdf2 pdf3 ....");
return;
}
else if (args[optind].equals("--"))
{
++optind;
break;
}
else if (args[optind].startsWith("-"))
{
System.err.println("bad argument " + args[optind]);
System.exit(-1);
}
else
{
break;
}
++optind;
}
if(optind==args.length)
{
extract(System.in);
}
else
{
while(optind< args.length)
{
String filename=args[optind++];
InputStream in=new FileInputStream(filename);
extract(in);
in.close();
}
}


}
catch(Throwable err)
{
err.printStackTrace();
}
}
}

Source code: InsertXMP.java

import java.io.*;
import org.apache.pdfbox.pdfparser.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.pdmodel.common.*;
import org.apache.jempbox.xmp.XMPMetadata;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;


public class InsertXMP
{



static public void main(String args[])
{
PDDocument document=null;
InputStream in=null;
try
{
String xmpIn=null;
String pdfIn=null;
String pdfOut=null;
Document xmpDoc=null;
int optind=0;
while(optind<args.length)
{
if(args[optind].equals("-h"))
{
System.err.println("Pierre Lindenbaum PhD. 2010");
System.err.println("-h this screen");
System.err.println("-pdfin|-i <pdf-in>");
System.err.println("-xmpin|-x <xmp-in>");
System.err.println("-pdfout|-o <pdf-out>");
return;
}
else if(args[optind].equals("-xmpin") || args[optind].equals("-x"))
{
xmpIn= args[++optind];
}
else if(args[optind].equals("-pdfin") || args[optind].equals("-i"))
{
pdfIn= args[++optind];
}
else if(args[optind].equals("-pdfout") || args[optind].equals("-o"))
{
pdfOut= args[++optind];
}
else if (args[optind].equals("--"))
{
++optind;
break;
}
else if (args[optind].startsWith("-"))
{
System.err.println("bad argument " + args[optind]);
System.exit(-1);
}
else
{
break;
}
++optind;
}
if(optind!=args.length)
{
System.err.println("Illegal number of arguments");
return;
}
if(pdfIn==null)
{
System.err.println("pdf-in missing");
return;
}
if(pdfOut==null)
{
System.err.println("pdf-out missing");
return;
}
if(pdfIn.equals(pdfOut))
{
System.err.println("pdf-out is same as pdf-in");
return;
}
if(xmpIn==null)
{
System.err.println("XMP missing");
return;
}
else
{
DocumentBuilderFactory f= DocumentBuilderFactory.newInstance();
f.setExpandEntityReferences(true);
f.setIgnoringComments(true);
f.setIgnoringElementContentWhitespace(true);
f.setValidating(false);
f.setCoalescing(true);
f.setNamespaceAware(true);
DocumentBuilder builder=f.newDocumentBuilder();
xmpDoc= builder.parse(xmpIn);
}

in=new FileInputStream(pdfIn);
PDFParser parser=new PDFParser(in);
parser.parse();
document= parser.getPDDocument();
if(document.isEncrypted())
{
System.err.println("Warning ! Document is Encrypted!");
}
PDDocumentCatalog cat = document.getDocumentCatalog();
PDMetadata metadata = new PDMetadata(document);
metadata.importXMPMetadata(new XMPMetadata(xmpDoc));
cat.setMetadata(metadata);
document.save(pdfOut);
}
catch(Throwable err)
{
err.printStackTrace();
}
finally
{
if(document!=null) try { document.close();} catch(Throwable err2) {}
if(in!=null) try { in.close();} catch(Throwable err2) {}
}
}
}

Example

The following Makefile downloads a pdf file, compiles both program , inserts and extracts the metadata:
CLASSPATH=pdfbox-app-1.2.1.jar:pdfbox-app-1.2.1.jar:.
test: InsertXMP.class ExtractXMP.class article.pdf
echo "Metadata in article"
java -cp ${CLASSPATH} ExtractXMP article.pdf
echo "Insert Metadata in article"
java -cp ${CLASSPATH} InsertXMP -i article.pdf -o article_meta.pdf -x metadata.xmp
echo "Metadata in new article"
java -cp ${CLASSPATH} ExtractXMP article_meta.pdf
InsertXMP.class:InsertXMP.java
javac -cp ${CLASSPATH} InsertXMP.java
ExtractXMP.class:ExtractXMP.java
javac -cp ${CLASSPATH} ExtractXMP.java

article.pdf:
wget -O $@ "http://www.biomedcentral.com/content/pdf/1471-2156-10-16.pdf"

Output

javac -cp pdfbox-app-1.2.1.jar:pdfbox-app-1.2.1.jar:. InsertXMP.java
javac -cp pdfbox-app-1.2.1.jar:pdfbox-app-1.2.1.jar:. ExtractXMP.java

wget -O article.pdf "http://www.biomedcentral.com/content/pdf/1471-2156-10-16.pdf"
--2010-07-11 13:15:10-- http://www.biomedcentral.com/content/pdf/1471-2156-10-16.pdf
Resolving www.biomedcentral.com... 213.219.33.18
Connecting to www.biomedcentral.com|213.219.33.18|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1295524 (1.2M) [application/pdf]
Saving to: `article.pdf'

100%[================================================================================>] 1,295,524 123K/s in 11s

2010-07-11 13:15:21 (113 KB/s) - `article.pdf' saved [1295524/1295524]

Metadata in article
java -cp pdfbox-app-1.2.1.jar:pdfbox-app-1.2.1.jar:. ExtractXMP article.pdf

<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="3.1-701">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
<pdf:Producer>Acrobat Distiller 7.0 (Windows)</pdf:Producer>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xap="http://ns.adobe.com/xap/1.0/">
<xap:CreateDate>2009-05-05T19:28:38Z</xap:CreateDate>
<xap:CreatorTool>FrameMaker 7.1</xap:CreatorTool>
<xap:ModifyDate>2009-05-06T02:18:59+05:30</xap:ModifyDate>
<xap:MetadataDate>2009-05-06T02:18:59+05:30</xap:MetadataDate>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/pdf</dc:format>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">1471-2156-10-16.fm</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Seq>
<rdf:li>Ezhilan</rdf:li>
</rdf:Seq>
</dc:creator>
</rdf:Description>
<rdf:Description rdf:about=""
xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/">
<xapMM:DocumentID>uuid:d1d0f8d9-8321-4e4b-828b-d31b75daba0f</xapMM:DocumentID>
<xapMM:InstanceID>uuid:39d7db98-d873-4b33-be85-87319547e81c</xapMM:InstanceID>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>

<?xpacket end="w"?>



Insert Metadata in article


java -cp pdfbox-app-1.2.1.jar:pdfbox-app-1.2.1.jar:. InsertXMP -i article.pdf -o article_meta.pdf -x meta.xmp

Metadata in new article


java -cp pdfbox-app-1.2.1.jar:pdfbox-app-1.2.1.jar:. ExtractXMP article_meta.pdf
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="">
<dc:creator rdf:resource="mailto:plindenbaum@yahoo.fr"/>
<dc:title>Hello World</dc:title>
<dc:date>2010-07-11</dc:date>
</rdf:Description>
<foaf:Person rdf:about="mailto:plindenbaum@yahoo.fr">
<foaf:name>Pierre Lindenbaum</foaf:name>
<foaf:depiction rdf:resource="http://profile.ak.facebook.com/profile5/1306/97/s501154465_2583.jpg"/>
</foaf:Person>
</rdf:RDF>
</x:xmpmeta>


That's it

Pierre

10 July 2010

ANSI escape sequences: colorize this DNA !

From Wikipedia: ANSI escape sequences are used to control text formatting, color, and other output options on text terminals. So you can try to type:

echo -e '\x1b[31mRED!\x1b[32mGREEN!\x1b[33mYELLOW!\x1b[0m'
in your favorite terminal, and you should get:
RED!GREEN!YELLOW!

It is then simple to create a simple 'C' program colorizing the content of a FASTA sequence.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <ctype.h>
#include <unistd.h>
/*
from http://stackoverflow.com/questions/3219393/
http://en.wikipedia.org/wiki/ANSI_escape_code#Colors
*/
#define ANSI_COLOR_RED "\x1b[31m"
#define ANSI_COLOR_GREEN "\x1b[32m"
#define ANSI_COLOR_YELLOW "\x1b[33m"
#define ANSI_COLOR_BLUE "\x1b[34m"
#define ANSI_COLOR_MAGENTA "\x1b[35m"
#define ANSI_COLOR_CYAN "\x1b[36m"
#define ANSI_COLOR_BLACK "\x1b[0m"
#define ANSI_COLOR_RESET ANSI_COLOR_BLACK

static void fasta2term(FILE *in)
{
int prev_c=-1;
int c;

//not the terminal ? just 'cat'
if(!isatty(fileno(stdout)))
{
char buff[BUFSIZ];
int nRead;
while((nRead=fread(buff,1,BUFSIZ,in))>0)
{
fwrite(buff,1,nRead,stdout);
}
return;
}

while((c=fgetc(in))!=EOF)
{
if(c=='>')
{

if(prev_c!=-1) fputs(ANSI_COLOR_RESET,stdout);
fputc('>',stdout);
while((c=fgetc(in))!=EOF)
{
fputc(c,stdout);
if(c=='\n') break;
}
prev_c=-1;
continue;
}
else if(c==prev_c || isspace(c))
{
fputc(c,stdout);
continue;
}

switch(tolower(c))
{
case 'a': fputs(ANSI_COLOR_GREEN,stdout); break;
case 't': fputs(ANSI_COLOR_RED,stdout); break;
case 'g': fputs(ANSI_COLOR_BLACK,stdout); break;
case 'c': fputs(ANSI_COLOR_BLUE,stdout); break;
default: fputs(ANSI_COLOR_YELLOW,stdout); break;
}
fputc(c,stdout);
prev_c=c;
}
if(prev_c!=-1) fputs(ANSI_COLOR_RESET,stdout);
}

int main(int argc,char** argv)
{
int optind=1;
while(optind < argc)
{
if(strcmp(argv[optind],"-h")==0)
{
fprintf(stderr,"%s: Pierre Lindenbaum PHD. 2010.\n",argv[0]);
fprintf(stderr,"Compilation: %s at %s.\n",__DATE__,__TIME__);
exit(EXIT_FAILURE);
}
else if(strcmp(argv[optind],"--")==0)
{
++optind;
break;
}
else if(argv[optind][0]=='-')
{
fprintf(stderr,"unknown option '%s'\n",argv[optind]);
exit(EXIT_FAILURE);
}
else
{
break;
}
++optind;
}
if(optind==argc)
{
fasta2term(stdin);
}
else
{
while(optind< argc)
{
FILE* in;
errno=0;
in=fopen(argv[optind],"r");
if(in==NULL)
{
fprintf(stderr,"Cannot open %s : %s\n",argv[optind],strerror(errno));
return EXIT_FAILURE;
}
fasta2term(in);
fclose(in);
++optind;
}
}
return EXIT_SUCCESS;
}

Testing:
#compiling
gcc -o fasta2term -O3 -Wall fasta2term.c

#don't colorize when the output is piped into another program (=not a terminal)
./fasta2term ~/file.fasta | head -n 5
>gi|27592135
GGAAGGGCTGCCCCACCATTCATCCTTTTCTCGTAGTTTGTGCACGGTGCGGGAGGTTGTCTGAGTGACT
TCACGGGTCGCCTTTGTGCAGTACTAGATATGCAGCAGACCTATGACATGTGGCTAAAGAAACACAATCC
TGGGAAGCCTGGAGAGGGAACACCACTCACTTCGCGAGAAGGGGAGAAACAGATCCAGATGCCCACTGAC
TATGCTGACATCATGATGGGCTACCACTGCTGGCTCTGCGGGAAGAACAGCAACAGCAAGAAGCAATGGC

#do colorize if the output is a terminal
head -n5 ~/file.fasta | ./fasta2term
>gi|27592135
GGAAGGGCTGCCCCACCATTCATCCTTTTCTCGTAGTTTGTGCACGGTGCGGGAGGTTGTCTGAGTGACT
TCACGGGTCGCCTTTGTGCAGTACTAGATATGCAGCAGACCTATGACATGTGGCTAAAGAAACACAATCC
TGGGAAGCCTGGAGAGGGAACACCACTCACTTCGCGAGAAGGGGAGAAACAGATCCAGATGCCCACTGAC
TATGCTGACATCATGATGGGCTACCACTGCTGGCTCTGCGGGAAGAACAGCAACAGCAAGAAGCAATGGC


That's it

Pierre

08 July 2010

Pairwise Alignments in C++ (for fun). My notebook.

With Next Generation Sequencing technologies, I have more and more opportunities to program in C++. C/C++ was my first programming language and I now enjoy my past experience with java for designing some classes, using the design patterns, etc... . Thus, I've recently played with the pairwise alignments in C++ and, in this post I'll describe the classes I've used (hey, I wrote this for fun, I didn't intend to write a formal and optimized code about this subject (I'm not a specialist of this anyway). Furthermore, I will not describe what is Dynamic Programming).

The Classes


Sequence

Sequence is an abstract sequence of characters (= string). It has a 'size' and should return the index-th character.
class Sequence
{
public:
Sequence() { }
virtual ~Sequence() { }
virtual size_type size() const=0;
virtual char at(size_type index) const=0;
}*SequencePtr;

CCSequence

CCSequence is a concrete implementation of Sequence using a std::string as the string holder.
class CCSequence:public Sequence
{
private:
std::string str;
public:
CCSequence(const std::string& str):Sequence(),str(str) { }
virtual ~CCSequence() { }
virtual size_type size() const
{
return (size_type)str.size();
}
virtual char at(size_type index) const
{
return str[index];
}
};

PtrSequence

PtrSequence is another implementation of Sequence but here, it uses a C pointer char* as the string holder.
class PtrSequence:public Sequence
{
private:
const char* str;
std::size_t length;
public:
PtrSequence(const char* str,size_t length):Sequence(),str(str),length(length) { }
PtrSequence(const char* str):str(str),length(0UL) { length=std::strlen(str); }
virtual ~PtrSequence() {}
virtual size_type size() const
{
return (size_type)this->length;
}
virtual char at(size_type index) const
{
return str[index];
}
};

PathItem

This class is a component of the pairwise alignment. It stores the positions on both sequences. A position equals to '-1' would be a gap.
typedef struct PathItem
{
Sequence::size_type x;
Sequence::size_type y;
PathItem(Sequence::size_type x,Sequence::size_type y):x(x),y(y)
{
}
Sequence::size_type get(int side) const { return (side==0?x:y);}
}*PathItemPtr;

Path

the class Path is a solution of the matrix traversal by the dynamic algorithm. It is a vector of PathItem and it also implements Sequence as the consensus of the alignment:
typedef class Path: public Sequence
{
private:
const SequencePtr seqX;
const SequencePtr seqY;
std::vector<PathItem> path;

std::auto_ptr<Sequence> _asSeq(int side) const
{
const SequencePtr seq=(side==0?seqX:seqY);
std::vector<PathItem>::const_iterator r=path.begin();
std::vector<PathItem>::const_iterator r_end=path.end();
std::ostringstream os;
while(r!=r_end)
{
Sequence::size_type n=r->get(side);
os << (n==-1?'-':seq->at(n));
++r;
}
return std::auto_ptr<Sequence>(new CCSequence(os.str()));
}
public:
Path(const SequencePtr seqX,const SequencePtr seqY,std::vector<PathItem> path):Sequence(),seqX(seqX),seqY(seqY),path(path)
{
}

virtual ~Path()
{
}

const SequencePtr X() const
{
return seqX;
}
const SequencePtr Y() const
{
return seqY;
}
std::auto_ptr<Sequence> consensusX() const
{
return _asSeq(0);
}
std::auto_ptr<Sequence> consensusY() const
{
return _asSeq(1);
}
std::auto_ptr<std::string> mid() const
{
std::vector<PathItem>::const_iterator r=path.begin();
std::vector<PathItem>::const_iterator r_end=path.end();
std::ostringstream os;
while(r!=r_end)
{
if(r->x==-1 || r->y==-1)
{
os << ' ';
}
else if(std::toupper(X()->at(r->x))==std::toupper(Y()->at(r->y)))
{
os << '|';
}
else
{
os << ' ';
}
++r;
}
return std::auto_ptr<std::string>(new std::string(os.str()));
}
virtual size_type size() const
{
return path.size();
}
virtual char at(size_type index) const
{
const PathItem& i=path.at(index);
if(i.x==-1 && i.y==-1) return '-';
if(i.x==-1) return Y()->at(i.y);
if(i.y==-1) return X()->at(i.x);
char cx= X()->at(i.x);
char cy= Y()->at(i.y);
return cx==cy?cx:'X';
}

std::ostream& print(std::ostream& out) const
{
out << *(consensusX()) << "\n"
<< *(mid()) << "\n"
<< *(consensusY()) << "\n"
;
return out;
}
}*PathPtr;

SubstitutionMatrix

SubstitutionMatrix is an abstract class. Its function compare returns the cost of the substitution of the symbol 'c1' by the symbol 'c2':
template<typename SCORE>
class SubstitutionMatrix
{
public:
virtual SCORE compare(char c1,char c2) const=0;
};

Identity

Identity is a simple implementation of SubstitutionMatrix. It returns 1 if the symbols are identical, else it returns -1.
template<typename SCORE>
class Identity:public SubstitutionMatrix<SCORE>
{
public:
virtual SCORE compare(char c1,char c2) const
{
return (SCORE)(std::toupper(c1)==std::toupper(c2)?1:-1);
}
};

Blosum

Blosum is an abstract implementation of SubstitutionMatrix for a BLOSUM matrix:
/** cf. ftp://ftp.ncbi.nih.gov/blast/matrices */
template<typename SCORE>
class Blosum:public SubstitutionMatrix<SCORE>
{
protected:
virtual int aa2index(char c) const
{
switch(std::toupper(c))
{
case 'A': return 0;
(...)
}
}
public:
virtual SCORE compare(char c1,char c2) const=0;
};

Blosum62

It is a concrete implementation of Blosum
static const signed char _blosum62[]={
4,-1,-2,-2,0,-1,-1,0,-2,-1,-1,-1,-1,-2,-1,1,0,-3,-2,0,-2,-1,0,-4,
(...)
};
/** ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62 */
template<typename SCORE>
class Blosum62:public Blosum<SCORE>
{
public:
virtual SCORE compare(char c1,char c2) const
{
return (SCORE)::_blosum62[this->aa2index(c1)*24+ this->aa2index(c2)];
}
};

Array2D

This abstract class holds the matrix of type 'T' for the dynamic algorithm. For a given matrix we need to know its width, its height , getthe value at position(x,y) and we want to set the value at position(x,y).
emplate<typename T>
class Array2D
{
public:
Array2D() {}
virtual ~Array2D() {}
virtual size_type width() const=0;
virtual size_type height() const=0;
virtual T get(size_type x,size_type y) const=0;
virtual void set(size_type x,size_type y,T val)=0;
protected:
std::size_t offset(size_type x,size_type y) const
{
return (this->width())*y + x;
}
};

DefaultArray2D

DefaultArray2D is a first concrete implementation of Array2D. It holds everything in memory:
template<typename T>
class DefaultArray2D:public Array2D<T>
{
protected:
size_type _width;
size_type _height;
T* matrix;

public:
DefaultArray2D( size_type width, size_type height):_width(width),_height(height)
{
matrix=new T[width*height];
}

virtual ~DefaultArray2D()
{
delete [] matrix;
}

virtual size_type width() const
{
return this->_width;
}

virtual size_type height() const
{
return this->_height;
}

virtual T get(size_type x,size_type y) const
{
return matrix[this->offset(x,y)];
}
virtual void set(size_type x,size_type y,T val)
{
matrix[this->offset(x,y)]=val;
}
};

StoredArray2D

StoredArray2D is second lousy concrete implementation of Array2D. It stores the matrix in a temporary file:
template<typename T>
class StoredArray2D:public Array2D<T>
{
private:
size_type _width;
size_type _height;
/** FILE pointer to the binary file */
mutable FILE* io_ptr;
void move(size_type x, size_type y) const
{
if(std::fseek (io_ptr,this->offset(x,y)*sizeof(T), SEEK_SET)!=0)
{
throw std::runtime_error("Cannot fseek");
}
}
public:
StoredArray2D( size_type width, size_type height):_width(width),_height(height),io_ptr(NULL)
{
io_ptr= std::tmpfile();
if(io_ptr==NULL) throw std::runtime_error("Cannot open tmp file");
T data;
std::memset(&data,0,sizeof(T));
for(std::size_t i=0;i< (size_t)(width*height);++i)
{
if(std::fwrite((void*)&data,sizeof(T),1,io_ptr)!=1)
{
std::fclose(io_ptr);
io_ptr=NULL;
throw std::runtime_error("write matrix");
}
}
}

virtual ~StoredArray2D()
{
if(io_ptr!=NULL) std::fclose(io_ptr);
}

virtual size_type width() const
{
return this->_width;
}

virtual size_type height() const
{
return this->_height;
}

virtual T get(size_type x,size_type y) const
{
T data;
move(x,y);
if(std::fread((void*)&data,sizeof(T),1,io_ptr)!=1)
{
throw std::runtime_error("cannot read");
}
return data;
}
virtual void set(size_type x,size_type y,T data)
{
move(x,y);
if(std::fwrite((void*)&data,sizeof(T),1,io_ptr)!=1)
{
throw std::runtime_error("cannot write");
}
}
};

Array2DFactory

Array2DFactory is a factory for an Array2D. It will allow our program to smoothly switch between any kind of Array2D:
template<typename T>
class Array2DFactory
{
private:
bool stored;
public:
//typedef typename Array2D<T>::size_type size_type;
Array2DFactory():stored(false) {}
virtual ~Array2DFactory() {}

void setStored(bool stored)
{
this->stored=stored;
}
bool isStored() const
{
return this->stored;
}

/** creates a new Array2D */
Array2D<T>* newArray(
size_type width,
size_type height
) const
{
if(isStored())
{
return new StoredArray2D<T>(width,height);
}
return new DefaultArray2D<T>(width,height);
}
};

Penaly

Penalty is an abstract class that will used to fill the edge of the matrix. It returns the score for inserting a symbol in a given sequence at a given position!
template<typename SCORE>
class Penalty
{
public:
Penalty() {}
virtual ~Penalty() {}
virtual SCORE get(const SequencePtr seq,int position) const=0;
};

DefaultPenalty

DefaultPenalty is a concrete implementation of Penalty. It uses a constant value for any position of the sequence.
template<typename SCORE>
class DefaultPenalty:public Penalty<SCORE>
{
private:
SCORE value;
public:
DefaultPenalty(SCORE value):value(value) {}
virtual ~DefaultPenalty() {}
virtual SCORE get(const SequencePtr seq,int position) const
{
return value;
}
};

Aligner

Aligner is an abstract pairwise aligner. It contains two Sequences, an Array2DFactory, a SubstitutionMatrix, etc... Its function align aligns the two sequences and path return the Path for the two aligned sequences:
template<typename T,typename SCORE>
class Aligner
{
private:
/** horizontal sequence */
SequencePtr seqX;
/** vertical sequence */
SequencePtr seqY;
/** array2D factory */
Array2DFactory<T>* array2dfactory;
/** substitution matrix */
SubstitutionMatrix<SCORE>* subsitutionMatrix;
/** penalty X */
Penalty<SCORE>* penaltyX;
/** penalty Y */
Penalty<SCORE>* penaltyY;
protected:
Aligner():seqX(NULL),seqY(NULL),
array2dfactory(NULL),
subsitutionMatrix(NULL)
{
}
public:
(...)
virtual SCORE align()=0;
virtual std::auto_ptr<Path> path()=0;
};

Needleman

Needleman is an implementation of Aligner for the Needleman & Wunsch algorithm.
template<typename T,typename SCORE>
class Needleman:public Aligner<T,SCORE>
{
private:
/** current matrix */
Array2D<T>* matrix;
public:
Needleman( ):matrix(NULL)
{

}

virtual ~Needleman()
{
}

/** clear the internal data if it exists */
virtual void clear()
{
Aligner<T,SCORE>::clear();
if(matrix!=NULL) delete matrix;
matrix=NULL;
}

virtual SCORE align()
{
/* clear the previous Array2D */
clear();
/* ask the factory for a new Array2D */
this->matrix = this->getArray2DFactory()->newArray(this->X()->size()+1, this->Y()->size()+1);

for(Sequence::size_type x=0;x<= this->X()->size();++x)
{
this->matrix->set(x,0,x*this->getPenaltyX()->get(this->X(),x) );
}
for(Sequence::size_type y=0;y<= this->Y()->size();++y)
{
this->matrix->set(0,y,y*this->getPenaltyY()->get(this->Y(),y));
}
for(Sequence::size_type x=1;x<= this->X()->size();++x)
{
for(Sequence::size_type y=1;y<= this->Y()->size();++y)
{
SCORE diag = matrix->get(x-1,y-1)+
this->compare(x-1,y-1);
SCORE delet= matrix->get(x-1,y) + this->getPenaltyX()->get(this->X(),x);
SCORE inser= matrix->get(x,y-1) + this->getPenaltyY()->get(this->Y(),y);
matrix->set(x,y,std::max(diag,std::max(delet,inser)));
}
}
return matrix->get(this->X()->size(),this->Y()->size());
}

public:

virtual std::auto_ptr<Path> path()
{
std::vector<PathItem> items;
Sequence::size_type x= this->X()->size();
Sequence::size_type y= this->Y()->size();
while (x>0 && y>0)
{
SCORE diag= matrix->get(x-1,y-1);
SCORE up = matrix->get(x, y- 1);
SCORE left = matrix->get(x-1,y);

if (diag >= up && diag >= left)
{
items.push_back(PathItem(x-1,y-1));
--x;
--y;
}
else if (left> up)
{
items.push_back(PathItem(x-1,-1));
--x;
}
else
{
items.push_back(PathItem(-1,y-1));
--y;
}
}

while (x > 0)
{
items.push_back(PathItem(x-1,-1));
--x;
}
while (y > 0)
{
items.push_back(PathItem(-1,y-1));
--y;
}
std::reverse(items.begin(),items.end());
return std::auto_ptr<Path>(new Path(this->X(),this->Y(),items));
}


};

SWaterman

SWaterman is an implementation of Aligner for the Smith & Waterman.
template<typename T,typename SCORE>
class SWaterman:public Aligner<T,SCORE>
{
private:
/** current matrix */
Array2D<T>* matrix;
SCORE best_score;
Sequence::size_type best_x;
Sequence::size_type best_y;
public:
SWaterman( ):matrix(NULL),best_score(0),best_x(0),best_y(0)
{

}

virtual ~SWaterman()
{
}

/** clear the internal data if it exists */
virtual void clear()
{
Aligner<T,SCORE>::clear();
if(matrix!=NULL) delete matrix;
matrix=NULL;
best_x=0;
best_y=0;
best_score=0;
}

virtual SCORE align()
{
/* clear the previous Array2D */
clear();
/* ask the factory for a new Array2D */
this->matrix = this->getArray2DFactory()->newArray(this->X()->size()+1, this->Y()->size()+1);

for(Sequence::size_type x=0;x<= this->X()->size();++x)
{
this->matrix->set(x,0,0);
}
for(Sequence::size_type y=0;y<= this->Y()->size();++y)
{
this->matrix->set(0,y,0);
}
for(Sequence::size_type x=1;x<= this->X()->size();++x)
{
for(Sequence::size_type y=1;y<= this->Y()->size();++y)
{
SCORE diag = matrix->get(x-1,y-1)+
this->compare(x-1,y-1);
SCORE delet= matrix->get(x-1,y) + this->getPenaltyX()->get(this->X(),x);
SCORE inser= matrix->get(x,y-1) + this->getPenaltyY()->get(this->Y(),y);

SCORE here= std::max((SCORE)0,std::max(diag,std::max(delet,inser)));
matrix->set(x,y,here);
if(best_score<here)
{
best_score=here;
best_x=x;
best_y=y;
}
}
}
return matrix->get(this->X()->size(),this->Y()->size());
}

public:

virtual std::auto_ptr<Path> path()
{
std::vector<PathItem> items;
Sequence::size_type x= best_x;
Sequence::size_type y= best_y;
while (x>0 && y>0 && matrix->get(x,y)!=0)
{
SCORE diag= matrix->get(x-1,y-1);
SCORE up = matrix->get(x, y- 1);
SCORE left = matrix->get(x-1,y);

if (diag >= up && diag >= left)
{
items.push_back(PathItem(x-1,y-1));
--x;
--y;
}
else if (left> up)
{
items.push_back(PathItem(x-1,-1));
--x;
}
else
{
items.push_back(PathItem(-1,y-1));
--y;
}
}

std::reverse(items.begin(),items.end());
return std::auto_ptr<Path>(new Path(this->X(),this->Y(),items));
}
};

AlignerFactory

AlignerFactory is a factory for an Aligner. It will allow our program to smoothly switch between any kind of Aligner:
template<typename T,typename SCORE>
class AlignerFactory
{
private:
bool local;
public:
AlignerFactory():local(false)
{
}

void setLocal(bool local) { this->local=local;}
bool isLocal() const { return local;}
std::auto_ptr<Aligner<T,SCORE> > newAligner()
{
if(isLocal())
{
return std::auto_ptr<Aligner<T,SCORE> >(new SWaterman<T,SCORE>());
}
return std::auto_ptr<Aligner<T,SCORE> >(new Needleman<T,SCORE>());
}
};

All in one, the full source code

#include <cstring>
#include <cstdlib>
#include <cstdio>
#include <cctype>
#include <algorithm>
#include <cassert>
#include <iostream>
#include <iomanip>
#include <stdexcept>
#include <vector>
#include <memory>
#include <sstream>

typedef signed long size_type;


typedef class Sequence
{
public:
typedef signed long size_type;
Sequence()
{
}
virtual ~Sequence()
{
}

virtual size_type size() const=0;
virtual char at(size_type index) const=0;
char operator[](size_type index) const
{
return this->at(index);
}
virtual std::ostream& print(std::ostream& out) const
{
size_type L=size();
for(size_type i=0;i< L;++i) out << at(i);
return out;
}
}*SequencePtr;

std::ostream& operator << (std::ostream& out, const Sequence& seq)
{
return seq.print(out);
}

class CCSequence:public Sequence
{
private:
std::string str;
public:
CCSequence(const std::string& str):Sequence(),str(str)
{
}
virtual ~CCSequence()
{
}
virtual size_type size() const
{
return (size_type)str.size();
}
virtual char at(size_type index) const
{
return str[index];
}
};

class PtrSequence:public Sequence
{
private:
const char* str;
std::size_t length;
public:
PtrSequence(const char* str,size_t length):Sequence(),str(str),length(length)
{
assert(str!=NULL);
assert(length>=0);
}
PtrSequence(const char* str):str(str),length(0UL)
{
assert(str!=NULL);
length=std::strlen(str);
}
virtual ~PtrSequence()
{
}
virtual size_type size() const
{
return (size_type)this->length;
}
virtual char at(size_type index) const
{
assert(index>=0);
assert(index<size());
return str[index];
}
};

typedef struct PathItem
{
Sequence::size_type x;
Sequence::size_type y;
PathItem(Sequence::size_type x,Sequence::size_type y):x(x),y(y)
{
}
Sequence::size_type get(int side) const { return (side==0?x:y);}
}*PathItemPtr;

typedef class Path: public Sequence
{
private:
const SequencePtr seqX;
const SequencePtr seqY;
std::vector<PathItem> path;

std::auto_ptr<Sequence> _asSeq(int side) const
{
const SequencePtr seq=(side==0?seqX:seqY);
std::vector<PathItem>::const_iterator r=path.begin();
std::vector<PathItem>::const_iterator r_end=path.end();
std::ostringstream os;
while(r!=r_end)
{
Sequence::size_type n=r->get(side);
os << (n==-1?'-':seq->at(n));
++r;
}
return std::auto_ptr<Sequence>(new CCSequence(os.str()));
}
public:
Path(const SequencePtr seqX,const SequencePtr seqY,std::vector<PathItem> path):Sequence(),seqX(seqX),seqY(seqY),path(path)
{
}

virtual ~Path()
{
}

const SequencePtr X() const
{
return seqX;
}
const SequencePtr Y() const
{
return seqY;
}
std::auto_ptr<Sequence> consensusX() const
{
return _asSeq(0);
}
std::auto_ptr<Sequence> consensusY() const
{
return _asSeq(1);
}
std::auto_ptr<std::string> mid() const
{
std::vector<PathItem>::const_iterator r=path.begin();
std::vector<PathItem>::const_iterator r_end=path.end();
std::ostringstream os;
while(r!=r_end)
{
if(r->x==-1 || r->y==-1)
{
os << ' ';
}
else if(std::toupper(X()->at(r->x))==std::toupper(Y()->at(r->y)))
{
os << '|';
}
else
{
os << ' ';
}
++r;
}
return std::auto_ptr<std::string>(new std::string(os.str()));
}
virtual size_type size() const
{
return path.size();
}
virtual char at(size_type index) const
{
const PathItem& i=path.at(index);
if(i.x==-1 && i.y==-1) return '-';
if(i.x==-1) return Y()->at(i.y);
if(i.y==-1) return X()->at(i.x);
char cx= X()->at(i.x);
char cy= Y()->at(i.y);
return cx==cy?cx:'X';
}

std::ostream& print(std::ostream& out) const
{
out << *(consensusX()) << "\n"
<< *(mid()) << "\n"
<< *(consensusY()) << "\n"
;
return out;
}
}*PathPtr;



std::ostream& operator << (std::ostream& out, const Path& path)
{
return path.print(out);
}


template<typename SCORE>
class SubstitutionMatrix
{
public:
virtual SCORE compare(char c1,char c2) const=0;
};



template<typename SCORE>
class Identity:public SubstitutionMatrix<SCORE>
{
public:
virtual SCORE compare(char c1,char c2) const
{
return (SCORE)(std::toupper(c1)==std::toupper(c2)?1:-1);
}
};

/** cf. ftp://ftp.ncbi.nih.gov/blast/matrices */
template<typename SCORE>
class Blosum:public SubstitutionMatrix<SCORE>
{
protected:
virtual int aa2index(char c) const
{
switch(std::toupper(c))
{
case 'A': return 0;
case 'R': return 1;
case 'N': return 2;
case 'D': return 3;
case 'C': return 4;
case 'Q': return 5;
case 'E': return 6;
case 'G': return 7;
case 'H': return 8;
case 'I': return 9;
case 'L': return 10;
case 'K': return 11;
case 'M': return 12;
case 'F': return 13;
case 'P': return 14;
case 'S': return 15;
case 'T': return 16;
case 'W': return 17;
case 'Y': return 18;
case 'V': return 19;
case 'B': return 20;
case 'Z': return 21;
case 'X': return 22;
default: return 23;
}
}
public:
virtual SCORE compare(char c1,char c2) const=0;
};


static const signed char _blosum62[]={
4,-1,-2,-2,0,-1,-1,0,-2,-1,-1,-1,-1,-2,-1,1,0,-3,-2,0,-2,-1,0,-4,
-1,5,0,-2,-3,1,0,-2,0,-3,-2,2,-1,-3,-2,-1,-1,-3,-2,-3,-1,0,-1,-4,
-2,0,6,1,-3,0,0,0,1,-3,-3,0,-2,-3,-2,1,0,-4,-2,-3,3,0,-1,-4,
-2,-2,1,6,-3,0,2,-1,-1,-3,-4,-1,-3,-3,-1,0,-1,-4,-3,-3,4,1,-1,-4,
0,-3,-3,-3,9,-3,-4,-3,-3,-1,-1,-3,-1,-2,-3,-1,-1,-2,-2,-1,-3,-3,-2,-4,
-1,1,0,0,-3,5,2,-2,0,-3,-2,1,0,-3,-1,0,-1,-2,-1,-2,0,3,-1,-4,
-1,0,0,2,-4,2,5,-2,0,-3,-3,1,-2,-3,-1,0,-1,-3,-2,-2,1,4,-1,-4,
0,-2,0,-1,-3,-2,-2,6,-2,-4,-4,-2,-3,-3,-2,0,-2,-2,-3,-3,-1,-2,-1,-4,
-2,0,1,-1,-3,0,0,-2,8,-3,-3,-1,-2,-1,-2,-1,-2,-2,2,-3,0,0,-1,-4,
-1,-3,-3,-3,-1,-3,-3,-4,-3,4,2,-3,1,0,-3,-2,-1,-3,-1,3,-3,-3,-1,-4,
-1,-2,-3,-4,-1,-2,-3,-4,-3,2,4,-2,2,0,-3,-2,-1,-2,-1,1,-4,-3,-1,-4,
-1,2,0,-1,-3,1,1,-2,-1,-3,-2,5,-1,-3,-1,0,-1,-3,-2,-2,0,1,-1,-4,
-1,-1,-2,-3,-1,0,-2,-3,-2,1,2,-1,5,0,-2,-1,-1,-1,-1,1,-3,-1,-1,-4,
-2,-3,-3,-3,-2,-3,-3,-3,-1,0,0,-3,0,6,-4,-2,-2,1,3,-1,-3,-3,-1,-4,
-1,-2,-2,-1,-3,-1,-1,-2,-2,-3,-3,-1,-2,-4,7,-1,-1,-4,-3,-2,-2,-1,-2,-4,
1,-1,1,0,-1,0,0,0,-1,-2,-2,0,-1,-2,-1,4,1,-3,-2,-2,0,0,0,-4,
0,-1,0,-1,-1,-1,-1,-2,-2,-1,-1,-1,-1,-2,-1,1,5,-2,-2,0,-1,-1,0,-4,
-3,-3,-4,-4,-2,-2,-3,-2,-2,-3,-2,-3,-1,1,-4,-3,-2,11,2,-3,-4,-3,-2,-4,
-2,-2,-2,-3,-2,-1,-2,-3,2,-1,-1,-2,-1,3,-3,-2,-2,2,7,-1,-3,-2,-1,-4,
0,-3,-3,-3,-1,-2,-2,-3,-3,3,1,-2,1,-1,-2,-2,0,-3,-1,4,-3,-2,-1,-4,
-2,-1,3,4,-3,0,1,-1,0,-3,-4,0,-3,-3,-2,0,-1,-4,-3,-3,4,1,-1,-4,
-1,0,0,1,-3,3,4,-2,0,-3,-3,1,-1,-3,-1,0,-1,-3,-2,-2,1,4,-1,-4,
0,-1,-1,-1,-2,-1,-1,-1,-1,-1,-1,-1,-1,-1,-2,0,0,-2,-1,-1,-1,-1,-1,-4,
-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,1
};

/** ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62 */
template<typename SCORE>
class Blosum62:public Blosum<SCORE>
{
private:

const static signed char blosum62[];
public:
virtual SCORE compare(char c1,char c2) const
{
return (SCORE)::_blosum62[this->aa2index(c1)*24+ this->aa2index(c2)];
}
};




template<typename T>
class Array2D
{
public:
//typedef unsigned int size_type;
Array2D() {}
virtual ~Array2D() {}

virtual size_type width() const=0;
virtual size_type height() const=0;
virtual T get(size_type x,size_type y) const=0;
virtual void set(size_type x,size_type y,T val)=0;
virtual std::ostream& print(std::ostream& out) const
{
for(size_type j=0;j< height();++j)
{
for(size_type i=0;i< width();++i)
{
if(i>0) out << " ";
out << std::setw(4) << get(i,j);
}
out << std::endl;
}
return out;
}
protected:
std::size_t offset(size_type x,size_type y) const
{
return (this->width())*y + x;
}
};


template<typename T>
class DefaultArray2D:public Array2D<T>
{
protected:
size_type _width;
size_type _height;
T* matrix;

public:
DefaultArray2D( size_type width, size_type height):_width(width),_height(height)
{
matrix=new T[width*height];
}

virtual ~DefaultArray2D()
{
delete [] matrix;
}

virtual size_type width() const
{
return this->_width;
}

virtual size_type height() const
{
return this->_height;
}

virtual T get(size_type x,size_type y) const
{
return matrix[this->offset(x,y)];
}
virtual void set(size_type x,size_type y,T val)
{
matrix[this->offset(x,y)]=val;
}
};

template<typename T>
class StoredArray2D:public Array2D<T>
{
private:
size_type _width;
size_type _height;
/** FILE pointer to the binary file */
mutable FILE* io_ptr;
void move(size_type x, size_type y) const
{
if(std::fseek (io_ptr,this->offset(x,y)*sizeof(T), SEEK_SET)!=0)
{
throw std::runtime_error("Cannot fseek");
}
}
public:
StoredArray2D( size_type width, size_type height):_width(width),_height(height),io_ptr(NULL)
{
io_ptr= std::tmpfile();
if(io_ptr==NULL) throw std::runtime_error("Cannot open tmp file");
T data;
std::memset(&data,0,sizeof(T));
for(std::size_t i=0;i< (size_t)(width*height);++i)
{
if(std::fwrite((void*)&data,sizeof(T),1,io_ptr)!=1)
{
std::fclose(io_ptr);
io_ptr=NULL;
throw std::runtime_error("write matrix");
}
}
}

virtual ~StoredArray2D()
{
if(io_ptr!=NULL) std::fclose(io_ptr);
}

virtual size_type width() const
{
return this->_width;
}

virtual size_type height() const
{
return this->_height;
}

virtual T get(size_type x,size_type y) const
{
T data;
move(x,y);
if(std::fread((void*)&data,sizeof(T),1,io_ptr)!=1)
{
throw std::runtime_error("cannot read");
}
return data;
}
virtual void set(size_type x,size_type y,T data)
{
move(x,y);
if(std::fwrite((void*)&data,sizeof(T),1,io_ptr)!=1)
{
throw std::runtime_error("cannot write");
}
}
};


template<typename T>
class Array2DFactory
{
private:
bool stored;
public:
//typedef typename Array2D<T>::size_type size_type;
Array2DFactory():stored(false) {}
virtual ~Array2DFactory() {}

void setStored(bool stored)
{
this->stored=stored;
}
bool isStored() const
{
return this->stored;
}

/** creates a new Array2D */
Array2D<T>* newArray(
size_type width,
size_type height
) const
{
if(isStored())
{
return new StoredArray2D<T>(width,height);
}
return new DefaultArray2D<T>(width,height);
}
};


template<typename SCORE>
class Penalty
{
public:
Penalty() {}
virtual ~Penalty() {}
virtual SCORE get(const SequencePtr seq,int position) const=0;
};

template<typename SCORE>
class DefaultPenalty:public Penalty<SCORE>
{
private:
SCORE value;
public:
DefaultPenalty(SCORE value):value(value) {}
virtual ~DefaultPenalty() {}
virtual SCORE get(const SequencePtr seq,int position) const
{
return value;
}
};

template<typename T,typename SCORE>
class Aligner
{
private:
/** horizontal sequence */
SequencePtr seqX;
/** vertical sequence */
SequencePtr seqY;
/** array2D factory */
Array2DFactory<T>* array2dfactory;
/** substitution matrix */
SubstitutionMatrix<SCORE>* subsitutionMatrix;
/** penalty X */
Penalty<SCORE>* penaltyX;
/** penalty Y */
Penalty<SCORE>* penaltyY;
protected:


Aligner():seqX(NULL),seqY(NULL),
array2dfactory(NULL),
subsitutionMatrix(NULL)
{
}
public:
/** clear the internal data if it exists */
virtual void clear()
{
}

virtual ~Aligner()
{
clear();
}
/** returns the horizontal sequence */
virtual const SequencePtr X() const
{
return this->seqX;
}
/** returns the vertical sequence */
virtual const SequencePtr Y() const
{
return this->seqY;
}
virtual void setX(SequencePtr seqX) { this->seqX=seqX; clear();}
virtual void setY(SequencePtr seqY) { this->seqY=seqY; clear();}
virtual void setArray2DFactory(Array2DFactory<T>* array2dfactory) { this->array2dfactory=array2dfactory; clear();}
virtual const Array2DFactory<T>* getArray2DFactory() const { return this->array2dfactory;}
virtual void setSubstitutionMatrix(SubstitutionMatrix<SCORE>* subsitutionMatrix) { this->subsitutionMatrix=subsitutionMatrix; clear();}
virtual const SubstitutionMatrix<SCORE>* getSubstitutionMatrix() const { return this->subsitutionMatrix;}
virtual void setPenaltyX(Penalty<SCORE>* penaltyX) { this->penaltyX=penaltyX; clear();}
virtual Penalty<SCORE>* getPenaltyX() const { return this->penaltyX;}
virtual void setPenaltyY(Penalty<SCORE>* penaltyY) { this->penaltyY=penaltyY; clear();}
virtual Penalty<SCORE>* getPenaltyY() const { return this->penaltyY;}

virtual SCORE align()=0;
virtual std::auto_ptr<Path> path()=0;
protected:
/** shortcut to compare X[x] and Y[y] */
int compare(Sequence::size_type posx,Sequence::size_type posy) const
{
return getSubstitutionMatrix()->compare(X()->at(posx),Y()->at(posy));
}
};

template<typename T,typename SCORE>
class Needleman:public Aligner<T,SCORE>
{
private:
/** current matrix */
Array2D<T>* matrix;
public:
Needleman( ):matrix(NULL)
{

}

virtual ~Needleman()
{
}

/** clear the internal data if it exists */
virtual void clear()
{
Aligner<T,SCORE>::clear();
if(matrix!=NULL) delete matrix;
matrix=NULL;
}

virtual SCORE align()
{
/* clear the previous Array2D */
clear();
/* ask the factory for a new Array2D */
this->matrix = this->getArray2DFactory()->newArray(this->X()->size()+1, this->Y()->size()+1);

for(Sequence::size_type x=0;x<= this->X()->size();++x)
{
this->matrix->set(x,0,x*this->getPenaltyX()->get(this->X(),x) );
}
for(Sequence::size_type y=0;y<= this->Y()->size();++y)
{
this->matrix->set(0,y,y*this->getPenaltyY()->get(this->Y(),y));
}
for(Sequence::size_type x=1;x<= this->X()->size();++x)
{
for(Sequence::size_type y=1;y<= this->Y()->size();++y)
{
SCORE diag = matrix->get(x-1,y-1)+
this->compare(x-1,y-1);
SCORE delet= matrix->get(x-1,y) + this->getPenaltyX()->get(this->X(),x);
SCORE inser= matrix->get(x,y-1) + this->getPenaltyY()->get(this->Y(),y);
matrix->set(x,y,std::max(diag,std::max(delet,inser)));
}
}
return matrix->get(this->X()->size(),this->Y()->size());
}

public:

virtual std::auto_ptr<Path> path()
{
std::vector<PathItem> items;
Sequence::size_type x= this->X()->size();
Sequence::size_type y= this->Y()->size();
while (x>0 && y>0)
{
SCORE diag= matrix->get(x-1,y-1);
SCORE up = matrix->get(x, y- 1);
SCORE left = matrix->get(x-1,y);

if (diag >= up && diag >= left)
{
items.push_back(PathItem(x-1,y-1));
--x;
--y;
}
else if (left> up)
{
items.push_back(PathItem(x-1,-1));
--x;
}
else
{
items.push_back(PathItem(-1,y-1));
--y;
}
}

while (x > 0)
{
items.push_back(PathItem(x-1,-1));
--x;
}
while (y > 0)
{
items.push_back(PathItem(-1,y-1));
--y;
}
std::reverse(items.begin(),items.end());
return std::auto_ptr<Path>(new Path(this->X(),this->Y(),items));
}


};


template<typename T,typename SCORE>
class SWaterman:public Aligner<T,SCORE>
{
private:
/** current matrix */
Array2D<T>* matrix;
SCORE best_score;
Sequence::size_type best_x;
Sequence::size_type best_y;
public:
SWaterman( ):matrix(NULL),best_score(0),best_x(0),best_y(0)
{

}

virtual ~SWaterman()
{
}

/** clear the internal data if it exists */
virtual void clear()
{
Aligner<T,SCORE>::clear();
if(matrix!=NULL) delete matrix;
matrix=NULL;
best_x=0;
best_y=0;
best_score=0;
}

virtual SCORE align()
{
/* clear the previous Array2D */
clear();
/* ask the factory for a new Array2D */
this->matrix = this->getArray2DFactory()->newArray(this->X()->size()+1, this->Y()->size()+1);

for(Sequence::size_type x=0;x<= this->X()->size();++x)
{
this->matrix->set(x,0,0);
}
for(Sequence::size_type y=0;y<= this->Y()->size();++y)
{
this->matrix->set(0,y,0);
}
for(Sequence::size_type x=1;x<= this->X()->size();++x)
{
for(Sequence::size_type y=1;y<= this->Y()->size();++y)
{
SCORE diag = matrix->get(x-1,y-1)+
this->compare(x-1,y-1);
SCORE delet= matrix->get(x-1,y) + this->getPenaltyX()->get(this->X(),x);
SCORE inser= matrix->get(x,y-1) + this->getPenaltyY()->get(this->Y(),y);

SCORE here= std::max((SCORE)0,std::max(diag,std::max(delet,inser)));
matrix->set(x,y,here);
if(best_score<here)
{
best_score=here;
best_x=x;
best_y=y;
}
}
}
return matrix->get(this->X()->size(),this->Y()->size());
}

public:

virtual std::auto_ptr<Path> path()
{
std::vector<PathItem> items;
Sequence::size_type x= best_x;
Sequence::size_type y= best_y;
while (x>0 && y>0 && matrix->get(x,y)!=0)
{
SCORE diag= matrix->get(x-1,y-1);
SCORE up = matrix->get(x, y- 1);
SCORE left = matrix->get(x-1,y);

if (diag >= up && diag >= left)
{
items.push_back(PathItem(x-1,y-1));
--x;
--y;
}
else if (left> up)
{
items.push_back(PathItem(x-1,-1));
--x;
}
else
{
items.push_back(PathItem(-1,y-1));
--y;
}
}

std::reverse(items.begin(),items.end());
//matrix->print(std::cout);
//std::cout << best_score << "=" << best_x<<"," << best_y << std::endl;
return std::auto_ptr<Path>(new Path(this->X(),this->Y(),items));
}


};

template<typename T,typename SCORE>
class AlignerFactory
{
private:
bool local;
public:
AlignerFactory():local(false)
{
}

void setLocal(bool local) { this->local=local;}
bool isLocal() const { return local;}
std::auto_ptr<Aligner<T,SCORE> > newAligner()
{
if(isLocal())
{
return std::auto_ptr<Aligner<T,SCORE> >(new SWaterman<T,SCORE>());
}
return std::auto_ptr<Aligner<T,SCORE> >(new Needleman<T,SCORE>());
}
};




typedef signed long penalty_t;

/**
*
* main
*
*/
int main(int argc,char **argv)
{
try
{
int optind=1;
bool align_local=false;
bool stored_matrix=false;
bool use_blosum62=true;
penalty_t gX=-5;
penalty_t gY=-5;
while(optind < argc)
{
if(std::strcmp(argv[optind],"-h")==0)
{
std::cout << argv[0] << ". Pierre Lindenbaum PhD. Compiled on " << __DATE__ << " at " << __TIME__ << "\n" <<
"options\n" <<
" -h Help (this screen)\n" <<
" -L local-alignement\n" <<
" -s stored matrix (slower)\n" <<
" -i identity matrix (instead of blosum62)\n" <<
" -x penalty X (" <<gX <<")\n" <<
" -y penalty Y (" <<gY <<")\n" <<
"seq1 seq2\n" <<
std::endl;
std::exit(EXIT_FAILURE);
}
else if(std::strcmp(argv[optind],"-L")==0)
{
align_local=true;
}
else if(std::strcmp(argv[optind],"-s")==0)
{
stored_matrix=true;
}
else if(std::strcmp(argv[optind],"-i")==0)
{
use_blosum62=false;
}
else if(std::strcmp(argv[optind],"-x")==0)
{
gX=(penalty_t)atol(argv[++optind]);
}
else if(std::strcmp(argv[optind],"-y")==0)
{
gY=(penalty_t)atol(argv[++optind]);
}
else if(std::strcmp(argv[optind],"--")==0)
{
++optind;
break;
}
else if(argv[optind][0]=='-')
{
std::cerr << "unknown option " << argv[optind] << std::endl;
std::exit(EXIT_FAILURE);
}
else
{
break;
}
++optind;
}
if(optind+2!=argc)
{
std::cerr << "expected only two args"<< std::endl;
std::exit(EXIT_FAILURE);
}
Blosum62<penalty_t> blosum;
Identity<penalty_t> identity;


Array2DFactory<int> factory;
factory.setStored(stored_matrix);

DefaultPenalty<penalty_t> penalty_x(gX);
DefaultPenalty<penalty_t> penalty_y(gY);
AlignerFactory<int,penalty_t> algofactory;
algofactory.setLocal(align_local);

std::auto_ptr<Aligner<int,penalty_t> > algo = algofactory.newAligner();

algo->setArray2DFactory(&factory);
algo->setSubstitutionMatrix(use_blosum62?(SubstitutionMatrix<penalty_t>*) &blosum :(SubstitutionMatrix<penalty_t>*) &identity);
algo->setPenaltyX(&penalty_x);
algo->setPenaltyY(&penalty_y);


PtrSequence seq1(argv[optind++]);
PtrSequence seq2(argv[optind++]);

algo->setX(&seq1);
algo->setY(&seq2);
algo->align();

std::cout << (*(algo->path())) << std::endl;

}
catch(std::exception& err)
{
std::cerr << err.what() << std::endl;
}
catch(...)
{
std::cerr << "BOUM" << std::endl;
}
return EXIT_SUCCESS;
}

Compilation


g++ -Wall -O3 needle.cpp

Examples


./a.out -h
./a.out. Pierre Lindenbaum PhD. Compiled on Jul 8 2010 at 23:03:05
options
-h Help (this screen)
-L local-alignement
-s stored matrix (slower)
-i identity matrix (instead of blosum62)
-x penalty X (-5)
-y penalty Y (-5)
seq1 seq2


time ./a.out MERQKRKADIEKGLQFIQSTLPLKQEEYEAFLLKLVQNLFAEGN MSSVSEERRKRQQNIKEGLQFIQSPLSYPGTQEQYAVYLHALVR
MERQK-----RKADIEKGLQFIQSTLP--LKQEEYEAFLLKLVQNLFAEGN
| | | ||||||| | || | | ||
MSSVSEERRKRQQNIKEGLQFIQSPLSYPGTQEQYAVYLHALVR-------


real 0m0.006s
user 0m0.004s
sys 0m0.004s


time ./a.out -s MERQKRKADIEKGLQFIQSTLPLKQEEYEAFLLKLVQNLFAEGN MSSVSEERRKRQQNIKEGLQFIQSPLSYPGTQEQYAVYLHALVR

MERQK-----RKADIEKGLQFIQSTLP--LKQEEYEAFLLKLVQNLFAEGN
| | | ||||||| | || | | ||
MSSVSEERRKRQQNIKEGLQFIQSPLSYPGTQEQYAVYLHALVR-------


real 0m0.037s
user 0m0.004s
sys 0m0.032s


./a.out -i MERQKRKADIEKGLQFIQSTLPLKQEEYEAFLLKLVQNLFAEGN MSSVSEERRKRQQNIKEGLQFIQSPLSYPGTQEQYAVYLHALVR

MERQKRKADIEKGLQFIQSTLPLKQEEYEAFLLKLVQNLFAEGN
| | | |
MSSVSEERRKRQQNIKEGLQFIQSPLSYPGTQEQYAVYLHALVR


./a.out -L MERQKRKADIEKGLQFIQSTLPLKQEEYEAFLLKLVQNLFAEGN MSSVSEERRKRQQNIKEGLQFIQSPLSYPGTQEQYAVYLHALVR

ME--RQKRKADIEKGLQFIQSTLP--LKQEEYEAFLLKLVQ
| || | ||||||| | || | | ||
VSEERRKRQQNIKEGLQFIQSPLSYPGTQEQYAVYLHALVR


./a.out -x -5 -y -20 -L MERQKRKADIEKGLQFIQSTLPLKQEEYEAFLLKLVQNLFAEGN MSSVSEERRKRQQNIKEGLQFIQSPLSYPGTQEQYAVYLHALVR

ME--RQKRKADIEKGLQFIQSTL
| || | ||||||| |
VSEERRKRQQNIKEGLQFIQSPL


That's it

Pierre