NCBI Blast Tabular output format fields

2 Comments
Posted 14 Dec 2009 in BLAST, genome bioinformatics

Certainly, with the new NCBI Blast+ tools, you won’t need this anymore, but as long as we are sticking with the old blastall programm with its horrible documentation, I keep forgetting the format of the BLAST tabular reports. Tabular format is created when you specify “-m 8”. This is the most useful format for parsing blast yourself without having to learn strange libraries like BioPerl, BioJava, BioPython or BioErlang (doesn’t this exist yet, Mike?)

So here is the meaning of the fields:

queryId, subjectId, percIdentity, alnLength, mismatchCount, gapOpenCount, queryStart, queryEnd, subjectStart, subjectEnd, eVal, bitScore

Parsing is then simple:

Python
for line in open(“myfile.blast”):
(queryId, subjectId, percIdentity, alnLength, mismatchCount, gapOpenCount, queryStart, queryEnd, subjectStart, subjectEnd, eVal, bitScore) = line.split(“\t”)

Perl
while (<>) {
($queryId, $subjectId, $percIdentity, $alnLength, $mismatchCount, $gapOpenCount, $queryStart, $queryEnd, $subjectStart, $subjectEnd, $eVal, $bitScore) = split(/\t/)
}


2 Comments

  1. You have to pass at least “r” to open the python file.

    for line in open(“myfile.blast”, “r”):

  2. You have not, it opens by default in “r” mode, my fault.



Add Your Comment