Arun Seetharam bio photo

Arun Seetharam

Bioinformatician

Email Twitter Github

Sometimes it is essential to know the length distribution of your sequences. It may be your newly assembled scaffolds or it might be a genome, that you wish to know the size of chromosomes, or it could just be any multi fasta sequence file. A simple way to do it is using biopython.

For example save this script as seq_length.py

#!/usr/bin/python
from Bio import SeqIO
import sys
cmdargs = str(sys.argv)
for seq_record in SeqIO.parse(str(sys.argv[1]), "fasta"):
 output_line = '%s\t%i' % \
(seq_record.id, len(seq_record))
 print(output_line)

To run,

chmod +x seq_length.py
seq_length.py inpput_file.fasta

This will print length for all the sequences in that file.