Command-line Blast

Installing blast

While you should have installed blast during the installing software tutorial, you can copy/paste the code block below to reinstall it if needed

sudo apt install ncbi-blast+

Getting data

We will download some cows and human proteins from RefSeq

wget ftp://ftp.ncbi.nih.gov/refseq/B_taurus/mRNA_Prot/cow.1.protein.faa.gz
wget ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/human.1.protein.faa.gz

Both these files are compressed. They are not tar archives, like we encountered earlier, but gzip files. To uncompress:

gzip -d *.gz

Let us take a look at the human file

head human.1.protein.faa

Both files contain protein sequences in the FASTA format

Question

How many sequences do I have in each file?

The files are slightly too big for our first time blasting things at the command-line. Let's downsize the cow file

head -6 cow.1.protein.faa > cow.small.faa

Our first blast

Now we can blast these two cow sequences against the set of human sequences.

First we need to build a blast database with our human sequences

makeblastdb -in human.1.protein.faa -dbtype prot
ls

The makeblastdb produced a lot of extra files. Those files are indexes and necessary for blast to function.

Now we can run blast

blastp -query cow.small.faa -db human.1.protein.faa -out cow_vs_human_blast_results.txt

We can look at the results using less

less cow_vs_human_blast_results.txt

To know about the various options that we can use with blastp:

blastp -help

and for easier reading

blastp -help | less

Question

How could I modify the previous blast command to filter the hits with an e-value of 1e-5

Bigger dataset

Now that we succeeded using a small dataset of two proteins, let's try with a slightly bigger one.

head -199 cow.1.protein.faa > cow.medium.faa

Question

How many protein sequences does cow.medium.faa contain?

We run blast again

blastp -query cow.medium.faa -db human.1.protein.faa \
    -out cow_vs_human_blast_results.tab -evalue 1e-5 \
    -outfmt 6 -max_target_seqs 1

Question

What do -outfmt and -max_target_seqs do?