SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

DNA Master Blast type

| posted 26 Jan, 2021 01:54
Are DNA Master and Phamerator using BLASTx (DNA > protein) for "Blast" searches? The input is DNA but the results show amino acid alignments & scores.

How about PECAAN? What it calls 'Phagesdb BLAST' and 'NCBI BLAST' look to be reporting in amino acid lengths, not base pair lengths.
| posted 27 Jan, 2021 14:40
Hi Chris,

If you BLAST a gene in DNA Master, you're actually BLASTing the gene product. (Since DNA Master has annotated genes, it's using those annotated amino acid sequences as the queries for BLAST.) So it's a BLASTp in DNA Master. Same for PECAAN, it's running BLASTp on PhagesDB or NCBI using the currently-annotated gene product as the query.

The only BLAST (I think) in Phamerator is the pairwise-genome comparison that shows coloration between genomes. That's a BLASTn of the genomic nucleotide sequences.

–Dan
| posted 27 Jan, 2021 19:10
The Phamerator explanation sounds right and I probably should have remembered that, but isn’t DNA Master & PECAAN doing BLASTx searches? https://blast.ncbi.nlm.nih.gov/Blast.cgi

I was taught BLASTx takes a nucleotide sequence, virtually translates it to the 1 possible amino acid sequence and then searches protein databases. BLASTp compares amino acids sequences to protein databases, where the amino acids are usually derived from mass spectrophotometry of a purified protein. BLASTn takes a nucleotide sequence, derived from DNA sequence analysis, and compares it to nucleotide databases. Finally, tBLASTn is the reverse of BLASTx, it takes an amino acid sequence, again usually derived from mass spectrophotometry of a purified protein, virtually derives all possible nucleotide sequences accounting for codon redundancy and compares all of them against nucleotide databases.

BLASTx and BLASTp both compare to protein databases, but the previous inputs DNA sequence and the later inputs amino acid sequence. Even if DNA Master and others are doing the virtual translation and not the BLAST website, aren’t those technically still BLASTx searches since the input is DNA sequence and the output is protein hits?
| posted 27 Jan, 2021 19:20
Chris,
Blast x takes a nucleotide sequence, translates it into all 6 reading frames and blasts. That is not needed if you want to know if the thing that was predicted by Glimmer and GeneMark has any homologues. tBlastx takes too long and would just be confusing, coding potential provides you the ORF you want to zero in on. If at any time, you think you need to look elsewhere - in another reading frame- go for it.

Using nucleotide sequences to look for function homologues is not nearly as helpful as using the protein sequence.

The ORFs identified by Glimmer and GeneMark (and in your auto-annoted DNA Master file) have pushed past looking for protein homologues across all 6 reading frames and showing the most likely ORF with coding potential (those are powerful and proven algorithms. Blastx of any kind is not routinely done. It can be used when you are just not convinced of what you are finding.

Hope that helps,
debbie
| posted 27 Jan, 2021 19:34
Hi Chris,

As Debbie said, the key difference with BLASTx is that it's checking all possible reading frames, not just a single one. This definitely has its uses which are different than BLASTp. For example, if you have a gap in a phage annotation where no genes have been called, but there appears to be two or three possible ORFs in that region, you could BLASTx that nucleotide sequence and check all those ORFs at once. Or if you had a gene with a suspected intron, and wanted to check all the frames at once for presence of coding elements with matches in the database, you could use BLASTx.

DNA Master is actually sending amino acid sequence to the BLAST program, which checks that against an amino acid database, hence it's a BLASTp. If it were sending nucleotide sequence, BLAST itself would do the translation into all frames and then it would be BLASTx.

Another important distinction is that when you are using BLASTp, you've already made some choices about how your nucleotide sequence is translated. Not just the frame (which is very important of course, as mentioned above!) but also where it starts. A BLASTp search will just show you matches for the portion you've already decided is the gene, whereas a BLASTx will take all the ORFs (in their largest possible versions) and do lots of searches.

While checking results from mass spec is definitely one use for BLASTp, I'm sure that the vast majority of both BLASTp queries and BLASTp results are bioinformatically predicted rather than experimentally verified.

Hope that makes some sense!
–Dan
| posted 27 Jan, 2021 20:17
Makes complete sense! Thanks
 
Login to post a reply.