SEA-PHAGES | DNA Master Blast type

Link to this post \| posted 26 Jan, 2021 01:54
cdherren	Are DNA Master and Phamerator using BLASTx (DNA > protein) for "Blast" searches? The input is DNA but the results show amino acid alignments & scores. How about PECAAN? What it calls 'Phagesdb BLAST' and 'NCBI BLAST' look to be reporting in amino acid lengths, not base pair lengths.

Link to this post \| posted 27 Jan, 2021 14:40
DanRussell	Hi Chris, If you BLAST a gene in DNA Master, you're actually BLASTing the gene *product*. (Since DNA Master has annotated genes, it's using those annotated amino acid sequences as the queries for BLAST.) So it's a BLASTp in DNA Master. Same for PECAAN, it's running BLASTp on PhagesDB or NCBI using the currently-annotated gene product as the query. The only BLAST (I think) in Phamerator is the pairwise-genome comparison that shows coloration between genomes. That's a BLASTn of the genomic nucleotide sequences. –Dan

Link to this post | posted 27 Jan, 2021 14:40

DanRussell

Hi Chris,

If you BLAST a gene in DNA Master, you're actually BLASTing the gene product. (Since DNA Master has annotated genes, it's using those annotated amino acid sequences as the queries for BLAST.) So it's a BLASTp in DNA Master. Same for PECAAN, it's running BLASTp on PhagesDB or NCBI using the currently-annotated gene product as the query.

The only BLAST (I think) in Phamerator is the pairwise-genome comparison that shows coloration between genomes. That's a BLASTn of the genomic nucleotide sequences.

–Dan

Link to this post \| posted 27 Jan, 2021 19:10
cdherren	The Phamerator explanation sounds right and I probably should have remembered that, but isn’t DNA Master & PECAAN doing BLASTx searches? https://blast.ncbi.nlm.nih.gov/Blast.cgi I was taught BLASTx takes a nucleotide sequence, virtually translates it to the 1 possible amino acid sequence and then searches protein databases. BLASTp compares amino acids sequences to protein databases, where the amino acids are usually derived from mass spectrophotometry of a purified protein. BLASTn takes a nucleotide sequence, derived from DNA sequence analysis, and compares it to nucleotide databases. Finally, tBLASTn is the reverse of BLASTx, it takes an amino acid sequence, again usually derived from mass spectrophotometry of a purified protein, virtually derives all possible nucleotide sequences accounting for codon redundancy and compares all of them against nucleotide databases. BLASTx and BLASTp both compare to protein databases, but the previous inputs DNA sequence and the later inputs amino acid sequence. Even if DNA Master and others are doing the virtual translation and not the BLAST website, aren’t those technically still BLASTx searches since the input is DNA sequence and the output is protein hits?

Link to this post | posted 27 Jan, 2021 19:10

cdherren

The Phamerator explanation sounds right and I probably should have remembered that, but isn’t DNA Master & PECAAN doing BLASTx searches? https://blast.ncbi.nlm.nih.gov/Blast.cgi

I was taught BLASTx takes a nucleotide sequence, virtually translates it to the 1 possible amino acid sequence and then searches protein databases. BLASTp compares amino acids sequences to protein databases, where the amino acids are usually derived from mass spectrophotometry of a purified protein. BLASTn takes a nucleotide sequence, derived from DNA sequence analysis, and compares it to nucleotide databases. Finally, tBLASTn is the reverse of BLASTx, it takes an amino acid sequence, again usually derived from mass spectrophotometry of a purified protein, virtually derives all possible nucleotide sequences accounting for codon redundancy and compares all of them against nucleotide databases.

BLASTx and BLASTp both compare to protein databases, but the previous inputs DNA sequence and the later inputs amino acid sequence. Even if DNA Master and others are doing the virtual translation and not the BLAST website, aren’t those technically still BLASTx searches since the input is DNA sequence and the output is protein hits?

Link to this post \| posted 27 Jan, 2021 19:20
debbie	Chris, Blast x takes a nucleotide sequence, translates it into all 6 reading frames and blasts. That is not needed if you want to know if the thing that was predicted by Glimmer and GeneMark has any homologues. tBlastx takes too long and would just be confusing, coding potential provides you the ORF you want to zero in on. If at any time, you think you need to look elsewhere - in another reading frame- go for it. Using nucleotide sequences to look for function homologues is not nearly as helpful as using the protein sequence. The ORFs identified by Glimmer and GeneMark (and in your auto-annoted DNA Master file) have pushed past looking for protein homologues across all 6 reading frames and showing the most likely ORF with coding potential (those are powerful and proven algorithms. Blastx of any kind is not routinely done. It can be used when you are just not convinced of what you are finding. Hope that helps, debbie

Link to this post | posted 27 Jan, 2021 19:20

debbie

Chris,
Blast x takes a nucleotide sequence, translates it into all 6 reading frames and blasts. That is not needed if you want to know if the thing that was predicted by Glimmer and GeneMark has any homologues. tBlastx takes too long and would just be confusing, coding potential provides you the ORF you want to zero in on. If at any time, you think you need to look elsewhere - in another reading frame- go for it.

Using nucleotide sequences to look for function homologues is not nearly as helpful as using the protein sequence.

The ORFs identified by Glimmer and GeneMark (and in your auto-annoted DNA Master file) have pushed past looking for protein homologues across all 6 reading frames and showing the most likely ORF with coding potential (those are powerful and proven algorithms. Blastx of any kind is not routinely done. It can be used when you are just not convinced of what you are finding.

Hope that helps,
debbie

Link to this post \| posted 27 Jan, 2021 19:34
DanRussell	Hi Chris, As Debbie said, the key difference with BLASTx is that it's checking all possible reading frames, not just a single one. This definitely has its uses which are different than BLASTp. For example, if you have a gap in a phage annotation where no genes have been called, but there appears to be two or three possible ORFs in that region, you could BLASTx that nucleotide sequence and check all those ORFs at once. Or if you had a gene with a suspected intron, and wanted to check all the frames at once for presence of coding elements with matches in the database, you could use BLASTx. DNA Master is actually sending amino acid sequence to the BLAST program, which checks that against an amino acid database, hence it's a BLASTp. If it were sending nucleotide sequence, BLAST itself would do the translation into all frames and then it would be BLASTx. Another important distinction is that when you are using BLASTp, you've already made some choices about how your nucleotide sequence is translated. Not just the frame (which is very important of course, as mentioned above!) but also where it starts. A BLASTp search will just show you matches for the portion you've already decided is the gene, whereas a BLASTx will take all the ORFs (in their largest possible versions) and do lots of searches. While checking results from mass spec is definitely one use for BLASTp, I'm sure that the vast majority of both BLASTp queries and BLASTp results are bioinformatically predicted rather than experimentally verified. Hope that makes some sense! –Dan

Link to this post | posted 27 Jan, 2021 19:34

DanRussell

Hi Chris,

As Debbie said, the key difference with BLASTx is that it's checking all possible reading frames, not just a single one. This definitely has its uses which are different than BLASTp. For example, if you have a gap in a phage annotation where no genes have been called, but there appears to be two or three possible ORFs in that region, you could BLASTx that nucleotide sequence and check all those ORFs at once. Or if you had a gene with a suspected intron, and wanted to check all the frames at once for presence of coding elements with matches in the database, you could use BLASTx.

DNA Master is actually sending amino acid sequence to the BLAST program, which checks that against an amino acid database, hence it's a BLASTp. If it were sending nucleotide sequence, BLAST itself would do the translation into all frames and then it would be BLASTx.

Another important distinction is that when you are using BLASTp, you've already made some choices about how your nucleotide sequence is translated. Not just the frame (which is very important of course, as mentioned above!) but also where it starts. A BLASTp search will just show you matches for the portion you've already decided is the gene, whereas a BLASTx will take all the ORFs (in their largest possible versions) and do lots of searches.

While checking results from mass spec is definitely one use for BLASTp, I'm sure that the vast majority of both BLASTp queries and BLASTp results are bioinformatically predicted rather than experimentally verified.

Hope that makes some sense!
–Dan

Link to this post \| posted 27 Jan, 2021 20:17
cdherren	Makes complete sense! Thanks

Recent Activity

DNA Master Blast type