SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

PhagesDB vs NCBI BLASTp results - the value of both?

| posted 26 Jan, 2016 18:44
The DNA Master Blasts of entire genomes have been taking forever and a day. They lock all other functions of the computer and if the computer restarts or something else happens in the night, the Blast query does not complete.

We have started BLASTp'ing individual gene products at NCBI and PhagesDB.

QUESTIONS:

1. Are both blasts really necessary?
2. What might the PhagesDB BLASTp turn up that the NCBI would not (archived phage genes, I suppose.)?
3. What might the NCBI BLASTp turn up that the PhagesDB BLASTp would not? So far, they are identical.
4. HHPred returns completely different regional homologies. Is there a threshold, i.e. number of AAs in a string that might be of interest?
5. In HHPred if the query sequence is 150aa long and a homologous stretch of 18-20aa is return in say an "E. coli DNA Gyrase Inhibitor" can/should we ignore such short homologies?
6. I'm certain there will be more Newbie Qs. Thanks for your patience. Is this the right place to ask these?

Thanks in advance to any brave (and patient) soul who tries to answer these.

Greg
| posted 27 Jan, 2016 04:13
QUESTIONS:

1. Are both blasts really necessary?
Both blast are important because they give you different data. The NCBI blast will give you all information while phagesdb blast will only give you actinobacter phages. You can however individually blast each gene product as you annotate in DNAmaster. I find this helpful for students since they can then make start site changes and re-blast, when necessary. What I typically do is set my personal file to blast overnight. I have never had a problem doing it that way then I have all the data needed.

2. What might the PhagesDB BLASTp turn up that the NCBI would not (archived phage genes, I suppose.)?
phagedb is going to show all the blast hits in actinobacter phages. This may be useful if you are looking for other phages that are not in genbank yet.

3. What might the NCBI BLASTp turn up that the PhagesDB BLASTp would not? So far, they are identical.
NCBI will turn up many additional results that phagesdb is not set for. These could be hits in bacterial species, other types of phages, such as coli phages.

4. HHPred returns completely different regional homologies. Is there a threshold, i.e. number of AAs in a string that might be of interest?
We ask our students to use their best judgement about functional information here. I use 95% homology and e-values higher than 10^-5. This is not a hard rule but a good guideline to start with.

5. In HHPred if the query sequence is 150aa long and a homologous stretch of 18-20aa is return in say an "E. coli DNA Gyrase Inhibitor" can/should we ignore such short homologies?
Take all information into consideration but use your best judgement.

Hope this helps and anyone else that has information to answer these that I might have missed feel free to jump in.
| posted 27 Jan, 2016 04:42
Another note about the difference between the data you get between PhagesDB blast hits and NCBI blast hits, even when the results are the same hits – on PhagesDB,you can sometimes get function information on hits where NCBI just says "gp12" and doesn't show you the function assignment.
| posted 27 Jan, 2016 14:57
Lee Hughes
Another note about the difference between the data you get between PhagesDB blast hits and NCBI blast hits, even when the results are the same hits – on PhagesDB,you can sometimes get function information on hits where NCBI just says "gp12" and doesn't show you the function assignment.

Thanks again. This is a great help.
| posted 27 Jan, 2016 15:02
One more question: (OK. More than one!)

We are having our students BlASTp the gene products that would be produced from all potential start codons, including the longest ORF, genemark, and glimmer calls and any other potential start codons in between.

I have had them using NCBI for these blasts. Would the PhagesDB BLASTp be a better option? It certainly is faster.

The reasoning is that this will help them be more confident of their start codon call. Is this necessary? A good practice? A waste of time? We are about to introduce them to starterator on Thursday. Will using starterator reduce/eliminate the need/value of doing this?

Thanks again. GF
| posted 27 Jan, 2016 17:12
GregFrederick@letu.edu
One more question: (OK. More than one!)

We are having our students BlASTp the gene products that would be produced from all potential start codons, including the longest ORF, genemark, and glimmer calls and any other potential start codons in between.

We just use the BLAST data that comes up from the NCBI blast in DNA Master as our starting point. If we have good data on the start (including Starterator) and the Q:T lineup is good, we don't look any further. I will only start doing BLASTs on other potential products if some of the other data doesn't support the call and I want to try other potential starts that could be better choices.
| posted 27 Jan, 2016 17:21
Lee Hughes
GregFrederick@letu.edu
One more question: (OK. More than one!)

We are having our students BlASTp the gene products that would be produced from all potential start codons, including the longest ORF, genemark, and glimmer calls and any other potential start codons in between.

We just use the BLAST data that comes up from the NCBI blast in DNA Master as our starting point. If we have good data on the start (including Starterator) and the Q:T lineup is good, we don't look any further. I will only start doing BLASTs on other potential products if some of the other data doesn't support the call and I want to try other potential starts that could be better choices.

OK. Thanks again. That will save students some time.
| posted 27 Jan, 2016 20:45
cmageeney
QUESTIONS:

1. Are both blasts really necessary?
Both blast are important because they give you different data. The NCBI blast will give you all information while phagesdb blast will only give you actinobacter phages. You can however individually blast each gene product as you annotate in DNAmaster. I find this helpful for students since they can then make start site changes and re-blast, when necessary. What I typically do is set my personal file to blast overnight. I have never had a problem doing it that way then I have all the data needed.

2. What might the PhagesDB BLASTp turn up that the NCBI would not (archived phage genes, I suppose.)?
phagedb is going to show all the blast hits in actinobacter phages. This may be useful if you are looking for other phages that are not in genbank yet.

3. What might the NCBI BLASTp turn up that the PhagesDB BLASTp would not? So far, they are identical.
NCBI will turn up many additional results that phagesdb is not set for. These could be hits in bacterial species, other types of phages, such as coli phages.

4. HHPred returns completely different regional homologies. Is there a threshold, i.e. number of AAs in a string that might be of interest?
We ask our students to use their best judgement about functional information here. I use 95% homology and e-values higher than 10^-5. This is not a hard rule but a good guideline to start with.

5. In HHPred if the query sequence is 150aa long and a homologous stretch of 18-20aa is return in say an "E. coli DNA Gyrase Inhibitor" can/should we ignore such short homologies?
Take all information into consideration but use your best judgement.

Hope this helps and anyone else that has information to answer these that I might have missed feel free to jump in.

Nicely answered, Katie! Thanks.

–Dan
| posted 27 Jan, 2016 21:14
cmageeney
QUESTIONS:

1. Are both blasts really necessary?
Both blast are important because they give you different data. The NCBI blast will give you all information while phagesdb blast will only give you actinobacter phages. You can however individually blast each gene product as you annotate in DNAmaster. I find this helpful for students since they can then make start site changes and re-blast, when necessary. What I typically do is set my personal file to blast overnight. I have never had a problem doing it that way then I have all the data needed.

2. What might the PhagesDB BLASTp turn up that the NCBI would not (archived phage genes, I suppose.)?
phagedb is going to show all the blast hits in actinobacter phages. This may be useful if you are looking for other phages that are not in genbank yet.

3. What might the NCBI BLASTp turn up that the PhagesDB BLASTp would not? So far, they are identical.
NCBI will turn up many additional results that phagesdb is not set for. These could be hits in bacterial species, other types of phages, such as coli phages.

4. HHPred returns completely different regional homologies. Is there a threshold, i.e. number of AAs in a string that might be of interest?
We ask our students to use their best judgement about functional information here. I use 95% homology and e-values higher than 10^-5. This is not a hard rule but a good guideline to start with.

5. In HHPred if the query sequence is 150aa long and a homologous stretch of 18-20aa is return in say an "E. coli DNA Gyrase Inhibitor" can/should we ignore such short homologies?
Take all information into consideration but use your best judgement.

Hope this helps and anyone else that has information to answer these that I might have missed feel free to jump in.

Thanks. Super info! gf
 
Login to post a reply.