SEA-PHAGES | PhagesDB vs NCBI BLASTp results

Link to this post \| posted 26 Jan, 2016 18:44
GregFrederick@letu.edu	The DNA Master Blasts of entire genomes have been taking forever and a day. They lock all other functions of the computer and if the computer restarts or something else happens in the night, the Blast query does not complete. We have started BLASTp'ing individual gene products at NCBI and PhagesDB. QUESTIONS: 1. Are both blasts really necessary? 2. What might the PhagesDB BLASTp turn up that the NCBI would not (archived phage genes, I suppose.)? 3. What might the NCBI BLASTp turn up that the PhagesDB BLASTp would not? So far, they are identical. 4. HHPred returns completely different regional homologies. Is there a threshold, i.e. number of AAs in a string that might be of interest? 5. In HHPred if the query sequence is 150aa long and a homologous stretch of 18-20aa is return in say an "E. coli DNA Gyrase Inhibitor" can/should we ignore such short homologies? 6. I'm certain there will be more Newbie Qs. Thanks for your patience. Is this the right place to ask these? Thanks in advance to any brave (and patient) soul who tries to answer these. Greg

Link to this post | posted 26 Jan, 2016 18:44

The DNA Master Blasts of entire genomes have been taking forever and a day. They lock all other functions of the computer and if the computer restarts or something else happens in the night, the Blast query does not complete.

We have started BLASTp'ing individual gene products at NCBI and PhagesDB.

QUESTIONS:

1. Are both blasts really necessary?
2. What might the PhagesDB BLASTp turn up that the NCBI would not (archived phage genes, I suppose.)?
3. What might the NCBI BLASTp turn up that the PhagesDB BLASTp would not? So far, they are identical.
4. HHPred returns completely different regional homologies. Is there a threshold, i.e. number of AAs in a string that might be of interest?
5. In HHPred if the query sequence is 150aa long and a homologous stretch of 18-20aa is return in say an "E. coli DNA Gyrase Inhibitor" can/should we ignore such short homologies?
6. I'm certain there will be more Newbie Qs. Thanks for your patience. Is this the right place to ask these?

Thanks in advance to any brave (and patient) soul who tries to answer these.

Greg

Link to this post \| posted 27 Jan, 2016 04:13
cmageeney	QUESTIONS: 1. Are both blasts really necessary? Both blast are important because they give you different data. The NCBI blast will give you all information while phagesdb blast will only give you actinobacter phages. You can however individually blast each gene product as you annotate in DNAmaster. I find this helpful for students since they can then make start site changes and re-blast, when necessary. What I typically do is set my personal file to blast overnight. I have never had a problem doing it that way then I have all the data needed. 2. What might the PhagesDB BLASTp turn up that the NCBI would not (archived phage genes, I suppose.)? phagedb is going to show all the blast hits in actinobacter phages. This may be useful if you are looking for other phages that are not in genbank yet. 3. What might the NCBI BLASTp turn up that the PhagesDB BLASTp would not? So far, they are identical. NCBI will turn up many additional results that phagesdb is not set for. These could be hits in bacterial species, other types of phages, such as coli phages. 4. HHPred returns completely different regional homologies. Is there a threshold, i.e. number of AAs in a string that might be of interest? We ask our students to use their best judgement about functional information here. I use 95% homology and e-values higher than 10^-5. This is not a hard rule but a good guideline to start with. 5. In HHPred if the query sequence is 150aa long and a homologous stretch of 18-20aa is return in say an "E. coli DNA Gyrase Inhibitor" can/should we ignore such short homologies? Take all information into consideration but use your best judgement. Hope this helps and anyone else that has information to answer these that I might have missed feel free to jump in.

Link to this post | posted 27 Jan, 2016 04:13

cmageeney

QUESTIONS:

1. Are both blasts really necessary?
Both blast are important because they give you different data. The NCBI blast will give you all information while phagesdb blast will only give you actinobacter phages. You can however individually blast each gene product as you annotate in DNAmaster. I find this helpful for students since they can then make start site changes and re-blast, when necessary. What I typically do is set my personal file to blast overnight. I have never had a problem doing it that way then I have all the data needed.

2. What might the PhagesDB BLASTp turn up that the NCBI would not (archived phage genes, I suppose.)?
phagedb is going to show all the blast hits in actinobacter phages. This may be useful if you are looking for other phages that are not in genbank yet.

3. What might the NCBI BLASTp turn up that the PhagesDB BLASTp would not? So far, they are identical.
NCBI will turn up many additional results that phagesdb is not set for. These could be hits in bacterial species, other types of phages, such as coli phages.

4. HHPred returns completely different regional homologies. Is there a threshold, i.e. number of AAs in a string that might be of interest?
We ask our students to use their best judgement about functional information here. I use 95% homology and e-values higher than 10^-5. This is not a hard rule but a good guideline to start with.

5. In HHPred if the query sequence is 150aa long and a homologous stretch of 18-20aa is return in say an "E. coli DNA Gyrase Inhibitor" can/should we ignore such short homologies?
Take all information into consideration but use your best judgement.

Hope this helps and anyone else that has information to answer these that I might have missed feel free to jump in.

Link to this post \| posted 27 Jan, 2016 04:42
lhughes	Another note about the difference between the data you get between PhagesDB blast hits and NCBI blast hits, even when the results are the same hits – on PhagesDB,you can sometimes get function information on hits where NCBI just says "gp12" and doesn't show you the function assignment.

Link to this post \| posted 27 Jan, 2016 14:57
GregFrederick@letu.edu	Lee Hughes Another note about the difference between the data you get between PhagesDB blast hits and NCBI blast hits, even when the results are the same hits – on PhagesDB,you can sometimes get function information on hits where NCBI just says "gp12" and doesn't show you the function assignment. Thanks again. This is a great help.

Link to this post \| posted 27 Jan, 2016 15:02
GregFrederick@letu.edu	One more question: (OK. More than one!) We are having our students BlASTp the gene products that would be produced from all potential start codons, including the longest ORF, genemark, and glimmer calls and any other potential start codons in between. I have had them using NCBI for these blasts. Would the PhagesDB BLASTp be a better option? It certainly is faster. The reasoning is that this will help them be more confident of their start codon call. Is this necessary? A good practice? A waste of time? We are about to introduce them to starterator on Thursday. Will using starterator reduce/eliminate the need/value of doing this? Thanks again. GF

Link to this post | posted 27 Jan, 2016 15:02

GregFrederick@letu.edu

One more question: (OK. More than one!)

We are having our students BlASTp the gene products that would be produced from all potential start codons, including the longest ORF, genemark, and glimmer calls and any other potential start codons in between.

I have had them using NCBI for these blasts. Would the PhagesDB BLASTp be a better option? It certainly is faster.

The reasoning is that this will help them be more confident of their start codon call. Is this necessary? A good practice? A waste of time? We are about to introduce them to starterator on Thursday. Will using starterator reduce/eliminate the need/value of doing this?

Thanks again. GF

Link to this post \| posted 27 Jan, 2016 17:12
lhughes	GregFrederick@letu.edu One more question: (OK. More than one!) We are having our students BlASTp the gene products that would be produced from all potential start codons, including the longest ORF, genemark, and glimmer calls and any other potential start codons in between. We just use the BLAST data that comes up from the NCBI blast in DNA Master as our starting point. If we have good data on the start (including Starterator) and the Q:T lineup is good, we don't look any further. I will only start doing BLASTs on other potential products if some of the other data doesn't support the call and I want to try other potential starts that could be better choices.

Link to this post | posted 27 Jan, 2016 17:12

lhughes

GregFrederick@letu.edu
One more question: (OK. More than one!)

We are having our students BlASTp the gene products that would be produced from all potential start codons, including the longest ORF, genemark, and glimmer calls and any other potential start codons in between.

We just use the BLAST data that comes up from the NCBI blast in DNA Master as our starting point. If we have good data on the start (including Starterator) and the Q:T lineup is good, we don't look any further. I will only start doing BLASTs on other potential products if some of the other data doesn't support the call and I want to try other potential starts that could be better choices.

Link to this post \| posted 27 Jan, 2016 17:21
GregFrederick@letu.edu	Lee Hughes GregFrederick@letu.edu One more question: (OK. More than one!) We are having our students BlASTp the gene products that would be produced from all potential start codons, including the longest ORF, genemark, and glimmer calls and any other potential start codons in between. We just use the BLAST data that comes up from the NCBI blast in DNA Master as our starting point. If we have good data on the start (including Starterator) and the Q:T lineup is good, we don't look any further. I will only start doing BLASTs on other potential products if some of the other data doesn't support the call and I want to try other potential starts that could be better choices. OK. Thanks again. That will save students some time.

Link to this post | posted 27 Jan, 2016 17:21

GregFrederick@letu.edu

Lee Hughes
GregFrederick@letu.edu
One more question: (OK. More than one!)

We are having our students BlASTp the gene products that would be produced from all potential start codons, including the longest ORF, genemark, and glimmer calls and any other potential start codons in between.

We just use the BLAST data that comes up from the NCBI blast in DNA Master as our starting point. If we have good data on the start (including Starterator) and the Q:T lineup is good, we don't look any further. I will only start doing BLASTs on other potential products if some of the other data doesn't support the call and I want to try other potential starts that could be better choices.

OK. Thanks again. That will save students some time.

Link to this post \| posted 27 Jan, 2016 20:45
DanRussell	cmageeney QUESTIONS: 1. Are both blasts really necessary? Both blast are important because they give you different data. The NCBI blast will give you all information while phagesdb blast will only give you actinobacter phages. You can however individually blast each gene product as you annotate in DNAmaster. I find this helpful for students since they can then make start site changes and re-blast, when necessary. What I typically do is set my personal file to blast overnight. I have never had a problem doing it that way then I have all the data needed. 2. What might the PhagesDB BLASTp turn up that the NCBI would not (archived phage genes, I suppose.)? phagedb is going to show all the blast hits in actinobacter phages. This may be useful if you are looking for other phages that are not in genbank yet. 3. What might the NCBI BLASTp turn up that the PhagesDB BLASTp would not? So far, they are identical. NCBI will turn up many additional results that phagesdb is not set for. These could be hits in bacterial species, other types of phages, such as coli phages. 4. HHPred returns completely different regional homologies. Is there a threshold, i.e. number of AAs in a string that might be of interest? We ask our students to use their best judgement about functional information here. I use 95% homology and e-values higher than 10^-5. This is not a hard rule but a good guideline to start with. 5. In HHPred if the query sequence is 150aa long and a homologous stretch of 18-20aa is return in say an "E. coli DNA Gyrase Inhibitor" can/should we ignore such short homologies? Take all information into consideration but use your best judgement. Hope this helps and anyone else that has information to answer these that I might have missed feel free to jump in. Nicely answered, Katie! Thanks. –Dan

Link to this post | posted 27 Jan, 2016 20:45

DanRussell

cmageeney
QUESTIONS:

1. Are both blasts really necessary?
Both blast are important because they give you different data. The NCBI blast will give you all information while phagesdb blast will only give you actinobacter phages. You can however individually blast each gene product as you annotate in DNAmaster. I find this helpful for students since they can then make start site changes and re-blast, when necessary. What I typically do is set my personal file to blast overnight. I have never had a problem doing it that way then I have all the data needed.

2. What might the PhagesDB BLASTp turn up that the NCBI would not (archived phage genes, I suppose.)?
phagedb is going to show all the blast hits in actinobacter phages. This may be useful if you are looking for other phages that are not in genbank yet.

3. What might the NCBI BLASTp turn up that the PhagesDB BLASTp would not? So far, they are identical.
NCBI will turn up many additional results that phagesdb is not set for. These could be hits in bacterial species, other types of phages, such as coli phages.

4. HHPred returns completely different regional homologies. Is there a threshold, i.e. number of AAs in a string that might be of interest?
We ask our students to use their best judgement about functional information here. I use 95% homology and e-values higher than 10^-5. This is not a hard rule but a good guideline to start with.

5. In HHPred if the query sequence is 150aa long and a homologous stretch of 18-20aa is return in say an "E. coli DNA Gyrase Inhibitor" can/should we ignore such short homologies?
Take all information into consideration but use your best judgement.

Hope this helps and anyone else that has information to answer these that I might have missed feel free to jump in.

Nicely answered, Katie! Thanks.

–Dan

Link to this post \| posted 27 Jan, 2016 21:14
GregFrederick@letu.edu	cmageeney QUESTIONS: 1. Are both blasts really necessary? Both blast are important because they give you different data. The NCBI blast will give you all information while phagesdb blast will only give you actinobacter phages. You can however individually blast each gene product as you annotate in DNAmaster. I find this helpful for students since they can then make start site changes and re-blast, when necessary. What I typically do is set my personal file to blast overnight. I have never had a problem doing it that way then I have all the data needed. 2. What might the PhagesDB BLASTp turn up that the NCBI would not (archived phage genes, I suppose.)? phagedb is going to show all the blast hits in actinobacter phages. This may be useful if you are looking for other phages that are not in genbank yet. 3. What might the NCBI BLASTp turn up that the PhagesDB BLASTp would not? So far, they are identical. NCBI will turn up many additional results that phagesdb is not set for. These could be hits in bacterial species, other types of phages, such as coli phages. 4. HHPred returns completely different regional homologies. Is there a threshold, i.e. number of AAs in a string that might be of interest? We ask our students to use their best judgement about functional information here. I use 95% homology and e-values higher than 10^-5. This is not a hard rule but a good guideline to start with. 5. In HHPred if the query sequence is 150aa long and a homologous stretch of 18-20aa is return in say an "E. coli DNA Gyrase Inhibitor" can/should we ignore such short homologies? Take all information into consideration but use your best judgement. Hope this helps and anyone else that has information to answer these that I might have missed feel free to jump in. Thanks. Super info! gf

Link to this post | posted 27 Jan, 2016 21:14

GregFrederick@letu.edu

cmageeney
QUESTIONS:

1. Are both blasts really necessary?
Both blast are important because they give you different data. The NCBI blast will give you all information while phagesdb blast will only give you actinobacter phages. You can however individually blast each gene product as you annotate in DNAmaster. I find this helpful for students since they can then make start site changes and re-blast, when necessary. What I typically do is set my personal file to blast overnight. I have never had a problem doing it that way then I have all the data needed.

2. What might the PhagesDB BLASTp turn up that the NCBI would not (archived phage genes, I suppose.)?
phagedb is going to show all the blast hits in actinobacter phages. This may be useful if you are looking for other phages that are not in genbank yet.

3. What might the NCBI BLASTp turn up that the PhagesDB BLASTp would not? So far, they are identical.
NCBI will turn up many additional results that phagesdb is not set for. These could be hits in bacterial species, other types of phages, such as coli phages.

4. HHPred returns completely different regional homologies. Is there a threshold, i.e. number of AAs in a string that might be of interest?
We ask our students to use their best judgement about functional information here. I use 95% homology and e-values higher than 10^-5. This is not a hard rule but a good guideline to start with.

5. In HHPred if the query sequence is 150aa long and a homologous stretch of 18-20aa is return in say an "E. coli DNA Gyrase Inhibitor" can/should we ignore such short homologies?
Take all information into consideration but use your best judgement.

Hope this helps and anyone else that has information to answer these that I might have missed feel free to jump in.

Thanks. Super info! gf

Recent Activity

PhagesDB vs NCBI BLASTp results - the value of both?