Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.
Recent Activity
Viknesh Sivanathan posted in did you know you can do restriction digests in the microwave?
nic.vega posted in did you know you can do restriction digests in the microwave?
nic.vega posted in did you know you can do restriction digests in the microwave?
Viknesh Sivanathan posted in did you know you can do restriction digests in the microwave?
nic.vega posted in did you know you can do restriction digests in the microwave?
RefSeq and INSDC name disagreements in NCBI Blast for Functonal Assignment
Link to this post | posted 17 Feb, 2023 19:50 | |
---|---|
|
I start out this post with the caveat that I may be doing something really dumb here, so I apologize if this is a known issue and I have just somehow missed it. Here's the situation: We're annotating MulchSalad (F) and had just started to teach function calling. We picked gene 1 to demonstrate with (big mistake! ). Basically here's the issue: If you BLAST on Phagesdb this gene hits to terminase small subunit. This is shown in the Pham view for other annotated genomes. If we BLAST (blastp) on NCBI with default settings, we see "minor tail protein" for the annotation: https://capture.dropbox.com/ZrKNG9p6b1vT2pDa I was confused by this and dug deeper. Apparently, this is because RefSeq is defaulting for matches. INSDC regular GenBank entries are still there with the names given by SEA-PHAGES, but they are hidden in the hits by default. You can directly compare this if you click around: https://capture.dropbox.com/u8NpcBuUFOAcg2UT If you look at the RefSeq entry it gives "minor tail protein" and if you look at GenBank it says "terminase small subunit" I am not sure if this is a common problem, but we definitely found it here. I am not sure there is really a question here so much as an observation. If I *do* have a question it's likely unanswerable–why did RefSeq call this MTP? HHPred agrees this is a terminase large subunit or terminase. Is this a common issue? This is the first time I'd ever seen anything like this. If it's isolated, I can deal with it ad hoc… if it's common maybe I need to plan a workaround. Thanks, all! Kyle
–
Kyle MacLea Associate Professor, University of New Hampshire at Manchester kyle.maclea@unh.edu +1 603-641-4129 |
Link to this post | posted 17 Feb, 2023 20:24 | |
---|---|
|
Hi Kyle, Well, this is quite messed up, isn't it. I will investigate further. in the meantime, I would like to provide what I think of BLASTp functions calls at NCBI, i don't value them very much - not without supporting evidence. So if you continue to investigate, there is no supporting data for a minor tail protein except NCBI said so. there is no way that a functional call can be made from the blast data that HHPred data sources does not support. (HHPred does a Psi blast, so it is finding more distant relationships to a protein than a single blast could.) Looks like this is a terminase, small subunit to me. debbie |
Link to this post | posted 19 Feb, 2023 13:21 | |
---|---|
|
That is very helpful, Debbie! Thank you! My inclination was to ignore this particular NCBI RefSeq call–so I am glad to hear you agree. And in the meantime I am definitely hoping this isn't more widespread! Kyle
–
Kyle MacLea Associate Professor, University of New Hampshire at Manchester kyle.maclea@unh.edu +1 603-641-4129 |
Link to this post | posted 22 Feb, 2023 20:00 | |
---|---|
|
debbie |
Link to this post | posted 23 Feb, 2023 01:14 | |
---|---|
|
Hi Allison, Minor tail proteins are the most common functional assignments that are acceptable to call with no CDD or HHPred supporting data. The genes that are "eligible" are the 4-6 large genes downstream of tape measure. I would not make the assignment if the gene is not a relatively big gene. In the case of Kyle's question, there is discrepant data that contradicts a minor tail protein call - synteny and HHPred hits. Best, debbie |
Link to this post | posted 23 Feb, 2023 13:40 | |
---|---|
|
Perfect, thank you. |
Link to this post | posted 23 Feb, 2023 17:33 | |
---|---|
|
Thank you–I like to use the MTPs as an example and it's nice to be reminded of the parameters around the function call! (This silly RefSeq issue notwithstanding!)
–
Kyle MacLea Associate Professor, University of New Hampshire at Manchester kyle.maclea@unh.edu +1 603-641-4129 |
Link to this post | posted 24 Feb, 2023 20:52 | |
---|---|
|
So we've got another instance of the same issue, Debbie. MulchSalad_Draft_37 is an integrase/tyrosine integrase by analysis on Phagesdb BLAST/HHPred as shown in the pham report: https://capture.dropbox.com/TobVF9m4QnDFz6Hl NCBI BLAST is showing mostly "endonuclease" as the annotation: https://capture.dropbox.com/g0O9azXO34T0rhG9 And if you do the "Identical Proteins" analysis on any of the predicted "endonuclease" genes you see the difference again: https://capture.dropbox.com/RTPOWS6fziZ4TMpK What we're seeing again is the RefSeq ("curated" database is calling most of our Cluster F pham 68632 integrases as endonucleases whereas GenBank/INSDC and Phagesdb are calling them integrases. So somewhere along the way RefSeq "curation" is changing the default of what we are calling these genes. And since apparently, by default, NCBI BLAST shows the RefSeq results and only shows the GenBank results if you dig deeper, I'm guessing this will cause a certain amount of confusion going forward. Meh. Kyle
–
Kyle MacLea Associate Professor, University of New Hampshire at Manchester kyle.maclea@unh.edu +1 603-641-4129 |
Link to this post | posted 03 Mar, 2023 19:29 | |
---|---|
|
Just a note that we continue to find this in working on MulchSalad (F1) genes. We saw some genes today in which RefSeq is calling the gene one thing and INSDC is instead calling it a hypothetical protein! So it even gets more complicated! Kyle
–
Kyle MacLea Associate Professor, University of New Hampshire at Manchester kyle.maclea@unh.edu +1 603-641-4129 |
Link to this post | posted 05 Jul, 2024 21:52 | |
---|---|
|
I have noticed that NCBI is revising whole blocks of phages and submitting them as new submissions with different accession numbers than the original SEA_Phage submissions. This is easy to detect when looking at the new NCBI file because it is a new reference (1) that has been added to the original reference(2: LOCUS YP_010057231 46 aa linear PHG 10-JAN-2023 DEFINITION HNH endonuclease [Mycobacterium phage Cane17]. ACCESSION YP_010057231 VERSION YP_010057231.1 DBLINK BioProject: PRJNA485481 DBSOURCE REFSEQ: accession NC_054716.1 KEYWORDS RefSeq. SOURCE Mycobacterium phage Cane17 ORGANISM Mycobacterium phage Cane17 Viruses; Duplodnaviria; Heunggongvirae; Uroviricota; Caudoviricetes; Ceeclamvirinae; Bixzunavirus; Bixzunavirus cane17. REFERENCE 1 (residues 1 to 46) CONSRTM NCBI Genome Project TITLE Direct Submission JOURNAL Submitted (07-MAY-2021) National Center for Biotechnology Information, NIH, Bethesda, MD 20894, USA REFERENCE 2 (residues 1 to 46) AUTHORS Fast,K.M., Castleberry,S., Jones,I.K., Larrimore,J.D., Long,C.A., Pritchett,N.C., Keener,T., Sandel,M.W., Bollivar,D.W., Garlena,R.A., Russell,D.A., Pope,W.H., Jacobs-Sera,D. and Hatfull,G.F. TITLE Direct Submission JOURNAL Submitted (28-JUL-201 Biology, Illinois Wesleyan University, 1312 Park Street, Bloomington, IL 61701, USA The major problem that I have with this is that we are not able to see the evidence or documentation that led to this huge change. This causes problems with our students that see an overwhelming block of identical functions in NCBI usually without noticing the original submissions with different functions. Sometimes the original submissions are visible in the NCBI BLAST results as shown in the PECAAN output below: HNH endonuclease [Mycobacterium phage Cane17] >gb|AXQ51660.1| hypothetical protein SEA_CANE17_46 [Mycobacterium phage Cane17] >gb|QAY13996.1| hypothetical protein SEA_COLT_48 [Mycobacterium phage Colt] and other times the original SEA_ evidence is sorted way down the list of results. We instruct our students to go with the Phagesdb results which are supported by HHPred or CDD evidence. The NCBI results are great for confirming the 1:1 start correlations. Enjoy! Claire Rinehart |