SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by fbaliraine

| posted 15 Feb, 2022 21:45
chg60
Hi Debbie and Fred,

I've attempted to begin a systematic analysis to determine how much we can trust the outputs from either of these programs.

I accumulated a list of diverse types of DNA binding proteins: tyrosine or serine integrases, terminase large subunits, HTH DNA-binding proteins, RecE exonuclease, RecT ssDNA binding protein, etc. I pulled representative sequences from a subset of phams predominated by proteins with these functions.

With the caveat that I've only run 6 sequences so far, I'm not impressed by DNABIND. It's very fast (which is nice!), but only two of the sequences were predicted as DNA-binding proteins (a tyrosine integrase and an HTH DNA-binding protein). The others were all reported as having a probability less than 40% of being DNA-binding.

DNABINDER is MUCH slower - I'm still waiting on the first protein sequence, nearly an hour later. Ignoring the question of whether we can trust its output, I'm of the opinion that this program is too slow to warrant systematic use by SEA-PHAGES annotators.

-Christian

Hi Christian,
DNABINDER will eat your lunch if you leave the setting as PSSM! If you want to go home earlier, please change the selection to "amino acid composition." The PSSM model is looking at the evolutionary trends, which is why it takes so long; amino acid composition is predicting on the basis of the amino acid composition like we are wanting (more details are explained on their website).
Cheers!
Fred
Posted in: Functional AnnotationCan we call DNA Binding proteins based on DNABIND and DNA Binder results?
| posted 15 Feb, 2022 21:38
debbie
Hi Fred,
I don't know enough about DNABIND or DNA Binder to know how good that they are predicting DNA binding proteins. An analysis of what we have called DNA binding proteins with these programs is in order to determine if we would want to adopt this, I think.
Make sense?
debbie

Hi Debbie,
I concur with you that "An analysis of what we have called DNA binding proteins with these programs is in order to determine if we would want to adopt this."
For now, I would settle for NKF until we get more support. Christian's current work is in order, and we too are starting on preliminary work to confirm.
Fred
Posted in: Functional AnnotationCan we call DNA Binding proteins based on DNABIND and DNA Binder results?
| posted 15 Feb, 2022 16:34
During the recent workshop, two the programs DNABIND and DNA Binder were mentioned for predicting DNA Binding Proteins. We have found several genes that have been predicted by both programs to be DNA binding proteins (varying strengths), but do not necessarily have strong HHpred alignments to DNA binding. Current BLASTp hits in NCBI and phagesDb are “Hypothetical Proteins.” However, these genes also appear to be in either an operon or in the syntenic region with other DNA binding proteins, such as DNA methylase, translocase, resolvase, and specific-DNA-binding proteins. Can we call these genes the general name of “DNA binding protein” based on the two programs and the strong possibility of the operon or in syntenic region? In general, is it possible to call DNA binding proteins based on these two programs alone? Two examples from the P1 phage Dynamo are gp 44 (start/stop: 31954-32103) and gp 51 (36150-3642smile. See attached file.
Thanks!
Fred
Posted in: Functional AnnotationCan we call DNA Binding proteins based on DNABIND and DNA Binder results?
| posted 21 Jul, 2021 22:21
Hi Debbie,
Thank you for helping put this to rest. We have been using PDB, PfamA, CDD, and Scope! We have not been including the Uni-Prot-Swiss Protein-viral database. Now when I exclude scope like you did and use Uni-Prot-Swiss Protein-viral, I can see hits in HHPred to minor tail proteins! Previously there was no significant hits to anything similar. Perhaps it might help to include the use of Uni-Prot-Swiss Protein-viral rather than or besides Scope in the resource guide for HHPred. Case closed! I will go ahead and name them minor tail proteins. Thanks again!
Fred
Posted in: Functional AnnotationMinor tail proteins far upstream of the tape measure protein?
| posted 19 Jul, 2021 23:21
Genes 5, 6, and 7 of phage Mach and gp 6 in phage Duplo (as per current naming in phamerator before annotation) are showing synteny with minor tail proteins in phamerator, and also hitting minor tail proteins in phagesDb and NCBI (q1:s1, 99-100%). However, none of them have any significant hits to minor tail proteins in HHPred. I have tried checking the forum to see if anyone previously asked a specific question about minor tail proteins occurring this far upstream of the tape measure protein, but I have not seen such a direct question or answer in this regard. What I can see from previous posts is that minor tail proteins should typically come after the tape measure (downstream) or just before it (upstream) in some clusters. https://seaphages.org/forums/topic/4872/ and https://seaphages.org/forums/topic/4836/

The official functions list states that “If you have significant hits to either collagen-like or glycine-rich proteins, and are in the syntenic region of minor tail proteins, you can call them minor tail proteins.” But none of this is seen in HHPRED in phage Mach and Duplo. Of note, phages Mach and Duplo are Siphoviridae, and Siphoviridae genes are expected to occur in the following order: Terminase small subunit – Terminase large subunit – Portal – Protease – Capsid – Head-Tail Connectors/adaptors – Major Tail Subunit/major tail protein – Tail assembly Chaperones –Tape measure Protein – Minor Tail Proteins (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0069273). Thus, synteny, per this Pope et al. 2013 does not seem to support minor tail protein in this region (see pg 4 under “Virion structure and assembly genes” in this paper).

I need to be sure whether something has changed since this article was published about 8 years ago. Based on the above, I am inclined to call them “NKF” rather than “minor tail proteins,” but since there are several hits to “minor tail proteins” in phagesDB as well as having synteny in phamerator, I am seeking the forum’s informed clarification.

Below is a sample sequence, Mach gp 6, is anyone wants to cross-check by BLAST in phagesDB and HHPred:
MANIGIVSDADTLVLWKGRDFKWSFENLDENRQPVDFPDGSLFIELQTGGEHNARQRVTITGATGGTYAFDILGETTPPIDYNDVSENPQGLPGDITEALEAAAGVGNVEVYPTLLQPSWILNFNLNSGKPLTEQLVNTINKTANDFFDTFEQLMGVDVSMTVTDALNFQLKVTSRRSFDEVGVVTFAVDVTGTAVKNFFNAVSGLVGAVNTVNVDFYWNRVYEIEFVGELANQPIEAILPDASNLTGYNPWITVEVIDLGKERLTIWPFIIDGTEATIKVESEEADLIPERTVWQLVFLPDGEPAGGDPITYGRVTRLGDZ

I have also attached a searchable phamerator map for phage Mach that demostrates what I am seeing.
Thanks!
Fred
Edited 19 Jul, 2021 23:34
Posted in: Functional AnnotationMinor tail proteins far upstream of the tape measure protein?
| posted 10 May, 2020 19:38
Great!
Thanks Debbie!
Fred
Posted in: AnnotationHaving trouble getting NCBI Blast data for a gene via DNA Master, but can get the data directly via phagesDb and NCBI websites: Shida gene 7
| posted 10 May, 2020 18:13
Dear phage hunters, since the very first auto-annotation till now, we’ve had trouble getting NCBI blast data for Shida Gene 7 when we do Blast through DNA Master, but can get the blast data when we do the BLASTp directly on the NCBI website, as well as in PhagesDb. This gene is present in Vorrps, YungJamal & Krili in phamerator and had q1:s1, 100% with phage Krill in phagesDb; Original GeneMark call @bp 2236; SSC: 2144-2236 rev; CP: Yes; SCS: Both; ST:SS; BLAST-Start: Krili, gp 7, phagesDb, q1:s1, 100%, 8e-12; Gap:4 bp overlap. TMHMM and SOSUI analysis confirmed this to be a membrane protein, with one transmembrane domain. Since we have tried several times without success getting Blast data for this gene via DNA master and its needed for submission of the final files, what would you advise us to do? This is the only gene in Shida that has not budged!

Attached are snapshots of the Frames, Coding potential profile, phamerator snapshot, results (actually no result) when Blasted via DNA Master, PhagedDb BLASTp results, NCBI BLASTp results (when done directly via the website), TMHMM & SOSUI analysis results. Please advise, because we have tried to get blast data for this gene directly via DNA Master as required, but all in vain.
Fred
Edited 10 May, 2020 18:17
Posted in: AnnotationHaving trouble getting NCBI Blast data for a gene via DNA Master, but can get the data directly via phagesDb and NCBI websites: Shida gene 7
| posted 09 May, 2020 02:46
Thank you Debbie!
Fred
Posted in: Functional AnnotationQuery about Function assignment for MrMiyagi gene 97
| posted 07 May, 2020 00:14
Dear Phage hunters,
I Need help deciding on the function of MrMiyagi gene 97 (gene 100 in phamerator). It is an Orpham.
BLASTp in NCBI & phagesDb hits holliday junction resolvase with phage Fowlmouth, q59: s7, coverage 52%, pecent identity 96.97%, e-value 1e-25 (https://blast.ncbi.nlm.nih.gov/Blast.cgi).
However, HHPred data hits at DNA Helicase, PDBe, Thermus thermophilus, d1ixsa_, 28.92%, 92.41% as well as RNA POLYMERASE SIGMA FACTOR CNRH; TRANSCRIPTION, ECF-TYPE SIGMA and on RuvA_C ; RuvA, C-terminal domain.https://toolkit.tuebingen.mpg.de/jobs/MrMiyagi_gp97

The hit at Holliday junction DNA helicase ruvA/RuvB(E.C.3.6.1.3) has a probability of 78.61%. I am therefore conflicted on what function to settle for: holliday junction resolvase, DNA Helicase or RNA polymerase sigma factor? There was a recent post about sigma factor but I dont see it.
Fred
Edited 07 May, 2020 00:18
Posted in: Functional AnnotationQuery about Function assignment for MrMiyagi gene 97
| posted 06 May, 2020 21:59
Thanks a lot Welkin!
Fred
Posted in: AnnotationTricky Start position decision: Need 2nd Opinion; Two overlapping Genes with Strong CP: MrMiyagi