SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by fbaliraine

| posted 18 Dec, 2023 06:32
I am asking this clarification question for subcluster F1 and P1 phages.
I notice that for phage Sonah (subcluster P1) position 37305-37388 bp (sequence MTDFLGATIRIVAQIGFPTVNPIEVMRZ) has a potential small gene. When inserted and blasted, it gives significant q1:s1, 100% but with just a few phages {Zilizebeth gp 64 (P1), HUHilltop gp 56 (P1) and Royals2015 gp 70 (F1)}, along with insignificant hits with Malithi gp 54 (P1), Camster gp 55 (P1) & Techage gp 59 (P1). I also note that all of the above genes are significantly loger that this potential gene, which would potentially only form part of the fisrt few aa of the above longer genes. I note that a vast majority of previously annotated phages skipped calling this gene. There is no coding potential whatsoever in GeneMarkS, smeg or TB among the above P1 phages that gave singificant hits. The only exception is in the subcluster F1 phage Royals2015 which also has no coding potential in Smeg or TB but has weak/insignificant CP (below 50%) in GeneMarkS (see attached). I note though, that the RBS score for this start in phage Sonah is strong (Z = 2.098; spacer distance = 10; final score = -4.661, with a TTG start codon). Moreover, it has an 8 bp overlap with the upstream gene and would form part of an operon with a 1 bp overlap (TAATG) with the downstream gene. Because this potential gene has weak CP and is only in GeneMarkS of F1 phage Royals2015 among all the genomes with significant hits that I have checked, and has not been called in many previously published genomes, I would like a second opinion about it going forward.
Thanks!
Fred
Edited 20 Jan, 2024 05:37
Posted in: Gene or not a GeneGene or no gene at Subcluster P1 phage Sonah position 37305-37388 bp?
| posted 15 Dec, 2023 22:40
We are dealing with the exact situation with Sonah gp 28 (25227-25475 bp; MWTLKFWKDASERAVKSAAQAAILALGGEAFNAWTVDWQTVGGIALGGAALSLLTSLGSDLLPFGTKGTASLAKLDGEGSARZ). This gene is identical to Langerak gp 28. It is downstream of lysin A & B. DeepTMHMM shows 2 transmebrane domains, and we see the same holin hit in HHPred in PFAM & UniProtKB. Using DeepTmHMM, we note that Langerak gp 29, the gene immediately downstream of Langerak gp 28, is a membrane protein, with one transmebrane domain, just like the downstream gp 29 of Sonah. Even syntenically, I want to call it a holin, but there's again this caveat in the Official Functions list, stating that, "to call a holin…at least 2 transmembrane domains found and the gene be adjacent to the endolysins (s), conserved domain hits (4), and the abscence of additional transmembrane domains in the area." In previous P1 annotations where we relied on the less sensitive SOSUI which classified it as soluble, and TmmHm which shows 0 predicted TMH, and the downstream gene not detected as a membrane proten by these less sensitive software, we had simply called NKF. Now, using the more sensitive Using DeepTmHMM, and given that everything else has been met to call this gene a holin, what do we do about the fact that the immediate downstream gene is actually a membrane protein with one transmembrane domain? — in view of the caveate…the abscence of additional transmembrane domains in the area.
Edited 18 Dec, 2023 03:02
Posted in: Cluster P Annotation TipsAssignment of gp28 as holin
| posted 10 Jun, 2023 05:47
I am inclined to call a phosphoesterase function for subcluster M1 phages Glaske16_gp129 (66640-67299) & Dulcita gp 126 (66622-66181 bp), but will seek clarification given that the top 15 hits in phagesDB are to metallophosphoesterases. The sequence is 100% identical between the above phages:

MSNVFFTSDLHIGHKKVVASRTTVDGEPAFPDLDNLPEWFGDFEIESYNRILADKWDTTVGKDDVVWVLGDISSGTKSGQEMALEWLSRRPGRKRLIKGNHDGVHPMYRDKAKWVKAYGEVFEDMDTAARIRVALSGGGHVDALLSHFPYMGDHTSVDRHTQWRLPNNGTILLHGHTHSSRRMSSCGGSLQVHVGVDAWNGYPVSMDEIRSYVEIWEDVZ

According to the Official SEA-PHAGES Function List, a metallophosphoesterase, "Must contain a HEXXH motif to coordinate the metal ion." I can see the HEXXH motif in Luchodor_gp60, the sample gene for metallophosphoesterase in the Functions list.

Whereas there are HHPred hits to metallophosphoesterase, my reasons for being weary about calling it a metallophosphoesterase rather than a phosphoesterase are two-fold:

1. I do not seem to see the HEXXH motif (except if the rule is flexible)
2. The top two HHPred hits are to phages D29 gp 66 & L5 gp 66, both of which are phosphoesterase. D29 is a well-studied, prototypic phage.

According to Rudner, Fawcett & Losick (1999; https://www.pnas.org/doi/10.1073/pnas.96.26.14765), the conserved sequence HEXXH is a hallmark of metalloproteases. An HEXXH motif should have an "H" followed by an "E", then by any two amino acids followed by an "H" (see attached). I did a search for "HE" in each of the sequences.

>phageD29_gp66 (phosphoesterase) HEXXH motif not seen
MSNVWFTSDLHIGHAKVAEDRDWAGPDHDLHLAELWDEQVGKEDVVWILGDISSGGTRAQLDALGWLLNRPGRKRLILGNHDRPHPMYRDAPRLSRLYWNVLDYMSTAARLRVPLDGGGHTNVLLSHFPYVGDHTAEQRFTQWRLRDEGLILLHGHTHSRIIRSTMTNPRQIHVGLDAWHDLVPMDEVREMVNDIEEGL

>Glaske16_gp129 (66640-67299) & Dulcita gp 126 (66622-66181 bp); HEXXH motif not seen
MSNVFFTSDLHIGHKKVVASRTTVDGEPAFPDLDNLPEWFGDFEIESYNRILADKWDTTVGKDDVVWVLGDISSGTKSGQEMALEWLSRRPGRKRLIKGNHDGVHPMYRDKAKWVKAYGEVFEDMDTAARIRVALSGGGHVDALLSHFPYMGDHTSVDRHTQWRLPNNGTILLHGHTHSSRRMSSCGGSLQVHVGVDAWNGYPVSMDEIRSYVEIWEDVZ

Luchodor_gp60 (metallophosphoesterase ref sequence) has the HEXXH motif (underlined)
MSKRIVVVSDTQIPFDDRKALKAVVGFIGDTQPDEVVHIGDLMDYPSPSRWTKGTAEEFAQRIKPDSEQAKRRFLEPLRARYDGPVKVHEGNHDSRPFEYLHKFAPALVEYADQFRFQNLLDFDGFGVEVAPEFYKLAPGWVSTHGHRGGVRLTQKAGDTAYNAMMRFNTSVIIGHTHRQGLKPHTLGYGGHQKVLWSMEVGNLMNMHLAQYLKGATANWQTGFGLLTVDGHHVKPELVPVVGGSFSVDGHVWKV

Based on the above, I would call the above Glaske16 & Dulcita genes phosphoesterases. Or could I be missing something?
Thanks!
Fred
Posted in: Functional AnnotationPhosphoesterase or metallophosphoesterase? A clarification question
| posted 01 Jun, 2023 16:14
My DNA Master was working well till this morning when I tried updating it. It couldn't, even after re-starting the computer. I asked our IT to uninstall and Re-install it (I had issues with DNA Master last week and re-installing worked!). But when I got in to set the preferences, I noticed that several options for the suggested preferences were missing, including the options to insert template into Notes, Direct connections to servers to obtain Glimmer and GeneMark data/Gene Prediction Server Location, Secure Connections, and the option to set the Shine Dalgarno Scoring. When I tried using it to check RBS score, I can no longer see the z values, etc., all I can see is the "SD Score" & "space" (See attached). Is this the new normal, or is something happening with DNA Master?

Also, looks like there could be problems with the server, as IT gets and the following "Error connecting to cobamide2.bio.pitt.edu"

Fred
Edited 01 Jun, 2023 20:30
Posted in: DNA MasterDNA Master Preferences Issue
| posted 30 May, 2023 17:39
Thank you Debbie.
This clarifies things about this gene since it has previously been called NKF as evident from the phagesDB hits. We are now calling this large gene (gp 35 in Dulcita and Diminimus and other homologous subclcuster M1 phage genes in the same region) as minor tail protein, and the small gene (gp 30) as NKF.
Fred
Edited 30 May, 2023 17:40
Posted in: Functional AnnotationIs this really not a Minor Tail Protein?
| posted 30 May, 2023 05:45
We are annotating this large protein (999 aa) down the tape measure, in is draft gp 35 in both phages Dulcita and Diminimus (28791-29789 bp) in subcluster M1. It’s sequence is below:

MASFTPIVPVRSNRPLELQRGNTLRYRYTLSGNKTFPAGTSAVLTVSNTYGQVVGAFIGTVAGKTIEFVEGPEISDTLARTDTWTLSVTYPGETHPTMLEQGQIIRVEAPFPDQPAMSPEFEGVRYEYHFGTPGFVKDPSWRILNGHPRVYDNSFRSLPNAVSSGSLFGGDLTFFDDVCMLWFAPLATDIVRLTYNTIRPIDNSNGEVWTIICSNYDATNWAGFHHKQVFGIGSWDDDEISVVTGTGPTTFTKRETESYDTVNNQAYTAEYNPVSNTYSLYVGTSLEPLISWTDETNVVEHGEGERYVGFGFKSALLYAGVQVSDWYIANTPZ

Notably, it mostly hits hypothetical proteins in phagesDB, although it hits minor tail proteins of the same phages such as PegLeg & Reindeer in NCBI (which happen to show hypothetical proteins in phagesDB).

In HHPred, it hits gene 31 of Mycobacterium phage D29 and phage L5, which upon inspection in the phagesDB gene list, that gene is a minor tail protein, and the HH probabilities for both phages are 99.85% & 99.86% respectively, with the hit in UniProt note about Protein existence, being "Predicted" (In some case for other proteins, UniProt states, "Evidence at protein level". According to the Resource Guide, minor tail proteins genes are big genes down the tape measure and usually not more than 5. We think that this would make the 5th big gene after the tape measure. We also know that we can use synteny only to call minor tail proteins, “You can call minor tail proteins for the 'big' genes downstream of the tape measure protein. there is usually not more than 5.” To use synteny, ALL the following three conditions must be met: (1) of the correct size , (2) adjacent to other structural genes of known, verifiable function and (3) the only possible option for that function in the genome (https://seaphagesbioinformatics.helpdocsonline.com/article-90).

Given that D29 is a well-studied prototypic phage, we are inclined to call this large gene a minor tail protein, but since many hits in phagesDB are to Hypothetical proteins, we would like to cross-check to be sure that we are not missing something.

On the other hand, there is a smaller (459 aa) draft gene 30 in Diminimus and Ducita downstream the tape measure which hits many minor tail proteins in phagesDB and is between two large, minor tail proteins but we would contend that this should be a Hypothetical Protein given its small size. Its sequence is:

MPPLNVHPPDPNHPKGMAWVLGVGMVDPRPGNNPNQPMAIVQSWEPTSELWWKLGLRWHPELAEVWAVGGGQFEIAQIVNEKPEAQEMSLEEGAAEVLEYIGKEHPEYAEMLQQIHNAGSDVERIKLVKQFDGEIKRLMTLMKYVSTKPAEEZ

Except something has changed lately, the following forum posts (4464 & 4546) suggest not calling small proteins minor tail proteins (https://seaphages.org/forums/topic/4464/; https://seaphages.org/forums/topic/4546/)

In summary, we are thinking of calling the large (999 aa) gene (draft gp 35) a minor tail protein and the smaller (459 aa) gene (draft gp 30) a hypothetical protein. See attached phamerator map.

What’s your verdict?

Fred
Edited 30 May, 2023 17:41
Posted in: Functional AnnotationIs this really not a Minor Tail Protein?
| posted 13 May, 2023 00:14
Thank you Debbie & Chris!
I think a note such as “Has H-N-H within 30-40 aa span but minor variations such as HNK, HNN, HNNH allowed, see forum topic 5505” or something similar as Chris suggests would be very helpful.
Fred
Posted in: Functional AnnotationClarification Question About HNH Endonuclease Function Determination in view of hits to the Ref Sequences
| posted 09 May, 2023 19:10
Thank you Debbie & Christopher.
I think the simple test would be very helpful, as the PDB hits were matching the Reference sequences as illustrated in the attachment with my initial post. I have edited the initial post to clarify that I was not referring to NCBI Ref sequences, but rather to the reference sequences provided in the Official Function List. The edit is, "The reference sequences for HNH endonuclease provided in the Official SEA-PHAGES Function List (as of May 9, 2023)." If SMART could clarify the "HNNH" pointed out by Christopher, that would be great as well.

I have also taken another look at Glaske16 gp 98 at position 56773-57150 bp which Christopher pointed out that it is not an HNH endonuclease(see attached) and note that it has HNNH in a 35 aa span (not exactly HNH), whereas the Official Function List states that HNH endonuclease “Must have H-N-H over a 30 aa span.” If we henceforward won’t be calling this an HNH endonuclease, since many previous annotators have called it HNH (Glaske16 gp98 has more than 60 hits to HNH endonuclease in phagesDB!), could it help to state in the Official Function List that for any gene to be called an HNH endonuclease, it “Must have H-N-H within a span of not more than 30 aa,” besides clarifying whether H-N-N-H could also be acceptable if it is within a 30 aa span?

Thanks!
Fred
Edited 10 May, 2023 05:01
Posted in: Functional AnnotationClarification Question About HNH Endonuclease Function Determination in view of hits to the Ref Sequences
| posted 04 May, 2023 23:30
This is a loaded multi-question but please bear with me!

HNH is expected to have a typical ββα-metal fold and Zn-finger motif (which would need protein modeling software to decipher; DOI: https://doi.org/10.1038/srep42542), and the Official Function List simply states that it “Must have H-N-H over a 30 aa span.” It would help students if there was an easy way to make a determination on this since it may not always be obvious in HHPred. Besides just considering the percent probability, should we also consider the e-values (and probably have an e-value cut-off)? Additionally, must it always hit chain A as well as the Zn-finger motif, or could it hit other chains such as chain D, with non-zinc motifs such as for Manganese or strontium ions?

In view phagesDB & HHPred data, we are seeking clarification of the HNH function status of the following five draft Glaske16 genes at positions: 44853-45341 bp (gp 70), 51656-52198 bp (gp 83), 54100-54426 bp (gp 91), 56773-57150 bp (gp 9smile, and 60940-61320 bp (gp 117). Their respective sequences are provided, along with background information.

>Glaske16_gp70_(44853-45341 bp)
MPDGNQPACKYGACNDPVLARGFCKLHYYRNRDGKPMDGPRRSYSTGPRAWTYERLASVPITSTGAHQRVRRLWGSASLYPCATCGGPAKDWAYDGTDPTHYYEQGRKAWSHFSRWPEFYMPMCKPCHSNHDRRAAADELREYRQWKMRNPGKTLEDLEGVAZ

>Glaske16_gp83_(51656-52198 bp)
MDTIWKPIPQDPTGLYLASQDGRILRKEYVIEKLQSHGHLYRRVMPEKIVKQCIKDRAPSHGVHPIIQMRSSTQYASTVERRVSSLIAAAWHGLPYEAGDRTAQNDWRIGFIDGDPSNVHADNLEWVSNQGVNTHHSHDFYYENLKAYRAQAAVETAESFLARYYSPDEIDWSTAERIAAZ

>Glaske16_gp91_( 54100-54426 bp)
MPTNSKNGPRSRGRTGGKFERAKWRVLKANQICAHPDCRQLIDLDLKWPDPMSPTVNHIIPVKDLAWDDPLTYSVENLEPMHLVCNQRLGAGPRKKKPKHPQSRNWREZ

>Glaske16_gp98_(56773-57150 bp)
MALAGEAKREYQRQWRANRRAAWFAGKACVRCGSDEDLELDHVDPTLKVTNAVWSWSQERRDVELAKCQVLCNACHKAKTISQTVITIGLKAYRHGTCSMYEHHRCRCGLCRLWARNKKRRQRAAZ

>Glaske16_gp117_(60940-61320 bp)
MQREYMRRWVANRRSAFFASKQCAMCGAGEELELDHIDPTKKVDHRIWSWTDARRSEELAKCQVLCASCHKKKTGEQWYANRSVSENAHHGTSRRYRKMKCRCGLCRLGNTNRSRALRQRHRVPVEZ

The reference sequences for HNH endonuclease provided in the Official SEA-PHAGES Function List (as of May 9, 2023) are Sisi gp 99 and Arianna gp 54. Both match Geobacillus virus E2 hit 5H0M_A in PDB, with Sisi having a 93.5% alignment, 98.7% probability, and E-value: 1e-7, while Arianna has 67.3% alignment, 98.7% probability, and E-value: 2.2e-7.
>Sisi_gp99 MPRAPKVCRHAGCTTLTTTGTCPQHTTHRWGNHQGRKVPHRLQQATFRRDNWTCQSCGHTATPGSGQLHADHIQPRSRGGADTLDNMRTLCKACHAPKSRAEARGSNT
> Arianna_gp54
MAWSNGSSRTSSKHWQALRASAKKQLGYYCCAVCGITPAGGARLELDHIIPVAEGGSDEMANLQWLCARHHAIKTRAESRRGAQRRAARRRLPQRPHPGLR
HHPred for Arianna is: PDB, Geobacillus virus E2, hit # 5H0M_A, 67.3% alignment, Probability: 98.67%, E-value: 2.2e-7.
In view of the above, we can now specifically ask about the following five draft Glaske16 genes at positions: 44853-45341 bp (gp 70), 51656-52198 bp (gp 83), 54100-54426 bp (gp 91), 56773-57150 bp (gp 9smile, and 60940-61320 bp (gp 117).
Glaske16 gp 70 (44853-45341 bp has the top PhagesDb hit as Skinny gp 71 which is called Hypothetical Protein, yet it is 100% identical, q1:s1, but has >10 hits to HNH endonuclease). I am inclined to call this an HNH endonuclease, except if the forum suggests otherwise. Again, below is its aa sequence:
MPDGNQPACKYGACNDPVLARGFCKLHYYRNRDGKPMDGPRRSYSTGPRAWTYERLASVPITSTGAHQRVRRLWGSASLYPCATCGGPAKDWAYDGTDPTHYYEQGRKAWSHFSRWPEFYMPMCKPCHSNHDRRAAADELREYRQWKMRNPGKTLEDLEGVAZ

However, this gene, like the two reference sequences, hits HNH chain A of the same Geobacillus virus E2 hit 5H0M_A in PDB, with 75.5% alignment, 99.19% probability, and E-value: 2.5e-11, with everything exactly as seen above for the two reference sequences, including the HNH endonuclease at position 76-124 (https://www.rcsb.org/structure/5H0M).
Notably, Skinny gp 93 which is called HNH has got poor e-values

Next is Glaske16 gene at 51656-52198 bp (draft gp 83); its sequence is below:
MDTIWKPIPQDPTGLYLASQDGRILRKEYVIEKLQSHGHLYRRVMPEKIVKQCIKDRAPSHGVHPIIQMRSSTQYASTVERRVSSLIAAAWHGLPYEAGDRTAQNDWRIGFIDGDPSNVHADNLEWVSNQGVNTHHSHDFYYENLKAYRAQAAVETAESFLARYYSPDEIDWSTAERIAAZ
This one too hits HNH endonuclease in phagesDB. HHPred shows it in PDB with 54.1% alignment, Probability: 99.76%, E-value: 6.2e-18, but notably, it does not hit the same chain as the ref chain (it hits 1U3E_M; https://www.rcsb.org/structure/1U3E) and no Zn+2 motif, but instead Mn+2 and Sr+2, but it also has the βα.
What is your verdict on this gene in Glaske16 at 51656-52198 bp in view of the above? I am inclined to call it HNH endonuclease, except if the forum suggests otherwise.

Next is Glaske16 gp 91 at position 54100-54426 bp. Has several hits to HNH in phagesDB.
MPTNSKNGPRSRGRTGGKFERAKWRVLKANQICAHPDCRQLIDLDLKWPDPMSPTVNHIIPVKDLAWDDPLTYSVENLEPMHLVCNQRLGAGPRKKKPKHPQSRNWREZ
This has a low e-value but hits the same chain as the ref sequence, and the zinc motif (https://www.rcsb.org/structure/5H0M ), and is called HNH endonuclease, and another hit at 4H9D_A (https://www.rcsb.org/structure/4H9D).
What is your verdict on this gene in Glaske16 gp91 at 54100-54426 bp in view of the above? I am inclined but wary to call it HNH endonuclease because of the e-values, but again, it hits are the same as the Ref sequences; any suggestions?

The next question is about the Glaske16 gp98 at position 56773-57150 bp. It has more than 60 hits to HNH endonuclease in phagesDB. Its sequence is below:
MALAGEAKREYQRQWRANRRAAWFAGKACVRCGSDEDLELDHVDPTLKVTNAVWSWSQERRDVELAKCQVLCNACHKAKTISQTVITIGLKAYRHGTCSMYEHHRCRCGLCRLWARNKKRRQRAAZ

It also hits the same hit 5H0M_A in PDB with the same everything as the reference sequences, and high probability (98%), alignment 52.4%, but with not as great an e value (0.000029). What is your verdict on this one?

Finally, the Glaske16 gp117 at 60940-61320 bp. This gene has more than 70 hits to HNH endonuclease in phagesDB. What is your verdict on this one? Its sequence is:
MQREYMRRWVANRRSAFFASKQCAMCGAGEELELDHIDPTKKVDHRIWSWTDARRSEELAKCQVLCASCHKKKTGEQWYANRSVSENAHHGTSRRYRKMKCRCGLCRLGNTNRSRALRQRHRVPVEZ
It also hits the same hit 5H0M_A in PDB (https://www.rcsb.org/structure/5H0M) with the same everything as the reference sequences Sisi gp 99 and Arianna gp 54, and high probability (98.05%), alignment 49.6%, but with not as great an e-value (0.000017). What is your verdict on this one?
See details in attached file.
Edited 09 May, 2023 18:55
Posted in: Functional AnnotationClarification Question About HNH Endonuclease Function Determination in view of hits to the Ref Sequences
| posted 04 May, 2023 23:11
Thank you, Debbie!
I will keep the gene with its tRNA overlap and push it over to QCer as you have suggested. I have tried searching for an article that would directly document evidence of overlaps between tRNA and protein-coding genes but was unsuccessful, although the Wright et al (2022) article (https://doi.org/10.1038/s41576-021-00417-w) has some documentation of overlaps between non-coding RNA (ncRNA) and protein-coding genes.
Fred
Posted in: tRNAsFollow-up Clarifying Question about tRNA and protein genes not overlapping