SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by fbaliraine

| posted 04 May, 2023 23:30
This is a loaded multi-question but please bear with me!

HNH is expected to have a typical ββα-metal fold and Zn-finger motif (which would need protein modeling software to decipher; DOI: https://doi.org/10.1038/srep42542), and the Official Function List simply states that it “Must have H-N-H over a 30 aa span.” It would help students if there was an easy way to make a determination on this since it may not always be obvious in HHPred. Besides just considering the percent probability, should we also consider the e-values (and probably have an e-value cut-off)? Additionally, must it always hit chain A as well as the Zn-finger motif, or could it hit other chains such as chain D, with non-zinc motifs such as for Manganese or strontium ions?

In view phagesDB & HHPred data, we are seeking clarification of the HNH function status of the following five draft Glaske16 genes at positions: 44853-45341 bp (gp 70), 51656-52198 bp (gp 83), 54100-54426 bp (gp 91), 56773-57150 bp (gp 9smile, and 60940-61320 bp (gp 117). Their respective sequences are provided, along with background information.

>Glaske16_gp70_(44853-45341 bp)
MPDGNQPACKYGACNDPVLARGFCKLHYYRNRDGKPMDGPRRSYSTGPRAWTYERLASVPITSTGAHQRVRRLWGSASLYPCATCGGPAKDWAYDGTDPTHYYEQGRKAWSHFSRWPEFYMPMCKPCHSNHDRRAAADELREYRQWKMRNPGKTLEDLEGVAZ

>Glaske16_gp83_(51656-52198 bp)
MDTIWKPIPQDPTGLYLASQDGRILRKEYVIEKLQSHGHLYRRVMPEKIVKQCIKDRAPSHGVHPIIQMRSSTQYASTVERRVSSLIAAAWHGLPYEAGDRTAQNDWRIGFIDGDPSNVHADNLEWVSNQGVNTHHSHDFYYENLKAYRAQAAVETAESFLARYYSPDEIDWSTAERIAAZ

>Glaske16_gp91_( 54100-54426 bp)
MPTNSKNGPRSRGRTGGKFERAKWRVLKANQICAHPDCRQLIDLDLKWPDPMSPTVNHIIPVKDLAWDDPLTYSVENLEPMHLVCNQRLGAGPRKKKPKHPQSRNWREZ

>Glaske16_gp98_(56773-57150 bp)
MALAGEAKREYQRQWRANRRAAWFAGKACVRCGSDEDLELDHVDPTLKVTNAVWSWSQERRDVELAKCQVLCNACHKAKTISQTVITIGLKAYRHGTCSMYEHHRCRCGLCRLWARNKKRRQRAAZ

>Glaske16_gp117_(60940-61320 bp)
MQREYMRRWVANRRSAFFASKQCAMCGAGEELELDHIDPTKKVDHRIWSWTDARRSEELAKCQVLCASCHKKKTGEQWYANRSVSENAHHGTSRRYRKMKCRCGLCRLGNTNRSRALRQRHRVPVEZ

The reference sequences for HNH endonuclease provided in the Official SEA-PHAGES Function List (as of May 9, 2023) are Sisi gp 99 and Arianna gp 54. Both match Geobacillus virus E2 hit 5H0M_A in PDB, with Sisi having a 93.5% alignment, 98.7% probability, and E-value: 1e-7, while Arianna has 67.3% alignment, 98.7% probability, and E-value: 2.2e-7.
>Sisi_gp99 MPRAPKVCRHAGCTTLTTTGTCPQHTTHRWGNHQGRKVPHRLQQATFRRDNWTCQSCGHTATPGSGQLHADHIQPRSRGGADTLDNMRTLCKACHAPKSRAEARGSNT
> Arianna_gp54
MAWSNGSSRTSSKHWQALRASAKKQLGYYCCAVCGITPAGGARLELDHIIPVAEGGSDEMANLQWLCARHHAIKTRAESRRGAQRRAARRRLPQRPHPGLR
HHPred for Arianna is: PDB, Geobacillus virus E2, hit # 5H0M_A, 67.3% alignment, Probability: 98.67%, E-value: 2.2e-7.
In view of the above, we can now specifically ask about the following five draft Glaske16 genes at positions: 44853-45341 bp (gp 70), 51656-52198 bp (gp 83), 54100-54426 bp (gp 91), 56773-57150 bp (gp 9smile, and 60940-61320 bp (gp 117).
Glaske16 gp 70 (44853-45341 bp has the top PhagesDb hit as Skinny gp 71 which is called Hypothetical Protein, yet it is 100% identical, q1:s1, but has >10 hits to HNH endonuclease). I am inclined to call this an HNH endonuclease, except if the forum suggests otherwise. Again, below is its aa sequence:
MPDGNQPACKYGACNDPVLARGFCKLHYYRNRDGKPMDGPRRSYSTGPRAWTYERLASVPITSTGAHQRVRRLWGSASLYPCATCGGPAKDWAYDGTDPTHYYEQGRKAWSHFSRWPEFYMPMCKPCHSNHDRRAAADELREYRQWKMRNPGKTLEDLEGVAZ

However, this gene, like the two reference sequences, hits HNH chain A of the same Geobacillus virus E2 hit 5H0M_A in PDB, with 75.5% alignment, 99.19% probability, and E-value: 2.5e-11, with everything exactly as seen above for the two reference sequences, including the HNH endonuclease at position 76-124 (https://www.rcsb.org/structure/5H0M).
Notably, Skinny gp 93 which is called HNH has got poor e-values

Next is Glaske16 gene at 51656-52198 bp (draft gp 83); its sequence is below:
MDTIWKPIPQDPTGLYLASQDGRILRKEYVIEKLQSHGHLYRRVMPEKIVKQCIKDRAPSHGVHPIIQMRSSTQYASTVERRVSSLIAAAWHGLPYEAGDRTAQNDWRIGFIDGDPSNVHADNLEWVSNQGVNTHHSHDFYYENLKAYRAQAAVETAESFLARYYSPDEIDWSTAERIAAZ
This one too hits HNH endonuclease in phagesDB. HHPred shows it in PDB with 54.1% alignment, Probability: 99.76%, E-value: 6.2e-18, but notably, it does not hit the same chain as the ref chain (it hits 1U3E_M; https://www.rcsb.org/structure/1U3E) and no Zn+2 motif, but instead Mn+2 and Sr+2, but it also has the βα.
What is your verdict on this gene in Glaske16 at 51656-52198 bp in view of the above? I am inclined to call it HNH endonuclease, except if the forum suggests otherwise.

Next is Glaske16 gp 91 at position 54100-54426 bp. Has several hits to HNH in phagesDB.
MPTNSKNGPRSRGRTGGKFERAKWRVLKANQICAHPDCRQLIDLDLKWPDPMSPTVNHIIPVKDLAWDDPLTYSVENLEPMHLVCNQRLGAGPRKKKPKHPQSRNWREZ
This has a low e-value but hits the same chain as the ref sequence, and the zinc motif (https://www.rcsb.org/structure/5H0M ), and is called HNH endonuclease, and another hit at 4H9D_A (https://www.rcsb.org/structure/4H9D).
What is your verdict on this gene in Glaske16 gp91 at 54100-54426 bp in view of the above? I am inclined but wary to call it HNH endonuclease because of the e-values, but again, it hits are the same as the Ref sequences; any suggestions?

The next question is about the Glaske16 gp98 at position 56773-57150 bp. It has more than 60 hits to HNH endonuclease in phagesDB. Its sequence is below:
MALAGEAKREYQRQWRANRRAAWFAGKACVRCGSDEDLELDHVDPTLKVTNAVWSWSQERRDVELAKCQVLCNACHKAKTISQTVITIGLKAYRHGTCSMYEHHRCRCGLCRLWARNKKRRQRAAZ

It also hits the same hit 5H0M_A in PDB with the same everything as the reference sequences, and high probability (98%), alignment 52.4%, but with not as great an e value (0.000029). What is your verdict on this one?

Finally, the Glaske16 gp117 at 60940-61320 bp. This gene has more than 70 hits to HNH endonuclease in phagesDB. What is your verdict on this one? Its sequence is:
MQREYMRRWVANRRSAFFASKQCAMCGAGEELELDHIDPTKKVDHRIWSWTDARRSEELAKCQVLCASCHKKKTGEQWYANRSVSENAHHGTSRRYRKMKCRCGLCRLGNTNRSRALRQRHRVPVEZ
It also hits the same hit 5H0M_A in PDB (https://www.rcsb.org/structure/5H0M) with the same everything as the reference sequences Sisi gp 99 and Arianna gp 54, and high probability (98.05%), alignment 49.6%, but with not as great an e-value (0.000017). What is your verdict on this one?
See details in attached file.
Edited 09 May, 2023 18:55
Posted in: Functional AnnotationClarification Question About HNH Endonuclease Function Determination in view of hits to the Ref Sequences
| posted 04 May, 2023 23:11
Thank you, Debbie!
I will keep the gene with its tRNA overlap and push it over to QCer as you have suggested. I have tried searching for an article that would directly document evidence of overlaps between tRNA and protein-coding genes but was unsuccessful, although the Wright et al (2022) article (https://doi.org/10.1038/s41576-021-00417-w) has some documentation of overlaps between non-coding RNA (ncRNA) and protein-coding genes.
Fred
Posted in: tRNAsFollow-up Clarifying Question about tRNA and protein genes not overlapping
| posted 02 May, 2023 03:58
In my previous post entitled “Is there any recent evidence of a tRNA overlapping a protein gene, even by a few bp?” (https://seaphages.org/forums/topic/5365/) I thought this question was settled. I want to delete the gene in phage Glaske16 at position 60940-61320 bp, but because several recent annotations have kept it, and it has more than 70 hits to the HNH endonuclease in phagesDB, I am seeking a second opinion on this. Its sequence is MQREYMRRWVANRRSAFFASKQCAMCGAGEELELDHIDPTKKVDHRIWSWTDARRSEELAKCQVLCASCHKKKTGEQWYANRSVSENAHHGTSRRYRKMKCRCGLCRLGNTNRSRALRQRHRVPVE

Despite more than 70 hits to HNH endonuclease in phagesDB, this gene has low (<50%) coding potential in Genemark_S, and entirely no CP in Genemark_smeg, and TB. THIS GENE OVERLAPS 15bp WITH A tRNA CALLED BY ARAGORN v1.2.41 AND tRNA-SE v. 2.0. WITH AN INFERNAL SCORE OF 55.5.
I am asking this question because of the statement from the Resource Guide entitled, “Predicting tRNA and tmRNA genes” (https://seaphagesbioinformatics.helpdocsonline.com/article-40):
“It is highly unusual that a phage tRNA willsmilei) Be encoded within an ORF called by Glimmer and GeneMark that has high coding potential, (ii) Be encoded on the opposite strand as a number of other phage tRNAs found in the same genome, (iii) Be encoded at a genomically distant location from the other tRNA genes in a genome. In general, violation of any of the three preceding conditions is sufficient for exclusion of a potential tRNA from an annotation (we have found a single high scoring tRNA that is not part of the rest of the large cluster, however this situation is very rare).”

Whereas I am inclined to delete the protein coding gene and keep the tRNA since it is called by ARAGORN and tRNAscan-SE with a high infernal score (55.5), I also note that this tRNA is distant from other tRNAs, being 387 bp apart, which seems to violate caveat iii above.
According to the forum post, “How close can one pack protein and tRNA's genes” of Feb 24, 2016, Dr Pope stated that, “We tend to steer clear of a tRNA and a protein occupying the same space, but there are definitely genomes where they get pretty close.”

I realize that there may be exceptions though but wanted to be sure. Some M1 phages such as Reindeer & Iphrane7 do not have this gene but have the “Glu” tRNA (61306-61380 bp in Glaske16; see figures & DNA Master file attached), and we know that tRNAs tend to be conserved.
Edited 02 May, 2023 04:16
Posted in: tRNAsFollow-up Clarifying Question about tRNA and protein genes not overlapping
| posted 21 Jul, 2022 18:29
Fabulous!
Thank you Karen!
Fred
Posted in: Functional Annotation“Hydrolase” or “NKF” for hits to LAGLIDADG endonuclease, homing endonuclease, HNH endonuclease?
| posted 15 Jul, 2022 21:44
Phage Skinny gene at 63560-63949 bp hits several LAGLIDADG endonuclease, homing endonuclease, HNH endonuclease, with low similarity (less than 60%). Wondering whether to call it with the general “Hydrolase” or “NKF”? The amino acid sequence is:
MDLAYLGGFFDGEGNVGLYKSGGESPRLRVQVFQNHGASQDRLMHEIHDTFGGTLHDRGTGYLYSASGSRAVDLLTQLRPHLRLKLEQADEALEWWRNRTAERFRSRTAEEVAYDESAMTRLKELKRAGZ
I have also attached the relevant BLASTp and HHPred data:
Thanks!
Posted in: Functional Annotation“Hydrolase” or “NKF” for hits to LAGLIDADG endonuclease, homing endonuclease, HNH endonuclease?
| posted 09 May, 2022 19:05
Thanks. I’ve looked at the “Thioredoxin” link that you’ve kindly provided, but there is something noteworthy in HHPred.

In PDB, the reference sequence phage Onyinye gene78 for “oxidoreductase” is almost thrice as long 792 bp vs 273 bp of Gilberta) and mostly hits “Polyketide oxygenase PgaE,” chain A, with ref Sequence in pfam hitting the "FAD folding domain" but I do not see any hits in Gilberta to any "FAD folding domain." Instead, in PDB Gilberta hits the "Molecule" THIOREDOXIN chains A & B, with the pfam ref hitting “Glutaredoxin or Glutaredoxin-like NRDH-redoxin, THIOREDOXIN; OXIDOREDUCTASE, GLUTAREDOXIN.” On the other hand, the ref sequence for Thioredoxin, phage Cjw1 gp 37 (246 bp) has a comparable length to Gilberta’s 273 bp and they both have several hits the molecule Thioredoxin chain A, but not the “Polyketide oxygenase PgaE,” or “FAD domain”.
Edited 09 May, 2022 19:08
Posted in: Functional AnnotationFunction for subcluster A11 phage Gilberta (37505-37777 rev): Thioredoxin, NrdH-like glutaredoxin or glutaredoxin?
| posted 08 May, 2022 02:08
There is a "mixed bag" with many hits to NrdH-like glutaredoxin and glutaredoxin in phagesDB, with a few hits to Thioredoxin for this Gilberta sequence MRTMFAPITIYTQPRCAPCDALKKRLEKEGIAFDAVDITKNEEAYAYVTGVLKAAATPIIVTDTHDPIIGDRPAELEELIEYYTTSETGVZ
However, the PDB HHPred shows more than 22 hits to Thioredoxin with alignments of 78-93% and probability of 99%, versus less than 5 hits to NrdH-like glutaredoxin with alignments ranging from 81-85% and probability of 99%, besides hits to glutaredoxin, all three of which are options currently provided in the Official Functions list. Should we go for, “Thioredoxin” or use “NrdH-like glutaredoxin” or the general term "glutaredoxin"?

In the “Cluster EB/ED glutaredoxin” forum post of 06 May, 2019, the use of “NrdH-like glutaredoxin” was considered appropriate for “HHPRED hit with >98% prob and >98% coverage to 4FIW_A which is the published crystal structure of NrdH from E coli.” See details of hits below:
HHPred PDB hits to Thioredoxin (6MOS_A, 93.4% alignment, 99.14% probability; 7ASW_A, 91.2% alignment, 98.91% probability; 3ZIT_B, 84.62% alignment, 99.47% probability; d1nhoa_, 85.71 alighment, 99.23% probability; d1f9ma_, d1ti3a_, d1r26a1, & d1ep7a_, 86.81 alignment, 99.2% probability; d4oo4a_, 85.71 alignment, 99.18% probability; d4j56e1 & d1thxa_, 85.71 % alignment, 99.13% probability; 3KP8_A, 84.61% alignment, 99.12% probability; d1nw2a_, 84.61% alignment, 99.02% probability; d1iloa_ c, 78.02% alignment, 99.1% probability; d3diea1, 83.52% alignment, 99.1% probability; 7B02_A, 86.81% alignment, 99.1% probability; 3HZ4_A & 7RGV_A, 90.1% alignment, 99.1% probability; 7BZK_B, 6ZYW_P, 6I1C_B, & 6Q6T_A, 89.01% alignment, 99.0% probability; 1THX_A, 87.91% alignment, 99.02% probability).

HHPred hits to NrdH-like glutaredoxin in (d1r7ha_, 81.32% alignment, 99.49% probability; d1h75a_, 83.52% alignment, 99.46 probability; 1R7H_A, 81.32% alignment, 99.11% probability; 4K8M_A, 84.61% alignment, 98.99% probability).
Thanks!
Edited 08 May, 2022 02:16
Posted in: Functional AnnotationFunction for subcluster A11 phage Gilberta (37505-37777 rev): Thioredoxin, NrdH-like glutaredoxin or glutaredoxin?
| posted 06 May, 2022 18:27
Thank you Debbie!
Fred
Posted in: Functional AnnotationA small minor tail protein called based on solely on synteny?
| posted 06 May, 2022 17:34
In the subcluster A11 phage Gilberta, we are seeing a small (189 bp long; position 25381-25569) gene hitting more than 60 minor tail protein genes in both NCBI and phagesDB. This gene is right downstream of a large (1992 bp long) minor tail protein gene which follows other minor tail proteins upstream of it. However, this small (189 bp) gene has neither hits to collagen-like, glycine-rich proteins, coiled-coils, nor any other significant hits in HHPred (Only one HHPred hit to Bacteriophage FRD2 protein, with 41.2% alignment and 11.9% probability). Could we still call it a minor tail protein solely based on synteny? Below is its amino acid sequence: MPWSPSPAFPQRQHRTAWFAELPAPTPAQHQTAWWAVYELDAPVEIACVTAAEGQEGPEEAVZ
Thanks!
Fred
Edited 06 May, 2022 19:16
Posted in: Functional AnnotationA small minor tail protein called based on solely on synteny?
| posted 21 Apr, 2022 23:19
Hi Debbie,
I concur. Case closed! Thank you for critically looking at this and clearing the air about calling genes that overlap with tRNAs. Perhaps a note in the resource guide can help eliminate future possibilities of someone calling such protein genes simply based on previous calls or significant BLASTp matches.
Fred
Posted in: tRNAsIs there any recent evidence of a tRNA overlapping a protein gene, even by a few bp?