SEA-PHAGES | All posts created by ivanerill

Link to this post \| posted 29 Jul, 2019 16:44
ivanerill	Hi Welkin, Just double-checking before submission. If we are naming the protein product, shouldn't it be "tellurium resistance D family protein", rather than "tellurium resistance protein D family"? Ivan

Posted in: Request a new function on the SEA-PHAGES official list → TerD, tellurium resistance protein

Link to this post \| posted 07 Aug, 2018 16:42
ivanerill	OK. Microdon has not been yet submitted. There are I believe a couple more on the BH cluster that are still draft. The rest were just released by GenBank this July…

Posted in: Functional Annotation → tail chaperones in cluster BH

Link to this post \| posted 06 Aug, 2018 18:38
ivanerill	Hi Veronique, Thanks. Yes, our idea was not so much to make the functional annotation (we agree the evidence is relatively, weak although the HHpred hit for the non-called region is over the p=0.9 threshold and synteny points toward these being tail chaperones), but to make the longer gene call. In all BH phages the option to make the longer call on Gp23 homologs is available, and results in a similar overlap with the previous gene (Gp22). Ivan

Link to this post | posted 06 Aug, 2018 18:38

ivanerill

Hi Veronique,

Thanks. Yes, our idea was not so much to make the functional annotation (we agree the evidence is relatively, weak although the HHpred hit for the non-called region is over the p=0.9 threshold and synteny points toward these being tail chaperones), but to make the longer gene call. In all BH phages the option to make the longer call on Gp23 homologs is available, and results in a similar overlap with the previous gene (Gp22).

Ivan

Posted in: Functional Annotation → tail chaperones in cluster BH

Link to this post \| posted 06 Aug, 2018 14:59
ivanerill	Hi Lee, I agree that neither call would be completely correct. The main point of our argument is that it is hard to explain why a non-called region would have a good HHpred hit with a DnaJ chaperone domain (that is, why would a non-coding region match a protein domain). Given that synteny suggests these should be tail chaperones, the HHpred hit is harder to disregard. Hence my take would be to make the longer call. As you point out, this is likely to be wrong, since it seems logical to assume that if these two are indeed tail chaperones, one would strongly suspect that they would also be using translational frameshifting. I, however, could not find conclusive evidence of this. Regarding DnaJ, I am not aware of any formal link between them and TACs. I was able to find a reference that indicates that DnaJ is used by phages during tail assembly (http://www.jmb.or.kr/journal/download.php?Filedir=../submission/Journal/014/&num=764). My overall take on this is that these are tail chaperones (given their syntenic arrangement, coding potential evidence for an overlap and the presence of a conserved chaperone domain in the region that would be left uncalled using GeneMark), but that they are divergent enough to not get proper hits in HHpred (beyond the DnaJ chaperone hit in the non-called region). Ivan

Link to this post | posted 06 Aug, 2018 14:59

ivanerill

Hi Lee,

I agree that neither call would be completely correct. The main point of our argument is that it is hard to explain why a non-called region would have a good HHpred hit with a DnaJ chaperone domain (that is, why would a non-coding region match a protein domain). Given that synteny suggests these should be tail chaperones, the HHpred hit is harder to disregard. Hence my take would be to make the longer call. As you point out, this is likely to be wrong, since it seems logical to assume that if these two are indeed tail chaperones, one would strongly suspect that they would also be using translational frameshifting. I, however, could not find conclusive evidence of this.

Regarding DnaJ, I am not aware of any formal link between them and TACs. I was able to find a reference that indicates that DnaJ is used by phages during tail assembly (http://www.jmb.or.kr/journal/download.php?Filedir=../submission/Journal/014/&num=764).
My overall take on this is that these are tail chaperones (given their syntenic arrangement, coding potential evidence for an overlap and the presence of a conserved chaperone domain in the region that would be left uncalled using GeneMark), but that they are divergent enough to not get proper hits in HHpred (beyond the DnaJ chaperone hit in the non-called region).

Ivan

Posted in: Functional Annotation → tail chaperones in cluster BH

Link to this post \| posted 04 Aug, 2018 04:42
ivanerill	We suspect that the canonical arrangement of tail chaperones preceding the tapemeasure gene may be conserved in cluster BH, and possibly contain a programmed frameshift. We would like feedback on whether to annotate overlapping genes and/or help in detecting putative frameshift points. Please see attachment for details. 872Kb

Posted in: Functional Annotation → tail chaperones in cluster BH

Link to this post \| posted 12 Feb, 2018 16:06
ivanerill	Hi Joseph, The typical and atypical models in GeneMark are described in: https://www.ncbi.nlm.nih.gov/pubmed/9847079 Essentially, GeneMark uses a heuristic to figure out the coding potential in a genome. It detects the longest possible ORFs and assumes they are real ORFs. Based on these, it starts the iterative training of Hidden Markov Models to predict coding and non-coding regions. One the HMM is set, GeneMark then performs clustering of predicted protein-coding genes. The most common clustering uses two clusters, assuming that the majority of genes follow a "coherent" codon usage pattern, and there is a minority, likely the result of lateral gene transfer (LGT) that do not stick to those rules. Final models are trained on these two clusters, resulting in the "Typical" model for "coherent" genes, and the "Atypical" model for the weirder one. Hope this helps! Ivan

Link to this post | posted 12 Feb, 2018 16:06

ivanerill

Hi Joseph,

The typical and atypical models in GeneMark are described in:
https://www.ncbi.nlm.nih.gov/pubmed/9847079

Essentially, GeneMark uses a heuristic to figure out the coding potential in a genome. It detects the longest possible ORFs and assumes they are real ORFs. Based on these, it starts the iterative training of Hidden Markov Models to predict coding and non-coding regions. One the HMM is set, GeneMark then performs clustering of predicted protein-coding genes. The most common clustering uses two clusters, assuming that the majority of genes follow a "coherent" codon usage pattern, and there is a minority, likely the result of lateral gene transfer (LGT) that do not stick to those rules.

Final models are trained on these two clusters, resulting in the "Typical" model for "coherent" genes, and the "Atypical" model for the weirder one.

Hope this helps! smile

Ivan

Posted in: DNA Master → Typical vs atypical GeneMarkS coding potential

Recent Activity

All posts created by ivanerill