Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.
Recent Activity
Typical vs atypical GeneMarkS coding potential
Link to this post | posted 06 Feb, 2016 20:18 | |
---|---|
|
Page 48 of the current Annotation guide makes reference to "typical" and "atypical" coding potential in the GeneMarkS output, which I believe is the heuristically generated version. What is the distinction here? In both LilDestine and Teardrop, the two types of coding potential appear to be largely, although not entirely, overlapping; in those areas of sequence containing atypical, but not typical, cp, the atypical regions are generally small. |
Link to this post | posted 12 Feb, 2018 16:06 | |
---|---|
|
Hi Joseph, The typical and atypical models in GeneMark are described in: https://www.ncbi.nlm.nih.gov/pubmed/9847079 Essentially, GeneMark uses a heuristic to figure out the coding potential in a genome. It detects the longest possible ORFs and assumes they are real ORFs. Based on these, it starts the iterative training of Hidden Markov Models to predict coding and non-coding regions. One the HMM is set, GeneMark then performs clustering of predicted protein-coding genes. The most common clustering uses two clusters, assuming that the majority of genes follow a "coherent" codon usage pattern, and there is a minority, likely the result of lateral gene transfer (LGT) that do not stick to those rules. Final models are trained on these two clusters, resulting in the "Typical" model for "coherent" genes, and the "Atypical" model for the weirder one. Hope this helps! Ivan |