SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

Typical vs atypical GeneMarkS coding potential

| posted 06 Feb, 2016 20:18
Page 48 of the current Annotation guide makes reference to "typical" and "atypical" coding potential in the GeneMarkS output, which I believe is the heuristically generated version. What is the distinction here? In both LilDestine and Teardrop, the two types of coding potential appear to be largely, although not entirely, overlapping; in those areas of sequence containing atypical, but not typical, cp, the atypical regions are generally small.
| posted 12 Feb, 2018 16:06
Hi Joseph,

The typical and atypical models in GeneMark are described in:
https://www.ncbi.nlm.nih.gov/pubmed/9847079

Essentially, GeneMark uses a heuristic to figure out the coding potential in a genome. It detects the longest possible ORFs and assumes they are real ORFs. Based on these, it starts the iterative training of Hidden Markov Models to predict coding and non-coding regions. One the HMM is set, GeneMark then performs clustering of predicted protein-coding genes. The most common clustering uses two clusters, assuming that the majority of genes follow a "coherent" codon usage pattern, and there is a minority, likely the result of lateral gene transfer (LGT) that do not stick to those rules.

Final models are trained on these two clusters, resulting in the "Typical" model for "coherent" genes, and the "Atypical" model for the weirder one.

Hope this helps! smile

Ivan
 
Login to post a reply.