4 bp overlaps

| posted 14 Mar, 2018 14:40
The right arms of cluster F genes are characterized by TONS of tiny genes. These genes are sometimes so small that the gene prediction programs have a really difficult time predicting them. You can usually identify them easily though, as their start and stop codons will overlap with the flanking genes in a 4bp overlap.
| posted 26 Jun, 2020 21:54
I am struggling with the evidence for OfUltron and Sebastian gene 103.
The -4 gap start is at 54324 and has a Z-score of 3 and Final score of -2.6 which looks like very compelling evidence for this start site. When I look at Starterator results I find that all 109 Cluster F1 hits call the start at 54471 which has a Z-score of 2.255 and a Final score of -4.458. When I look at the secondary structure potential for the RBS at start 54471 I find that 5 of the seven bases are in a very strong stem of a local stem-loop with a final Free Energy of -1200, which is very high (6 of the 7 bp in the stem are G-C bp). Wow, this makes the -4 gap look even better. However, if translation starts at the -4 start site there is no coding capacity for about 70 bases and then there is atypical coding capacity for about 50 bases before the start at 54471. I searched this non-coding capacity range and found 4 rare codons in this region. My initial instinct is to go with start 54324 with the -4 gap with the hope that some of the ribosomes would be able to navigate the rare codon domain, even though that may be at a slower rate. Is there Mass Spec data for Cluster F1 phages or other evidence (besides herd instinct) that has pointed everyone else to call the start at 54471?
| posted 27 Jun, 2020 13:32
Only 8 of the 109 have the -4 start site at Starterator location 12. As you can see from the Starterator map, all of these belong to the family of longest ORFs. All of the closest nucleotide BLAST relatives to OfUltron and Seabastion (Llama, Modragons and Ochi17) have this -4 start. I am going to call the -4 start because the data is consistent for these longer ORFs and may be a new evolving pham.
| posted 27 Jun, 2020 13:48
The Starterator data for the shorter start is quite compelling. I just looked at a different scenario where the -4bp overlap was on a portal gene. In this case, the thought was that the structural studies for the portal protein out-weighed the convenient -4bp overlap. In this case -with such a tiny gene and in a place where the -4bp overlap is common - I can't blame you for picking it. We need experiments!
About your 'evolving pham' comment, maybe this is an older version……
