Tricky Start position decision

| posted 25 Apr, 2020 08:55
Dear Phage hunters,
Please help me decide on this start.
Dilemma: From BLAST searches, everyone else seems to have selected the start called by both GM and GL. I don’t want to go against the tide but…Whereas the start from 63861 bp (ATG) which was called by both GL & GM gives the best RBS score (Z = 3.35, FS -1.993), it leaves a huge (423 bp) gap, and gives a short (162 bp) ORF, compared to position at 64122 bp (GTG) , which has RBS score (Z= 2.151, Fs -5.229), has a 30bp overlap which is acceptable according to the guiding principles of annotation, and gives a far longer ORF (423 bp vs 162).
| posted 25 Apr, 2020 13:03
Hi Fred,
I would need additional info to help evaluate. What is the phage? Would you attach your DNA Master file? Have you looked at Starterator data?
| posted 26 Apr, 2020 02:34
Hi Debbie,
Thanks for your response. It is phage Heath. The file is attached. I inserted gp 60, so anything beyond gp 60 will be plus 1bp from the draft sequence. The question is about feature 79 (78 in Draft sequence). I’ve checked, Starterator and realize that it is an Orpham, no data. Now that you have the file, could you kindly as well look at gp 74 (73 in draft), it is also an Orpham. The explanatory notes for available in the notes section. In both cases, it seams to me that GL/GM are is preferring ATG over GTG starts.
Edited 26 Apr, 2020 02:44
| posted 26 Apr, 2020 14:47
Hi Fred,
Thanks. First of all, Heath is a cluster B4 phage. Have you noticed the orange lines that crisscross the B4 genomes in phamerator. Do you know what that is indicating?

They are likely some sort of repeats. There are 2 resources to point you to:
1. A paper: Cluster K mycobacteriophages: insights into the evolutionary origins of mycobacteriophage TM4

2. Check out Section this doc "Exploring bacteriophage biology".

It would be great to look for them! What you don't want to do is include the repeats in your protein calls.

In both cases, I would call the genes much shorter than you, mostly to respect the coding potential. For gene 79, I think most of the data fits a start at 63861.
For gene 74, I think that most of the data fits a start at 62311.

Let me know what you think or if you have additional questions. Please add the forum post (just a link) to your cover letter when you submit this genome. So that the reviewer knows the energy that you invested in the calls.

| posted 26 Apr, 2020 20:40
Thanks a lot Debbie!
This helps a lot.
Case closed!
