SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

2022 Genomics Workshop Follow-up Annotation: Ciao

| posted 13 Dec, 2022 02:04
2022 Genomics Workshop Participants,
The genome we will annotate is Mycobacterium phage Ciao. Please add your annotation questions to this forum post. You will find Ciao in Phamerator and PECAAN. Be sure to identify the stop of the gene that you are interested in (it is the only unique identifier). Enjoy annotating!
| posted 28 Dec, 2022 21:58
Hi Debbie & all,

I'm looking at the start for the reverse gene w/ a stop at 42523 in Ciao and came across something interesting. Pippin_68 and Ciao_66 are virtually identical AA sequences, except for the first 4-8 AA's, and the frameshift mutation that occurred in Ciao to generate this is making me question which frame the "real" start codon is actually in.

It made more sense with pictures and formatting that's hard to achieve in a forum post, so PDF is attached, including my question about which of two possible starts to call.

Thanks!
| posted 03 Jan, 2023 03:47
Hi Megan,
I agree that Ciao sequence in the first 4-8 amino acids is unlike others in the pham.
However, the gene does stop at 42523. So the only start to call is 42699.
I think that starterator shows the data well.
in addition, check the GeneMark using smeg and tb as the target. You will see that the best coding potential is in the frame identified.
Make sense?
debbie
| posted 10 Jan, 2023 21:43
Hello!

I'm working on gene 17, 12876-13325.
There isn't a disagreement in DNA Master.
Glimmer calls start at 12876.

BUT, when you look at the GeneMark graph (attached), Glimmers start puts it wayyyy to the left of the coding potential. This start has a Z value = 1.323 and Raw Score = -5.344.

Another start that is just outside the coding potential (marked with an orange highlighter on my attached pic) has a better Z value = 1.724 and Raw Score = -4.502.

So I'm thinking even though the program is in agreement, I need a second opinion.
Does it matter that the start that is called is wayyyyyyy to the left of the coding potential?

WDYT?

Amy
Edited 10 Jan, 2023 21:44
| posted 10 Jan, 2023 21:51
Ohhhhhhhhhhhhhhhhhhhhhh I'm an idiot!
I was looking at GeneMark for the next gene…
Nevermind!!!
Means I need a break.
smile
| posted 11 Jan, 2023 00:12
Amy,
This is good stuff. It will make you more prepared for what your students will do. It is all good!
debbie
| posted 12 Jan, 2023 15:00
So for gene 32 (start: 25887 stop: 27365) NCBI blasts is showing strong hits for "Minor Tail Protein), but HHpred is showing strong hits for "Beta-lactamase". Here is my HHpred link: https://toolkit.tuebingen.mpg.de/jobs/1188359. Maroc7 and PherrisBueller, also in the A1 cluster are calling "minor tail protein", but wouldn't these conflicting calls between NCBI/HHpred lead to a "hypothetical" call? Surrounding genes, have some "minor protein" hits, but are mostly "hypothetical".

Thanks,

Joe
| posted 12 Jan, 2023 16:12
Hi Joe,
It is common that minor tail genes contain enzymes that help get the phage attached to the cell wall. So it is common to find them containing collagen, beta-lactamse, peptidoglucan domain (to name a few) hits. The big genes (maybe 4-6 of them) following the tape measure protein are really the only genes we can call by synteny alone as "minor tail protein". If the genes following the tape measure are not big, we look for additional evidence and sometmes find it. this gene is 493 aa, so I think we are good!
| posted 12 Jan, 2023 17:02
Thanks Debbie!
| posted 14 Jan, 2023 16:43
debbie
Hi Megan,
I agree that Ciao sequence in the first 4-8 amino acids is unlike others in the pham.
However, the gene does stop at 42523. So the only start to call is 42699.
I think that starterator shows the data well.
in addition, check the GeneMark using smeg and tb as the target. You will see that the best coding potential is in the frame identified.
Make sense?
debbie

Hi Debbie,

Yes, this makes sense - thanks!

I was thinking about this in a chicken-or-egg way. Does a strong RBS which we "know" was previously used as a start site override coding potential, even though the frameshift has separated it from most of the coding potential, or is the coding potential enough for the ribosome to use a different/less favorable RBS (and how does the ribosome "know" where the coding potential is? philosophical questions of gene expression smile).

From your reply it sounds like we should favor the latter, which makes sense.
 
Login to post a reply.