SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

Calling start in FC phage Phrampa

| posted 01 Oct, 2025 20:17
Hi all,

Our class is looking at Gene 1 (stop 685), which is called by glimmer and genemark (Both), the gene is long enough (120 nt = 40aa), has good coding potential (Yes), has SYNTENY (Yes with FC phages MIMI, TALIA, and ATUIN), is the only gene in the region (Yes), and has a function predicted, (Hypothetical Protein).

We need help choosing a start. Both starts 1 and 10 are called by Glimmer and GeneMark. In STARTERATOR, start 1 has two MA's while start 10 has six MA's. Start 1 has lower RBS scores (Z-score of 1.399 and a final score of -6.961), and start 10 has a z-score of 2.024 and a final score of -4.608. Start 1 also has the longest open reading frame. Start 1 gives us 1:1 matches to the highest GenBank hit, while Start 10 gives us 1:1 matches to the second and third GenBank hits. Basically, the highest RBS scores go with start 10, but the longest open reading frame and the top hits go with start 1. Which would be a better start?

Thanks!
| posted yesterday, 01:05
HI Brian,
When I look at the Starterator report, I see one common start for every phage listed - it is technically threaded through the starts listed as 9, 10 and 11. Which is the start that corresponds to bp 110. Remember that starterator is a clustal alignment and as the sequence diverges, small discrepancies at the nucleotide level can be present.
I would call 110 as the start.
debbie
| posted yesterday, 16:17
Thanks!
| posted today, 17:24
This raises an interesting issue about starterator numbering and while not directly related to the question at hand above (start 1 vs start 10) it is an important point in interpretation of starterator results and why Deb talks about starts 9, 10 & 11.

I will give a super simple example of why it is recommended that if there are two (or more) starts that look very close in the graphical track you can consider them "the same". Here are two made up sequences that demonstrates how small sequence changes can lead clustal to artificially create different start numbers. I will use a tiny sequence right around a start codon with the smallest possible change of the insertion of a single base. Here are the two sequences:
CCCATGCCC
&
CCCAATGCCC

When clustal aligns these two the result will align like this:

CCCA-TGCCC
|||| |||||
CCCAATGCCC
When starterator looks at this alignment the two strands do not have identical locations (the top strand has an ATG starting at base 4 of the alignment and the bottom strand has an ATG starting at base 5 of the alignment). So these two sequences will get different start numbers. But these two sequences really do have the same start, as Deb says, this is an issue with how clustal does the alignment as sequences diverge.

Bottom line: don't over interpret different start numbers to mean "absolutely must be different starts if they are different numbers", instead look at the tracks at the top of the report, if the starts are tightly clustered you can assume that there are minor base differences but the starts all very likely trace back to a single start in the common ancestor and can thus be considered "the same start".
Edited today, 17:37
| posted today, 17:42
Thanks, that is a good point to keep in mind. I appreciate your comment.
 
Login to post a reply.