The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at

GTGA overlaps

| posted 22 Jan, 2019 17:14
I see mention of the ATGA 4bp overlap across 2 genes as a preferred configuration and can assist in selecting a correct start site. What is the situation with a GTGA 4bp overlap? We have several genes in our EE phage that show this GTGA configuration. Is this also a preferred orientation that should be noted in the annotation notes file?
RS Pollenz
| posted 24 Jan, 2019 16:36
It seems that ATG and GTG are both very good start codons for bacteriophages so the GTGA overlap should be considered equally good.
| posted 04 Feb, 2019 20:03
OK great. Here is another interesting situation. We have a double start codon GTGATG where the GTGA creates the 4 bp overhang to to the previous gene. I know from Welkin that we typically choose the 2nd start codon if there are two consecutive……Thoughts since choosing the ATG is a one bp overlap to the TGA stop. The RBS data show identical values for the GTG and ATG.
RS Pollenz
| posted 04 Feb, 2019 20:07
Given what we know from experiments that the Hatfull lab has done the the second start codon is probably the correct one. You still see a 1 bp overlap and that is also highly favored. It could be ither one or both, but based on the mass spec experiments they have done, it seems that the second start would be the more likely start.
| posted 20 Feb, 2019 14:33
So, it is better to go with the -1 gap rather than the -4? I thought I remembered this at the summer annotation workshop, but was told the -4 was better at another venue.
| posted 20 Feb, 2019 15:22
This is an issue where not everyone seems to agree and so there are examples of both calls in the databases. From what I understand from conversations I have had with Welkin indicated if you have two potential starts right in a row like this it is usually the second (-1) based on proteomics data. I could not quickly find a reference to support my memory unfortunately.
| posted 20 Feb, 2019 17:05
Hi all,
I can weigh in on this. In the papers that we wrote about Cluster O, J, and M, we did some mass spec. The data showed that where there were 2 starts in a row, no methionine was present in that mass spec data. The interpretation of that is likely that post-translational chopping of the initiation methionine occurred. (So there could only be one present to chop.)
I just scanned the papers to see if we wrote that in one of them but couldn't find it. So by all means take a look to see if it is there.
But that is why we suggest that the second start codon (1 base overlap) be called. It is hard to do, when you are progammed (like me) to look for the 4-base pair overlap and all other data - usually the SD score - points to the first start codon as the better call. But you can do it! It also was not done on lots of files and it is difficult to pinpoint to fix.
| posted 01 Dec, 2019 20:21
If I can make sure I am perfectly clear? This is generalizable, correct? If we see GTGATG or ATGATG, go with the second as a rule, and not just in clusters O, J, and M. I presume it is also true with TTGATG?

| posted 03 Dec, 2019 18:25
Excellent question, because we simply do not know. But yes, at this time, you can apply it generally!
Login to post a reply.