The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at

VIP2-like toxin/ ADP-ribosyltransferase

| posted 19 Oct, 2020 22:23
A gene early in the genome, found in A2 phages, is often called as a VIP2-like toxin OR an ADP-ribosyltransferase. The start call for this gene is very polarizing.

For example, in phage Smeagan, the gene is Gene 3 (stop site 2251):

Either of the two recommended start sites, 1322 or 1661, result in either a large overlap (-94) or a large gap (245). The earlier one includes a region without coding potential; the later start cuts off a lot of coding potential. The -94 start site has a slightly better RBS score and Z-score.

Starterator calls for this gene are split almost down the middle between the two recommended sites, with a slight bias toward choosing the site that results in the overlap.

Any guidance on this one? Thanks all!
Edited 19 Oct, 2020 22:24
| posted 22 Oct, 2020 00:15
To me the results of of the starterator reports are quite telling. The two choices you point out are labelled start 12 and 15 in the current starterator report here. First the level of conservation for start 12 is much much higher than start 15. In fact there are only 2 of 56 phage that don't have start 2 and both of those have a start very very close by position to start 12. On the other hand start 15 is only seen in 2/3rds of these genes and for 7 of the 30 tracks there are no starts anywhere near start 15.

To me it is hard to believe that evolution would continue to choose to keep the bases that code for start 12 in virtually all these genes if start 15 was really the start cf translation, so I would have a strong preference for it.

As for coding potential (CP). If you look carefully you can see examples of other regions in the genome where you know the sequence is coding but the CP signal drops to zero. These are regions that are downstream of a strong CP signal but before the stop codon. See the CP for gene 14, there is easily at least 100 bases with no CP signal. So this is why I have a "rule" that a positive signal in CP is good evidence there IS a gene but no CP is not quite as good at indicating there IS NOT a gene. Said more formally, CP algorithm makes more false negative errors than false positive errors. So, in this case where one start says there is a CP false positive (start with 245 gap) and the other choice would say that CP is a false negative I would say that CP also is slightly more supportive of the big overlap start.

Taken together then I would annotate this gene to start at 1322. If I were helping a student with this I would now ask them to back and double check that gene 2 is real just because of that super large overlap. But even if gene 2 is real I would probably still stick with that huge overlap given the strong level of conservation seen in starterator report.
| posted 22 Oct, 2020 00:19
Thank you so much! The Starterator split of manual annotations between sites 12 and 15 threw me off, but you make an excellent point about the better conservation of site 12.

Gene 2 does appear to be real and is very well conserved in the cluster. So it looks like this one is a weird overlap!
| posted 22 Oct, 2020 01:06
Thanks Chris!
Login to post a reply.