SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

2 genes in same place of Cluster BE phage, Kentucky Racer

| posted 15 Feb, 2024 17:54
We are annotating the phage KentuckyRacer and have come across a section where there is two genes that directly overlap in Phamerator. Start site for the first gene is 65377 and stop 65658, and a 0 base pair gap, this gene has the longest ORF this is gene 95. No coding potential on GeneMark was found. Coding potential was discovered in another overlapping reading frame. The other gene included has a start site 65390 and stop 65638 and a gap of 13 bp and is gene 96. Gene 96 was not included in DNA master. When compared to the Phamerator map, the other phages IchabodCrane and MindFlayer share the pham number 87029 (16) as KentuckyRacer_96. Gene 95 pham number is 1536 (70). We are unsure of how to determine which is the correct gene.
Thank you.

Riley Bryant Anna Elpers
| posted 15 Feb, 2024 22:30
I call this "gene content analysis", and according to the guiding principles rule 2: "Genes do not overlap by more than a few bp, although up to about 30 is legitimate". I would also add that like all rules, exceptions exist. All that is to say that you are correct to be suspicious given the very large overlap one or the other is very likely a false positive from the gene predictors used to create the draft annotations.

So for evidence as to what are real genes and what are false positives I would rank evidence in this order and list the evidence FOR a real gene and against the hypothesis it is a false positive (from strongest to weakest, not from what I look at first to last)
1. Strong HHPRED alignments to well characterized crystalized proteins (this will almost never happen to a false positive)
2. Strong BLAST alignment to a well characterized protein with an assigned function (again almost never happens to a false positive)
3a. Good coding potential with the BLACK signal not the red signal
3b. Good BLAST hits to other well annotated phages
{3a and 3b are tied for quality in my mind]

4. Then would come Rule 9 in the guiding principles: "Switches in gene orientation are relatively rare" [does not apply in your case, but added here as another source of evidence as many times the two genes that overlap are on different strands]

So you probably want to check 1 above, as for 2 you did not state if the matches in the other phage have assigned function or not, so you have some more investigation to do but by rule 3 you at least have a good hypothesis as to which is more likely the false positive.
Edited 15 Feb, 2024 22:34
 
Login to post a reply.