SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

Orpham genes in AZ phage

| posted 10 Mar, 2022 05:18
My class is annotating VResidence which is an AZ phage closely related to DrSierra. Much to our surprise -given how many AZ phage there are already sequenced-, there are two orpham genes towards the end. Gene 53 (stop at 3711smile and gene 54 (stop at 37497). We are inclined to call them genes mostly because of gaps: without them, there is too large of a gap. And they have a 3 bp overlap with the previous gene and with each other … is that chance? We don't think so! But rather than putting this in the notes, we are communicating in real time as requested!

Here is the paragraph about it that my student wrote; we investigated the possibility of a different gene being there instead of those two, and the coding potential is a bit noisy:

We found for Gene #53 that there seemed to be some coding potential in a different frame sequence. However, in this frame, there was no stop or start that could be identified. When looking at the frames in DNA master, the reverse would overlap too much with Gene #52 to be considered. There were no possible homologs found in Phamerator and there were no hits when locally blasted in the phagesdb. There were no hits in the NCBI nr database as well. For Gene #54, there was a coding potential with a stop and start in the direct sequence. There were no homologs found in Phamerator. The NCBI database also had no hits for this protein sequence. For gene #54, when locally blasted in the phagesbd, all the hits were proteins with an unknown function. HHPred had no hits for either of the sequences. We think there may be a possibility that gene 53 and gene 54 may be the same gene.

Attached is an image from the coding potential: Gene 53 is the first gene found in he 3rd frame.
Edited 10 Mar, 2022 22:24
| posted 11 Mar, 2022 17:55
First, I would say that having an orpham or two in a phage is not to unusual to really worry me. In looking at the phamerator map of all AZ phage I can see that KeAlii has at least 3 orphams, and there are several phage with 1 or 2 orphams so having two orphams is not so unusual as to cause real questions in these genes.

The longer gene 54 has such good coding potential I would always call that one. 53 is just long enough to call. See rule 8 guiding Principles.. I would say 35 amino acids is in the grey zone, meaining it requires some evidence other than just an open reading frame to call the gene. However both have pretty good coding potential so I would call both.

No BLAST hits is also not so surprising as we already know it is an orpham (i.e. unique among all 400K proteins in phagesd). So while a good BLASTp metch might make you feel more confident there really is a gene here, the lack of a BLAST hit is not good evidence that this region is not a gene. Said more formally, a positive result in a BLASTp is good evidence, a negative result is not good evidence there is no gene, it is simply that BLAST has nothing to say one way or the other.

As for overlap, we would call this a 4 base overlap (gap score of -4). Since gene coordinates describe intervals not counts you cannot just subtract the coordinates, you have to adjust by 1. I run across this issue all the time with my students and I have them draw out a tiny sequence with a few "genes" to see the difference between interval math and normal math.

Finally, for gene calls, it is better to have a false positive (i.e. call a gene which really isn't there) than it is to have a false negative (miss a gene). So even if I was not sure of gene 53 I would still call it given this rule.

So for all the above, I would keep both these genes in the annotation and just be amazed at how diverse the gene collection is among all these phage. I am sure Deb could quote you a few papers that discuss the ideas of phage as "engines of gene creation" and I think, at least for 54, that we could have an example of that.
| posted 17 Mar, 2022 20:43
Thanks for this awesome feedback!! I shared it with the students that were working on those genes an it was very helpful. Our class is small so they appreciated having a broader discussion with somebody outside of our small circle. Yay!
cdshaffer
First, I would say that having an orpham or two in a phage is not to unusual to really worry me. In looking at the phamerator map of all AZ phage I can see that KeAlii has at least 3 orphams, and there are several phage with 1 or 2 orphams so having two orphams is not so unusual as to cause real questions in these genes.

The longer gene 54 has such good coding potential I would always call that one. 53 is just long enough to call. See rule 8 guiding Principles.. I would say 35 amino acids is in the grey zone, meaining it requires some evidence other than just an open reading frame to call the gene. However both have pretty good coding potential so I would call both.

No BLAST hits is also not so surprising as we already know it is an orpham (i.e. unique among all 400K proteins in phagesd). So while a good BLASTp metch might make you feel more confident there really is a gene here, the lack of a BLAST hit is not good evidence that this region is not a gene. Said more formally, a positive result in a BLASTp is good evidence, a negative result is not good evidence there is no gene, it is simply that BLAST has nothing to say one way or the other.

As for overlap, we would call this a 4 base overlap (gap score of -4). Since gene coordinates describe intervals not counts you cannot just subtract the coordinates, you have to adjust by 1. I run across this issue all the time with my students and I have them draw out a tiny sequence with a few "genes" to see the difference between interval math and normal math.

Finally, for gene calls, it is better to have a false positive (i.e. call a gene which really isn't there) than it is to have a false negative (miss a gene). So even if I was not sure of gene 53 I would still call it given this rule.

So for all the above, I would keep both these genes in the annotation and just be amazed at how diverse the gene collection is among all these phage. I am sure Deb could quote you a few papers that discuss the ideas of phage as "engines of gene creation" and I think, at least for 54, that we could have an example of that.
 
Login to post a reply.