The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at

Cluster B gene with no coding potential

| posted 15 Mar, 2016 17:19
When reviewing a student’s annotation, we came across the following situation where a student had added a gene:
Below is the evidence used to call the gene and I would appreciate an expert opinion on the call.
1. There was a large gap ( > 400 bp) between genes 9 + 10 in Iridoclysis
2. In at least three other cluster B phages a similar gap has been filled by a gene (see phamerator map below).
The gene was added and the following information was noted:
Blast: 1:1 alignment with gp10 of 3 other phages.
RBS Final SD – 6.354; Z = 1.399 (not wonderful).
FS: According to phamerator, a similar gene in other phage is assigned a function of HNH endonuclease, also there is a ~ 60 bp hit on hhpred for a HnH endonuclease (99% probablility) and a hit for a HNH endonuclease domain when blasted using phages db.
An additional piece of evidence, we uploaded the Fasta sequence to Starterator (since it is an unphamerated gene) and got agreement with our start site call.
This all seems like good evidence except for one thing:
Coding Potential in Genemark (S, smeg and TB) is pitiful! (see attached coding potential file).

My question - so does that mean, this gene cannot / should not be added?
If I had been doing the annotation, I might not have ever added this gene based on the GeneMark files.
Edited 15 Mar, 2016 17:38
| posted 15 Mar, 2016 17:19
Phamerator map of the area
| posted 16 Mar, 2016 20:31
My own feeling is that coding potential is a good positive signal but a lousy negative signal. That is, seeing coding potential is a good sign for the presence of a gene but no coding potential is not good evidence for the absence of a gene. Coding potential is mostly about matching the nucleotide biases of the genes in the training set (be it self training on the phage itself or trained on a nearby host). Its always possible that a gene has evolved for some reason to specifically NOT match a particular bias. Say expression in a different host for example, or evolved for slow translation rates by using rare codons.

In addition, I was trained that when in doubt it is always best to over annotate, not under annotate. The idea was it is better to have a false positive protein in a database than a false negative protein missing from the database. Its typically easier to find and reject an error than it is to find something that has been erroneously left out.

Having said that, I will mention that the first point is my own opinion and I would not be surprised if other reasonable annotators disagree. Its really about data interpretation without guidance from any experiments so its all about opinion. As for the later point, I have found that policy to be much more a consensus in the community of human annotators that trained me and much less so in the community of people I have met that work with prokaryotes. So a reasonable counter argument is that it is better to match the standards of the community to which I would defer to others to voice an opinion.
| posted 28 Mar, 2016 16:49
Hi Marie,
HNH endonucleases are little parasitic selfish genes that move quickly through the genomic landscape– much more quickly than they would evolve to match phage genome's coding potential (think transposons).
They routinely have poor coding potential in genemark outputs for this reason, but usually have pretty good HHPred alignments because the 3D structure is conserved. Some HNH endonucleases even give false positives in GeneMark– that is, it looks like there is coding potential in the opposite strand. So if you are seeing HNH endonuclease as your functional assignments in phagesdb and phamerator maps, don't worry about poor coding potential.
| posted 29 Mar, 2016 11:45
Thank you both - this is very helpful.
Login to post a reply.