| posted 16 Jan, 2020 01:19
I'd like to present my new paper on annotation of phage genomes, entitled "A Method for Improving the Accuracy and Eciency of Bacteriophage Genome Annotation". In this paper, I present the method we use in my lab to annotate phage genome. It is based on the method we all use in SEA-PHAGES, but is a little more formal and detailed. I estimate the accuracy of the method and compare it to programs such as Glimmer and GeneMark, using the genomes of phages Lambda and Patience as test sets. It does seem that manual annotation method produces more accurate annotations, both in terms of gene calling and start codon calling, at least as far as Lambda and Patience are concerned. I think those of you interested in genome annotation will find this paper very interesting, and may find many useful things to incorporate into your annotation protocols.

| posted 26 Mar, 2019 15:34
Hi Debbie, and thanks for looking into this. I looked into the small forward ORF you mentioned but I feel the coding potential for that one is very weak, and it is more likely that gene 105 should be called long instead (which would fill the gap somewhat).
| posted 26 Mar, 2019 14:59
Yes, I got the next forward gene and deleted those short reverse genes (121-125). Thanks for looking into this.
| posted 25 Mar, 2019 23:19
Gene 103 (forward frame 3, 61983-62291) in cluster J phage NihilNomen is called by Glimmer and GeneMark (the GeneMark in DNA Master) and has strong coding potential for part of its length. While ordinarily this would be a no brainer, there is an ORF in reverse frame 1 (62040-62480) with equally strong coding potential and called by host-trained GeneMark.hmm and GeneMark S. The two genes overlap, but not completely (see attached figure).

In terms of homology matches, gene 103 has 6 matches to cluster J phages, while the reverse ORF has none.

Call both genes? Or just gene 103?
| posted 25 Mar, 2019 22:56
I have a situation with two overlapping genes in Cluster J phage NihilNomen which looks like it might be a translational frameshift.

Gene 123 in forward frame 2 is called by Glimmer and GeneMark, albeit both call it short. However, there is an even longer ORF in frame 1 that is called by the host-trained GeneMark.hmm that has very strong coding potential and full overlaps with gene 123 (se attached figure).

Interestingly, the coding potential for the ORF in frame 1 and gene 123 do not overlap: The coding potential for gene 123 starts right where the coding potential for the ORF in frame 1 ends (see attached figure). Furthermore, the coding potential map seems to indicate there is a translational frameshift. The ORF in frame 1 has BLAST matches to phage Thibaut, while gene 123 has matches to phages Omega and Porcelain. In terms of function, both genes have matches to DNA cytosine methyltransferase.

I am unsure what to do here? Call both genes? If so, what to do about the potential translational frameshift? Or should I call only one gene, and if so which one?
| posted 12 Feb, 2018 19:27
We have found 4 ORFs located in coding gaps that have significant BLAST matches (<E-10) to other Mycobacterium smegmatis phage genes, but without any coding potential. To call or not to call?
The BLAST matches are as follows:

ORF 1:
gp57 in Mycobcterium phage Send513 with E-value = E-38
gp57 in Mycobcterium phage Papyrus with E-value = E-38

ORF 2:
Many matches to HNH endonucleases in many Mycobacterium phages, all with E-value 0

ORF 3:
gp90 in Mycobacterium phage Send 513 with E-value = E-13

ORF 4:
gp72 in Mycobacterium phage Squirty with E-value = E-27
gp49 in Mycobacterium phage Shuna with E-value = E-21

Thanks for any insight.
| posted 27 Jan, 2018 01:31
Is there a cutoff value of the SD score for start codons, i.e. a value below which you would eliminate a start codon from consideration? I use Kibler6 and Karlin medium per the DNA Master guide.

| posted 26 Jan, 2018 21:27
One of our phages, Riparian, has circularly permuted genome ends, and there is a gene that straddles the genome ends (see attached dnam5 file). Can we proceed with annotation normally, or is this going to be a problem later on?

Reply from Deborah Jacobs-Sera:
When you have a circularly permuted genome (that is – a genome without defined ends) we cut it in a gap upstream of the terminase. I cut Riparian according to those guidelines. Sometimes small genes are called that we cannot support with sufficient data. That wrap-around gene is one such gene. Also note that I cut the genome so that gene 1 starts on bp1. So the start of the first gene should be bp 1!

I double-checked my decision of where best to cut this genome and am sticking with it. I do not believe that wrap-around gene is real when you check out the overlap with gene 1.
| posted 24 Jan, 2018 19:59
Still not working for me, as of January 24 2018. I get "Glimmer failure" and "Annotation failure".

