SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

DNA Master vs. Phamerator

| posted 22 Feb, 2018 20:50
We're doing the Maroc7_Draft phage. Phamerator says that there are 95 genes, but autoannotation in DNA Master says there are only 91 genes. How come?
| posted 23 Feb, 2018 17:08
I think I found the problem. The map from Phamerator.org has 4 very small extra genes that might be from a mistake. They are inserted into other bigger genes. For example in the attached picture, gene 40 would be a mistake.

When I add the sequence into PECAAN, the Phamerator map generated from PECAAN doesn't have those 4 extra genes, which agrees with DNA Master. So should I trust the map from PECAAN and not the one from Phamerator.org?
Edited 23 Feb, 2018 17:13
| posted 23 Feb, 2018 20:24
It is not unusual to get slightly different results from gene predictors when the same sequence is analyzed on different computers. Software version, default parameter settings, different training sets and other factors can effect the outcome. This is really just par for the course in bioinformatics. Without good evidence to the contrary I always treat each source (PECAAN, DNAMaster, Phamerator) as evidence to be used in the final annotation. Those genes that appear in one auto-annotation and not the other simply have less evidence to support them being in the final annotation.

From the Guiding Principles of Bacteriophage Genome Annotation rule 2 we know that genes rarely overlap by more than 30 bp, so clearly some of those genes in that cluster as displayed in the phamerator map should not end up in the final annotation. So the genes in that region that show up in both have slightly more evidence supporting their existence compared to the genes that only show up in one. However, you will want to use all sources of evidence before you decide which genes should end up in your final annotation. See Deciding whether an auto-annotated gene is a gene.
| posted 23 Feb, 2018 20:45
Thanks, that makes sense. I also got this response from the Phamerator chat:

The draft genomes that are in Phamerator (such as Maroc7_Draft) are directly imported from DNA Master's auto-annotation, which is itself based on Glimmer and GeneMark predictions. The same is true of PECAAN as far as I am aware. So DNA Master, Phamerator, and PECAAN are all showing the output of the same two programs. Why then do the data in Phamerator, DNA Master, and PECAAN not all agree with each other? The Glimmer and GeneMark methods use random subsampling from within a genome to make predictions about where the genes are. Because the random sampling will be different each time the programs are run, the predictions themselves can vary (slightly) each time you run Glimmer or GeneMark. Which output is correct, Phamerator, DNA Master, or PECAAN? While there are clearly issues on the Phamerator map (such as gene 40 that you pointed out), it's likely that all 3 programs are showing annotations that contain mistakes! That's of course why we need to perform manual revisions to the auto-annotation. Hope this helps!
 
Login to post a reply.