SEA-PHAGES | DNA Master vs. Phamerator

Link to this post \| posted 22 Feb, 2018 20:50
cqdiep	We're doing the Maroc7_Draft phage. Phamerator says that there are 95 genes, but autoannotation in DNA Master says there are only 91 genes. How come?

Link to this post \| posted 23 Feb, 2018 17:08
cqdiep	I think I found the problem. The map from Phamerator.org has 4 very small extra genes that might be from a mistake. They are inserted into other bigger genes. For example in the attached picture, gene 40 would be a mistake. When I add the sequence into PECAAN, the Phamerator map generated from PECAAN doesn't have those 4 extra genes, which agrees with DNA Master. So should I trust the map from PECAAN and not the one from Phamerator.org? Edited 23 Feb, 2018 17:13 14Kb

Link to this post | posted 23 Feb, 2018 17:08

cqdiep

I think I found the problem. The map from Phamerator.org has 4 very small extra genes that might be from a mistake. They are inserted into other bigger genes. For example in the attached picture, gene 40 would be a mistake.

When I add the sequence into PECAAN, the Phamerator map generated from PECAAN doesn't have those 4 extra genes, which agrees with DNA Master. So should I trust the map from PECAAN and not the one from Phamerator.org?

Edited 23 Feb, 2018 17:13

Link to this post \| posted 23 Feb, 2018 20:24
cdshaffer	It is not unusual to get slightly different results from gene predictors when the same sequence is analyzed on different computers. Software version, default parameter settings, different training sets and other factors can effect the outcome. This is really just par for the course in bioinformatics. Without good evidence to the contrary I always treat each source (PECAAN, DNAMaster, Phamerator) as evidence to be used in the final annotation. Those genes that appear in one auto-annotation and not the other simply have less evidence to support them being in the final annotation. From the Guiding Principles of Bacteriophage Genome Annotation rule 2 we know that genes rarely overlap by more than 30 bp, so clearly some of those genes in that cluster as displayed in the phamerator map should not end up in the final annotation. So the genes in that region that show up in both have slightly more evidence supporting their existence compared to the genes that only show up in one. However, you will want to use all sources of evidence before you decide which genes should end up in your final annotation. See Deciding whether an auto-annotated gene is a gene.

Link to this post | posted 23 Feb, 2018 20:24

cdshaffer

It is not unusual to get slightly different results from gene predictors when the same sequence is analyzed on different computers. Software version, default parameter settings, different training sets and other factors can effect the outcome. This is really just par for the course in bioinformatics. Without good evidence to the contrary I always treat each source (PECAAN, DNAMaster, Phamerator) as evidence to be used in the final annotation. Those genes that appear in one auto-annotation and not the other simply have less evidence to support them being in the final annotation.

From the Guiding Principles of Bacteriophage Genome Annotation rule 2 we know that genes rarely overlap by more than 30 bp, so clearly some of those genes in that cluster as displayed in the phamerator map should not end up in the final annotation. So the genes in that region that show up in both have slightly more evidence supporting their existence compared to the genes that only show up in one. However, you will want to use all sources of evidence before you decide which genes should end up in your final annotation. See Deciding whether an auto-annotated gene is a gene.

Link to this post \| posted 23 Feb, 2018 20:45
cqdiep	Thanks, that makes sense. I also got this response from the Phamerator chat: The draft genomes that are in Phamerator (such as Maroc7_Draft) are directly imported from DNA Master's auto-annotation, which is itself based on Glimmer and GeneMark predictions. The same is true of PECAAN as far as I am aware. So DNA Master, Phamerator, and PECAAN are all showing the output of the same two programs. Why then do the data in Phamerator, DNA Master, and PECAAN not all agree with each other? The Glimmer and GeneMark methods use random subsampling from within a genome to make predictions about where the genes are. Because the random sampling will be different each time the programs are run, the predictions themselves can vary (slightly) each time you run Glimmer or GeneMark. Which output is correct, Phamerator, DNA Master, or PECAAN? While there are clearly issues on the Phamerator map (such as gene 40 that you pointed out), it's likely that all 3 programs are showing annotations that contain mistakes! That's of course why we need to perform manual revisions to the auto-annotation. Hope this helps!

Link to this post | posted 23 Feb, 2018 20:45

cqdiep

Thanks, that makes sense. I also got this response from the Phamerator chat:

The draft genomes that are in Phamerator (such as Maroc7_Draft) are directly imported from DNA Master's auto-annotation, which is itself based on Glimmer and GeneMark predictions. The same is true of PECAAN as far as I am aware. So DNA Master, Phamerator, and PECAAN are all showing the output of the same two programs. Why then do the data in Phamerator, DNA Master, and PECAAN not all agree with each other? The Glimmer and GeneMark methods use random subsampling from within a genome to make predictions about where the genes are. Because the random sampling will be different each time the programs are run, the predictions themselves can vary (slightly) each time you run Glimmer or GeneMark. Which output is correct, Phamerator, DNA Master, or PECAAN? While there are clearly issues on the Phamerator map (such as gene 40 that you pointed out), it's likely that all 3 programs are showing annotations that contain mistakes! That's of course why we need to perform manual revisions to the auto-annotation. Hope this helps!

Recent Activity

DNA Master vs. Phamerator