Below is a summary of the abstract you submitted. Presenting author(s) is shown in bold.
If any changes need to be made, you can modify the abstract or change the authors.
You can also download a .docx version of this abstract.
If there are any problems, please email Dan at dar78@pitt.edu and he'll take care of them!
This abstract was last modified on March 28, 2023 at 10:20 a.m..
To have a better understanding of bacteriophages, their biology, and potential applications such as phage therapy and industrial pathogen control, it is important to correctly annotate phage genomes. However, not all annotators are experts in bacteriophage biology, structural biology, chemistry, biochemistry, or proteomics. This opens opportunities for mis-identifying genes. Moreover, called gene functions, including incorrect ones, tend to get propagated. To make matters more challenging, not all genes in the same pham necessarily have the same function. This is due to some pham members lacking particular functional domains, even though parts of their genes may show homology and have similar HHPred hits with other pham members. This study examined annotation data in subcluster M1 phages over a 14-year period (2009-2022) to identify inconsistencies in gene calls, in order to offer suggestions for accurate genome annotation. Various bioinformatics software and databases were utilized, including DNA Master, PhagesDB Blast, NCBI Blast, HHPred, Phamerator, and DeepTMHMM. While annotating subcluster M1 Mycobacterium phage Glaske16 gene at 39487-40347bp for example, notable inconsistencies were observed in the function calls of genes in the same pham 72358. BLASTp results from NCBI and PhagesDB showed that some annotators had called the function as unknown, while others called Cas4 family exonuclease, or simply exonuclease, even among genes with ≥99% homology. Such inconsistencies in function calls of homologous genes could be indicative of a larger annotation problem, namely, individual annotators heavily relying on synteny or BLAST results to make the final decision on gene function. Data on the accuracy of gene calling in the subcluster M1 over the analyzed 14-year span are discussed, and a simple outline to help better utilize the available bioinformatics tools for future gene calls is provided.