The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at

How to Pass Preliminary Annotation Review

| posted 14 Mar, 2018 17:43
This is in the Online Bioinformatics Guide, and restated here:

Here is a list to highlight what we look for during the initial annotation review. Initial review is an iterative, rather than exhaustive process, and your files may be returned to you after we identify any one of the following issues. Please use a request for revision as an opportunity to double-check all the items below and it will be less likely that your files will be returned again.

Most common reasons files are returned for revision during annotation preliminary review:

1. Missing the the official SEA-PHAGES cover sheet and checklist.
We designed this sheet to highlight the items that should be addressed for every annotation. If you don't use it, we don't know if you've addressed them.

2. No annotation of the programmed translational frameshift in the tail assembly chaperones.
We can't identify this region for all clusters, but in those that we can, it should be addressed. See the online guide and the Cluster-Specific annotation tips for help. Frameshifts should ONLY be added for the tail assembly chaperone as those are the only gene that have wet bench evidence for a shift.

3. tRNAs are not correctly trimmed. Remember that the autoannotation tRNA predictions may not be correct and should be reviewed.(

4. Functions do not match Official SEA-PHAGES function list.
Unfortunately, we have to be extra picky about this — spelling, capitalization, and extraneous punctuation all matter as we move towards automated curation of our data. The computer thinks everything written out a different way is a different thing. Even hidden carriage return marks can interfere with downstream formatting.
If you think your genome contains a gene with a function that is not on the list, create a new thread on the "request a new function for the official function list" forum.

5. File formatting for any of the required files is not up-to-date. The old DNA Master Annotation Guide is no longer up-to-date or being updated— please use the new Online Guide.

6. The phage page at is missing the GPS coordinates and the complete name of the student(s) who found the phage. This information becomes part of the GenBank file, and we can't submit the files until we have that information. This is also a good time to check that the phage page record is complete and accurate.

7. Flagrant violations of the Guiding Principles of annotation: genes annotated on top of each other, huge gaps with no genes predicted and no explantion, etc.

8. No evidence that programs like HHpred and Starterator were used. These programs are essential for the identification of gene starts and accuracy of functional assignments.

9. Annotation is missing or has too many of the most common phage functions as laid out in the "Functions present in (almost) all phage genomes" in the Online Bioinformatics Guide. (
This page is a quick reference guide to help you out with things like "how many portal genes should I expect to find in my genome?".
Exceptions to this will be noted in the Cluster-specific annotation forums.

10. Finally, to generate the best annotations, please refer to the Cluster-Specific annotation forums. We've added lots of tips to help you out.

If you have any questions, please ask. We are happy to help!
Edited 04 Apr, 2018 19:31
| posted 01 May, 2023 02:44
In the Official Function list, most functions are listed in lower case, but "Hypothetical Protein" is listed with both words capitalized. I see the note that capitalization matters (#4), but is is killing me to have "Hypothetical Protein" capitalized when only specific names (RecA, DNA, Holliday, etc.) are generally capitalized in the list. It just does not match! (I do also note the exceptions of Metalloprotase and Metallophosphatase, which I haven't yet used.) Will my annotation get rejected if I use "hypothetical protein" instead?
| posted 01 May, 2023 12:51
Hi Pam,
You can submit Hypothetical Protein as Hypothetical Protein or hypothetical protein.
The capitalization is related to how it is automated in DNA Master and PECAAN, respectively.

Sorry that is confounding, but it is the small stuff.

| posted 01 May, 2023 13:14
Thanks, Debbie! It's reassuring to know that both are okay. And it's helpful to know that it relates to the automation.
Login to post a reply.