The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at

All posts created by welkin

| posted 05 Apr, 2018 12:19
Hi Arturo,
You are right that in a circularly permuted genome you should pay attention to the gap between gene 1 and its upstream gene (in this case, the last gene).
However– we ask you write down the gap to highlight the space or overlap between genes and get you to think about whether or not you've chosen the correct start. Since you know where gene 1 starts, it becomes somewhat irrelevant to note the gap. So gene 1s in all the genomes have always received a pass on the "gap". you can just write "n/a" for gene 1.
Posted in: Choosing Start SitesHow to choose the start of the first gene for a circularly permuted genome
| posted 05 Apr, 2018 12:15
Hi Miriam,
There is a document on the Faculty Information page in the Bioinformatics section that describes how to fix a corrupted file:

Posted in: DNA MasterCorrupted files for merge
| posted 03 Apr, 2018 14:11
Unlike many other clusters, the Cluster A phages have *some* minor tail proteins at the left end of their genome, upstream of the lysins and terminase genes (around gene 4-6ish). You can recognize these proteins due to their size. Some of them may have structural motifs that suggest long, extended proteins, like collagen-repeats, or coiled-coils.
Posted in: Cluster A Annotation Tipsminor tail proteins
| posted 03 Apr, 2018 14:07
Hi Joe,
I am not surprised that you didn't get HHPred hits to minor tail proteins. The best HHPred data comes from crystal structures and tail proteins are extremely hard to crystalize as they are long and fibrous. They are also extremely modular, which can make generating a conserved domain or multiple sequence alignment difficult too (the other sources of data for HHPred searches).
The minor tail proteins at the beginning of the genome in Cluster A phages were initially assigned this function by Graham when he worked on the annotation of L5 and D29— and so now we are using synteny more than anything else to assign these functions, as in, "Those big long extended proteins at the left end of Cluster A genomes that are not lysins are tail proteins". A lot of minor tail proteins get assigned functions by synteny– they are the right size and in the right place in the genome, and so, therefore, that's what they have to be.

Make sense?
Posted in: Functional AnnotationExcellent BLAST but poor HHPred
| posted 22 Mar, 2018 15:08
Cluster H genomes have a non-canonical frameshift sequence in their tail assembly chaperones. While we are pretty sure we know where it is based on sequence alignments with Send513, we are going to not include it in our annotations at this point until we have wet bench data to back it up.
Posted in: Cluster H Annotation Tipsframeshifts
| posted 22 Mar, 2018 15:07
Cluster H genomes have a large gap in the center with no CDS or tRNA features. We don't know what goes in there and it is OK to leave it empty.
Posted in: Cluster H Annotation Tipsa big gap
| posted 14 Mar, 2018 17:43
This is in the Online Bioinformatics Guide, and restated here:

Here is a list to highlight what we look for during the initial annotation review. Initial review is an iterative, rather than exhaustive process, and your files may be returned to you after we identify any one of the following issues. Please use a request for revision as an opportunity to double-check all the items below and it will be less likely that your files will be returned again.

Most common reasons files are returned for revision during annotation preliminary review:

1. Missing the the official SEA-PHAGES cover sheet and checklist.
We designed this sheet to highlight the items that should be addressed for every annotation. If you don't use it, we don't know if you've addressed them.

2. No annotation of the programmed translational frameshift in the tail assembly chaperones.
We can't identify this region for all clusters, but in those that we can, it should be addressed. See the online guide and the Cluster-Specific annotation tips for help. Frameshifts should ONLY be added for the tail assembly chaperone as those are the only gene that have wet bench evidence for a shift.

3. tRNAs are not correctly trimmed. Remember that the autoannotation tRNA predictions may not be correct and should be reviewed.(

4. Functions do not match Official SEA-PHAGES function list.
Unfortunately, we have to be extra picky about this — spelling, capitalization, and extraneous punctuation all matter as we move towards automated curation of our data. The computer thinks everything written out a different way is a different thing. Even hidden carriage return marks can interfere with downstream formatting.
If you think your genome contains a gene with a function that is not on the list, create a new thread on the "request a new function for the official function list" forum.

5. File formatting for any of the required files is not up-to-date. The old DNA Master Annotation Guide is no longer up-to-date or being updated— please use the new Online Guide.

6. The phage page at is missing the GPS coordinates and the complete name of the student(s) who found the phage. This information becomes part of the GenBank file, and we can't submit the files until we have that information. This is also a good time to check that the phage page record is complete and accurate.

7. Flagrant violations of the Guiding Principles of annotation: genes annotated on top of each other, huge gaps with no genes predicted and no explantion, etc.

8. No evidence that programs like HHpred and Starterator were used. These programs are essential for the identification of gene starts and accuracy of functional assignments.

9. Annotation is missing or has too many of the most common phage functions as laid out in the "Functions present in (almost) all phage genomes" in the Online Bioinformatics Guide. (
This page is a quick reference guide to help you out with things like "how many portal genes should I expect to find in my genome?".
Exceptions to this will be noted in the Cluster-specific annotation forums.

10. Finally, to generate the best annotations, please refer to the Cluster-Specific annotation forums. We've added lots of tips to help you out.

If you have any questions, please ask. We are happy to help!
Edited 04 Apr, 2018 19:31
Posted in: How to Pass Preliminary Annotation ReviewHow to Pass Preliminary Annotation Review
| posted 14 Mar, 2018 16:11
The MPMEs – mycobacteriophage mobile elements— are very small and closely related. So how to tell them apart?

If your MPME is IDENTICAL to BPs 58, you have MPME1.

If your MPME is IDENTICAL to Fruitloop 71, you have MPME2.

IF you are not IDENTICAL to either, let us know! maybe you found a new one!

See Sampson et al for more information.
Posted in: Cluster G Annotation TipsMPMEs--which one?
| posted 14 Mar, 2018 14:46
The MPMEs – mycobacteriophage mobile elements— are very small and closely related. So how to tell them apart?

If your MPME is IDENTICAL to BPs 58, you have MPME1.

If your MPME is IDENTICAL to Fruitloop 71, you have MPME2.

IF you are not IDENTICAL to either, let us know! maybe you found a new one!.

See Sampson et al for more information.
Posted in: Cluster F Annotation TipsMPMEs--which one?
| posted 14 Mar, 2018 14:41
Cluster F phages can have either +1 or -1 frameshifts in the tail assembly chaperones. Make sure you know which one your phage has.
Posted in: Cluster F Annotation Tipsframeshifts