The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at

All posts created by cdshaffer

| posted 16 Mar, 2023 21:08
I would discount the observation that "majority of the functions as hypothetical" because most of those "hypothetical proteins" were probably annotated before the above discussion and thus before the addition of "helicase loader" to the approved terms. So in this case, "absence of evidence is NOT evidence of absence". New crystals are constantly being published and the approved terms list is a living document that changes as we find new functions and fine tune the nomenclature. This is one of those places where I would tell a student "This is why we keep doing annotation by hand, we keep learning more and more and we keep getting better and better at annotation".

Looking at the positive evidence though, this call is indeed tricky, there are several HHPRED hits suggestive of helicase loader that all have really high probability but only about 40-50% coverage. So this is where reasonable annotators can disagree. In looking at the crystal data here I can see that the part that does not align is "disordered" so one could use that to argue that a strictly similar structure in this region is not required for function (as this region is not highly structured in the crystalized protein) and thus the fact that it does not match at the structural level is not good evidence that this new protein is not another example of a helicase loader. Bottom line, the fact that this region is disordered means that I discount the evidence that HHPRED is not matching them (i.e. it weakens the negative result). I don't think I ever like adding annotations on just a single piece of weak evidence, even if I can make a handwavy argument for why it is weak, so I would want more evidence. My own sense would be to look for synteny evidence to strengthen the call for a helicase loader. Since proteins that interact are much more often found near each other in phage genomes, you might find some positive results that give you more confidence you have a helicase loader. Is this gene near other genes that look to be part of a replisome or near some type of helicase? If you find a nearby helicase then you have found additional evidence. Synteny is never strong evidence, but combining two pieces of weak evidence (synteny and partial HHPRED), can sometimes clearly provide sufficient evidence and give you confidence to "make the call".
Posted in: Request a new function on the SEA-PHAGES official listphage helicase loader protein
| posted 03 Mar, 2023 16:45
Unfortunately, being an "Orpham" means that the protein has not other similar proteins and is thus placed in its own pham group. Starterator is about using evidence from evolution to help gather evidence for start codons based on conservation and evolution of the genes by comparing how they have evolved over time. However for orphams, with only one member in the pham, there is nothing to "compare", so there is nothing to report. Start codon choice will just have to proceed without evidence from comparative evolution and rely on what evidence is available.
Posted in: StarteratorPham not found in Starterator
| posted 02 Mar, 2023 17:11
DeepTMMHMM is working for me again but I had to create an account. It was still failing when I tried to use DeepTMMHMM as a guest. I used my github account to sign in thru OAuth but it looks like you might be able to just create an account de novo with an email address.
Posted in: Functional AnnotationDeep TMHMM?
| posted 01 Mar, 2023 21:55
Not working for me on web page either. I did not test command line, but the error log for my web run shows that the program was not able to initialize at the biolib cloud service. Unfortunately, the web page and the command line both submit to biolib so if biolib is down not much can be done until they fix the service.

I cannot seem to find any way to get the status of the biolib cloud service so if anyone knows how to that, would love to hear it.

There is a way to run the whole analysis locally but it requires access to specialized GPU hardware to do the analysis so we are really just stuck waiting for biolib service to be restored.

Hopefully they are aware of this issue and are working on a solution, again can't find a status page anywhere, so hard to know.
Posted in: Functional AnnotationDeep TMHMM?
| posted 29 Jan, 2023 22:44
Just an FYI, I have been trying to reach PECAAN all day today with no success. Also, this website is also reporting it is down. Emails and messages have been sent to Clair and Dex.
Posted in: PECAANPECAAN Down?
| posted 11 Jan, 2023 20:24
when presented with this issue of specificity (i.e. should one annotate using the more specific term that includes the "MerR like" ) I like to first check out the PFam hits; others prefer the CD hit database, both are fine, I am just used to the Pfam dataset and I think the Hidden markov models that underlie PFam database (instead of the PSSM data in CDD) are more sensitive.

You can add the Pfam database to an hhpred search, I usually don't add this database by default but I would go back and add it in a situation like this. In this case, your protein hits PF13411.9 ; MerR_1 ; MerR HTH family regulatory protein with 98% and 75% alignment. So in terms of assessing the quality of the hit I would say it is borderline (for PFam hits I do like nearly full length) but since this is just a specificity issue (I.e. there is plenty of evidence from your search that there is indeed an HTH in this protein) I would tell a student that, yes, this PFam hit is sufficient evidence to support the added "MerR like" term to the function.

Note that some of the other proteins in your search do also mention the MerR like domain but you need to be careful and not take that as evidence at face value. Since HHPRED will happily report partial alignments, to use that evidence you would need to confirm that the MerR like HTH domain in the description is also in the region of the alignment. This can easily be done, it just usually takes a few clicks to dig into other databases and thus more time; whereas when looking for a good PFam hit, all the evidence you need is right there on the HHpred results page, so it is more of an expediency issue than a "quality of evidence" issue as to why I prefer using PFam.
Edited 11 Jan, 2023 20:24
Posted in: Annotationhelix-turn-helix binding domain or protein?
| posted 19 Dec, 2022 23:20
Ok I just tried this out and everything worked. I now have DNA Master running in a Windows 11 for ARM using the free "for personal use" version of VMWare Fusion 13 so this is a solution that does not require buying parallels.

With all the registering, downloading and configuring it took about 2.5 hours. The video was very detailed and goes through all the steps. So if you are at all reticent I recommend watching the video linked about. If you want to just try I would add the following steps:

step 4 add that after download you need to install homebrew on your mac, then use homebrew to install the qemu suite, then use qemu-img to convert the Windows 11 client download from a VHDX format to a VMDK format for use with VMWare.

step 7. you will get stuck trying to get windows to start up as installation now requires an internet connection but you cannot install the network drivers and get internet until you isntall the VMtools which you cannot do until you start up Windows. So check out the video or search for on how to boot windows 11 without internet.
Finally to get everything working once Windows fully boots, use Powershell in administrator to run the VMWare Tools from the virtual CD.

I too have not been able to get file sharing or copy/paste set up. According to this page:
and this page:
these features are not yet supported yet for M1 mac/Windows 11 for Arm. See suggested work arounds.
Posted in: DNA MasterDNA Master on M1 Mac
| posted 29 Sep, 2022 15:37
if it is asking for the user name that is:

for the password use:
Posted in: Bioinformatic Tools and AnalysesPhamNexus on SEA-VM
| posted 18 Aug, 2022 22:16
That was very helpful. The HHPRED results will only be available for a short while and so I am posting the protein sequence of the example above so that anyone who wishes can redo the search.

>CAG9959048.1 RecB-like helicase [Campylobacter jejuni]
Posted in: Cluster P Annotation TipsRecB-like exonuclease/helicase or Cas4 family exonuclease?
| posted 08 Aug, 2022 19:10
q1: no assembler that I know of is aware of scientific standards about which strand should be the top strand and which should be the bottom strand. These standards are determined in a community-by-community way and so vary from one system to another. For example in eukaryotes we typically use the standard set by the cytologists and how they present whole chromosomes. Thus, in a very high quality assembly (where we probably have evidence for the locations of centromeres and telomeres) we will publish the sequence to match the typical cytological display.

The phagesdb community has standards for determination of base 1 and strand. For all our phage deterination of Base 1 determination depends on the type of phage end structure, while strand is usually picked so the structural genes are top strand and near the beginning of the sequence. Dan posted some videos here with lots of help on this but you need to be able to look at the raw assembly to answer some of these questions.

Finally, a lot of published sequences are actually sequences of prophage and base 1 and orientation are set by the location of the insertion site and the standard orientation for the host genome. [based on your gene matching I think this is the case for NC_015296.1] There are a collection of phage like this in the phamerator database where the order and orientation of the sequence has been changed from the genbank record to a different order and orientation so as to match (as best as possible) the typical order in the phamerator database, this makes drawing and interpreting the comparison maps at much easier.

Q2: I would recommend you set your base 1 and strand using a similar stratagy, that is to say, pick the base 1 and orientation to make the subsequent steps of comparison as easy as possible. But that of course depends on what you're comparing your genome to. The good news here is that DNA master has a nice feature if you want to "roll" the genome around to set a different base 1 as well as the ability to switch to the complementary strand.
Posted in: NewblerGetting Started with Phage Assembly