The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at

Function for Gene 1 of EK1 Phages

| posted 04 May, 2020 14:13

I'm finalizing a genome for an EK1 phage, and gene 1 has been historically annotated as NKF. For our phage (Pabst), we have significant HHPRED hits for the gene (see attached screenshot) that suggests this gene may be a sigma factor. There are many hits like this as you scroll through the HHPRED list.

The NCBI blast hits tell a similar tale, although the statistics are not as good. With Blast we see E-values of 0.02-0.09 for hits to sigma factors. The conserved domain database does not find anything. I was hoping to hear input on how to treat this gene: leave as NKF or assign a function? I'd feel better about assigning a function if both the HHPRED and Blast results gave a clear answer, and for this reason am leaning toward leaving as NKF. However, I don't want to ignore the HHPRED data and therefore wanted to share with the community to get feedback. Thank you!

Jamie Wallen
| posted 04 May, 2020 20:10
Hi. The difference between blast hits and HHpred is based on blast and psi-blast results. So HHpred will always be a 'further stretch' to the function you want to call than a blast hit. The problem with Blast hits is, that the blast hit shows homology, but doesn't show credibility to what is said to be the function. So if someone mislabeled a gene function, it can easily be perpetuated.
By all of that, I am not unhappy that blast confidence and HHPred do not match.

I would call this gene a Hypothetical Protein because I don't know enough about DNA polymerases to know how necessary a sigma factor would be. Do you?
I think the hits are to an RNA sigma factor? Does that fit? What does your expertise tell you about the collection of genes in the vicinity?

Your insight would be appreciated!

| posted 04 May, 2020 21:18
Probably RNA polymerase sigma factor? I've just been looking at the HHPred data for the Reference sequence Nerujay_52 provided in the Official Functions list and this is what I see: (also attached for comparison)
Edited 04 May, 2020 21:20
| posted 05 May, 2020 01:41
Hi Debbie and Fred,

Yes, the hits I am seeing are for an RNA sigma factor. This protein lies next to several DNA metabolism genes (primpol, helicases, etc.) and reminds me of the T7 system (also a podo). T7 brings in its own RNA polymerase (gene 1) that is used to transcribe its genes, and the major DNA replication genes lie near the RNA pol (gene 2.5=ssDNA binding protein, gene 4=primase/helicase, and gene 5: DNA polymerase). If this protein is a sigma factor, then perhaps this is a similar type of system in these phages. The podos we have discovered do not bring in an RNA polymerase, but a sigma factor that can direct transcription (via the host RNA polymerase?) of its DNA metabolism genes. It seems that this protein, and the DNA metabolism genes I mention above, are unique to the podos in EK, EK1, EK2, and EM.

Debbie, I agree with your decision to call this a hypothetical protein based on the current data (or lack thereof). If I had additional solid data, along with the HHPRED, then I'd feel better about assigning function.

| posted 14 May, 2020 18:12
Debbie. I just want to make certain I am making calls correctly. The responses above left me a bit uncertain. I'm going to ask a couple of follow-up questions. Admittedly I have not looked into Jamie's genome closely. But the responses above are making us question a call we made on Phage Heath gp71.

Let's look at this from general gene calls initially, if I may.

1. If I were looking at HHPred results for any gene and saw 10 or more hits, all with similar protein functions, covering nearly half the protein with a probability of over 98% and e-values of 10^-6 or better I would lean toward calling the HHPred identified function. Wouldn't this be strong enough supporting evidence for at least considering?

2. If the same gene had a first HHPred hit that is the second HHPred hit for the Official Function List reference gene, I would again feel more confident with the function call, even if no one else in the SEA had noticed it previously. Is this consideration wrong?

3. If that first HHPred hit is a protein from Mycobacterium tuberculosis, I would tend to feel even more confident the call is correct, even though no other PhagesDB Blast hits to the protein have ID'd this function. Is this consideration wrong?

At that point, I would likely assign function based upon HHPred analysis results alone.

Again, I don't know all the details in Jamie's phage and how strong his hits are. We were about to switch the call to "Hypothetical Protein" based upon the response above. But my gut tells me I need to ask these questions to make certain. I may have been doing this wrong in the past. IDK.

The HHPred output link to (Heath_gp71) the gene we are now questioning is here:

That linked output will likely go away in 20-30 days. So I will attach the results in a pdf for future reference.

The HHPred hit 6DVC_F (our first hit) is the second hit for the reference phage gene (Nerujay_52) for RNA Polymerase Sigma Factor. We thought this would be enough supporting evidence.

But we don't want to call it if it should not be. We, also don't want to "band wagon" and call it like everyone else, if this is sufficient supporting evidence to make a functional call.

Thanks for any additional clarification you can provide on using HHPred hits alone for function assignment.

| posted 19 May, 2020 12:31
Based on Jamie's discussion (Thanks Jamie!), let's call this Hypothetical Protein.

Greg - BTW and FYI - I asked Jamie because he studies DNA polymerases.
| posted 21 May, 2020 20:36
I've got a similar result to Nurujay_Gp52 (41973). In this case I think there is strong evidence of being RNA polymerase sigma factor, specifically an extracytoplasmic sigma factor (ECF), of which are are large family of diverse alternative sigma factors that regulate genes in response to different environmental changes. Attached is the HHPRED alignment between Jorgensen_51 and sigmaL from M. tuberculosis. ECF sigma factors are smaller than other sigma factors, are are comprised of two conserved domains (R2 and R4) that are connected by an unstructured linker. The crystal structure (6DVC) is of RNA polymerase, sigmaL and DNA and demonstrates interaction between R2 of sigmaL and the -10 region of the non-template strand.

In light of other discussion on sigma factors….

Is this a RNA polymerase ECF sigma factor (new function) or transcription factor (not on the list)?? Alternatively RNA polymerase sigma factor (on the list) seems reasonable.
Login to post a reply.