The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at

New function? Hfq RNA binding protein

| posted 04 Feb, 2022 20:28
We are annotating JohnDoe, a cluster AZ phage. Gene 53 in related phages has been annotated as NKF (by us and others), but we see high probability on HHpred for similarity to Hfq RNA binding proteins. The only two RNA binding protein options on the approved list are "RNA binding protein" and "Ro RNA binding protein". Can this new function be added?

See attached doc.

All other AZ phage with this gene will need to be corrected.

| posted 10 Feb, 2022 03:48
We'll need to do some further investigations, I think. Without experimental data, I am bit uncomfortable to call this more specifically than an "RNA binding protein". I have attached an article that describes 3 different RNA binding protein families. How would one differentiate between them? Knowing that would help in this decision. Still thinking.
| posted 10 Feb, 2022 04:08
Hi Debbie. We'll use RNA binding protein, but I'll also have the students gather more evidence to see if they can differentiate between the different RNA binding proteins. This is an annotation we will change in all the AZ phages that have this gene as we complete the AZ harmonization.
| posted 10 Feb, 2022 04:17
Full disclosure: I don't know anything specific about RNA binding proteins, but I have ideas about how to address questions like these.

It looks to me as though the similarity of this protein to Hfq is restricted to the predicted fold, as opposed to actual amino acid sequence similarity. Accordingly, CD-Search on the Conserved Domains Database doesn't recognize anything about it. And of course, finding that kind of thing is one of the reasons we use HHpred.

If we take the fold prediction to be reliable, then one would certainly be within reason to hypothesize that it's an RNA binding protein. But I don't think the HHpred result should stand as the only evidence. In the absence of doing actual biochemistry, I think that one could look for support for that hypothesis by asking which amino acids of Hfq have interactions with RNA (which I'm sure is well-established and described in the literature) and ask whether at least some of the amino acids at the corresponding positions in this protein (as predicted by HHpred) look like they could have similar interactions, accounting for side chain biochemical properties like charge or hydrophobicity.

Either way, I'd argue against calling it any specific variety of RNA binding protein, like Hfq. In my view, the absence of sequence similarity (even if the structure is conserved) means that the Occam's razor argument is that it's convergent with Hfq, rather than homologous to it, and I don't think we should give the same names to proteins that are convergent. So my vote is not to call it Hfq, but to consider calling it an RNA binding protein if it has a handful of key amino acids in the right places based on Hfq structure/function.

Edited 10 Feb, 2022 04:19
| posted 10 Feb, 2022 05:28
thanks Mitch. I also know little about RNA binding proteins, but given the HHPred results we will investigate if key residues are conserved. Hfq is a phage protein, so the idea that this protein and Hfq evolved from a common ancestor seems possible? I'm not sure if convergence vs. common descent is the simpler explanation. However, I haven't looked carefully at the sequence similarity (or lack of it).
| posted 10 Feb, 2022 14:11
Adam, the way I see it, the approach to identifying homology between any two genes/proteins is to seek evidence of sequence similarity; similar 3-D structures point to related function, but not necessarily common descent. Convergence is the null hypothesis. It's certainly not the case that evidence of common descent between genes/proteins with very limited sequence similarity can never be found, but I just don't see evidence of sequence similarity with these particular proteins using a couple of different tools (BLAST, CD-Search). That doesn't mean it didn't happen; but if it can't be detected, convergence is the default. Maybe someone will find evidence of sequence similarity using more sophisticated tools.

But let's say it were the case that this protein and Hfq could be demonstrated by some means to have a common ancestor. Is that enough to call it Hfq? If the criteria of sequence similarity and having key side chains in similar places are met, then I think the best we could say is that it's in a family, or superfamily, with Hfq. I don't know what the cutoff is or should be, but I would argue that if no sequence similarity above background can be detected, I don't think it's useful to give proteins the same exact designation.

There are a lot of ways a protein could evolve to bind RNA (or DNA), and there's been an enormous amount of opportunity to explore possibilities over evolutionary time - especially for bacteriophages, which replicate so often. It wouldn't surprise me at all to learn that in terms of 3-dimensional structure, similar solutions have been arrived at multiple times, and that this protein and Hfq have everything in the right place to do similar things.

That's my two cents, anyway (maybe three cents; I'll shut up now). I look forward to seeing what we find out about this protein - it's a fascinating case!
| posted 15 Feb, 2022 18:22
Hi all,
Sally Molloy also weighed in:
I attached a paper describing prokaryotic Hfq proteins (see figure 2). I think the protein actually has the conserved residues required for the two Sm domains of an Hfq protein including:

1) it has the conserved G in Beta2 that is found in all Sm proteins
2)It has the highly conserved hydrophobic residues characteristic of the first Sm domain
3) It has the highly conserved G of Sm1 but is missing the second highly conserved D of Sm1.
4) It has the absolutely conserved Q of alpha helix 1 and it has the highly conserved Y/F in Sm1

It is missing the YKH motif of the SM2 motif but instead has an HRS motif (the eukaryotic motif here is simply RG).

So its pretty similar in terms of secondary structure and conserved amino acids to Gram positive Hfq proteins. I think we can at least call it an RNA binding protein and maybe an Hfq protein.

Sally Molloy

I personally am inclined to call these proteins "RNA binding proteins".
Login to post a reply.