The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at

RecB-like exonuclease/helicase or Cas4 family exonuclease?

| posted 03 Aug, 2022 18:32
Greetings all!

We generated our R2I for Langerak and were pleased to see that we had assigned all functional assignments correctly except for one: Gene 46 (31633-32562). This gene belongs to pham 37230. On Phamerator, all of the other most closely related P1 cluster phages (Donovan, FirstPlacePfu, HUHilltop, Jebeks and Techage) all have this same pham, and the assigned function is RecB-like exonuclease/helicase. On the SEA PHAGES function list, we are told the following:

RecB-like exonuclease/helicase: If both a helicase and nuclease domain are present, the RecB label should be used.

Cas4-family exonuclease: This family of exonucleases is similar to the exonuclease domain of RecB. The Cas4 label should be used if the gene includes only the exonuclease region. IF the gene also includes a helicase domain, the RecB label should be used. Cas4 family nucleases tend to have alignments to the crystal structure 4R5Q_A, 41C1_A and to the PD-(D/E)XK nuclease superfamily (PF12705.7, among others)

On HHpred for Langerak, we did get hits to the Cas4 family nucleases mentioned in the above paragraph. But we also got hits to helicases: 3U4Q_B, 4CEI_A, 6PPU_B, 1W36_E, and others. We decided to assign the function as the RecB-like exonuclease/helicase because (1) we thought we had fulfilled the requirement stated on the SEA PHAGES official function list that we should have hits to both helicases and exonucleases and (2) All of the other P1 cluster phages with the same pham used this functional assignment on Phamerator.

But we were marked wrong.

I'd like to get this right going forward, so could someone please help me out?

As always, thanks for helping me "review to improve"!

Best wishes,
| posted 04 Aug, 2022 19:08
Hi Kathleen,

The exonucleases are confusing! You are correct that Langerak_46 has HHPred hits to nuclease/helicases as well as the Cas4-family exonucleases. However, the nuclease/helicase hits are only aligning to the C-terminal parts of those proteins (see attached screenshots). My interpretation is that Langerak_46 is aligning to the exonuclease domains of the RecB-like proteins, but it does not contain the helicase domain. So I think the best call given the information in the official function list is Cas4-family exonuclease.

The prototype gene product in the official function list for RecB-like exonuclease/helicase is RedRock_72, but the HHPred results for that protein show only alignment to the exonuclease (, and in fact that gene is currently annotated as Cas4 exonuclease. I will try to find an example of a RecB-like protein so you can see how to identify the two domains.

I am also not entirely sure how to distinguish Cas4-family exonuclease from other members of the PD-(D/E)XK superfamily that show up as HHPred hits, but am looking in to that further.

Edited 04 Aug, 2022 19:13
| posted 05 Aug, 2022 14:56
Karen, this is very helpful, thank you. An example of a RecB-like protein would be very helpful. I understand your thought process and your logic as identifying this protein as a Cas4 endonuclease. But why do so many other P1 cluster phages assign the function as RecB-like exonuclease/helicase? Shouldn't proteins in the same pham all have the same function?
Thanks again for your reply,
| posted 05 Aug, 2022 15:45
I am going to answer that question. It is because once a function is identified, it gets propagated. Doesn't matter who said it, or what the supporting evidence is.
For SMART, our focus this year has really been to work through these mis-identifications. So questions like yours will help us all get there.
The best way to identify where the calls are difficult is to look at phams with multiple functional calls. We all need to work through the data and learn a little bit of structural biology/chemistry and biochemistry to make better decisions. Not always clear and we all bring a different understanding.
And the answer to your question about phams is no, not all members in a pham will have the same function. Simplistically, I can see why we think that they should. But if we understand how a pham is built, then we would know that all things in the pham are not the same. A simple explanation is the one member may not have a 'domain' that another one has, so part of the gene is homologues, but missing a particular functional domain would imply that it has a different function. Maybe!
An important component is that context is important.
| posted 08 Aug, 2022 22:54
Thanks, Debbie and Karen! I met with my students today, and I also consulted with a biochemistry colleague, and we investigated some of the other genes in pham 37230. There are 266 members of this pham, and I looked on phagesdb and these genes are all about 800-1000 bp in length. I looked at some of the other pham members and I used the domain function in Phamerator (and I also did a BLASTp on NCBI to get the domain information for Langerak_46 and BiteSize_54) and from what I can see, these proteins all have hits to exonuclease domains that cover the length of the protein, and no alignments with helicases. And the encoded proteins don't appear to be large enough to have both exonuclease and helicase functionalities. It is possible that the earlier annotations of the P1 cluster phages of this pham as RecB-like exonuclease/helicase were incorrect?

I'm also curious about RedRock_72 as an example of a RecB-like exonuclease/helicase. I looked on phagesdb and found that gp72 in RedRock belongs to pham 24. Virtually all members of pham 24 are assigned a Cas4 exonuclease function (as Karen says in her post), and again, these genes are about the same size (800-900 base pairs)indicating that the proteins aren't large enough to have both helicase and nuclease functionalities. Maybe we don't have an example of a phage protein that has both functionalities because one doesn't exist, as least as far as we know?

Thanks to both of you to responding to my posts–my next task is to look over a QC'd genome that we recently received, and we got quite a few of the functional assignment wrong. As a biochemist, I ought to be good at this, and it's important for me to get this right so that I can properly mentor my students.

Thanks again,
| posted 17 Aug, 2022 17:23
Hi Kathleen,

After looking into this a bit more, I think I need to change my recommendation to RecE-like exonuclease for Langerak_46, based on the RecT-like ssDNA binding protein gene immediately downstream (Langerak_47). Please see this forum thread I missed this as I was focused on the Cas4 vs RecB question but, as Debbie pointed out, context is important! The genes mentioned in that post, Che9c_60 and ShaboiShabazz_42, are in the same pham as Langerak_46.

This is a large pham including several clusters. Most (but not all) of the genes have a RecT-like ssDNA binding protein immediately downstream, and if they do they should be called RecE-like based on this analysis.

RecE-like exonucleases, Cas4 family exonucleases, and the exonuclease domain of RecB-like helicase/exonucleases all have very similar folds and motifs (all are part of the PD-(D/E)XK nuclease superfamily). All will likely have HHPred hits to crystal structures for RecE (e.g. 3H4R_A) and Cas4 (e.g. 4R5Q_A). Based on the information in the official function list and the forum thread linked above, the following guidelines might be useful for calling proteins with these HHPred hits:

a. If the protein also has a helicase domain, call RecB-like helicase/exonuclease
b. If the gene immediately downstream is a RecT-like ssDNA binding protein, call RecE-like exonuclease
c. If neither a nor b are true, call Cas4-family exonuclease

I have not yet found an example of a phage RecB-like helicase/exonuclease. Here is the HHPred result for a bacterial RecB-like protein that has both domains, as an example.

| posted 18 Aug, 2022 22:16
That was very helpful. The HHPRED results will only be available for a short while and so I am posting the protein sequence of the example above so that anyone who wishes can redo the search.

>CAG9959048.1 RecB-like helicase [Campylobacter jejuni]
Login to post a reply.