SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

No frameshift in cluster BK1?

| posted 17 Mar, 2021 19:32
I have a question about the tail assembly chaperone protein in cluster BK1. We are annotating the Emma1919 genome.

Looking at all non-draft BK1 genomes in Phamerator, none of them show a tail assembly chaperone with a programmed translational frameshift. Instead, they show two consecutive tail assembly chaperones: the first one belonging to pham 5495 and the second belonging to either pham 22821 or pham 37280.

I ran an NCBI blastx between the Emma1919 genome and the second tail assembly chaperone (from pham 37280) in Gilson (Gilson_39), which is the most similar genome to Emma1919, and found an almost perfect match between Gilson and Emma1919, except at the start codon (Emma1919 has a GTG so has a V instead of M). But the protein is only 29 amino acids long.

Is this very short second protein legitimate? Or is it more likely that there is a translational frameshift in here that was previously missed? If so, how can we find the slippery sequence? The attached picture shows the six-frame translation of the region of interest. The first tail assembly chaperone (pham 5495) is highlighted in yellow, and the pham 37280 protein is highlighted in green. The region in blue in between the two ORFs doesn't contain any stop codons, which suggests to me that there might be a frameshift somewhere in the nucleotide sequence shown in red.

A few more data points:
GeneMark shows atypical coding potential in that whole blue/green region.
The entire region (bp 22601-23300) is highly conserved with Gilson, with only 6 bases different between the two.
The DNA sequence in the region where the frameshift has to happen doesn't contain a sequence that matches the known slippery sequences shown in the Bioinformatics Guide.
Edited 17 Mar, 2021 20:04
| posted 17 Mar, 2021 21:12
Joyce,

We've annotated several BK1 at UNT (and are annotating 2 right now). The last line of your post hits at the crux of the matter. Without a clear slippery sequence, we cannot bioinformatically identify a frameshift even if it seems likely that there could be a frameshift in this area. That is the reason none of the annotations I've been involved in for this subcluster have a frameshift in the final annotation.

Lee
| posted 17 Mar, 2021 21:19
So is the idea that if there isn't a slippery sequence that matches one of the ones that have been experimentally determined, then we assume that there isn't a programmed frameshift?

Should we annotate the small, 29 aa coding sequence as a protein then? That's smaller than given in the guiding principles.
| posted 17 Mar, 2021 21:25
Hi all,
I just looked at a couple of these.
In the paper referenced in the manual, the sequence XXXYYYZ is described as the canonical sequence. Right?
In the 2-3 sequences that I looked at there is such a sequence CCCAAATctt (Position 22920 in Emma1919).
Lee does your genomes have that sequence?
Is CCCAAAT conical enough (not really found on the list)?
Does HHPRED have hits for either G or T to the TAC?
What other evidence is needed?

I think that together we can come up with a really good answer to this (and then make changes to the other files).
Let me know what you think!
Lee - thanks for checking in on this one.
debbie
| posted 17 Mar, 2021 21:41
I would guess there is a slippery sequence here but there is no way to find it as it has yet to be discovered in the lab. As an annotator I would never intentionally "make up" a slippery sequence. So even though there is likely a slippery sequence somewhere in that genome I have no way to find it. This means I know I cannot get the "right" answer. Then, if I cannot get the "right" answer, the best I can do is try to find the "least worst" answer. For me, the "least worst" is to annotate as much of the T region as I can as a gene. I know this is very likely wrong but it is "less wrong" than the alternatives of either picking a slipper sequence with no support or having no gene annotated for that region at all. And yes for many of the BK1's the "longest" form is really really short, so we just annotate that tiny gene and give it the tail assembly chaperone and hope that anyone that runs across the annotation will know enough (or go to the literature to find out) what is really going on here. But there is really no way to annotate these regions that works well for a naive reader.

But I agree with Deb, if we can come up with a hypothesis that makes sense based on the published properties of slippery sequences then that is better than the current solution. I will look for the XXXYYYZ in our BK1's.

P.S. for those unfamiliar with the G/T nomenclature see this page:
https://seaphagesbioinformatics.helpdocsonline.com/article-6
Edited 17 Mar, 2021 21:46
| posted 17 Mar, 2021 22:38
What an interesting and cool question!
Here is an update with some more evidence:

I checked 5 BK1 by hand and all have that CCCAAAT sequence. I then realized we should just look at all the sequences in the pham. So I looked at the multiple sequence alignment for the pham 5495 which include the G gene for BE and BK phages. The CCCAAAT is found in all the BK1 G genes (they all have gene numbers in the 30's) but that sequence is not found in any of the BE (genes in the 50's-60's) so if this is the slippery sequence you have to argue that it changed to CCCGGAA and yet it is still slippery -or- that the location of the slip has moved since the BE and BK genes diverged. This is fruitful ground for reasonable well trained annotators to disagree, since it is all based on individual estimations of the likelihoods of certain events occurring over evolution.

Do we have any evidence of the frequency of slippery sequence turn over rates in the mycobacteriophage? That is a much more comprehensive set might be informative.

Alternatively if you back up a few bases there is a sequence which is conserved for 7 of 8 residues across all phage sequences and the one degenerate position is always a pyrimidine I.e. AA(C/T)GACCC. This may not fit any pattern seen among the bench validated slippery sequences but the sample size there is low enough I am not sure how much confidence we should put in those observed patterns.
SaltySpitoon_CDS_62        AAGCTGAATGACCCGGAACTGGAAGCCGCAGCGAAGGCGGCGATGATGAC
MindFlayer_CDS_56          AAGCTGAATGACCCGGAACTGGAAGCCGCAGCGAAGGCGGCGATGATGAC
Wipeout_CDS_56             AAGCTGAATGACCCGGAACTGGAAGCCGCAGCGAAGGCGGCGATGATGAC
Quaran19_CDS_62            AAGCTGAATGACCCGGAACTGGAAGCCGCAGCGAAGGCGGCGATGATGAC
TomSawyer_CDS_56           AAGCTGAATGACCCGGAACTGGAAGCCGCAGCGAAGGCGGCGATGATGAC
JimJam_CDS_62              AAGCTGAATGACCCGGAACTGGAAGCCGCAGCGAAGGCGGCGATGATGAC
PumpkinSpice_CDS_62        AAGCTGAATGACCCGGAACTGGAAGCCGCAGCGAAGGCGGCGATGATGAC
Starbow_CDS_56             AAGCTGAATGACCCGGAACTGGAAGCCGCAGCGAAGGCGGCGATGATGAC
Battuta_CDS_56             AAGCTGAATGACCCGGAACTGGAAGCCGCAGCGAAGGCGGCGATGATGAC
Birchlyn_CDS_55            AAGCTGAATGACCCGGAACTGGAAGCCGCAGCGAAGGCGGCGATGATGAC
Bordeaux_CDS_56            AAGCTGAATGACCCGGAACTGGAAGCCGCAGCGAAGGCGGCGATGATGAC
Karimac_CDS_57             AAGCTGAATGACCCGGAACTGGAAGCCGCAGCGAAGGCGGCGATGATGAC
IchabodCrane_CDS_55        AAGCTGAATGACCCGGAACTGGAAGCCGCAGCGAAGGCGGCGATGATGAC
LukeCage_CDS_57            AAGCTGAATGACCCGGAACTGGAAGCCGCAGCGAGGGCGGCGATGATGAC
StarPlatinum_CDS_58        AAGCTGAACGACCCGGAACTGGAAGCCGCAGCGAGGGCGGCGATGATGAC
Enygma_CDS_63              AAGCTGAATGACCCGGAACTGGAAGCCGCAGCGAGGGCGGCGATGATGAC
Genie2_CDS_58              AAGCTTAACGACCCGGAACTGGAAGCCGCAGCGAGAGCGGCGATGATGAC
BoomerJR_CDS_58            AAGCTTAACGACCCGGAACTGGAAGCCGCAGCGAGAGCGGCGATGATGAC
Yaboi_CDS_58               AAGCTTAACGACCCGGAACTGGAAGCCGCAGCGAGAGCGGCGATGATGAC
Wofford_CDS_57             AAGCTGAATGACCCGGAACTGGAGGCCGCAGCGAAGGCGGCTCTGATGAG
Evy_CDS_56                 AAGCTCAATGACCCGGAACTGATGGCCGCAGCAGCGGCGATAATGGAGAA
Jay2Jay_CDS_61             AAGCTCAATGACCCGGAACTGATGGCCGCAGCAGCGGCGATAATGGAGAA
Warpy_CDS_60               AAGCTCAATGACCCGGAACTGATGGCCGCAGCAGCGGCGATAATGGAGAA
Targaryen_CDS_59           AAGCTCAATGACCCGGAACTGATGGCCGCAGCAGCGGCGATAATGGAGAA
Sushi23_CDS_56             AAGTTGAATGACCCGGAACTGATGGCCGCAGCGGCGGCAGTGATGGAGCA
Teutsch_CDS_56             AAGTTGAATGACCCGGAACTGATGGCCGCAGCGGCGGCAGTGATGGAGCA
Tribute_CDS_54             AAGTTGAATGACCCGGAACTGATGGCCGCAGCGGCGGCAGTGATGGAGCA
Peebs_CDS_55               AAGTTGAATGACCCGGAACTGATGGCCGCAGCGGCGGCAGTGATGGAGCA
Cross_CDS_56               AAGTTGAATGACCCGGAACTGATGGCCGCAGCGGCGGCAGTGATGGAGCA
Samisti12_CDS_56           AAGTTGAATGACCCGGAACTGATGGCCGCAGCGGCGGCAGTGATGGAGCA
EGole_CDS_56               AAGTTGAATGACCCGGAACTGATGGCCGCAGCGGCGGCAGTGATGGAGCA
NootNoot_CDS_52            AAGCTTAACGACCCGGAACTGATGGCCGCAGCGGCGGCAATGATGGAGAA
Paradiddles_CDS_52         AAGCTTAACGACCCGGAACTGATGGCCGCAGCGGCGGCAATGATGGAGAA
Bartholomune_CDS_54        AAGCTTAACGACCCGGAACTGATGGCCGCAGCGGCGGCAATGATGGAGAA
Braelyn_CDS_55             AAGCTTAACGACCCGGAACTGATGGCCGCAGCGGCGGCAATGATGGAGAA
LilMartin_CDS_53           AAGCTGAATGACCCGGAACTGATGGCCGCAGCAGCGGCAGTGATGGAGCA
MulchMansion_CDS_53        AAGCTGAATGACCCGGAACTGATGGCCGCAGCAGCGGCAGTGATGGAGCA
Mildred21_CDS_54           AAGCTAAATGACCCGGAACTGATGGCCGCAGCGGCGGCAGTGATGGAGCA
Bmoc_CDS_54                AAGCTGAATGACCCGGAACTGATGGCCGCAGCAGCGGCAGTGATGGAGCA
Daubenski_CDS_57           AAGCTCAACGACCCGGAACTGATGGCCGCAGCAGCGGCAGCGATGGAACA
Tomas_CDS_67               AAGCTGAATGACCCGGAACTGATGGCCGCAGCAGCGGCAGCAGTGGAGAG
Annadreamy_CDS_31          AAGTTGAATGACCCAAATCTTCTAGCGGCGGCTCAGGAGGCTCTTGGGAA
Limpid_CDS_31              AAGTTGAATGACCCAAATCTTCTAGCGGCGGCTCAGGAGGCTCTTGGGAA
Beuffert_CDS_32            AAGTTGAATGACCCAAATCTTCTAGCGGCGGCTCAGGAGGCTCTTGGGAA
Blueeyedbeauty_CDS_33      AAGTTGAATGACCCAAATCTTCTAGCGGCGGCTCAGGAGGCTCTTGGGAA
Sham_CDS_32                AAGCTCAACGACCCAAATCTTCTAGCGATGGCTCAGGAGGCACTTGGAAG
TunaTartare_CDS_32         AAGCTCAACGACCCAAATCTTCTAGCGATGGCTCAGGAGGCACTTGGAAG
Faust_CDS_34               AAGTTGAACGACCCAAATCTTCTAGCGATGGCTCAGGAGGCACTTGGCAG
Jada_CDS_32                AAGTTGAATGACCCAAATCTTCTAGCGGCGGCAGCGGAGGCTCTTGGGAG
Forrest_CDS_35             AAGTTGAATGACCCAAATCTTCTAGCGGCGGCAGCGGAGGCTCTTGGGAG
Gilson_CDS_34              AAGTTGAATGACCCAAATCTTCTAGCGGCGGCAGCGGAGGCTCTTGGGAG
MeganTheeKilla_CDS_32      AAGTTGAATGACCCAAATCTTCTAGCGGCGGCAGCGGAGGCTCTTGGGAG
Emma1919_CDS_34            AAGTTGAATGACCCAAATCTTCTAGCGGCGGCAGCGGAGGCTCTTGGGAG
SparkleGoddess_CDS_34      AAGTTGAATGACCCAAATCTTCTAGCGGCGGCAGCGGAGGCTCTTGGGAG
Stigma_CDS_35              AAGTTGAATGACCCAAATCTTCTAGCGGCGGCAGCGGAGGCTCTTGGGAG
Karp_CDS_34                AAGTTGAATGACCCAAATCTTCTAGCGGCGGCAGCGGAGGCTCTTGGGAG
Belfort_CDS_35             AAGTTGAATGACCCAAATCTTCTAGCGGCGGCAGCGGAGGCTCTTGGGAG
Comrade_CDS_34             AAGTTGAATGACCCAAATCTTCTAGCGGCGGCAGCGGAGGCTCTTGGGAG
Moab_CDS_34                AAGTTGAACGACCCAAATCTTCTAGCGGCGGCTCAGGAAGCACTTGGGAG
Circinus_CDS_31            AAGCTGAACGACCCAAATCTTCTAGCGATGGCAGCGGAAGCACTTGGGAA
BillNye_CDS_29             AAGCTGAACGACCCAAATCTTCTAGCGATGGCAGCGGAAGCACTTGGGAA
Muntaha_CDS_30             AAGCTGAACGACCCAAATCTTCTAGCGGCAGCGGCGGAGGCTCTTGGGAA
Wakanda_CDS_30             AAGCTGAACGACCCAAATCTTCTAGCGGCAGCGGCGGAGGCTCTTGGGAA
                           *** * ** *****  * **    **    **    *      *
                                      ^^^^^^^
| posted 17 Mar, 2021 23:12
I was just QC'ing TunaTartare, a BK1, and came across this thread as it was growing. I was about to chime in that TT has the CCCAAAT, but Chris just beat me to it. And with a lot more as well!

Steve
| posted 18 Mar, 2021 00:17
Chris,
To back up in the sequence, we run into stop codons and I don't see how to slip over them.
I went back in the literature and found another paper with a list of published slippages.
There is a supplemental table with lots of examples, providing a bit more confidence in the CCCAAAT sequence. (attached) I think we should call it.
It is compelling that if the BK sequences slip, the BE sequences must also. Even the GeneMark shows a pattern that suggests it also overlaps/slips. I don't think I can call, it but even without HHPred data for the "T" gene, i would still suggest it is the tail assembly chaperone.

Well - all of you that are looking at these phages, what do you think?
debbie
| posted 18 Mar, 2021 01:32
So if I had posted this a day or two ago, our poster abstract might have looked quite a bit different…
Would it be ok for us to try to include all this analysis in our poster? Maybe we'd get more insight, or at the very least, this looks like a good learning opportunity for my students.

As for what I think - I don't have anywhere close to everyone else's experience with phage genome annotation, but it seems to me that the high level of conservation in the region between the first chaperone and the "second" chaperone is another piece of evidence suggesting that the region codes for protein. But I appreciate that there is an issue with precisely calling the slippage position. But the sequence conservation within the pham does seem to support Debbie's CCCAAAT suggestion. So I *want* to call the slippery sequence there, but I also agree that we'd be making an educated guess.

It would be nice to come to some kind of conclusion - we're slated to annotate a couple of the BE2 phages shown in Chris's figure for the second half of the semester!
| posted 18 Mar, 2021 02:55
Looking at the referenced paper, I do find that the CCCAAAT appears very much like the GGGAAAT in the Lambda example. If we can follow that example, then the shift would be a -1 giving Proline (from a Proline in the original frame). The new frame had a stop codon a few codons previously, so this is about the earliest the frameshift could happen. This looks very good to me for the BK1 phages (though still begs the question about where a shift would be in the BE1 phage genes that are in the same pham).

The first nucleotide of the slippery sequence in Sham would be 23985 (this is the phage I had open to check).

Lee
 
Login to post a reply.