SEA-PHAGES | All posts created by balish

Link to this post \| posted 20 Feb, 2024 22:45
balish	My two cents here. Yes, if you started at 8409, then translation would begin with fMet-Val-Glu. If you started at 8412, it would begin with fMet-Glu. Although the only way to know for sure is to analyze the protein in an infected bacterial cell itself, the rule of thumb is always to use the last among consecutive start codons. In this particular situation there are arguments that could be made for either case being correct, but by applying this rule, at least everyone is consistent. -Mitch

Link to this post | posted 20 Feb, 2024 22:45

My two cents here. Yes, if you started at 8409, then translation would begin with fMet-Val-Glu. If you started at 8412, it would begin with fMet-Glu. Although the only way to know for sure is to analyze the protein in an infected bacterial cell itself, the rule of thumb is always to use the last among consecutive start codons. In this particular situation there are arguments that could be made for either case being correct, but by applying this rule, at least everyone is consistent.

-Mitch

Posted in: Choosing Start Sites → Two consecutive ATG start sites

Link to this post \| posted 13 Apr, 2023 00:59
balish	Thanks, Debbie! -Mitch

Posted in: Frameshifts and Introns → Frameshift in a singleton, OnionKnight

Link to this post \| posted 12 Apr, 2023 17:36
balish	My class and I are working on OnionKnight, a singleton that came from another institution. Genes 15/16 have all the appropriate features associated with being the tail assembly chaperone ORFs, but there's nothing to compare it with. Gene 15 runs from 9987 to 10592, and there's a conspicuous GGGAAA from 10484 to 10489, although it's followed not by a purine but by a C. I don't see anything else that screams "here's the frameshift." Is it safe to assign 10487 as the location of a -1 frameshift, or should I just skip it and leave the second ORF alone?

Link to this post | posted 12 Apr, 2023 17:36

balish

My class and I are working on OnionKnight, a singleton that came from another institution. Genes 15/16 have all the appropriate features associated with being the tail assembly chaperone ORFs, but there's nothing to compare it with. Gene 15 runs from 9987 to 10592, and there's a conspicuous GGGAAA from 10484 to 10489, although it's followed not by a purine but by a C. I don't see anything else that screams "here's the frameshift." Is it safe to assign 10487 as the location of a -1 frameshift, or should I just skip it and leave the second ORF alone?

Posted in: Frameshifts and Introns → Frameshift in a singleton, OnionKnight

Link to this post \| posted 30 Mar, 2023 16:54
balish	I think we might have an answer! I believe this gene encodes a baseplate protein, and if so, I also believe it's the first one identified in one of the SEA's podoviruses. Its top hits on HHpred are very strong but partial hits, mostly to "DNA stabilization" proteins, etc. But the fourth hit (probability >99%) is across almost the entire protein for both the query and target proteins, albeit with a large gap in the target protein. When we follow the links, they lead us to gp10 from Salmonella's famous phage P22, which is also a podovirus. P22's gp10 is a baseplate protein, which links the capsid head to the tail needle. This is different structurally from the baseplates of myoviruses like in clusters C and AR, for which baseplate wedge proteins and baseplate J-proteins are annotated as two separate things. (And I think all those top hits on HHpred are also probably baseplate proteins, for what it's worth.) This is what we'll be presenting as our big idea in our SEA-PHAGES presentation. So I'd love for someone to evaluate this idea and see if I'm just a hopeful nut. -Mitch

Link to this post | posted 30 Mar, 2023 16:54

balish

I think we might have an answer! I believe this gene encodes a baseplate protein, and if so, I also believe it's the first one identified in one of the SEA's podoviruses. Its top hits on HHpred are very strong but partial hits, mostly to "DNA stabilization" proteins, etc. But the fourth hit (probability >99%) is across almost the entire protein for both the query and target proteins, albeit with a large gap in the target protein. When we follow the links, they lead us to gp10 from Salmonella's famous phage P22, which is also a podovirus.

P22's gp10 is a baseplate protein, which links the capsid head to the tail needle. This is different structurally from the baseplates of myoviruses like in clusters C and AR, for which baseplate wedge proteins and baseplate J-proteins are annotated as two separate things. (And I think all those top hits on HHpred are also probably baseplate proteins, for what it's worth.)

This is what we'll be presenting as our big idea in our SEA-PHAGES presentation. So I'd love for someone to evaluate this idea and see if I'm just a hopeful nut.

-Mitch

Posted in: Functional Annotation → Apparent structural proteins from EK2 phage

Link to this post \| posted 07 Mar, 2023 16:26
balish	Debbie, Thanks for the response. It's indicated on the cluster-specific page that the major capsid protein is #33. Looking forward to figuring out the rest! -Mitch

Posted in: Functional Annotation → Apparent structural proteins from EK2 phage

Link to this post \| posted 07 Mar, 2023 15:14
balish	While annotating our new phage Moleficent, we see that the product of gene 45, which is highly conserved among related phages, has significant HHpred hits to a variety of types of capsid proteins: major capsid protein, needle protein, tailspike protein, tail tubular protein, etc. It's located amid other capsid protein genes. Relatedly, gene 44's best HHpred hit is a phage adaptor protein, though it's below the 90% probability mark. Both these genes are given as NKFs in all other phages. But it seems like their actual functions are potentially discernable. Can someone take a look and see if they think there are any functions we should be calling for these genes? We haven't looked at the neighboring genes yet, but it wouldn't surprise me if they also had some clues among hits. Best, Mitch Edited 07 Mar, 2023 16:21

Link to this post | posted 07 Mar, 2023 15:14

balish

While annotating our new phage Moleficent, we see that the product of gene 45, which is highly conserved among related phages, has significant HHpred hits to a variety of types of capsid proteins: major capsid protein, needle protein, tailspike protein, tail tubular protein, etc. It's located amid other capsid protein genes.

Relatedly, gene 44's best HHpred hit is a phage adaptor protein, though it's below the 90% probability mark.

Both these genes are given as NKFs in all other phages. But it seems like their actual functions are potentially discernable.

Can someone take a look and see if they think there are any functions we should be calling for these genes? We haven't looked at the neighboring genes yet, but it wouldn't surprise me if they also had some clues among hits.

Best,
Mitch

Edited 07 Mar, 2023 16:21

Posted in: Functional Annotation → Apparent structural proteins from EK2 phage

Link to this post \| posted 05 Aug, 2022 05:36
balish	I agree with Chris that it would be premature to call this gene product a ferritin-like DNA binding protein, a ferritin, or perhaps even a DNA binding protein. However, I would argue that because it is readily identifiable a member of the well-established DPS protein family, it would be beneficial to the community if it were to be annotated as a "DPS family protein." The advantage of using this designation over something like "hypothetical protein" or even "DNA binding protein" is that it would draw attention to the fact that it's a member of an established protein family, and that there are therefore reasonable hypotheses for future researchers about its potential range of functions. We frequently identify proteins as members of families without specifying their functions, and this example strikes me as no different. -Mitch

Link to this post | posted 05 Aug, 2022 05:36

balish

I agree with Chris that it would be premature to call this gene product a ferritin-like DNA binding protein, a ferritin, or perhaps even a DNA binding protein. However, I would argue that because it is readily identifiable a member of the well-established DPS protein family, it would be beneficial to the community if it were to be annotated as a "DPS family protein." The advantage of using this designation over something like "hypothetical protein" or even "DNA binding protein" is that it would draw attention to the fact that it's a member of an established protein family, and that there are therefore reasonable hypotheses for future researchers about its potential range of functions. We frequently identify proteins as members of families without specifying their functions, and this example strikes me as no different.

-Mitch

Posted in: Functional Annotation → DNA-binding ferritin-like protein

Link to this post \| posted 14 May, 2022 17:14
balish	I think the idea is that the capsid maturation protease is sufficiently similar to the ClpP family of proteases that it will sometimes get a hit with ClpP in its name. But if the other signs are there that it's a capsid maturation protease, it should be called that, and not ClpP-like. Given that ClpP is also a serine protease, I'm not sure that there would ever be a case when it doesn't hit something with that activity. So that's confusing and seems possibly extraneous - unless my interpretation is incorrect. I would expect that the presence of a ClpP-like or capsid maturation protease gene at the left end of the phage genome amid all the other capsid genes is likely to be sufficient to warrant calling it a capsid maturation protease in most cases. -Mitch

Link to this post | posted 14 May, 2022 17:14

balish

I think the idea is that the capsid maturation protease is sufficiently similar to the ClpP family of proteases that it will sometimes get a hit with ClpP in its name. But if the other signs are there that it's a capsid maturation protease, it should be called that, and not ClpP-like. Given that ClpP is also a serine protease, I'm not sure that there would ever be a case when it doesn't hit something with that activity. So that's confusing and seems possibly extraneous - unless my interpretation is incorrect. I would expect that the presence of a ClpP-like or capsid maturation protease gene at the left end of the phage genome amid all the other capsid genes is likely to be sufficient to warrant calling it a capsid maturation protease in most cases.

-Mitch

Posted in: Cluster P Annotation Tips → ClpP-like protease or CMP?

Link to this post \| posted 28 Apr, 2022 17:06
balish	There's no reason a protein can't do more than one thing, and even among our many phage genes we have other examples of genes encoding fusion proteins. It sounds to me like you have a DNA-binding protease, and it will be fascinating to learn what it does! -Mitch Edited 28 Apr, 2022 17:07

Posted in: Functional Annotation → IrrE or metallopeptidase

Link to this post \| posted 20 Apr, 2022 14:02
balish	Debbie, Thank you (and the big boss). Yes, Morkie is the one that brought this to my attention. I suppose as more DH's are found we'll be able to see whether this is universally applicable to cluster DH, as it clearly isn't for DN. -Mitch

Posted in: Cluster DH Annotation Tips → Tail assembly chaperones?

Recent Activity

All posts created by balish