Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.
Recent Activity
All posts created by cdshaffer
Link to this post | posted 08 Aug, 2022 19:10 | |
---|---|
|
q1: no assembler that I know of is aware of scientific standards about which strand should be the top strand and which should be the bottom strand. These standards are determined in a community-by-community way and so vary from one system to another. For example in eukaryotes we typically use the standard set by the cytologists and how they present whole chromosomes. Thus, in a very high quality assembly (where we probably have evidence for the locations of centromeres and telomeres) we will publish the sequence to match the typical cytological display. The phagesdb community has standards for determination of base 1 and strand. For all our phage deterination of Base 1 determination depends on the type of phage end structure, while strand is usually picked so the structural genes are top strand and near the beginning of the sequence. Dan posted some videos here with lots of help on this but you need to be able to look at the raw assembly to answer some of these questions. Finally, a lot of published sequences are actually sequences of prophage and base 1 and orientation are set by the location of the insertion site and the standard orientation for the host genome. [based on your gene matching I think this is the case for NC_015296.1] There are a collection of phage like this in the phamerator database where the order and orientation of the sequence has been changed from the genbank record to a different order and orientation so as to match (as best as possible) the typical order in the phamerator database, this makes drawing and interpreting the comparison maps at phamerator.org much easier. Q2: I would recommend you set your base 1 and strand using a similar stratagy, that is to say, pick the base 1 and orientation to make the subsequent steps of comparison as easy as possible. But that of course depends on what you're comparing your genome to. The good news here is that DNA master has a nice feature if you want to "roll" the genome around to set a different base 1 as well as the ability to switch to the complementary strand. |
Posted in: Newbler → Getting Started with Phage Assembly
Link to this post | posted 04 Aug, 2022 20:43 | |
---|---|
|
We worked on many of the proteins in this pham. I don't think any student has worked on this protein in an in-depth investigation so I spent about 30 minutes poking at the literature and looking at a member of this pham we have been working on this summer pepperwood_draft_32 (stop: 14017). Looking broadly at the hhpred and blast hits, there is clearly a very wide ranging family of proteins all like those in this pham. Also, pepperwood_draft_32 may not be the best member to represent the group as it is a bit shorter than the others ( but its annotated with the longest ORF). In looking over the paper associated with the high matching crystal 2C2U, I can see that many of the proteins in this family have dual functions, DNA binding and Iron sequestration. Most of these proteins appear to be expressed in bacteria in responce to starvation or other forms of stress and are probably play a role in protecting DNA from damage during high levels of oxidative stress. The general name for proteins like this are DPS for DNA Protection during Starvation. In the paper on the 2C2U crystal they stated that the mechanism for DNA binding was not well understood, so i looked for recent papers and found this one: https://doi.org/10.1128/jb.00036-22 it appears to do a good job of summarizing these DPS proteins if you really want to get into the literature. But the one take home message I got from this is that proteins in this wide family have a variety of activities some do and some don't have DNA binding, some do and some don't store Iron, and some do and some don't have ferroxidase activity. So bottom line is that it is really going to be hard to get a good convincing argument of what function any particular protein has. So Bottom line I would say is that annotation in this case is about which kind of error you think is worse, false positives or false negatives. For functional annotation I prefer false negatives since too many people will take any annotation as "the truth". Many naive users may not understand that an annotation might mean something that is "weak but interesting". This propagation of incorrect functional annotations is a well understood problem with large databases like genbank and I would rather say nothing for the function and not exacerbate the problem. However, as you cited above, there is clearly some evidence to suggest a DNA binding activity for this protein and so it is totally reasonable to come to the conclusion that there is enough evidence to support this annotation. What we really have here is a situation where reasonable and well trained annotators can disagree and that is OK, without clear wet bench experimentation annotations are always just well informed estimations and it is totally reasonable to have different estimations. |
Posted in: Functional Annotation → DNA-binding ferritin-like protein
Link to this post | posted 11 Jul, 2022 20:40 | |
---|---|
|
To me the most informative piece of evidence is the start pattern in phage genes Kradal_87, Satis_87, EhyElimayoE_87. In the current version of the starterator report those are track 3. In that track you can see that start 4 is the longest orf (so I know those annotations cannot be too long) and that start 4 in Frankenweenie is 65470. So if Kradal, Satis and EhyElimayoE can make a perfectly viable phage with only the amino acids between start 4 and the stop codon, there is no reason to believe that the other phage require more amino acids. Yes this is a negative argument but lacking evidence to the contrary I will stick with parsimony as an important consideration. That is to say, in this case where there is not good evidence to the contrary, the simple conclusion that all the phage are using the same start (4) is the best annotation. So based mostly on parsimony, this pattern suggests to me that start 4 at 65470 should be annotated and not the start with the -4 overlap. |
Link to this post | posted 17 Jun, 2022 19:07 | |
---|---|
|
The upstream "G" form of this gene has a weak hit to the Tail assembly chaperone Pfam. . PF16459.8. The numbers on this hit are pretty bad 68% probability. So by itself it certainly not sufficient evidence to call a TAC. However we thought by adding in synteny evidence and that we had found a "pretty good" possible slippery sequence that the sum of all those observations that the TAC function was justified. But I agree with the recent discussions elsewhere that there should be a policy which is to only call the slip if there is a sequence match to one of the published, well documented slippery sequences. So calling the slip here is an overcall. If you remove the slip then there is really no evidence for functionally calling the downstream "T" region as a TAC (except for synteny evidence), and while we do call some gene functions based only on synteny, TAC is not one of them. So if we want to update the annotation in Genbank for this phage, I would say keep the TAC function call for the G form (32294-33103), {based on weak hhpred evidence combined with good synteny evidence), remove the gene that codes for the GT form (32294..33055,33057..33407), and add in a gene that starts at 33072 or 33126 as a gene, and give it an NKF annotation. I would probably go with the longer start without evidence to the contrary but since there is a very high level of uncertainty here in any case I could go with either start. |
Link to this post | posted 16 Jun, 2022 17:14 | |
---|---|
|
We annotated Satis and using the rules that existed at the time we did find and annotate a slippery sequence, however looking at it now I am not sure that annotation is correct. Also when we did the second BM that slippery sequence we annotated was not conserved so at that point we were not convinced we have found a slippery sequence. So unless you have a really good match to an already documented slippery sequence (see the table here: bioinformatics guide "Annotating programmed translational frameshifts") I would not call the slip. It is still possible that you could assign the functional annotation of "tail assembly chaperone" to one or both of the two genes, irrespective of finding the slippery site. For that just use standard functional evidence of HHPRED, blast, synteny. |
Link to this post | posted 08 May, 2022 17:39 | |
---|---|
|
Trying to determine substrate specificity to that level is tricky. If you are lucky there might be some comments in the published papers on those crystals you listed and they could indicate which specific amino acid side chains are involved in binding the substrate. If you go to the PDB database and look up each crystal by name you will get a link to the primary publication. However, another suitable annotation would be to use a less specific term to imply that the exact substrate is undetermined. The generic approved term here is oxidoreductase. I don't have these memorized, I used QuickGo to look up terms and see what the scientists that think hard about how terms are related to one another and publish that in the Gene Ontology say about these terms. For example, here is the link to the page on the "Thioredoxin". If you look at the ancestor chart and following the black "is a" arrows you can see that Thioredoxin "is a" peroxidase activity which "is a" oxidoreductase activity, acting on… which "is a" oxidoreductase activity. And a quick search you can see that oxidoreductase is on the approved terms list. So I would say the best annotation given the results you have shown in "oxidoreductase" and any higher level of specificity would require both a deep dive and at least some good luck. |
Posted in: Functional Annotation → Function for subcluster A11 phage Gilberta (37505-37777 rev): Thioredoxin, NrdH-like glutaredoxin or glutaredoxin?
Link to this post | posted 20 Apr, 2022 17:17 | |
---|---|
|
I would be curious to know how strong the synteny evidence is here. Are there are any phage where we have good evidence from HHPRED/BLAST and a good quality slippery sequence for the G/T Tail assembly chaperones (TASs) but where synteny places them outside the region upstream and very close to the tape measure? Said another way, if we have never found TASs outside this region, then synteny is another pretty strong piece of evidence that support the conclusion that these are indeed TASs. I can't think of any TAS's outside this region, but my own experience is restricted to only two hosts. I don't have time to look into the question of TAS synteny right now, but can get to it later if no one knows off the top of their head that thare are a large number of counter examples. |
Posted in: Cluster DH Annotation Tips → Tail assembly chaperones?
Link to this post | posted 11 Apr, 2022 20:39 | |
---|---|
|
Ok, I just reran pham 103037 and it ran without a hitch, so I have no idea why it failed in the automatted pipeline. Could be a weird one-off quirk or something more serious. Please let me know if you have any more trouble. Here is the link for your convenience; http://phages.wustl.edu/starterator/Pham103037Report.pdf |
Posted in: Starterator → Pham not found in Starterator
Link to this post | posted 11 Apr, 2022 19:38 | |
---|---|
|
I am looking into this. Starterator is currently at the most recent database release which means this pham must have failed analysis. This has been happening more and more as the database gets more entries that are not analyzed using the SEA-PHAGES protocol. These other sources of annotation often create gene models that have some kind of weird property (like a start codon other than ATG, GTG or TTG). While I look into why the failure and try to correct for it, if you tell me which gene/phage you are looking for, I can probably find it in one of the older versions of the database for you. |
Posted in: Starterator → Pham not found in Starterator
Link to this post | posted 23 Mar, 2022 19:33 | |
---|---|
|
Yes indeed you should add the second primase. This is one of those situations where I say "there are exceptions to every rule". In this case the overlap rule. For more details see: https://seaphages.org/forums/topic/4545/ |