SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by cdshaffer

| posted 08 May, 2022 17:39
Trying to determine substrate specificity to that level is tricky. If you are lucky there might be some comments in the published papers on those crystals you listed and they could indicate which specific amino acid side chains are involved in binding the substrate. If you go to the PDB database and look up each crystal by name you will get a link to the primary publication.

However, another suitable annotation would be to use a less specific term to imply that the exact substrate is undetermined. The generic approved term here is oxidoreductase. I don't have these memorized, I used QuickGo to look up terms and see what the scientists that think hard about how terms are related to one another and publish that in the Gene Ontology say about these terms. For example, here is the link to the page on the "Thioredoxin". If you look at the ancestor chart and following the black "is a" arrows you can see that Thioredoxin "is a" peroxidase activity which "is a" oxidoreductase activity, acting on… which "is a" oxidoreductase activity. And a quick search you can see that oxidoreductase is on the approved terms list.

So I would say the best annotation given the results you have shown in "oxidoreductase" and any higher level of specificity would require both a deep dive and at least some good luck.
Posted in: Functional AnnotationFunction for subcluster A11 phage Gilberta (37505-37777 rev): Thioredoxin, NrdH-like glutaredoxin or glutaredoxin?
| posted 20 Apr, 2022 17:17
I would be curious to know how strong the synteny evidence is here. Are there are any phage where we have good evidence from HHPRED/BLAST and a good quality slippery sequence for the G/T Tail assembly chaperones (TASs) but where synteny places them outside the region upstream and very close to the tape measure?

Said another way, if we have never found TASs outside this region, then synteny is another pretty strong piece of evidence that support the conclusion that these are indeed TASs.

I can't think of any TAS's outside this region, but my own experience is restricted to only two hosts.
I don't have time to look into the question of TAS synteny right now, but can get to it later if no one knows off the top of their head that thare are a large number of counter examples.
Edited 20 Apr, 2022 17:27
Posted in: Cluster DH Annotation TipsTail assembly chaperones?
| posted 11 Apr, 2022 20:39
Ok,
I just reran pham 103037 and it ran without a hitch, so I have no idea why it failed in the automatted pipeline. Could be a weird one-off quirk or something more serious. Please let me know if you have any more trouble.

Here is the link for your convenience;
http://phages.wustl.edu/starterator/Pham103037Report.pdf
Posted in: StarteratorPham not found in Starterator
| posted 11 Apr, 2022 19:38
I am looking into this. Starterator is currently at the most recent database release which means this pham must have failed analysis. This has been happening more and more as the database gets more entries that are not analyzed using the SEA-PHAGES protocol. These other sources of annotation often create gene models that have some kind of weird property (like a start codon other than ATG, GTG or TTG).

While I look into why the failure and try to correct for it, if you tell me which gene/phage you are looking for, I can probably find it in one of the older versions of the database for you.
Posted in: StarteratorPham not found in Starterator
| posted 23 Mar, 2022 19:33
Yes indeed you should add the second primase. This is one of those situations where I say "there are exceptions to every rule". In this case the overlap rule. For more details see: https://seaphages.org/forums/topic/4545/
Posted in: Gene or not a GeneTwo overlapping DNA Primases in JPandJE
| posted 15 Mar, 2022 22:37
Cluster BE have two genes which have been annotated as "endolysin". See the phams for Cross_40 and Cross 40 and Cross 44. I am linking to the proteins in case pham numbers change, currently 98222 and 98787

The first pham has genes ~500 bp long with some endolysin annotations and some hydrolases and all members are in the BE cluster. The latter pham has genes in the ~1000 base range also with "endolysin" as well as "LysM-like peptidoglycan binding protein" and variants. It is quite a bit larger group that spans multiple streptomyces clusters, as well as cluster AS.

Most BE phage have both these proteins and currently annotate one as "endolysin" and the other as something else.

On close inspection by HHPRED the Cross 44 group has the more typical "Lysin A structure" with two domains. In this case there is an N-terminal domain of 150 amino acids with high quality HHRED hits to "N-acetylmuramoyl-L-alanine amidase" (e.g. crystal 6SSC) and C-terminal domains with high quality HHPRED hits to transglycosylases and "lysozyme".

Cross 40 and its members also have High quality hits which probably explain the "endolysin" annotations. In particular a very good hit to the C-terminal region to the peptidoglycan hydrolase domain of a M tuberculosis resuscitation protein RfpB, as well as another alignment to the 6TAB crystal termed a "lysozyme". Both of these proteins are glycosyltransferase of one type or another.

I would suggest that given its structure and distribution across multiple clusters that the larger Cross 44 like proteins be called "endolysin", for the other an annotation of "glycosyltransferase" might be best, or a second "endolysin" could also be considered.
Posted in: Cluster BE Annotation Tipstwo endolysins
| posted 13 Mar, 2022 22:11
According to this page: https://biostar.usegalaxy.org/p/28273/
The Galaxy instance at Texas A&M has a circos wrapper as well as other graphics methods.
The good news is that Galaxy is a web based graphical system for bioinformatic analysis and there is no charge. The bad news is there is still a non-trival learning curve. Galaxy is a really nice middle ground for doing bioinformatics, and the Texas A&M galaxy instance is specifically geared to phage analysis.

So, it might be worth considering, but if you have never used Galaxy you are going to need to commit a non-trivial amount of time just to train on Galaxy. I have used Galaxy and think it is one of the best web based systems for complex computational workflows, however, some tools work better than others when implemented in Galaxy and since I have never used that Circos wrapper, I have no idea how good it is. So if you don't know Galaxy there could be a considerable investment in time only to find out the wrapper really doesn't give you what you are looking for.

On the other hand Galaxy is a pretty good system to learn if you are looking to dive deeper into bioinformatics and still keep everything in a graphical, web-based format where you don't have to worry about command line, package management, and installation.
Posted in: Bioinformatic Tools and AnalysesCircular Genome Visualization
| posted 11 Mar, 2022 17:55
First, I would say that having an orpham or two in a phage is not to unusual to really worry me. In looking at the phamerator map of all AZ phage I can see that KeAlii has at least 3 orphams, and there are several phage with 1 or 2 orphams so having two orphams is not so unusual as to cause real questions in these genes.

The longer gene 54 has such good coding potential I would always call that one. 53 is just long enough to call. See rule 8 guiding Principles.. I would say 35 amino acids is in the grey zone, meaining it requires some evidence other than just an open reading frame to call the gene. However both have pretty good coding potential so I would call both.

No BLAST hits is also not so surprising as we already know it is an orpham (i.e. unique among all 400K proteins in phagesd). So while a good BLASTp metch might make you feel more confident there really is a gene here, the lack of a BLAST hit is not good evidence that this region is not a gene. Said more formally, a positive result in a BLASTp is good evidence, a negative result is not good evidence there is no gene, it is simply that BLAST has nothing to say one way or the other.

As for overlap, we would call this a 4 base overlap (gap score of -4). Since gene coordinates describe intervals not counts you cannot just subtract the coordinates, you have to adjust by 1. I run across this issue all the time with my students and I have them draw out a tiny sequence with a few "genes" to see the difference between interval math and normal math.

Finally, for gene calls, it is better to have a false positive (i.e. call a gene which really isn't there) than it is to have a false negative (miss a gene). So even if I was not sure of gene 53 I would still call it given this rule.

So for all the above, I would keep both these genes in the annotation and just be amazed at how diverse the gene collection is among all these phage. I am sure Deb could quote you a few papers that discuss the ideas of phage as "engines of gene creation" and I think, at least for 54, that we could have an example of that.
Posted in: Gene or not a GeneOrpham genes in AZ phage
| posted 01 Mar, 2022 17:20
The first thing to try is confirm that the preferences are set correctly.

See this page on the bioinformatics guide:
https://seaphagesbioinformatics.helpdocsonline.com/article-66

There are settings which worked in the past but will now cause the connection to fail. I had issues with updates on my old install until I double checked and changed a few preferences.
Posted in: DNA MasterDNA Master Failing to Update - 01.23.2020
| posted 22 Feb, 2022 16:35
I heard from Steve, Phamerator is now up and working for me. If it is not working for you be sure to post again.
Posted in: PhameratorPhamerator not loading