SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by cdshaffer

| posted 21 Mar, 2019 15:47
To me annotations like helix-turn-helix DNA binding domain should be added only if there is sufficient evidence that the protein really does have a domain of that type AND, more importantly, there is not a more specific approved term that is also supported by the evidence. For example many (if not all??) sigma factors contain HTH domains but "sigma factor" so either term would apply. However sigma factor is a much better annotation than HTH binding domain protein since it is a more specific term. So you have to look at the evidence with an eye toward the validity of "sigma factor" vs "HTH domain", this is why each match should be evaluated with respect to the size and location of the exact match with respect to the whole proteins. Full length alignments are much better than short little domain matches, but if all you have is a short high quality match to an HTH domain then I would add it. I think for short domain matches I would also focus on the HHPRED results. A "helix-turn-helix domain" is really annotating the presence of a structural domain so I would want to focus on the programs that are trying to find similarity at the structural level (HHPRED, Phyre2 etc), not the primary amino acid level (i.e. BLASTP)

In this particular case since FIC has been rejected as a term, there is no better approved term that HTH domain. Thus, I would just evaluate the evidence from hhpred as to the question do I really have a HTH domain and add it if I felt the evidence justified it.
Posted in: Functional AnnotationFIC family protein
| posted 20 Mar, 2019 16:23
Hmmm, that pham no longer exists, probably was in the old database but not the newest one, if you are using the virtual machine it is important to run phamerator first to check for and install any database updates before running starterator.

I have used the most recent "stable" version of starterator which has a few udpates that you have not seen if you are used to the whole phage report from the version in the virtual machine, so just be aware that the output will be a bit different. I have posted the results here:

https://wustl.box.com/s/irzr8fel3z6e1tmr9qirymj3fl91j0ng

You should also know that most users no longer bother with the whole phage reports but instead go to the pre-computed online versions of the per pham starterator reports. You can see in this article how to get access to those online reports. I always try to keep these reports up to date with new versions of the database.
Posted in: Starteratorphage that crash starterator
| posted 12 Mar, 2019 18:02
I think it is always better to be as specific as the evidence allows. The three terms you cite are not inconsistent just different levels of specificity. I think most, if not all, kinases use ATP as the source of the phosphate so not surprising that a kinase has an identifiable "ATP binding cassette".

I would pay particular attention to the length of the alignments, does a particular alignment include the majority of your protein? is that alignment along the majority of the subject? Thoese "full length by full length" alignments are the most informative and I would pick the most specific term (i.e. the thymidylate kinase). On the other hand if the region of your protein that matches the "ATP binding cassette" is the same region that is matching to a subpart of some thymidylate kinase then you likely just have an ATP binding domain that is particularly similar to the ATP binding domain found in a thymidylate kinase, in that case I would use the more general ATP binding domain.
Posted in: Functional AnnotationPham 5614 function
| posted 11 Mar, 2019 19:24
the alternative would be to create two approved annotation terms that indicate how the subdomains of the protein are divided between the two polypeptide chains. This is similar to how some AY phage have the large terminase being split into a "ATP-ase domain" and a "nuclease domain".

I think the issue will be is there really value added with defining two new terms. If the split in your DNA pol correlates well with domain locations, like the above terminase example, it might make sense. If, on the other hand, the split does not split nicely domains in a sensible way (like the polypeptides split a doamin into two halves) then there is probably not a better solution than the one currently in use which is to give each part the name of the whole. As a reminder, to propose the addition of the two terms you think would better represent the gene products use the "request a new function" topic.
Posted in: Cluster EF Annotation TipsTwo piece DnaE-like DNA polymerse III (alpha)
| posted 11 Mar, 2019 19:07
cristian,
I have my students use this online tool for ANI calculations, sorry it doesn't answer your question but a work around is better than nothing:

http://enve-omics.ce.gatech.edu/ani/
Posted in: DNA MasterGenome Comparison
| posted 28 Feb, 2019 17:20
These are not mutually exclusive results. Looking at the internal structure for RecA, {I used this link: [Rec A page at UniProt]} I can see that RecA includes a AAA-ATPase like domain. So both term apply, it's just a question of specificity.

To me RecA is a much better description of a function than a general ATPase fold seen in diverse cellular activities {see this}. So if you have a good match to RecA across most of RecA I would say use RecA. If, on the other hand, the match is simply to the AAA ATPase as found in RecA I would go with the less specific AAA-ATPase. A detailed look at which parts of RecA is aligning to your protein by looking at the actual HHPred alignment should answer that question.
Posted in: Functional AnnotationR cluster Candle pham 4972 function
| posted 26 Feb, 2019 18:49
Just a heads up.
It looks like this more recent database update had a very large number of changes. Since starterator tries to use previous results when possible, the large number of changes means this analysis requires a lot more processing and is taking much longer than is typical. I will post as soon as the results are available.

Data has been posted, if you are still missing proper pham links please repost your message
Edited 26 Feb, 2019 22:02
Posted in: StarteratorStarterator not matching up with listed phams
| posted 26 Feb, 2019 16:10
The most likely reason is a database version sync. There was a database update to version 256 late yesterday, the wustl website is still showing the results from version 255. The new database is being worked on by Starterator but there are approximately 15 thousand reports it just takes time, I expect all the runs to be completed in a couple more hours.
If you still have trouble after the update is posted then please post a specific example or two of exactly which genes and pham numbers are giving you issues and I will investigate further.
Posted in: StarteratorStarterator not matching up with listed phams
| posted 04 Feb, 2019 19:08
It appears from your picture that you are still running an older version of Starterator. The newer version does not crash. Updating starterator is non-trivial which is one of the reasons we went with online reports.
I have posted a whole phage report for phage Zolita which you can download.
If you want to try to update, I can help you with that just post a followup or email me directly (address is my last name @ wustl.edu).
Posted in: Starteratorphage that crash starterator
| posted 28 Jan, 2019 16:11
To update the database use Phamerator, just start up Phamerator, then wait for the download and install. Once Phamerator is done updating Starterator should find phage Skippy.

As for "Unphamerator phage", Starterator works much better if you add the Profile in addition to the fasta file. This file gives the locations of all the genes so Starterator does not need to try and figure it out. This file is generated by opening the DNA Master file for the phage and using the default settings of the Genome -> Profile… menu to create and save a .csv file. Transfer that file to your machine running Starterator and use it for the Profile.

Finally, all starterator reports are now available online if you don't want to hassle with the whole phage report. They can be accessed from the phagesdb web page for the gene. Here is an example for skippy gene 1:

https://phagesdb.org/genes/SKIPPY_DRAFT_1/

On that page is a link to the most recent Starterator Report (currently pham 6729).
Posted in: Starteratorphage that crash starterator