SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by cdshaffer

| posted 09 Jul, 2024 21:25
The RecA like proteins in the BN cluster are not placed in the same pham and the highly likely RecA (exemplified by Spud_205). See this discussion for details: See this forum post #5567.
As such care should be taken when annotating a protein as a RecA and the less specific term "ASCE ATPase" should be used unless there is clear evidence for the presence of all the important features to support the RecA annotation (again see topic 5567 linked above)
Posted in: Cluster BN Annotation TipsRecA in cluster BN
| posted 29 Jun, 2024 18:59
Recent crystal's have supported the annotation of both a large and small subunit. See the Crystal 7JOQ. If you have sufficiently good matches to this crystal it can support the indentification of the small terminase in BE phage. If you have a small terminase be sure to call the other terminase the large terminase. If you only can find support for one, just annotate simply terminase.
Edited 03 Jul, 2024 17:03
Posted in: Cluster BE Annotation Tipsterminase
| posted 08 Jun, 2024 13:50
As of 2020 we are calling all the single endolysins as simply "endolysin". See this discussion for details: https://seaphages.org/forums/topic/4656/
Posted in: Cluster BL Annotation Tipslysin A
| posted 08 Jun, 2024 13:44
As of 2020 we are calling all the single endolysins as simply "endolysin". See this discussion for details: https://seaphages.org/forums/topic/4656/
Posted in: Cluster BG Annotation Tipslysin A
| posted 28 May, 2024 17:57
As far as I know there is no way to use just the graphical interface to get the list of all proteins with a certain functional call. You can do this easily with the command line. Just to get you started I created a few files really easy to do if you know how to search the Actino_draft database.

So 1st I did a check of the variatinon of the terms that you might be interested in. So here are with all the various functional terms that have been used that include "primase". That file is here: http://phages.wustl.edu/primase_terms.txt

Next here is a list of all phams where at least one member is annotated with a term that starts "DNA primase". There are 70 of those phams you could look at in more detail: http://phages.wustl.edu/phams_with_primase.txt

finally I create a long tab delimed list that reports the phage, the genes, the phams and the function where the function starts with "DNA primase…". This list is just over 3600 entries so you would probably want to download, open in Excel or similar and filter. You could download here: http://phages.wustl.edu/primase_phams.txt
Posted in: Functional AnnotationGG cluster DNA primse/helicase
| posted 26 Apr, 2024 17:26
You can always run the search manually:

1. In PECAAN go to the sequence tab and select the amino acid sequence
go to the HHPRED web server: https://toolkit.tuebingen.mpg.de/tools/hhp
2. Paste in your sequence,
3. Select databases: For the typical search, I have students add two databases {Unitrot and Pfam] to search in addition the the default PDB database. You add these by selecting them in the "Select structural/domain databases" menu.
4. click submit
5. wait. Time varies but usually takes 1 to 3 minutes.
Results are kept for a few days (links will be in the left column if you come back later from the same computer). Most of the scores you see in PECAAN are in the "Hitlist" section of the results.
As a bonus, at the top you will see a graphical representation of the locations of the hits which can help you see the overall domain structure of your protein. And at the bottom you get full alignments which can be helpful in a deep dive into exactly what does and does not match between your protein and the hit.
Posted in: PECAANHHPred not updating in PECAAN
| posted 28 Feb, 2024 18:10
Yes If you are using a Dot plot tool to compare genomes and it checks both strands you are good. In your case, if you have large sections of one genome that are inverted in another genome(an thus on the other strand) this will be seen in the dot plot as long diagonal lines that change the slope from positive to negative.

However, the protocols as posted on QUBES uses Gepard (which is really fast) but it only compares the top strand of each sequence. So to look for similarity when you suspect one sequence is inverted, you would need to compare the reverse complement of one of the phage to the normal strand sequence of the other.

Other programs like NCBI BLASTN compare both strands (use the "compare two sequences" check box). BLASTn can be quite a bit slower (when dealing with multiple phage sequences, and may fail totally if your sequences are too long), but it you want to look for large scale similarity and you are not sure which strand to look, BLASTn will probably do better. I would do an initial assessment with BLAST on a single genome vs single genome and once I knew which strands to compare I could do the final comparisons in Gepard.
Edited 28 Feb, 2024 22:48
Posted in: Bioinformatic Tools and AnalysesPhage Comparative Genomics Lab Manual - QUBES Resource
| posted 22 Feb, 2024 21:31
When I use that sequence in an HHPRED search I get an alignment to roughly the 1st half of crystal 5LD9 the JAMM/MPN(+) Protease ( amino acids 10 - 90). On the PDB page for the crystal it looks like the crystal has the same amino acid coordinates as does the native protein, so I can use those ~10 - 90 coordinates where I look at the literature on this protein. According to this paper the active site residues of the JAMM protease motif are (ExnH xHx7Sx2D ). This motif has a nice match in the phage protein, (the HxH are at 73 and 75, the S and D are also there at the correct distance ) so I think this phage protein is also, like JAMM/MPN(+), a metalloprotease.

So now the question is more of an issue of nomenclature/semantics. Should there be two terms in the approved list (something like "metalloprotease HEXXH type" and "metalloprotease EHHSD type" ) or should we lump together the HEXXH and EHHSD types under the same "metalloprotease" term and update the approved terms list to maybe say "Typically has HEXXH motif but other metalloprotease motifs (e.g. "ExnHxHx7Sx2D" ) have been described and can be used to support this function if present" or words to that effect.
Edited 22 Feb, 2024 21:40
Posted in: Functional AnnotationMetalloprotease without HEXXH motif?
| posted 16 Feb, 2024 18:33
Very short answer: use the official function list.

Long answer:
Many times when you do a deep dive into issues like this (where the evidence is strong enough to call two different terms), you find one of two things going on. Usually it turns out the two terms are mostly synonymous. Like one term traces back to an E coli protein and the other term comes from studies in B subtilis. Both proteins probably fulfill the same biological role so they are pretty much the "same protein", they just have different names. The other likely result is that one term is a more specific term that the other. Like is it a "Car" or a "Ford" or a "Mustang". All these terms might apply.

In this case, and I am guessing here, but I would not be surprised if Rec A and UvsX are very similar to each other and we really just have two synonyms. This is easy to check, do an HHPred search with Rec A or UvsX and see how well they align to each other. If they are basically the same protein then you know you are in the first situation above.

I am going to assume that the two proteins are mostly the same and not levels of specificity, then the way to proceed is to use the Official function list. If one term is on the list and the other is not, you have two choices: 1. use the term on the list OR 2. Decide that even thought they are "mostly" the same they are in fact different enough that both terms should be on the list. If you think that is the case, post your proposal to add the term to the approved list on the proper forum; once you get it approved then everyone can use it and everyone's annotations are all the better for your contribution.

This is why I always tell my students that while this second option can be a lot of work it is also a real accomplishment. Finding a new, novel, and fundamentally different function that is not on the list and convincing the list keepers of this, is very impressive indeed! But it takes time and effort, reading papers and developing the evidence to get to a convincing argument that the two terms are distinct enough to justify both on the list.
Edited 16 Feb, 2024 18:37
Posted in: AnnotationRecA-like recombinase or UvsX-like recombinase for KentuckyRacer 62351-62378
| posted 15 Feb, 2024 22:30
I call this "gene content analysis", and according to the guiding principles rule 2: "Genes do not overlap by more than a few bp, although up to about 30 is legitimate". I would also add that like all rules, exceptions exist. All that is to say that you are correct to be suspicious given the very large overlap one or the other is very likely a false positive from the gene predictors used to create the draft annotations.

So for evidence as to what are real genes and what are false positives I would rank evidence in this order and list the evidence FOR a real gene and against the hypothesis it is a false positive (from strongest to weakest, not from what I look at first to last)
1. Strong HHPRED alignments to well characterized crystalized proteins (this will almost never happen to a false positive)
2. Strong BLAST alignment to a well characterized protein with an assigned function (again almost never happens to a false positive)
3a. Good coding potential with the BLACK signal not the red signal
3b. Good BLAST hits to other well annotated phages
{3a and 3b are tied for quality in my mind]

4. Then would come Rule 9 in the guiding principles: "Switches in gene orientation are relatively rare" [does not apply in your case, but added here as another source of evidence as many times the two genes that overlap are on different strands]

So you probably want to check 1 above, as for 2 you did not state if the matches in the other phage have assigned function or not, so you have some more investigation to do but by rule 3 you at least have a good hypothesis as to which is more likely the false positive.
Edited 15 Feb, 2024 22:34
Posted in: Annotation2 genes in same place of Cluster BE phage, Kentucky Racer