SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

New Features in PECAAN

| posted 24 Jul, 2018 17:44
Thanks
| posted 13 Feb, 2019 23:25
Hi Claire and Welkin,
Can someone help me navigate the new feature "Phagesdb Function Frequency" box. This just popped up this week and it caught me off guard in class when doing a demonstration with PECAAN.
Thanks!
Sally
| posted 14 Feb, 2019 09:28
First let me point out that we have added two new fields to the Phagesdb BLAST table, Cluster and Pham. We added these two fields because when considering the functions in Phagesdb BLAST we often found ourselves asking, is this phage from the same sub cluster or is this a gene function that has been acquired from another group of phages? Once we added these fields we found it much easier to sort based on cluster and pham. While clicking on the headers to sort and using the search tool is a wonderful way to explore the Phageseb BLAST information, we found ourselves thinking, wouldn't it be nice if we just had a summary count of the functions along with the cluster and pham information. That is why the summary table was created. The count just shows the number of hits for each function/cluster/pham while the frequency shows the fraction of the total number of phages with an assigned function.
When checking genes with no assigned function, NKF, I find it very useful to be able to check the Phagesdb Summary and if there are no function/cluster/pham lines then I know it is not really useful to sort and search the Phagesdb BLAST table.
Another interesting feature of this summary table is that it shows the variety of names for the same function. This may be useful to point out to students as a reason for using the "approved function list" represented in the drop down window that appears as functions are typed in.
Finally, the Phagesdb Summary is useful to see which clusters may have exchanged this gene and you can use the Pham to make sure that they are fairly close relatives.
Hope that this helps and that the new features are useful to you and your students.
Thanks,
Claire
| posted 14 Feb, 2019 12:54
Claire Rinehart
First let me point out that we have added two new fields to the Phagesdb BLAST table, Cluster and Pham. We added these two fields because when considering the functions in Phagesdb BLAST we often found ourselves asking, is this phage from the same sub cluster or is this a gene function that has been acquired from another group of phages? Once we added these fields we found it much easier to sort based on cluster and pham. While clicking on the headers to sort and using the search tool is a wonderful way to explore the Phageseb BLAST information, we found ourselves thinking, wouldn't it be nice if we just had a summary count of the functions along with the cluster and pham information. That is why the summary table was created. The count just shows the number of hits for each function/cluster/pham while the frequency shows the fraction of the total number of phages with an assigned function.
When checking genes with no assigned function, NKF, I find it very useful to be able to check the Phagesdb Summary and if there are no function/cluster/pham lines then I know it is not really useful to sort and search the Phagesdb BLAST table.
Another interesting feature of this summary table is that it shows the variety of names for the same function. This may be useful to point out to students as a reason for using the "approved function list" represented in the drop down window that appears as functions are typed in.
Finally, the Phagesdb Summary is useful to see which clusters may have exchanged this gene and you can use the Pham to make sure that they are fairly close relatives.
Hope that this helps and that the new features are useful to you and your students.
Thanks,
Claire

Hi Claire,
I think these fields are tremendously helpful! Thank you so much!

One additional column that might help in the table would be to include the date a phage was annotated. As we start revising functional assignments, the latest, most accurate assignment may appear on the table only once–and that's the one we would like everyone to switch to.

Great new feature!
| posted 15 Feb, 2019 03:11
Thanks Claire and Welkin! Looking forward to using this new function with my students in class!
| posted 09 Mar, 2019 22:07
Claire,

Can you tell me what the Phagesdb Function Frequency is based on? I see a list of various function assignments for a variety of PHAMS, but what brings them up? I don't see the same hits on Blastp or HHPred, always, and the PHAMS are different than the gp being examined, so I'm not clear.

By the way, I posted another question about HHPred results in PECAAN and external, any thoughts about that?

Thanks!

Steve
| posted 11 Mar, 2019 14:37
Steve,
The phagesdb Function table that we have added provides a summary of the top 100 hits that we draw from Phagesdb. Those that show functions other than "Unknown Function" are when grouped by function pham and cluster to give you a summary of the evidence for each individual combination of those three elements. The Phagesdb Function Frequency is simply the number of hits for each line divided by the total number of functional hits in the top 100.

As for differences between what you see in Phagesdb and what you see in the NCBI BLASTp, we can often see phages in Phagesdb that we don't see in the NCBI output because NCBI only shows one representative hit of a group of identical proteins. You can often see these by clicking on the NCBI BLAST Accession link for a phage and then clicking on the "Identical Proteins" button on the upper left of the gene window. We try to show members from this list in our Description field but are limited to the number that we have chosen to display, so, if you see a very long list, click on the Accession link and check out the "Identical Proteins" because we have probably not listed all of them.

Differences between PECAAN hits and those obtained from direct searches to HHPred and the BLAST hits from Phagesdb and NCBI BLAST do arise when new information is available through the web services that have not been incorporated into the static PECAAN database for your phage. That is why we have the Last Updated: field under the header for each of these databases. These databases can be Rerun individually for each gene, or under the Admin->Phages menu you can select your phage and press the Edit button for it, which will let you: Reblast all genes (BLAST only) or Rerun Evidence for All Genes (BLAST, HHPred, CDD).
Another difference between live and PECAAN data can occur when PECAAN is using an older search database than the online services. You have pointed out some problems previously with the HHPred, which were quite significant. We have therefore tried to increase our database update from monthly to weekly, to try to correspond with their update. The NCBI BLAST and HHPred searches take serious computation and NCBI and the HHPred providers requested that we run these locally on our supercomputer, so that is what we now do, thus the potential for unlinked comparison databases. For HHPred another source of variability is the four databases that we search. We use the CD, Scope70, Pfam-A and PDB databases. There is the option to select other databases in the online execution of this program. The final source of variability in HHPred is built into its architecture. The HMM is built on a set of probabilities and therefore has the possibility of producing slightly different outputs. I have noticed this when two runs produce a hits to the same multi-subunit crystal structure in PDB searches but return different homologous subunits from the crystal.

Hope that this helps clarify some of these questions.
Thanks,
Claire
| posted 11 Mar, 2019 14:46
I am hoping you have a resolution for this.
Currently there is a phage in PECAAN named Anthony. It is a Bacillus phage. However, we have a mycobacteriophage named Anthony.
We would love to annotate it using PECAAN.
Any suggestions for how to proceed?
Thanks,
debbie

Debbie,
We have added some modifications to PECAAN that will allow you to easily resolve this problem of having duplicate phage names. In fact, I recently made seven copies of a phage, Lilizi, that we had annotated previously but had not submitted, so that I could give students a phage to practice on that had the Pham and Starterator information attached, unlike Etude.
For your Anthony phage, I would create a phage entry named Anthony_Myco and attach the files as usual. Information after the _ is usually ignored and the pham searches will be done with just the Anthony part of the name. If you are doing training and want to change the first part of the name to something that people won’t recognize as being the same phage you could enter something like TestSet_1 and then go into the Admin->Phages menu and then search for your phage TestSet_1 and press the Edit button at the end of the line. In the Edit window there is now an option to assign the Phamerator Phage Match that should be mapped onto the phage. This will also populate the Starterator field in PECAAN. Please be careful with this option and make sure that you are mapping the Phamerated phage name to a PECAAN copy with the same sequence.
Give it a try and give us feedback.

Thanks,
Claire
Edited 12 Mar, 2019 14:37
| posted 11 Mar, 2019 19:03
Claire,

Thank you very much for the response, that was very helpful. And I can delete Anthony now, we are done with it. Though the answer will, no doubt, be useful for other duplicates in the future.

Steve
| posted 12 Mar, 2019 17:07
Claire,
It works!
Thanks,
debbie
 
Login to post a reply.