SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by ClaireRinehart

| posted 13 May, 2020 23:08
Since PECAAN's BLASTS, HHPred and other queries are run locally in our High Performance Computer Center to avoid overloading NCBI and other services we sometimes get out of sync with them. We check these databases regularly to see when we need to update, but sometimes we get out of sync and simply need to download the latest database. So, if you see this kind of problem again, just drop us a line and we will initiate an additional download from these databases.

HPCC administrator has informed me that we should be back in sync.

Just a note concerning a global re-run on your whole phage from the Admin - Phages window that Chris mentioned. If you re-BLAST the whole phage genome you will often loose the evidence checkmarks. Therefore, do so with the assumption that the checkmarks will be reset to zero.

Thanks,
Claire
Edited 13 May, 2020 23:10
Posted in: PECAANRerun function not updating
| posted 29 Jan, 2020 03:40
TOPCONS has a very good prediction statistics compared to other prediction programs, including TmHmm. The five supporting methods, shown along with TOPCONS, aid in developing the reliability score graphic. To score high on reliability you need to have a TM domain predicted by TOPCONS and all of the other five supporting programs at the same location. The neat thing about TOPCONS is its ability to accurately predict when there is no TM domain. TOPCONS is over 95% accurate in predicting when there is no TM domain. So, if there is no TM domain found in the TOPCONS line then you can bet that it is not a membrane protein.

I looked at gene 34 (start 28413) in PECAAN and see TOPCONS evidence for two TM domains, which fits the two TM domain evidence for the phage r1t holin. At least one domain has decent support from three out of five of the other prediction programs, which is OK support. I think that there is enough evidence for the holin call.

Claire
Posted in: PECAANNew Features in PECAAN
| posted 16 Aug, 2019 18:49
Heather,
Yes, only having the tyrosine and serine integrase options does often require a little more work.
One place that I like to go for this information is HHPred. If you can find the hits that have four letter/number names before a _ in the left column, these links lead to the PDB database that usually has a rich set of information. I like to read the collapsed PubMed Abstract under the literature section. This often has reference to the type of integrase. If there is nothing there, search down to the Small Molecules section and you can sometimes find reference to a serine or tyrosine interaction. Another place in PECAAN to look is at the Pham link under the Starterator dropdown box. This takes you to the Phagesdb summary for the Pham that has the Phages, their functions and sizes. You should see a consistent set of either serine integrases or tyrosine integrases in this pham list. Another quick summary of the hits found in Phagesdb is in the Phages Function Frequency table above the Phagesdb BLAST. This shows all of the top 100 function hits and will give you a feel for the number of hits called as y-int or s-int as well as their associated phams. If there are Conserved Domain Database hits these will usually define the integrase type also. Finally, some of the top NCBI hits will often contain either the serine or tyrosine type.
I hope this is helpful.
Thanks,
Claire
Edited 16 Aug, 2019 19:02
Posted in: PECAANNew Features in PECAAN
| posted 26 Jul, 2019 13:14
Heather,
In the NCBI outputs there are several tagged descriptor lines like: /note and /product. Occasionally, when the editors at NCBI find that a protein has a domain that they feel matches one of the functional domains, they will insert a /region note. Whenever you find a Yes under the Region header in the NCBI BLAST it will be a blue link. If you click on this link a separate window will pop up that will contain the /region note and additional annotation lines from the NCBI output. So, the Region column is just a flag that lets you see that there additional information or confirmation that has been added to the original annotation by NCBI. You will also notice that the Yes / No designators are only present for matches that have greater than a 70% identity, this was an arbitrary cutoff that we chose to save search time.

Enjoy!

Claire
Posted in: PECAANNew Features in PECAAN
| posted 15 May, 2019 17:05
Sally,
We have the TMHMM transmembrane prediction function built into PECAAN, but whenever I find such a call in TMHMM I verify it with a couple of programs, SOSUI and TOPCONS. I really like the TOPCONS output because if it does not call a membrane domain then it is almost assured that it is not a membrane protein. We hope to add these additional verification programs into PECAAN this summer.
Thanks,
Claire
Posted in: Request a new function on the SEA-PHAGES official listmembrane protein
| posted 28 Apr, 2019 22:19
Jeff,
Sorry for the problems that you are experiencing.
I downloaded the Dieselweasel.fasta file from Phagesdb and opened it in DNA Master. I then plasted in the Full Annotation export from PECAAN and Parsed it. It all went in up through gene 85. I have attached the DNA Master file.

We have noticed that successfully pasting text into DNA Master and then Parsing is very dependant on the text editor that you use to copy and paste with. I use Textwrangler or BBedit because they are text only editors. No control codes are embedded. Certainly Word or even Apple's own TextEdit will slip control codes into the text, almost unseen, that will cause DNA Master to abort the Parsing. In the future, you might try another text editor.
Thanks,
Claire
Posted in: DNA MasterDNA Master Note Parsing Bugs
| posted 17 Apr, 2019 14:46
Deborah and Debbie,
Both Den3 and Velene were put into PECAAN at the end of February 2019.
For PECAAN, we have to do the HHpred searches locally since the high demand was putting to much load on the online site.
At the end of Feb. we updated the database for pdb70 and changed their hhblits database for multi sequence alignment to Uniclust30 from the old uniprot20. This put the PECAAN HHpred more inline with what was being generated from the online site. This was probably after your entry of Den3 and Velene. For some of the other phages we had notices some differences between the online and PECAAN runs. That is what prompted the changes that we made, as mentioned above.
Since the databases that we pull from are dynamic, we have put the dates at the top of each of the PECAAN database results to inform users of when the material was last updated. If ever you have a question about currency of the data, just press the re-run button adjacent to the database header to get the latest updates. They should be the most relevant. To update all of the data for a phage that was entered long before annotation, you can also go to the top Admin menu and select the Phages option. You can then find your phage by typing it's name into the search box. Press the Edit button at the end of the entry and then select Reblast… to update the Phagesdb and NCBI BLAST results or Rerun Evidence for all genes to update the evidence for all databases. Note that this will uncheck the evidence boxes that may have been previously marked.
I hope this helps explain a little about where we have been and where we are today with PECAAN.
-enjoy!
Claire
Posted in: Cluster EA Annotation TipsDNA binding domain protein or amidotransferase
| posted 10 Apr, 2019 16:55
JoAnn,
We just processed Chotabhai from PECAAN into DNA Master and then into the submission pipeline without any problems of getting the Hypothetical Protein tag to populate.

Usually in these situations, we have found that the software that you are using to copy the file and paste it into the DNA Master documentation is inserting a character that is not compatible with the DNA Master parsing.

Would you please copy the PECAAN "Export CDS Function" file and paste it into a new file, save it and then send it to us so that we can compare to our processed file. email to claire.rinehart@wku.edu.

Please also indicate what software package you are using to view the PECAAN "Export CDS Function" file and to copy from.

Thanks,
Claire
Posted in: PECAANNew Features in PECAAN
| posted 11 Mar, 2019 14:46
I am hoping you have a resolution for this.
Currently there is a phage in PECAAN named Anthony. It is a Bacillus phage. However, we have a mycobacteriophage named Anthony.
We would love to annotate it using PECAAN.
Any suggestions for how to proceed?
Thanks,
debbie

Debbie,
We have added some modifications to PECAAN that will allow you to easily resolve this problem of having duplicate phage names. In fact, I recently made seven copies of a phage, Lilizi, that we had annotated previously but had not submitted, so that I could give students a phage to practice on that had the Pham and Starterator information attached, unlike Etude.
For your Anthony phage, I would create a phage entry named Anthony_Myco and attach the files as usual. Information after the _ is usually ignored and the pham searches will be done with just the Anthony part of the name. If you are doing training and want to change the first part of the name to something that people won’t recognize as being the same phage you could enter something like TestSet_1 and then go into the Admin->Phages menu and then search for your phage TestSet_1 and press the Edit button at the end of the line. In the Edit window there is now an option to assign the Phamerator Phage Match that should be mapped onto the phage. This will also populate the Starterator field in PECAAN. Please be careful with this option and make sure that you are mapping the Phamerated phage name to a PECAAN copy with the same sequence.
Give it a try and give us feedback.

Thanks,
Claire
Edited 12 Mar, 2019 14:37
Posted in: PECAANNew Features in PECAAN
| posted 11 Mar, 2019 14:37
Steve,
The phagesdb Function table that we have added provides a summary of the top 100 hits that we draw from Phagesdb. Those that show functions other than "Unknown Function" are when grouped by function pham and cluster to give you a summary of the evidence for each individual combination of those three elements. The Phagesdb Function Frequency is simply the number of hits for each line divided by the total number of functional hits in the top 100.

As for differences between what you see in Phagesdb and what you see in the NCBI BLASTp, we can often see phages in Phagesdb that we don't see in the NCBI output because NCBI only shows one representative hit of a group of identical proteins. You can often see these by clicking on the NCBI BLAST Accession link for a phage and then clicking on the "Identical Proteins" button on the upper left of the gene window. We try to show members from this list in our Description field but are limited to the number that we have chosen to display, so, if you see a very long list, click on the Accession link and check out the "Identical Proteins" because we have probably not listed all of them.

Differences between PECAAN hits and those obtained from direct searches to HHPred and the BLAST hits from Phagesdb and NCBI BLAST do arise when new information is available through the web services that have not been incorporated into the static PECAAN database for your phage. That is why we have the Last Updated: field under the header for each of these databases. These databases can be Rerun individually for each gene, or under the Admin->Phages menu you can select your phage and press the Edit button for it, which will let you: Reblast all genes (BLAST only) or Rerun Evidence for All Genes (BLAST, HHPred, CDD).
Another difference between live and PECAAN data can occur when PECAAN is using an older search database than the online services. You have pointed out some problems previously with the HHPred, which were quite significant. We have therefore tried to increase our database update from monthly to weekly, to try to correspond with their update. The NCBI BLAST and HHPred searches take serious computation and NCBI and the HHPred providers requested that we run these locally on our supercomputer, so that is what we now do, thus the potential for unlinked comparison databases. For HHPred another source of variability is the four databases that we search. We use the CD, Scope70, Pfam-A and PDB databases. There is the option to select other databases in the online execution of this program. The final source of variability in HHPred is built into its architecture. The HMM is built on a set of probabilities and therefore has the possibility of producing slightly different outputs. I have noticed this when two runs produce a hits to the same multi-subunit crystal structure in PDB searches but return different homologous subunits from the crystal.

Hope that this helps clarify some of these questions.
Thanks,
Claire
Posted in: PECAANNew Features in PECAAN