SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by ClaireRinehart

| posted 27 Jun, 2020 13:32
Only 8 of the 109 have the -4 start site at Starterator location 12. As you can see from the Starterator map, all of these belong to the family of longest ORFs. All of the closest nucleotide BLAST relatives to OfUltron and Seabastion (Llama, Modragons and Ochi17) have this -4 start. I am going to call the -4 start because the data is consistent for these longer ORFs and may be a new evolving pham.
Posted in: Cluster F Annotation Tips4 bp overlaps
| posted 26 Jun, 2020 21:54
Welkin,
I am struggling with the evidence for OfUltron and Sebastian gene 103.
The -4 gap start is at 54324 and has a Z-score of 3 and Final score of -2.6 which looks like very compelling evidence for this start site. When I look at Starterator results I find that all 109 Cluster F1 hits call the start at 54471 which has a Z-score of 2.255 and a Final score of -4.458. When I look at the secondary structure potential for the RBS at start 54471 I find that 5 of the seven bases are in a very strong stem of a local stem-loop with a final Free Energy of -1200, which is very high (6 of the 7 bp in the stem are G-C bp). Wow, this makes the -4 gap look even better. However, if translation starts at the -4 start site there is no coding capacity for about 70 bases and then there is atypical coding capacity for about 50 bases before the start at 54471. I searched this non-coding capacity range and found 4 rare codons in this region. My initial instinct is to go with start 54324 with the -4 gap with the hope that some of the ribosomes would be able to navigate the rare codon domain, even though that may be at a slower rate. Is there Mass Spec data for Cluster F1 phages or other evidence (besides herd instinct) that has pointed everyone else to call the start at 54471?
Posted in: Cluster F Annotation Tips4 bp overlaps
| posted 13 May, 2020 23:08
Since PECAAN's BLASTS, HHPred and other queries are run locally in our High Performance Computer Center to avoid overloading NCBI and other services we sometimes get out of sync with them. We check these databases regularly to see when we need to update, but sometimes we get out of sync and simply need to download the latest database. So, if you see this kind of problem again, just drop us a line and we will initiate an additional download from these databases.

HPCC administrator has informed me that we should be back in sync.

Just a note concerning a global re-run on your whole phage from the Admin - Phages window that Chris mentioned. If you re-BLAST the whole phage genome you will often loose the evidence checkmarks. Therefore, do so with the assumption that the checkmarks will be reset to zero.

Thanks,
Claire
Edited 13 May, 2020 23:10
Posted in: PECAANRerun function not updating
| posted 29 Jan, 2020 03:40
TOPCONS has a very good prediction statistics compared to other prediction programs, including TmHmm. The five supporting methods, shown along with TOPCONS, aid in developing the reliability score graphic. To score high on reliability you need to have a TM domain predicted by TOPCONS and all of the other five supporting programs at the same location. The neat thing about TOPCONS is its ability to accurately predict when there is no TM domain. TOPCONS is over 95% accurate in predicting when there is no TM domain. So, if there is no TM domain found in the TOPCONS line then you can bet that it is not a membrane protein.

I looked at gene 34 (start 28413) in PECAAN and see TOPCONS evidence for two TM domains, which fits the two TM domain evidence for the phage r1t holin. At least one domain has decent support from three out of five of the other prediction programs, which is OK support. I think that there is enough evidence for the holin call.

Claire
Posted in: PECAANNew Features in PECAAN
| posted 16 Aug, 2019 18:49
Heather,
Yes, only having the tyrosine and serine integrase options does often require a little more work.
One place that I like to go for this information is HHPred. If you can find the hits that have four letter/number names before a _ in the left column, these links lead to the PDB database that usually has a rich set of information. I like to read the collapsed PubMed Abstract under the literature section. This often has reference to the type of integrase. If there is nothing there, search down to the Small Molecules section and you can sometimes find reference to a serine or tyrosine interaction. Another place in PECAAN to look is at the Pham link under the Starterator dropdown box. This takes you to the Phagesdb summary for the Pham that has the Phages, their functions and sizes. You should see a consistent set of either serine integrases or tyrosine integrases in this pham list. Another quick summary of the hits found in Phagesdb is in the Phages Function Frequency table above the Phagesdb BLAST. This shows all of the top 100 function hits and will give you a feel for the number of hits called as y-int or s-int as well as their associated phams. If there are Conserved Domain Database hits these will usually define the integrase type also. Finally, some of the top NCBI hits will often contain either the serine or tyrosine type.
I hope this is helpful.
Thanks,
Claire
Edited 16 Aug, 2019 19:02
Posted in: PECAANNew Features in PECAAN
| posted 26 Jul, 2019 13:14
Heather,
In the NCBI outputs there are several tagged descriptor lines like: /note and /product. Occasionally, when the editors at NCBI find that a protein has a domain that they feel matches one of the functional domains, they will insert a /region note. Whenever you find a Yes under the Region header in the NCBI BLAST it will be a blue link. If you click on this link a separate window will pop up that will contain the /region note and additional annotation lines from the NCBI output. So, the Region column is just a flag that lets you see that there additional information or confirmation that has been added to the original annotation by NCBI. You will also notice that the Yes / No designators are only present for matches that have greater than a 70% identity, this was an arbitrary cutoff that we chose to save search time.

Enjoy!

Claire
Posted in: PECAANNew Features in PECAAN
| posted 15 May, 2019 17:05
Sally,
We have the TMHMM transmembrane prediction function built into PECAAN, but whenever I find such a call in TMHMM I verify it with a couple of programs, SOSUI and TOPCONS. I really like the TOPCONS output because if it does not call a membrane domain then it is almost assured that it is not a membrane protein. We hope to add these additional verification programs into PECAAN this summer.
Thanks,
Claire
Posted in: Request a new function on the SEA-PHAGES official listmembrane protein
| posted 28 Apr, 2019 22:19
Jeff,
Sorry for the problems that you are experiencing.
I downloaded the Dieselweasel.fasta file from Phagesdb and opened it in DNA Master. I then plasted in the Full Annotation export from PECAAN and Parsed it. It all went in up through gene 85. I have attached the DNA Master file.

We have noticed that successfully pasting text into DNA Master and then Parsing is very dependant on the text editor that you use to copy and paste with. I use Textwrangler or BBedit because they are text only editors. No control codes are embedded. Certainly Word or even Apple's own TextEdit will slip control codes into the text, almost unseen, that will cause DNA Master to abort the Parsing. In the future, you might try another text editor.
Thanks,
Claire
Posted in: DNA MasterDNA Master Note Parsing Bugs
| posted 17 Apr, 2019 14:46
Deborah and Debbie,
Both Den3 and Velene were put into PECAAN at the end of February 2019.
For PECAAN, we have to do the HHpred searches locally since the high demand was putting to much load on the online site.
At the end of Feb. we updated the database for pdb70 and changed their hhblits database for multi sequence alignment to Uniclust30 from the old uniprot20. This put the PECAAN HHpred more inline with what was being generated from the online site. This was probably after your entry of Den3 and Velene. For some of the other phages we had notices some differences between the online and PECAAN runs. That is what prompted the changes that we made, as mentioned above.
Since the databases that we pull from are dynamic, we have put the dates at the top of each of the PECAAN database results to inform users of when the material was last updated. If ever you have a question about currency of the data, just press the re-run button adjacent to the database header to get the latest updates. They should be the most relevant. To update all of the data for a phage that was entered long before annotation, you can also go to the top Admin menu and select the Phages option. You can then find your phage by typing it's name into the search box. Press the Edit button at the end of the entry and then select Reblast… to update the Phagesdb and NCBI BLAST results or Rerun Evidence for all genes to update the evidence for all databases. Note that this will uncheck the evidence boxes that may have been previously marked.
I hope this helps explain a little about where we have been and where we are today with PECAAN.
-enjoy!
Claire
Posted in: Cluster EA Annotation TipsDNA binding domain protein or amidotransferase
| posted 10 Apr, 2019 16:55
JoAnn,
We just processed Chotabhai from PECAAN into DNA Master and then into the submission pipeline without any problems of getting the Hypothetical Protein tag to populate.

Usually in these situations, we have found that the software that you are using to copy the file and paste it into the DNA Master documentation is inserting a character that is not compatible with the DNA Master parsing.

Would you please copy the PECAAN "Export CDS Function" file and paste it into a new file, save it and then send it to us so that we can compare to our processed file. email to claire.rinehart@wku.edu.

Please also indicate what software package you are using to view the PECAAN "Export CDS Function" file and to copy from.

Thanks,
Claire
Posted in: PECAANNew Features in PECAAN