SEA-PHAGES | All posts created by ClaireRinehart

Link to this post \| posted 11 Jul, 2020 23:59
ClaireRinehart	Well, here we are again. This time the Cluster A1 phage is STLscum and it has all of the features described above. I notice that there are now others that fit the criteria above that have called this superinfection immunity protein or superinfection exclusion protein including Swag_38, LastResort_38, Jabith_72, and Niza_72. All of these have 2-3 transmembrane domains and are good matches to pfam 14373. Would you reconsider naming this group of proteins "superinfection exclusion protein" after JSwag_38. This seems more appropriate since this protein contributes to the exclusion of super-infecting genomes to the periplasmic space? Thanks, Claire

Link to this post | posted 11 Jul, 2020 23:59

Well, here we are again. This time the Cluster A1 phage is STLscum and it has all of the features described above. I notice that there are now others that fit the criteria above that have called this superinfection immunity protein or superinfection exclusion protein including Swag_38, LastResort_38, Jabith_72, and Niza_72. All of these have 2-3 transmembrane domains and are good matches to pfam 14373. Would you reconsider naming this group of proteins "superinfection exclusion protein" after JSwag_38. This seems more appropriate since this protein contributes to the exclusion of super-infecting genomes to the periplasmic space?
Thanks,
Claire

Posted in: Request a new function on the SEA-PHAGES official list → Superinfection Immunity Protein

Link to this post \| posted 27 Jun, 2020 13:32
ClaireRinehart	Only 8 of the 109 have the -4 start site at Starterator location 12. As you can see from the Starterator map, all of these belong to the family of longest ORFs. All of the closest nucleotide BLAST relatives to OfUltron and Seabastion (Llama, Modragons and Ochi17) have this -4 start. I am going to call the -4 start because the data is consistent for these longer ORFs and may be a new evolving pham.

Link to this post | posted 27 Jun, 2020 13:32

ClaireRinehart

Only 8 of the 109 have the -4 start site at Starterator location 12. As you can see from the Starterator map, all of these belong to the family of longest ORFs. All of the closest nucleotide BLAST relatives to OfUltron and Seabastion (Llama, Modragons and Ochi17) have this -4 start. I am going to call the -4 start because the data is consistent for these longer ORFs and may be a new evolving pham.

Posted in: Cluster F Annotation Tips → 4 bp overlaps

Link to this post \| posted 26 Jun, 2020 21:54
ClaireRinehart	Welkin, I am struggling with the evidence for OfUltron and Sebastian gene 103. The -4 gap start is at 54324 and has a Z-score of 3 and Final score of -2.6 which looks like very compelling evidence for this start site. When I look at Starterator results I find that all 109 Cluster F1 hits call the start at 54471 which has a Z-score of 2.255 and a Final score of -4.458. When I look at the secondary structure potential for the RBS at start 54471 I find that 5 of the seven bases are in a very strong stem of a local stem-loop with a final Free Energy of -1200, which is very high (6 of the 7 bp in the stem are G-C bp). Wow, this makes the -4 gap look even better. However, if translation starts at the -4 start site there is no coding capacity for about 70 bases and then there is atypical coding capacity for about 50 bases before the start at 54471. I searched this non-coding capacity range and found 4 rare codons in this region. My initial instinct is to go with start 54324 with the -4 gap with the hope that some of the ribosomes would be able to navigate the rare codon domain, even though that may be at a slower rate. Is there Mass Spec data for Cluster F1 phages or other evidence (besides herd instinct) that has pointed everyone else to call the start at 54471?

Link to this post | posted 26 Jun, 2020 21:54

ClaireRinehart

Welkin,
I am struggling with the evidence for OfUltron and Sebastian gene 103.
The -4 gap start is at 54324 and has a Z-score of 3 and Final score of -2.6 which looks like very compelling evidence for this start site. When I look at Starterator results I find that all 109 Cluster F1 hits call the start at 54471 which has a Z-score of 2.255 and a Final score of -4.458. When I look at the secondary structure potential for the RBS at start 54471 I find that 5 of the seven bases are in a very strong stem of a local stem-loop with a final Free Energy of -1200, which is very high (6 of the 7 bp in the stem are G-C bp). Wow, this makes the -4 gap look even better. However, if translation starts at the -4 start site there is no coding capacity for about 70 bases and then there is atypical coding capacity for about 50 bases before the start at 54471. I searched this non-coding capacity range and found 4 rare codons in this region. My initial instinct is to go with start 54324 with the -4 gap with the hope that some of the ribosomes would be able to navigate the rare codon domain, even though that may be at a slower rate. Is there Mass Spec data for Cluster F1 phages or other evidence (besides herd instinct) that has pointed everyone else to call the start at 54471?

Posted in: Cluster F Annotation Tips → 4 bp overlaps

Link to this post \| posted 13 May, 2020 23:08
ClaireRinehart	Since PECAAN's BLASTS, HHPred and other queries are run locally in our High Performance Computer Center to avoid overloading NCBI and other services we sometimes get out of sync with them. We check these databases regularly to see when we need to update, but sometimes we get out of sync and simply need to download the latest database. So, if you see this kind of problem again, just drop us a line and we will initiate an additional download from these databases. HPCC administrator has informed me that we should be back in sync. Just a note concerning a global re-run on your whole phage from the Admin - Phages window that Chris mentioned. If you re-BLAST the whole phage genome you will often loose the evidence checkmarks. Therefore, do so with the assumption that the checkmarks will be reset to zero. Thanks, Claire Edited 13 May, 2020 23:10

Link to this post | posted 13 May, 2020 23:08

ClaireRinehart

Since PECAAN's BLASTS, HHPred and other queries are run locally in our High Performance Computer Center to avoid overloading NCBI and other services we sometimes get out of sync with them. We check these databases regularly to see when we need to update, but sometimes we get out of sync and simply need to download the latest database. So, if you see this kind of problem again, just drop us a line and we will initiate an additional download from these databases.

HPCC administrator has informed me that we should be back in sync.

Just a note concerning a global re-run on your whole phage from the Admin - Phages window that Chris mentioned. If you re-BLAST the whole phage genome you will often loose the evidence checkmarks. Therefore, do so with the assumption that the checkmarks will be reset to zero.

Thanks,
Claire

Edited 13 May, 2020 23:10

Posted in: PECAAN → Rerun function not updating

Link to this post \| posted 29 Jan, 2020 03:40
ClaireRinehart	TOPCONS has a very good prediction statistics compared to other prediction programs, including TmHmm. The five supporting methods, shown along with TOPCONS, aid in developing the reliability score graphic. To score high on reliability you need to have a TM domain predicted by TOPCONS and all of the other five supporting programs at the same location. The neat thing about TOPCONS is its ability to accurately predict when there is no TM domain. TOPCONS is over 95% accurate in predicting when there is no TM domain. So, if there is no TM domain found in the TOPCONS line then you can bet that it is not a membrane protein. I looked at gene 34 (start 28413) in PECAAN and see TOPCONS evidence for two TM domains, which fits the two TM domain evidence for the phage r1t holin. At least one domain has decent support from three out of five of the other prediction programs, which is OK support. I think that there is enough evidence for the holin call. Claire

Link to this post | posted 29 Jan, 2020 03:40

ClaireRinehart

TOPCONS has a very good prediction statistics compared to other prediction programs, including TmHmm. The five supporting methods, shown along with TOPCONS, aid in developing the reliability score graphic. To score high on reliability you need to have a TM domain predicted by TOPCONS and all of the other five supporting programs at the same location. The neat thing about TOPCONS is its ability to accurately predict when there is no TM domain. TOPCONS is over 95% accurate in predicting when there is no TM domain. So, if there is no TM domain found in the TOPCONS line then you can bet that it is not a membrane protein.

I looked at gene 34 (start 28413) in PECAAN and see TOPCONS evidence for two TM domains, which fits the two TM domain evidence for the phage r1t holin. At least one domain has decent support from three out of five of the other prediction programs, which is OK support. I think that there is enough evidence for the holin call.

Claire

Posted in: PECAAN → New Features in PECAAN

Link to this post \| posted 16 Aug, 2019 18:49
ClaireRinehart	Heather, Yes, only having the tyrosine and serine integrase options does often require a little more work. One place that I like to go for this information is HHPred. If you can find the hits that have four letter/number names before a _ in the left column, these links lead to the PDB database that usually has a rich set of information. I like to read the collapsed PubMed Abstract under the literature section. This often has reference to the type of integrase. If there is nothing there, search down to the Small Molecules section and you can sometimes find reference to a serine or tyrosine interaction. Another place in PECAAN to look is at the Pham link under the Starterator dropdown box. This takes you to the Phagesdb summary for the Pham that has the Phages, their functions and sizes. You should see a consistent set of either serine integrases or tyrosine integrases in this pham list. Another quick summary of the hits found in Phagesdb is in the Phages Function Frequency table above the Phagesdb BLAST. This shows all of the top 100 function hits and will give you a feel for the number of hits called as y-int or s-int as well as their associated phams. If there are Conserved Domain Database hits these will usually define the integrase type also. Finally, some of the top NCBI hits will often contain either the serine or tyrosine type. I hope this is helpful. Thanks, Claire Edited 16 Aug, 2019 19:02

Link to this post | posted 16 Aug, 2019 18:49

ClaireRinehart

Heather,
Yes, only having the tyrosine and serine integrase options does often require a little more work.
One place that I like to go for this information is HHPred. If you can find the hits that have four letter/number names before a _ in the left column, these links lead to the PDB database that usually has a rich set of information. I like to read the collapsed PubMed Abstract under the literature section. This often has reference to the type of integrase. If there is nothing there, search down to the Small Molecules section and you can sometimes find reference to a serine or tyrosine interaction. Another place in PECAAN to look is at the Pham link under the Starterator dropdown box. This takes you to the Phagesdb summary for the Pham that has the Phages, their functions and sizes. You should see a consistent set of either serine integrases or tyrosine integrases in this pham list. Another quick summary of the hits found in Phagesdb is in the Phages Function Frequency table above the Phagesdb BLAST. This shows all of the top 100 function hits and will give you a feel for the number of hits called as y-int or s-int as well as their associated phams. If there are Conserved Domain Database hits these will usually define the integrase type also. Finally, some of the top NCBI hits will often contain either the serine or tyrosine type.
I hope this is helpful.
Thanks,
Claire

Edited 16 Aug, 2019 19:02

Posted in: PECAAN → New Features in PECAAN

Link to this post \| posted 26 Jul, 2019 13:14
ClaireRinehart	Heather, In the NCBI outputs there are several tagged descriptor lines like: /note and /product. Occasionally, when the editors at NCBI find that a protein has a domain that they feel matches one of the functional domains, they will insert a /region note. Whenever you find a Yes under the Region header in the NCBI BLAST it will be a blue link. If you click on this link a separate window will pop up that will contain the /region note and additional annotation lines from the NCBI output. So, the Region column is just a flag that lets you see that there additional information or confirmation that has been added to the original annotation by NCBI. You will also notice that the Yes / No designators are only present for matches that have greater than a 70% identity, this was an arbitrary cutoff that we chose to save search time. Enjoy! Claire

Link to this post | posted 26 Jul, 2019 13:14

ClaireRinehart

Heather,
In the NCBI outputs there are several tagged descriptor lines like: /note and /product. Occasionally, when the editors at NCBI find that a protein has a domain that they feel matches one of the functional domains, they will insert a /region note. Whenever you find a Yes under the Region header in the NCBI BLAST it will be a blue link. If you click on this link a separate window will pop up that will contain the /region note and additional annotation lines from the NCBI output. So, the Region column is just a flag that lets you see that there additional information or confirmation that has been added to the original annotation by NCBI. You will also notice that the Yes / No designators are only present for matches that have greater than a 70% identity, this was an arbitrary cutoff that we chose to save search time.

Enjoy!

Claire

Posted in: PECAAN → New Features in PECAAN

Link to this post \| posted 15 May, 2019 17:05
ClaireRinehart	Sally, We have the TMHMM transmembrane prediction function built into PECAAN, but whenever I find such a call in TMHMM I verify it with a couple of programs, SOSUI and TOPCONS. I really like the TOPCONS output because if it does not call a membrane domain then it is almost assured that it is not a membrane protein. We hope to add these additional verification programs into PECAAN this summer. Thanks, Claire

Link to this post | posted 15 May, 2019 17:05

ClaireRinehart

Sally,
We have the TMHMM transmembrane prediction function built into PECAAN, but whenever I find such a call in TMHMM I verify it with a couple of programs, SOSUI and TOPCONS. I really like the TOPCONS output because if it does not call a membrane domain then it is almost assured that it is not a membrane protein. We hope to add these additional verification programs into PECAAN this summer.
Thanks,
Claire

Posted in: Request a new function on the SEA-PHAGES official list → membrane protein

Link to this post \| posted 28 Apr, 2019 22:19
ClaireRinehart	Jeff, Sorry for the problems that you are experiencing. I downloaded the Dieselweasel.fasta file from Phagesdb and opened it in DNA Master. I then plasted in the Full Annotation export from PECAAN and Parsed it. It all went in up through gene 85. I have attached the DNA Master file. We have noticed that successfully pasting text into DNA Master and then Parsing is very dependant on the text editor that you use to copy and paste with. I use Textwrangler or BBedit because they are text only editors. No control codes are embedded. Certainly Word or even Apple's own TextEdit will slip control codes into the text, almost unseen, that will cause DNA Master to abort the Parsing. In the future, you might try another text editor. Thanks, Claire 63Kb

Link to this post | posted 28 Apr, 2019 22:19

ClaireRinehart

Jeff,
Sorry for the problems that you are experiencing.
I downloaded the Dieselweasel.fasta file from Phagesdb and opened it in DNA Master. I then plasted in the Full Annotation export from PECAAN and Parsed it. It all went in up through gene 85. I have attached the DNA Master file.

We have noticed that successfully pasting text into DNA Master and then Parsing is very dependant on the text editor that you use to copy and paste with. I use Textwrangler or BBedit because they are text only editors. No control codes are embedded. Certainly Word or even Apple's own TextEdit will slip control codes into the text, almost unseen, that will cause DNA Master to abort the Parsing. In the future, you might try another text editor.
Thanks,
Claire

Posted in: DNA Master → DNA Master Note Parsing Bugs

Link to this post \| posted 17 Apr, 2019 14:46
ClaireRinehart	Deborah and Debbie, Both Den3 and Velene were put into PECAAN at the end of February 2019. For PECAAN, we have to do the HHpred searches locally since the high demand was putting to much load on the online site. At the end of Feb. we updated the database for pdb70 and changed their hhblits database for multi sequence alignment to Uniclust30 from the old uniprot20. This put the PECAAN HHpred more inline with what was being generated from the online site. This was probably after your entry of Den3 and Velene. For some of the other phages we had notices some differences between the online and PECAAN runs. That is what prompted the changes that we made, as mentioned above. Since the databases that we pull from are dynamic, we have put the dates at the top of each of the PECAAN database results to inform users of when the material was last updated. If ever you have a question about currency of the data, just press the re-run button adjacent to the database header to get the latest updates. They should be the most relevant. To update all of the data for a phage that was entered long before annotation, you can also go to the top Admin menu and select the Phages option. You can then find your phage by typing it's name into the search box. Press the Edit button at the end of the entry and then select Reblast… to update the Phagesdb and NCBI BLAST results or Rerun Evidence for all genes to update the evidence for all databases. Note that this will uncheck the evidence boxes that may have been previously marked. I hope this helps explain a little about where we have been and where we are today with PECAAN. -enjoy! Claire

Link to this post | posted 17 Apr, 2019 14:46

ClaireRinehart

Deborah and Debbie,
Both Den3 and Velene were put into PECAAN at the end of February 2019.
For PECAAN, we have to do the HHpred searches locally since the high demand was putting to much load on the online site.
At the end of Feb. we updated the database for pdb70 and changed their hhblits database for multi sequence alignment to Uniclust30 from the old uniprot20. This put the PECAAN HHpred more inline with what was being generated from the online site. This was probably after your entry of Den3 and Velene. For some of the other phages we had notices some differences between the online and PECAAN runs. That is what prompted the changes that we made, as mentioned above.
Since the databases that we pull from are dynamic, we have put the dates at the top of each of the PECAAN database results to inform users of when the material was last updated. If ever you have a question about currency of the data, just press the re-run button adjacent to the database header to get the latest updates. They should be the most relevant. To update all of the data for a phage that was entered long before annotation, you can also go to the top Admin menu and select the Phages option. You can then find your phage by typing it's name into the search box. Press the Edit button at the end of the entry and then select Reblast… to update the Phagesdb and NCBI BLAST results or Rerun Evidence for all genes to update the evidence for all databases. Note that this will uncheck the evidence boxes that may have been previously marked.
I hope this helps explain a little about where we have been and where we are today with PECAAN.
-enjoy!
Claire

Posted in: Cluster EA Annotation Tips → DNA binding domain protein or amidotransferase

Recent Activity

All posts created by ClaireRinehart