Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.
Recent Activity
All posts created by ClaireRinehart
Link to this post | posted 12 Apr, 2025 13:53 | |
---|---|
|
The following description are my notes on how to export PECAAN annotations and import the information into DNA Master. Enjoy! Export PECAAN files In PECAAN, go to the Export menu and click on the “Export CDS Function” button to export the PECAAN file into a .txt file that can be opened in BBEdit or any other pure text editor. When the PHAGENAME is displayed, put SEA_ before the PHAGENAME, if the phage is from the Science Education Alliance (SEA) program, otherwise just continue. Create DNA Master file with PECAAN annotations When starting DNA Master choose the center option to import a .fasta file and auto-annotate. Navigate to the phagename.fasta file, when the popup window asks for the .fasta file, and click on it. When the Annotate window is displayed, select the options that you want and click the “Annotate” button. When auto-annotation is complete, close the log and message windows. You now have a DNA Master file. To import the PECAAN annotations, click on the “Documentation” menu option. Open the PECAAN Phagename_cdsfunctions.txt file in a text editor, such as BBEdit. Click on the annotation text, go to the “Edit” menu and click the “Select All” option and then the “Copy” option from the same menu. Go back to the DNA Master “Documentation” window and right click on the text. Choose the “Select All” option from the popup window. Right click again on the text again and choose the “Paste” option. This should replace the auto-annotation documentation with the PECAAN documentation. Next click the “Parse” button in the upper right corner of the DNA Master Documentation window. Select the first five features on the left of the pop-up window and select the “Bacterial and Plant Plastid Code” option for the “Genetic Code” option. Pres the “Parse” button. Go back to the Features menu window and continue with the preparation of the GenBank file. In the Features window, right-click on the Name header and select the Wide Feature List option. You should see the Tag as SEA_PHAGENAME_#. Prepending the SEA_ tag to the PHAGENAME was an option in the PECAAN “Export CDS Function”. If you do not see this format then in the validate window, described next, you will need to check the “Override” button at the bottom the “Validation” window and put SEA_PHAGENAME into the “Locus Tag Prefix” and press the “Reassign Gene Data” button. Go click the Validate button at the bottom of the gene list window. You get an all clear for the starts and stops. You may see a note about the two tail assembly chaperone genes sharing a 5’ start. That is OK if there is a frame-shifted gene in your genome and these stop sites correspond to those genes. If there are any other error notes, go fix them and then re-validate until the errors are resolved. If the Tag column is in the SEA_PHAGENAME format then you are done with the validation, if not, then look at the previous paragraph for instructions. When finished, click on the Description button to see the Product, Function, and Notes fields. Save the DNA Master file as PhageName.dnam5 in the phage folder with the .fasta and author files. If coming from PECAAN’s “Export CDS Function”, PECAAN export has already put the functions from the notes into the Product field, populated the NKFs as “hypothetical proteins” and if prepended the phage name with SEA_ then the tags are properly set. Scroll through a few genes to verify that this has been done by looking for “hypothetical protein” in the Product field for those genes without a function, i.e., NKF in PECAAN. |
Posted in: PECAAN → Exporting PECAAN annotations to DNA Master
Link to this post | posted 05 Jul, 2024 21:52 | |
---|---|
|
I have noticed that NCBI is revising whole blocks of phages and submitting them as new submissions with different accession numbers than the original SEA_Phage submissions. This is easy to detect when looking at the new NCBI file because it is a new reference (1) that has been added to the original reference(2: LOCUS YP_010057231 46 aa linear PHG 10-JAN-2023 DEFINITION HNH endonuclease [Mycobacterium phage Cane17]. ACCESSION YP_010057231 VERSION YP_010057231.1 DBLINK BioProject: PRJNA485481 DBSOURCE REFSEQ: accession NC_054716.1 KEYWORDS RefSeq. SOURCE Mycobacterium phage Cane17 ORGANISM Mycobacterium phage Cane17 Viruses; Duplodnaviria; Heunggongvirae; Uroviricota; Caudoviricetes; Ceeclamvirinae; Bixzunavirus; Bixzunavirus cane17. REFERENCE 1 (residues 1 to 46) CONSRTM NCBI Genome Project TITLE Direct Submission JOURNAL Submitted (07-MAY-2021) National Center for Biotechnology Information, NIH, Bethesda, MD 20894, USA REFERENCE 2 (residues 1 to 46) AUTHORS Fast,K.M., Castleberry,S., Jones,I.K., Larrimore,J.D., Long,C.A., Pritchett,N.C., Keener,T., Sandel,M.W., Bollivar,D.W., Garlena,R.A., Russell,D.A., Pope,W.H., Jacobs-Sera,D. and Hatfull,G.F. TITLE Direct Submission JOURNAL Submitted (28-JUL-201 ![]() Park Street, Bloomington, IL 61701, USA The major problem that I have with this is that we are not able to see the evidence or documentation that led to this huge change. This causes problems with our students that see an overwhelming block of identical functions in NCBI usually without noticing the original submissions with different functions. Sometimes the original submissions are visible in the NCBI BLAST results as shown in the PECAAN output below: HNH endonuclease [Mycobacterium phage Cane17] >gb|AXQ51660.1| hypothetical protein SEA_CANE17_46 [Mycobacterium phage Cane17] >gb|QAY13996.1| hypothetical protein SEA_COLT_48 [Mycobacterium phage Colt] and other times the original SEA_ evidence is sorted way down the list of results. We instruct our students to go with the Phagesdb results which are supported by HHPred or CDD evidence. The NCBI results are great for confirming the 1:1 start correlations. Enjoy! Claire Rinehart |
Posted in: Functional Annotation → RefSeq and INSDC name disagreements in NCBI Blast for Functonal Assignment
Link to this post | posted 20 May, 2024 14:57 | |
---|---|
|
You can't enter un-called tRNAs into PECAAN. Sorry. You can get PECAAN to rerun the tRNA and tmRNA search by clicking on the Admin menu button and then selecting the Phage option. Next enter the name of your phage into the search box and once you see your phage, click on the Edit button at the right hand edge of your phage line. You will then be given the option to 'Rerun the tRNA and tmRNA' search. |
Posted in: PECAAN → New Features in PECAAN
Link to this post | posted 27 Apr, 2024 01:25 | |
---|---|
|
I verified that the Starterator and Database are both version 561 but pham 161446 at URL: http://phages.wustl.edu/starterator/Pham161446Report.pdf gives me a 404 error, page not found. I checked many of the other Starterator phams and they seem to be working now that you have updated. Thanks for your help. |
Posted in: Starterator → Pham not found in Starterator
Link to this post | posted 02 Mar, 2024 22:24 | |
---|---|
|
Debbie, I have additionally found Luxx gene 21 and 22 starts have been changed to bring this genome into "conformity" with the other EE genomes. Looking at the data I again disagree with these changes. Please educate me on the rationale. Thanks, Claire Rinehart Luxx Gene 21 (reverse gene) Starterator calls start 15583, which has a very poor Z- and Final Score but captures all of the coding capacity. Start 15577 is just six bases shorter and has a viable Z- and Final score, while capturing most all of the coding capacity. The best scoring start is at 15520 with excellent Z- and Final Scores but looses 63 bases of coding capacity found in the tail region of the typical plot, but into the peak region of the atypical plot. My choice is start 15577. Starterator Info for manual annotations of cluster EE: •Start number 3 was manually annotated 1 time for cluster EE. •Start number 5 was manually annotated 9 times for cluster EE. •Start number 6 was manually annotated 78 times for cluster EE. •Start number 7 was manually annotated 2 times for cluster EE. •Start number 10 was manually annotated 2 times for cluster EE. •Start number 12 was manually annotated 5 times for cluster EE. •Start number 14 was manually annotated 1 time for cluster EE. •Start number 15 was manually annotated 2 times for cluster EE. Gene: Luxx_21 Start: 15583, Stop: 15083, Start Num: 6 Candidate Starts for Luxx_21: (2, 15784), (Start: 5 @15595 has 9 MA's), (Start: 6 @15583 has 78 MA's), (Start: 7 @15577 has 2 MA's), (9, 15559), (Start: 10 @15553 has 2 MA's), (Start: 12 @15520 has 5 MA's), (16, 15475), (17, 15439), (18, 15424), (19, 15397), (20, 15385), (22, 15343), (23, 15319), (25, 15289), (28, 15259), (29, 15250), (30, 15229), (31, 15226), (33, 15202), (38, 15118 ), Ribosomal binding scores: Direction Start Stop Length Gap Spacer Z-score Final Score Codon Reverse 15784 15083 702 -122 14 2.391 -4.856 ATG Reverse 15595 15083 513 67 7 0.6 -8.437 GTG Reverse 15583 15083 501 79 12 0.503 -7.945 ATG Reverse 15577 15083 495 85 10 1.201 -6.398 ATG Reverse 15559 15083 477 103 10 0.549 -7.710 ATG Reverse 15553 15083 471 109 10 1.201 -6.398 ATG Referse 15520 15083 438 142 13 2.987 -3.155 ATG Luxx Gene 22 (reverse gene) (reverse gene) Starterator calls the start at 15893. As you can see below, start 15893 has one of the poorest Z- and Final Scores. A better choice would be 15923 or 15818. In looking at the coding capacity below, Start 15818 would give up a large portion of coding capacity. However, start 15923 would even capture the atypical coding capacity and is my start of choice. Starterator Gene: Luxx_22 Start: 15893, Stop: 15663, Start Num: 5 Candidate Starts for Luxx_22: (3, 15935), (4, 15923), (Start: 5 @15893 has 100 MA's), (6, 15818 ), (7, 15809), (8, 15791), (10, 15740), (11, 15728 ), Ribosomal binding scores: Direction Start Stop Length Gap Spacer Z-Score Final Score Reverse 15935 15663 273 468 15 1.495 -6.714 Reverse 15923 15663 261 480 15 2.237 -5.221 Reverse 15893 15663 231 510 15 0.936 -7.839 Reverse 15818 15663 156 585 9 2.137 -4.595 Reverse 15809 15663 147 598 18 2.137 -6.122 |
Posted in: Cluster EE Annotation Tips → Genome Curation - a must read!
Link to this post | posted 02 Mar, 2024 21:55 | |
---|---|
|
Debbie, I was using the GenBank submission of Luxx (cluster EE) to evaluate the annotations of our student's practice genomes that are based on Luxx. I found that gene 18 had a -34 gap start called instead of the -4 that we originally called. Start Stop Length Gap Spacer Z-score Final Score Codon Forward 14090 14578 489 -259 6 0.875 -8.104 GTG Forward 14315 14578 264 -34 16 1.587 -6.724 GTG Forward 14327 14578 252 -22 10 1.637 -5.522 GTG Forward 14345 14578 234 -4 16 2.066 -5.760 GTG Forward 14426 14578 153 77 6 1.917 -6.007 ATG Forward I read the Cluster EE forum notes and see that Luxx was modified to bring the group into "conformity". I would like to learn what would justify gene 18 start being called at 14315 other than the fact that the other 84 genomes in Starterator use that site? Thanks, I am still learning. Claire |
Posted in: Cluster EE Annotation Tips → Genome Curation - a must read!
Link to this post | posted 22 Mar, 2022 14:31 | |
---|---|
|
The NCBI BLAST is now operating and the backlog has been significantly reduced. Thanks again. Claire |
Posted in: PECAAN → PECAAN not BLASTing?
Link to this post | posted 22 Mar, 2022 03:46 | |
---|---|
|
Thanks Steve for point this out. You are right. It has been a week for some submissions. We are working on getting the backlog resolved. Claire |
Posted in: PECAAN → PECAAN not BLASTing?
Link to this post | posted 02 Jul, 2021 16:52 | |
---|---|
|
PECAAN has been modified to output the tRNA report so that it now passes the QC workflow. |
Posted in: PECAAN → PECAAN and tRNA notes problem?
Link to this post | posted 01 Jun, 2021 21:35 | |
---|---|
|
PECAAN Annotation Tutorial Videos * Finding closest relatives https://youtu.be/5jqoHZwacAM * Compare genome to closest relatives https://youtu.be/6dh9yiWR2yw * How the locations of genes are predicted https://youtu.be/51YurlcyJKk * How to add and delete genes https://youtu.be/aNmH541DGMA * SEA PHAGES annotation guide https://youtu.be/4MYjl0T5cKY * Starterator, PhagesDB & NCBI BLAST https://youtu.be/85JwOLoBwFU * Gaps & Ribosomal Binding Sites https://youtu.be/dj-YcygwP3s * GM coding capacity, LORF & start sites https://youtu.be/L_AIQ1rAUxg * tRNA and tmRNA https://youtu.be/n07izcvyUGE * Assigning a function https://youtu.be/VV97ZP7ZpG0 PECAAN Admin Tutorial Videos * Why PECAAN? https://youtu.be/KlVepaPHA3g * How to put a Phage Genome into PECAAN https://youtu.be/Vxf9Bs1QysY * How to put Users into PECAAN https://youtu.be/3l1pMuMEXgg * How to update Official Function List https://youtu.be/wiKv9A0cX_c |
Posted in: PECAAN → YouTube Videos for Students and Faculty