SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by cdshaffer

| posted 26 Dec, 2021 19:42
All good questions. We have not finished the analysis on these proteins yet, so I am not sure what the final annotation on these proteins should be. I just noted the implied missing term from the note and so posted the above request. I will get back with more details once I work with the student.

This phage is not yet in phagesdb. In case you want to take a look, see PECAAN phage stanimal, gene that end at 22299 is the one discussed above and has the really good hit to the amidase domain; while the other gene which might also be annotated "endolysin" (based on membership in pham 93752) ends at 16907.
Posted in: Request a new function on the SEA-PHAGES official listminor fix for approved terms
| posted 23 Dec, 2021 20:41
We have a streptomyces phage which does not have the typical Lysin A/B pair, we have found one protein that hits quite well by HHPRED to an N-acetylmuramoyl-L-alanine amidase (crystal 6SSC with 99.5% probability, and 100% coverage of the crystal and ~66% coverage of the phage protein). Looking at the approved list in the notes column for lysin A, N-acetylmuramoyl-L-alanine amidase domain it says:

if not a Mycobacteriophage, must have a lysin B, otherwise
it is endolysin, N-acetylmuramoyl-L-alanine amidase domain

However the term "endolysin, N-acetylmuramoyl-L-alanine amidase domain" is not officially an approved term (i.e. not listed in column A). I mention this only to request "endolysin, N-acetylmuramoyl-L-alanine amidase domain" be added to the list so it gets updated on pecaan.
Edited 23 Dec, 2021 20:43
Posted in: Request a new function on the SEA-PHAGES official listminor fix for approved terms
| posted 19 Nov, 2021 16:52
pdm_utils uses Biopython numbering system which is based on python. This system uses zero based counting (the first position is 0) with an "open right end" (the right coordinate is the 1st position after the region). So base numbers for gene positions will not be the same numbers in PDM_utils and DNA Master even though they mark the same region. As a biologist you can think of this as DNA Master is numbering the bases and PDM_utils is numbering the phosphate backbone and always assuming there is a 5' phosphate.

So the "base" numbers (99206-99279) are actually marking out exactly the same region as the "phosphate" numbers (99205, 99279). Here is a link to a BioStars page with pictures and more details which show how these two numbering systems relate to each other: https://www.biostars.org/p/84686/

However, while the above explains the difference in the coordinates it does not explain the "move the end 3 bases". Here I believe PDM_utils is just going on the literal tRNA Scan results and has not done any programing to correct the end as should be done manually for all tRNA Scan results (For those unaware see this article in the Bioinformatics guide.) So, if you did do the manual trimming as described your result is better than PDM_utils and I would just ignore its warning and submit. I would also send an email to Deb, Christian and Lawrence that the manually trimmed tRNA at 99206-99279 is indeed correct even though it is failing PDM-utils.
Posted in: tRNAsTomas tRNA error
| posted 11 Nov, 2021 22:55
OK follow up on the issue with pham 56633. As I said before I had already found that the issue was with phage ISF9 gene 29. It turns out that phage is one of the "Added phage" which don't come from any of the Pittsburg programs but was a phage isolated from Microbacterium oxydans in Iran and published in genbank. It turns out this sequence has two N bases in the version of the sequence in Actino_Draft and these N bases confused Starterator and caused it to crash when it was counting bases to find the start and stop codons. So this bug should not be a problem for phage that we publish since Dan is always careful to check the sequences for N's but it could be an ongoing issue for these phage that get added. Not sure exactly how to deal with these phage in the long run but for now please continue to post if you find a missing pham report.
Posted in: StarteratorPham not found in Starterator
| posted 11 Nov, 2021 20:24
Very cool, I think it could be fun to set it up so students could do their own sequencing on a nanopore sometime late in the first semester. These genomes are so small I think we could get enough data even on the smallest (i.e. cheapest) of the nanopore sequencers. Which did you use? Were you using the standard Minion or one of the smaller flongles? OR was this outsourced on a GridION? Thanks for being the initial test subject.

The only comment I have is that I thought there was a certain rate in which the pore will pick up and start sequencing the second strand pretty quickly after the 1st. Does your single long 115k read look like an inverted repeat? This is what I would anticipate if you were reading the second strand after pulling the first strand through the pore.

Have you tried assembly yet? Do you get a single contig of the size expected for an AW phage? If so I would think you get an estimate of the quality of the genome assembly by just running the draft assembly through auto-annotation. We have such a strong expectation of "tight pack" genes and since many assembly/sequencing errors would disrupt genes, I would think an auto-annotation of that preliminary assembly could provide a decent estimate of the quality and if there was a need for illumina polishing or not.
Posted in: Sequencing, Assembling, and Finishing GenomesNanopore
| posted 09 Nov, 2021 21:24
OK preliminary analysis suggests this is some kind of error in start codon annotations in phage ISF9 gene 29. This is a non-SEA phage from genbank that was added to the Actino_Draft database. The annotated start for this gene in the Actino_Draft is not a valid start codon once it is analyzed by Starterator. So it could be a bug in Starterator or a data entry error in Actino_Draft database. Determining that will take time, but in the mean time I just hand edited my local copy of the database to remove the problematic gene from pham 56633. I then ran the starterator analysis with all members of the pham except ISF9_29. The report should now be available but you will want to download the file for later use as it is likely to disappear again with the next database update, as I am not sure how long it will take to track down the exact issue.

For documentation purposes this link should work for the next 3-4 months:
http://phages.wustl.edu/438/Pham56633Report.pdf
Posted in: StarteratorPham not found in Starterator
| posted 09 Nov, 2021 17:53
Wow thanks for this, you are correct that pham is missing. You have actually located a bug in Starterator, which crashes when you try to analyze that pham. This is excellent, as we find and fix these bugs the program works better and better. I did a quick check and I found the general location of the problematic pham in the analysis but will need time to dig deeper into why it is happening. I will post more info once I get this bug squashed. Thanks for posting.
Posted in: StarteratorPham not found in Starterator
| posted 03 Nov, 2021 16:25
As usual we are in agreement. I too think it is a really strong candidate but I was just not going to call without something more in support.
Posted in: Frameshifts and IntronsTAC frameshift in singleton
| posted 01 Nov, 2021 19:58
OK,
I have the situation in phage onionknight (a singleton) looking for a tail assembly chaperone and looking for the slip. The attached shows the region which is the only sequence that matches at all with any of the published slippery sequences. However, it is by no means an exact match. See pic, I have highlighted the putative slippery sequence in blue and the two translation frames in red. This would be a typical -1. This slip would be about 20 amino acids upstream of the stop codon in the short form so not an unreasonable location, it would just be nice to get a consensus with other annotators if this match to published slippery sequences is close enough to annotate or should we just leave it as two separate ORF's.
Posted in: Frameshifts and IntronsTAC frameshift in singleton
| posted 16 Sep, 2021 20:51
I will add my experience as another datapoint to help with troubleshooting:
I too am also getting glimmer results but not GeneMark results, and I have confirmed the settings as displayed in Debbies attached picture. Very strange that one predictor would work and not the other for some users and not others.
Posted in: DNA MasterAuto-annotation fix for fall 2017 and later