SEA-PHAGES | All posts created by cdshaffer

← previous
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
next →

Link to this post \| posted 11 Feb, 2022 20:08
cdshaffer	There are two issues here. One is the code and what should starterator put on the report for situations like this. The other is how best to interpret the data to try to come up with the start choice best supported by the evidence. With respect to the former, Amanda is correct in that the code that handles that is quite simple and just is not built to deal with ties and deciding in any formal way how to break them. Coding/testing/publishing changes all takes time so for many issue like this, the question is always "is the problem worth fixing? or "is it good enough even though not perfect?" There are probably dozens of issues like this so there is always more problems that need fixing that time to fix them. Thus, these kinds of issues can be quite common, especially in bioinformatic software where there is one or only a small number of maintainers. This is a good teaching moment to remind students that for all bioinformatic software like this, it is always wise to to be wary of the results from any one program, especially when running across unusual or rare situations. As you use a program more and more you will learn what the program does well and where it "fails" but before that time (to mis-quote an old TV cop show) "Be careful out there". In this particular case, I have time to work on Starterator (pretty much only in the Fall) and I use feedback from users on what issues to fix or new features to add when deciding what exactly to do. So it would be totally appropriate to submit this as an "issue" that needs fixing. This is done on the Github pages where the official version of the code exists. There on github is a discussion board called "issues" where anyone can post bug reports and feature requests. I encourage anyone and everyone to provide feedback there by creating a new issue and posting your requests/comments or adding your own comments to other issues. Any software is only as good as it ability to serve the needs of its users, which is why user feedback is so important. When I get time to work on starterator I go to the issue board to see what's up and any issue with lots of comments is far more likely to be worked on than an issue that is never mentioned. As for the interpretation of Starterator reports during gene start analysis I will leave that for your discussion with Deb.

Link to this post | posted 11 Feb, 2022 20:08

There are two issues here. One is the code and what should starterator put on the report for situations like this. The other is how best to interpret the data to try to come up with the start choice best supported by the evidence.

With respect to the former, Amanda is correct in that the code that handles that is quite simple and just is not built to deal with ties and deciding in any formal way how to break them. Coding/testing/publishing changes all takes time so for many issue like this, the question is always "is the problem worth fixing? or "is it good enough even though not perfect?" There are probably dozens of issues like this so there is always more problems that need fixing that time to fix them. Thus, these kinds of issues can be quite common, especially in bioinformatic software where there is one or only a small number of maintainers. This is a good teaching moment to remind students that for all bioinformatic software like this, it is always wise to to be wary of the results from any one program, especially when running across unusual or rare situations. As you use a program more and more you will learn what the program does well and where it "fails" but before that time (to mis-quote an old TV cop show) "Be careful out there".

In this particular case, I have time to work on Starterator (pretty much only in the Fall) and I use feedback from users on what issues to fix or new features to add when deciding what exactly to do. So it would be totally appropriate to submit this as an "issue" that needs fixing. This is done on the Github pages where the official version of the code exists. There on github is a discussion board called "issues" where anyone can post bug reports and feature requests. I encourage anyone and everyone to provide feedback there by creating a new issue and posting your requests/comments or adding your own comments to other issues. Any software is only as good as it ability to serve the needs of its users, which is why user feedback is so important. When I get time to work on starterator I go to the issue board to see what's up and any issue with lots of comments is far more likely to be worked on than an issue that is never mentioned.

As for the interpretation of Starterator reports during gene start analysis I will leave that for your discussion with Deb.

Posted in: Starterator → How is the most annotated start determined when 2 starts have the same number of manual annotations in the same pham?

Link to this post \| posted 02 Feb, 2022 05:49
cdshaffer	I would try tie MAC version first both mac and linux have the same unix core structure, you certainly will need to re-install the guest additions though as the mac version will likely have the guest additions for mac installed and you will need to download and install the guest additions for linux. Since you have Mint already installed and running you can do this, I would search for help pages on "how to install virtualbox guest additions on linux guest". Any of the top hits should walk you through the steps. I took a quick look some suggest using the command line, others say the program that automatically runs when you "insert" the virtual CD with the guest additions software works as well. I always used the command line but the autorun could work, as with most things you will just have to try and see. Good luck. Post questions if you get stuck

Link to this post | posted 02 Feb, 2022 05:49

cdshaffer

I would try tie MAC version first both mac and linux have the same unix core structure, you certainly will need to re-install the guest additions though as the mac version will likely have the guest additions for mac installed and you will need to download and install the guest additions for linux.

Since you have Mint already installed and running you can do this, I would search for help pages on "how to install virtualbox guest additions on linux guest". Any of the top hits should walk you through the steps. I took a quick look some suggest using the command line, others say the program that automatically runs when you "insert" the virtual CD with the guest additions software works as well. I always used the command line but the autorun could work, as with most things you will just have to try and see. Good luck.
Post questions if you get stuck

Posted in: SEA-PHAGES Virtual Machine → Installing SEA VM on Linux machine

Link to this post \| posted 28 Jan, 2022 18:17
cdshaffer	Unfortunately there are no "good" (i.e. cheap and easy) solutions at this time to getting students access to DNA Master with the newer M1 hardware. I also suspect this is going to be more and more of a problem as Apple moves more and more of their computers to the M1 chips. The best solution at this time is recommend to the student to buy parallels, which is akin to VirtualBox in that it will allow installation and running of Windows Virtual machines. Of course the instructions for doing this are all for virtualbox and so many exact details on installation and running of the windows machine will differ slightly with parallels. The only good news there is really is that there is a 50% discount for students for the basic version of parallels, point the student to this page for all the details: https://www.parallels.com/landingpage/pd/education/ As a final note I will say that the minimum recommended specifications for Parallels is quite low and I would expect most users with that bottom of the line computer to have a very unsatisfactory experience. I would tell my students not to bother spending $40 for parallels unless they have AT LEAST 8 gig of memory, less than that it is just wouldn't be worth the price.

Link to this post | posted 28 Jan, 2022 18:17

cdshaffer

Unfortunately there are no "good" (i.e. cheap and easy) solutions at this time to getting students access to DNA Master with the newer M1 hardware. I also suspect this is going to be more and more of a problem as Apple moves more and more of their computers to the M1 chips.

The best solution at this time is recommend to the student to buy parallels, which is akin to VirtualBox in that it will allow installation and running of Windows Virtual machines. Of course the instructions for doing this are all for virtualbox and so many exact details on installation and running of the windows machine will differ slightly with parallels. The only good news there is really is that there is a 50% discount for students for the basic version of parallels, point the student to this page for all the details:
https://www.parallels.com/landingpage/pd/education/

As a final note I will say that the minimum recommended specifications for Parallels is quite low and I would expect most users with that bottom of the line computer to have a very unsatisfactory experience. I would tell my students not to bother spending $40 for parallels unless they have AT LEAST 8 gig of memory, less than that it is just wouldn't be worth the price.

Posted in: SEA-PHAGES Virtual Machine → Error message when installing VirtualBox in MacBook Air

Link to this post \| posted 18 Jan, 2022 23:56
cdshaffer	Has anyone tried buying cloud PC access? You can use these search term: "windows cloud PC" or "Desktop as a Service" to see what I am talking about. All these are paid services (although some do have a free period to start), and unfortunately most are designed for businesses not individuals but there are some out there that do cater to the individual. These systems are designed such that a student could connect remotely from any computer with a browser (like a chromebook). The total cost would likely be less than the cost of a typical text book will vary quite a bit depending on how powerful a machine you need. Dan, can you tell us what is the bare minimum one would need for running DNA Master in Windows 10? I assume a student could get by with 1 vCPU but how much memory? This would help estimate prices. Any reports from others that might have tried this, would appreciate if you could tell us of your experience.

Link to this post | posted 18 Jan, 2022 23:56

cdshaffer

Has anyone tried buying cloud PC access?

You can use these search term: "windows cloud PC" or "Desktop as a Service" to see what I am talking about. All these are paid services (although some do have a free period to start), and unfortunately most are designed for businesses not individuals but there are some out there that do cater to the individual. These systems are designed such that a student could connect remotely from any computer with a browser (like a chromebook). The total cost would likely be less than the cost of a typical text book will vary quite a bit depending on how powerful a machine you need. Dan, can you tell us what is the bare minimum one would need for running DNA Master in Windows 10? I assume a student could get by with 1 vCPU but how much memory? This would help estimate prices.

Any reports from others that might have tried this, would appreciate if you could tell us of your experience.

Posted in: DNA Master → DNA Master and Chromebook

Link to this post \| posted 26 Dec, 2021 19:42
cdshaffer	All good questions. We have not finished the analysis on these proteins yet, so I am not sure what the final annotation on these proteins should be. I just noted the implied missing term from the note and so posted the above request. I will get back with more details once I work with the student. This phage is not yet in phagesdb. In case you want to take a look, see PECAAN phage stanimal, gene that end at 22299 is the one discussed above and has the really good hit to the amidase domain; while the other gene which might also be annotated "endolysin" (based on membership in pham 93752) ends at 16907.

Link to this post | posted 26 Dec, 2021 19:42

cdshaffer

All good questions. We have not finished the analysis on these proteins yet, so I am not sure what the final annotation on these proteins should be. I just noted the implied missing term from the note and so posted the above request. I will get back with more details once I work with the student.

This phage is not yet in phagesdb. In case you want to take a look, see PECAAN phage stanimal, gene that end at 22299 is the one discussed above and has the really good hit to the amidase domain; while the other gene which might also be annotated "endolysin" (based on membership in pham 93752) ends at 16907.

Posted in: Request a new function on the SEA-PHAGES official list → minor fix for approved terms

Link to this post \| posted 23 Dec, 2021 20:41
cdshaffer	We have a streptomyces phage which does not have the typical Lysin A/B pair, we have found one protein that hits quite well by HHPRED to an N-acetylmuramoyl-L-alanine amidase (crystal 6SSC with 99.5% probability, and 100% coverage of the crystal and ~66% coverage of the phage protein). Looking at the approved list in the notes column for lysin A, N-acetylmuramoyl-L-alanine amidase domain it says: `if not a Mycobacteriophage, must have a lysin B, otherwise it is endolysin, N-acetylmuramoyl-L-alanine amidase domain` However the term "endolysin, N-acetylmuramoyl-L-alanine amidase domain" is not officially an approved term (i.e. not listed in column A). I mention this only to request "endolysin, N-acetylmuramoyl-L-alanine amidase domain" be added to the list so it gets updated on pecaan. Edited 23 Dec, 2021 20:43

Link to this post | posted 23 Dec, 2021 20:41

cdshaffer

We have a streptomyces phage which does not have the typical Lysin A/B pair, we have found one protein that hits quite well by HHPRED to an N-acetylmuramoyl-L-alanine amidase (crystal 6SSC with 99.5% probability, and 100% coverage of the crystal and ~66% coverage of the phage protein). Looking at the approved list in the notes column for lysin A, N-acetylmuramoyl-L-alanine amidase domain it says:

if not a Mycobacteriophage, must have a lysin B, otherwise
it is endolysin, N-acetylmuramoyl-L-alanine amidase domain

However the term "endolysin, N-acetylmuramoyl-L-alanine amidase domain" is not officially an approved term (i.e. not listed in column A). I mention this only to request "endolysin, N-acetylmuramoyl-L-alanine amidase domain" be added to the list so it gets updated on pecaan.

Edited 23 Dec, 2021 20:43

Posted in: Request a new function on the SEA-PHAGES official list → minor fix for approved terms

Link to this post \| posted 19 Nov, 2021 16:52
cdshaffer	pdm_utils uses Biopython numbering system which is based on python. This system uses zero based counting (the first position is 0) with an "open right end" (the right coordinate is the 1st position after the region). So base numbers for gene positions will not be the same numbers in PDM_utils and DNA Master even though they mark the same region. As a biologist you can think of this as DNA Master is numbering the bases and PDM_utils is numbering the phosphate backbone and always assuming there is a 5' phosphate. So the "base" numbers (99206-99279) are actually marking out exactly the same region as the "phosphate" numbers (99205, 99279). Here is a link to a BioStars page with pictures and more details which show how these two numbering systems relate to each other: https://www.biostars.org/p/84686/ However, while the above explains the difference in the coordinates it does not explain the "move the end 3 bases". Here I believe PDM_utils is just going on the literal tRNA Scan results and has not done any programing to correct the end as should be done manually for all tRNA Scan results (For those unaware see this article in the Bioinformatics guide.) So, if you did do the manual trimming as described your result is better than PDM_utils and I would just ignore its warning and submit. I would also send an email to Deb, Christian and Lawrence that the manually trimmed tRNA at 99206-99279 is indeed correct even though it is failing PDM-utils.

Link to this post | posted 19 Nov, 2021 16:52

cdshaffer

pdm_utils uses Biopython numbering system which is based on python. This system uses zero based counting (the first position is 0) with an "open right end" (the right coordinate is the 1st position after the region). So base numbers for gene positions will not be the same numbers in PDM_utils and DNA Master even though they mark the same region. As a biologist you can think of this as DNA Master is numbering the bases and PDM_utils is numbering the phosphate backbone and always assuming there is a 5' phosphate.

So the "base" numbers (99206-99279) are actually marking out exactly the same region as the "phosphate" numbers (99205, 99279). Here is a link to a BioStars page with pictures and more details which show how these two numbering systems relate to each other: https://www.biostars.org/p/84686/

However, while the above explains the difference in the coordinates it does not explain the "move the end 3 bases". Here I believe PDM_utils is just going on the literal tRNA Scan results and has not done any programing to correct the end as should be done manually for all tRNA Scan results (For those unaware see this article in the Bioinformatics guide.) So, if you did do the manual trimming as described your result is better than PDM_utils and I would just ignore its warning and submit. I would also send an email to Deb, Christian and Lawrence that the manually trimmed tRNA at 99206-99279 is indeed correct even though it is failing PDM-utils.

Posted in: tRNAs → Tomas tRNA error

Link to this post \| posted 11 Nov, 2021 22:55
cdshaffer	OK follow up on the issue with pham 56633. As I said before I had already found that the issue was with phage ISF9 gene 29. It turns out that phage is one of the "Added phage" which don't come from any of the Pittsburg programs but was a phage isolated from Microbacterium oxydans in Iran and published in genbank. It turns out this sequence has two N bases in the version of the sequence in Actino_Draft and these N bases confused Starterator and caused it to crash when it was counting bases to find the start and stop codons. So this bug should not be a problem for phage that we publish since Dan is always careful to check the sequences for N's but it could be an ongoing issue for these phage that get added. Not sure exactly how to deal with these phage in the long run but for now please continue to post if you find a missing pham report.

Link to this post | posted 11 Nov, 2021 22:55

cdshaffer

OK follow up on the issue with pham 56633. As I said before I had already found that the issue was with phage ISF9 gene 29. It turns out that phage is one of the "Added phage" which don't come from any of the Pittsburg programs but was a phage isolated from Microbacterium oxydans in Iran and published in genbank. It turns out this sequence has two N bases in the version of the sequence in Actino_Draft and these N bases confused Starterator and caused it to crash when it was counting bases to find the start and stop codons. So this bug should not be a problem for phage that we publish since Dan is always careful to check the sequences for N's but it could be an ongoing issue for these phage that get added. Not sure exactly how to deal with these phage in the long run but for now please continue to post if you find a missing pham report.

Posted in: Starterator → Pham not found in Starterator

Link to this post \| posted 11 Nov, 2021 20:24
cdshaffer	Very cool, I think it could be fun to set it up so students could do their own sequencing on a nanopore sometime late in the first semester. These genomes are so small I think we could get enough data even on the smallest (i.e. cheapest) of the nanopore sequencers. Which did you use? Were you using the standard Minion or one of the smaller flongles? OR was this outsourced on a GridION? Thanks for being the initial test subject. The only comment I have is that I thought there was a certain rate in which the pore will pick up and start sequencing the second strand pretty quickly after the 1st. Does your single long 115k read look like an inverted repeat? This is what I would anticipate if you were reading the second strand after pulling the first strand through the pore. Have you tried assembly yet? Do you get a single contig of the size expected for an AW phage? If so I would think you get an estimate of the quality of the genome assembly by just running the draft assembly through auto-annotation. We have such a strong expectation of "tight pack" genes and since many assembly/sequencing errors would disrupt genes, I would think an auto-annotation of that preliminary assembly could provide a decent estimate of the quality and if there was a need for illumina polishing or not.

Link to this post | posted 11 Nov, 2021 20:24

cdshaffer

Very cool, I think it could be fun to set it up so students could do their own sequencing on a nanopore sometime late in the first semester. These genomes are so small I think we could get enough data even on the smallest (i.e. cheapest) of the nanopore sequencers. Which did you use? Were you using the standard Minion or one of the smaller flongles? OR was this outsourced on a GridION? Thanks for being the initial test subject.

The only comment I have is that I thought there was a certain rate in which the pore will pick up and start sequencing the second strand pretty quickly after the 1st. Does your single long 115k read look like an inverted repeat? This is what I would anticipate if you were reading the second strand after pulling the first strand through the pore.

Have you tried assembly yet? Do you get a single contig of the size expected for an AW phage? If so I would think you get an estimate of the quality of the genome assembly by just running the draft assembly through auto-annotation. We have such a strong expectation of "tight pack" genes and since many assembly/sequencing errors would disrupt genes, I would think an auto-annotation of that preliminary assembly could provide a decent estimate of the quality and if there was a need for illumina polishing or not.

Posted in: Sequencing, Assembling, and Finishing Genomes → Nanopore

Link to this post \| posted 09 Nov, 2021 21:24
cdshaffer	OK preliminary analysis suggests this is some kind of error in start codon annotations in phage ISF9 gene 29. This is a non-SEA phage from genbank that was added to the Actino_Draft database. The annotated start for this gene in the Actino_Draft is not a valid start codon once it is analyzed by Starterator. So it could be a bug in Starterator or a data entry error in Actino_Draft database. Determining that will take time, but in the mean time I just hand edited my local copy of the database to remove the problematic gene from pham 56633. I then ran the starterator analysis with all members of the pham except ISF9_29. The report should now be available but you will want to download the file for later use as it is likely to disappear again with the next database update, as I am not sure how long it will take to track down the exact issue. For documentation purposes this link should work for the next 3-4 months: http://phages.wustl.edu/438/Pham56633Report.pdf

Link to this post | posted 09 Nov, 2021 21:24

cdshaffer

OK preliminary analysis suggests this is some kind of error in start codon annotations in phage ISF9 gene 29. This is a non-SEA phage from genbank that was added to the Actino_Draft database. The annotated start for this gene in the Actino_Draft is not a valid start codon once it is analyzed by Starterator. So it could be a bug in Starterator or a data entry error in Actino_Draft database. Determining that will take time, but in the mean time I just hand edited my local copy of the database to remove the problematic gene from pham 56633. I then ran the starterator analysis with all members of the pham except ISF9_29. The report should now be available but you will want to download the file for later use as it is likely to disappear again with the next database update, as I am not sure how long it will take to track down the exact issue.

For documentation purposes this link should work for the next 3-4 months:
http://phages.wustl.edu/438/Pham56633Report.pdf

Posted in: Starterator → Pham not found in Starterator

← previous
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
next →

Recent Activity

All posts created by cdshaffer