SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by cdshaffer

| posted 09 Feb, 2024 16:32
there is one more important difference in the installation for Arm based mac's. once you get mysql installed and are setting up conda { see here } you need to change the conda create command by adding some bits at the beginning. So you want to change the create command from
> conda create –name pdm_utils curl python pip biopython==1.77 networkx paramiko pymysql sqlalchemy tabulate urllib3
to
CONDA_SUBDIR=osx-64 conda create –name pdm_utils ….etc.

Then activate the conda environment the first time add this second command:

conda activate pdm_utils
conda env config vars set CONDA_SUBDIR=osx-64

You should only have to run the "conda env config…." line one time to set things up. From then on you can just use
conda activate
and
conda deactivate
as outlined in the instructions.
Posted in: Bioinformatic Tools and AnalysesPDM utils on a Mac M1
| posted 28 Jan, 2024 00:08
Looking at the database I can see that Gene 33 in Poultris is a tRNA gene, so it does not show up on the list of protein coding genes (i.e. the list on phagesdb). The prediction has the tRNA gene from 21834 to 21936 which is completely within the protein coding gene 32. The database does not give the provenance of the prediction so no way to tell if it was called by tRNA-scan or Aragorn, nor which version of those programs were used. But given the 100% overlap with gene 32 it is probably a false positive result, so I am going to guess tRNA-Scan (no shade on tRNA-scan). tRNA-Scan gives a score with its calls, so its whole design philosophy is to call everything no matter how unlikely and just give the really unlikely ones a very bad score. Just another example of why human manual annotation is still a "Good thing" ™
Posted in: PhameratorMissing gene in PhagesDB draft genome
| posted 26 Sep, 2023 17:46
I cannot help you much with the add solexa reads perl script, I have never used it. I always just use the newbler graphical interface, I create a new project from scratch and set everthing up like this:

start by creating a project folder, usually on the desktop; i copy the fastq file with just the reads I want to try to assemble into that folder. open newebler graphical interface, select new project, navigate to the new project folder I just created, give the project a name and click OK.

I then go to the project tab, select the "fastq reads" sub-tab then hit the plus sign in the left side. I then select the fastq file that I prepared for proper size. I then go to parameters tab and in the input sub-tab, make sure large/complex genome is unchecked and Heterozygotic mode is also unchecked. In the computation sub-tab number of CPU's is set to 0 (so that all CPU's are used). In the output sub-tab include consensus and quick output are checked, reads limited to one contig & output trimmed tread are unchecked. For the other settings I use
Pairwise alignment None
Ace format consed16
Ace read mode Default
alignment info: output small
all contig threshold 100
large contig threshold 500
scaffold length threshold 2000

I run the assembly (click the start) then I use consed to open the ace file which I will find in a folder called edit_dir which will be buried down a few folder levels within the project folder. typically the edit_dir will be in a folder called consed in a folder called assembly in a folder with the name of the project in the project folder.

as for your other question using the perl script, my guess is you need the full description of the location of the ace file. this means in the command line you need to specifically name every folder in the exact order to tell the perl script exactly where to find the ace file. IN the above example where I create a project folder on the desktop this part of the command would be quite long something like:
-ace /home/seafaculty/Desktop/projectfolder/projectname/assembly/consed/edit_dir/454Contigs.ace.1

where several of those entries between the / need to be the exact names of your folders in your system. Also make sure to folders have spaces in the names or it gets really tricky. You can get the exact thing to type if you can find the ace file in the graphical interface, right click on it and select properties and copy and paste from the Location entry.

If you want can you copy and paste your exact command and the exact responce from the computer. Also run the
pwd
command and copy/paste the output
Posted in: NewblerGetting Started with Phage Assembly
| posted 19 Sep, 2023 15:15
5 GB should work, I had less than that with my last laptop and had many successful assemblies. So set it to 5, boot up the machine and try assembly again. if it does not finish in 5 or 10 minutes I would try cutting the number of reads in half and try assembly again. Hopefully you can find a working solution with enough data for good assembly but not so too much data that slow everything down drastically with memory overflow issues.
Posted in: NewblerGetting Started with Phage Assembly
| posted 18 Sep, 2023 20:10
You can only change those settings while the machine is OFF. So go the the SEA VM, and shutdown. Once the VM machine is off you should see a green area which are permissible settings. Once you have made and saved the changes you can boot up the SEA VM again. I would also recommend you shut down most other programs running on the host while you work on assembly (like email clients, web browsers, Word, etc,) This will give your machine as much free memory as possible to work with.
Edited 18 Sep, 2023 20:11
Posted in: NewblerGetting Started with Phage Assembly
| posted 18 Sep, 2023 16:19
yes ram memory. How are you running newbler? I run it using the old SEA VM in virtualbox with an older intel mac host. For that set up I go to Machine -> settings -> System -> motherboard
on that page is a slider called "Base memory". My iMac has 16 Gb, so virtualBox allows me up to about 12 GB. I have it set to 10 GB which is plenty. I would recommend setting it to 8 GB or as high as allowed given your computer set-up. Then try assembly again. If it still takes too long reduce the number of reads and try again. If you have a different set up post the description.
Posted in: NewblerGetting Started with Phage Assembly
| posted 11 Sep, 2023 22:48
Unfortunately Newbler is no longer being developed.

for me, I get slow assembly times mostly when I don't have enough memory. This can easily slow things down by many orders of magnitude. So start by trying to increasing the memory available to the VM if you can. I give my machines 4 or 6 Gb if I can. Ask google, or post another query here if you need help with that.

If you have given the VM the max memory size you can and it is still really slow then try fewer reads. Most of my 100x coverage genomes assemble just fine, so you could easily reduce your read count by half and still very likely get a good assembly. If that fails, try 50X (i.e. reduce the read count by a factor of 4). it is really just trial and error in terms of having enough data to get a good assembly but not so much data that you overlaod the memory available on your machine. This is why when I last assembled a whole genome on a drosophila species (180 GB haploid genome) I used a campus computer with 128 GB of available memory, made the assembly take 4-6 hours instead of 4-6 months if I had tried on my laptop.
Posted in: NewblerGetting Started with Phage Assembly
| posted 07 Sep, 2023 17:12
So your example large contig has about ~35X coverage (6441 reads times 150 bp / read divided by ~28,000 bp). 35X is a bit low for illumina. Recommended minimum for Illumina is 50x but for these tiny genomes since sequencing is so cheap I typically go for 200-300X.

For a 70,000 bp genome and 150 bp reads I would probably use 100,000 to 150,000 reads. So adjust your "head" command to take extract more reads and try another assembly. I would just work with the R1 reads they tend to be better quality than the R2 reads. R2 reads are really good for mapping reads to large complex genomes, but for de novo assembly I stick with the R1 reads. Since each sequence takes up 4 line you want somewhere between 400,000 and 600,000 lines of your fastq file to get the 100 to 150 k reads. So instead of your example command of using 20000 use 500000. That would give an estimated coverage of 267x for a 70 kb genome.
Posted in: NewblerGetting Started with Phage Assembly
| posted 05 Sep, 2023 18:50
As for how to handle R1 and R2 depends on your exact sequencer, the quality of the reads, the library prep method, and the read length. There are enough variables here that you will just need to do trial and error and see what works for you. Lately, I have been using 150 bp reads and the R1 reads have been of such high quality that I get good assemblies by just using the correct number of R1 reads. I would suggest you try this simple solution first and see if you get a good assembly, if you do great, if not then more work prepping the reads prior to assembly is worth trying, see the next paragraph. The other problem is using the correct number of reads, see farther below.

On older machines, with reads with higher error rates, I would run the program "pear" to merge the R1 and R2 reads into a longer higher quality "read" that would improve assembly, but this requires a library prep protocol with shorter 300-500 DNA fragments and longer reads. For me, I I used to do this when running 250-300 bp reads. I used pear becuase it was easy to install on my old intel mac. Not sure what I would use now that I am on a newer mac or if I had a PC.

As for the issue of a small number of large contigs and 100's of smaller, this is exactly the result you will get if you use too many reads. See my comments above on error and why too many reads can be a "Bad Thing". I would recommend you try the "head …" command where you extract out a smaller number of reads and try assembling those. That is not really a step you can skip if you want a nice clean assembly. If you did reduce the number of reads and you are getting this result it may be you either have contamination or too few reads. Getting evidence on thsi question is in the next paragraph.

Have you looked at your contigs? Do they look like phage genomes by blast or contaminants? For the large contigs how many reads are in the contig and how long is the contig? More specific details here would help. Note that the newbler assembler creates a file called "454LargeContigs.fna" it has all the sequences of all the "large" contigs. You can open this file with a text editor, and copy out sections of sequence to use in BLAST searches to see if the contig is likely phage sequence or some other contaminant. If you get phage hits you can likely estimate the size of your genome to help you pick the correct number of reads.

See if all that adive solves your issue, if not post specific answers to as many of the above questions as you can and we can go from there.
Posted in: NewblerGetting Started with Phage Assembly
| posted 21 Jul, 2023 19:01
The password discussed above is the password for the mysql database it is not the password if you want to run sudo commands. For sudo you use your password you used to login. But there is a more important issue.

Are you logged in as SEA student or SEA faculty? The SEA student account does not have permission to do sudo commands so the seastudent password will fail. If this is the problem you should get an error message about not being in "sudoers". You have to do all this by logging into the SEA faculty account as that account has permission to run the sudo command. If you are on the seafaculty account and it is failing there is some unusual issue with your set up. If that is the case the next step is probably to post the exact steps you are on and the exact error message.

Note: The password discussed above is the password for the mysql database it is not the password if you want to run sudo. You can use mysql with password phage from the seastudent account, you just cannot run the sudo command with the seastudent login password.
Edited 21 Jul, 2023 19:04
Posted in: PhameratorInstall Guest Additions to VM ---- Without "SEAFaculty Login ability"