SEA-PHAGES | Getting Started with Phage Assembly

Link to this post \| posted 18 Sep, 2023 20:45
jcaoyao@gmail.com	Thank you for your clear and most helpful indications. Now I can move my slider. The green area of the ruler below my slider only goes up to about 5 GB. I guess I'll have to live with that.

Link to this post \| posted 19 Sep, 2023 15:15
cdshaffer	5 GB should work, I had less than that with my last laptop and had many successful assemblies. So set it to 5, boot up the machine and try assembly again. if it does not finish in 5 or 10 minutes I would try cutting the number of reads in half and try assembly again. Hopefully you can find a working solution with enough data for good assembly but not so too much data that slow everything down drastically with memory overflow issues.

Link to this post | posted 19 Sep, 2023 15:15

cdshaffer

5 GB should work, I had less than that with my last laptop and had many successful assemblies. So set it to 5, boot up the machine and try assembly again. if it does not finish in 5 or 10 minutes I would try cutting the number of reads in half and try assembly again. Hopefully you can find a working solution with enough data for good assembly but not so too much data that slow everything down drastically with memory overflow issues.

Link to this post \| posted 19 Sep, 2023 16:32
jcaoyao@gmail.com	Got it, thanks for your wishes. Now with 5 GB, it has been going on for over 6 hours and it is getting stuck at the last few hundred reads. I had downsampled to 200,000 lines. I think I'm gonna abort and I try 100,000 to get 50x coverage, since it is the bare minimum, which shouldn't take me more than an hour. Edited 19 Sep, 2023 19:58

Link to this post \| posted 21 Sep, 2023 09:21
jcaoyao@gmail.com	Hi, cdshaffer. I have tried to add Illumina reads to an assembly using "addSolexaReads.perl" and following Dan's instructions. However I keep getting the following error "-ace must be specified". Everything looks correct to me with the command and filenames. Any idea?

Link to this post \| posted 26 Sep, 2023 17:46
cdshaffer	I cannot help you much with the add solexa reads perl script, I have never used it. I always just use the newbler graphical interface, I create a new project from scratch and set everthing up like this: start by creating a project folder, usually on the desktop; i copy the fastq file with just the reads I want to try to assemble into that folder. open newebler graphical interface, select new project, navigate to the new project folder I just created, give the project a name and click OK. I then go to the project tab, select the "fastq reads" sub-tab then hit the plus sign in the left side. I then select the fastq file that I prepared for proper size. I then go to parameters tab and in the input sub-tab, make sure large/complex genome is unchecked and Heterozygotic mode is also unchecked. In the computation sub-tab number of CPU's is set to 0 (so that all CPU's are used). In the output sub-tab include consensus and quick output are checked, reads limited to one contig & output trimmed tread are unchecked. For the other settings I use Pairwise alignment None Ace format consed16 Ace read mode Default alignment info: output small all contig threshold 100 large contig threshold 500 scaffold length threshold 2000 I run the assembly (click the start) then I use consed to open the ace file which I will find in a folder called edit_dir which will be buried down a few folder levels within the project folder. typically the edit_dir will be in a folder called consed in a folder called assembly in a folder with the name of the project in the project folder. as for your other question using the perl script, my guess is you need the full description of the location of the ace file. this means in the command line you need to specifically name every folder in the exact order to tell the perl script exactly where to find the ace file. IN the above example where I create a project folder on the desktop this part of the command would be quite long something like: -ace /home/seafaculty/Desktop/projectfolder/projectname/assembly/consed/edit_dir/454Contigs.ace.1 where several of those entries between the / need to be the exact names of your folders in your system. Also make sure to folders have spaces in the names or it gets really tricky. You can get the exact thing to type if you can find the ace file in the graphical interface, right click on it and select properties and copy and paste from the Location entry. If you want can you copy and paste your exact command and the exact responce from the computer. Also run the `pwd` command and copy/paste the output

Link to this post | posted 26 Sep, 2023 17:46

cdshaffer

I cannot help you much with the add solexa reads perl script, I have never used it. I always just use the newbler graphical interface, I create a new project from scratch and set everthing up like this:

start by creating a project folder, usually on the desktop; i copy the fastq file with just the reads I want to try to assemble into that folder. open newebler graphical interface, select new project, navigate to the new project folder I just created, give the project a name and click OK.

I then go to the project tab, select the "fastq reads" sub-tab then hit the plus sign in the left side. I then select the fastq file that I prepared for proper size. I then go to parameters tab and in the input sub-tab, make sure large/complex genome is unchecked and Heterozygotic mode is also unchecked. In the computation sub-tab number of CPU's is set to 0 (so that all CPU's are used). In the output sub-tab include consensus and quick output are checked, reads limited to one contig & output trimmed tread are unchecked. For the other settings I use
Pairwise alignment None
Ace format consed16
Ace read mode Default
alignment info: output small
all contig threshold 100
large contig threshold 500
scaffold length threshold 2000

I run the assembly (click the start) then I use consed to open the ace file which I will find in a folder called edit_dir which will be buried down a few folder levels within the project folder. typically the edit_dir will be in a folder called consed in a folder called assembly in a folder with the name of the project in the project folder.

as for your other question using the perl script, my guess is you need the full description of the location of the ace file. this means in the command line you need to specifically name every folder in the exact order to tell the perl script exactly where to find the ace file. IN the above example where I create a project folder on the desktop this part of the command would be quite long something like:
-ace /home/seafaculty/Desktop/projectfolder/projectname/assembly/consed/edit_dir/454Contigs.ace.1

where several of those entries between the / need to be the exact names of your folders in your system. Also make sure to folders have spaces in the names or it gets really tricky. You can get the exact thing to type if you can find the ace file in the graphical interface, right click on it and select properties and copy and paste from the Location entry.

If you want can you copy and paste your exact command and the exact responce from the computer. Also run the

pwd

command and copy/paste the output

Link to this post \| posted 27 Sep, 2023 12:04
jcaoyao@gmail.com	Thank you for this clear walk-through, which I understand is for assembling a genome from scratch. But supposing I have assembled already using 50k reads, and it has given me 100+ contigs of a variety of sizes, and I want to add the remainder of the reads that were not used initially, so as to improve the assembly and get the fewest contigs possible, you wouldn't know how I could do that, would you? Dan once said to use the command he posted at https://seaphages.org/forums/topic/42/ but all my attempts to execute it have failed. I do hope somebody eventually sees my query there. But since it is unlikely, would it be possible at all for me to send you my fastq files and run them on your computer, and possibly join the 100+ pieces of contigs? According to BLASTn, the contigs seem to belong to several different phages even within a single fastq file. I would be more than grateful.

Link to this post | posted 27 Sep, 2023 12:04

jcaoyao@gmail.com

Thank you for this clear walk-through, which I understand is for assembling a genome from scratch. But supposing I have assembled already using 50k reads, and it has given me 100+ contigs of a variety of sizes, and I want to add the remainder of the reads that were not used initially, so as to improve the assembly and get the fewest contigs possible, you wouldn't know how I could do that, would you? Dan once said to use the command he posted at https://seaphages.org/forums/topic/42/
but all my attempts to execute it have failed. I do hope somebody eventually sees my query there. But since it is unlikely, would it be possible at all for me to send you my fastq files and run them on your computer, and possibly join the 100+ pieces of contigs? According to BLASTn, the contigs seem to belong to several different phages even within a single fastq file. I would be more than grateful.

Recent Activity

Getting Started with Phage Assembly