Downloading cluster specific fasta files

| posted 09 Mar, 2020 17:58
What is the best way for downloading fasta files for a selected group of phages to build phylogenetic trees? I downloaded the complete fasta file from phagesdb and then I searched the sequences using the phage names, but it takes forever for large clusters like A1.
Thank you
| posted 10 Mar, 2020 16:41
There is a lot of help on the "data" page at phagesdb: Data page

If you or someone you know has basic scripting skills you can use the API at phagesdb. It gives you the most flexibility and reproducibility if you want to repeat the experiment later.

Failing that I would use the bulk system at NCBI. Here is a shorthand protocol I would use:

1. Get "Full tab delimited all phages spreadsheet" from the data page linked above.
2. Use your or a friend's excel skills to filter on cluster/subcluster to create a list with just the phage you want and select and copy the accession numbers column to clipboard
3. Go to Genbank and search the "nucleotide" database with the complete list of all the accession numbers (just paste the whole list in the little search box)
4. On the results page use the tiny menus near the top to change "Summary" to FASTA and then use the "Send to" menu and select "File". You should get a concatenated fasta file wherever your browser downloads. File will be named "sequence.fasta" which you should rename to something move appropriate like "A1_phages.fasta"
