SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by chg60

| posted 02 Mar, 2023 14:50
@Chris

The commandline version of biolib will submit the jobs to the DTU servers using the same API as their site uses, so in theory there's no advantage to using the biolib commandline unless you have a NVIDIA RTX 2080 GPU (or better) and a correctly setup environment.

However, you can run DeepTMHMM locally even without a graphics card if you write a little Python script that imports from biolib. It takes a couple of minutes per gene, so it's not a scalable solution, nor is it a viable long-term option, but for one genome at a time should circumvent whatever (hopefully temporary) issues are being experienced on DTU's backend.

I have such a script written if you're interested in trying it out, but I don't have it on this computer so won't be able to share it until later today.

-Christian
Posted in: Functional AnnotationDeep TMHMM?
| posted 30 Oct, 2022 13:16
Hi Amanda,

This is another good use case for a quick MySQL export command. This one may be a bit more complicated than the one I gave you for exporting phams, but will allow you to count the annotated tRNAs.

mysql -u root -p Actino_Draft -e "SELECT PhageID, Start, Stop, Orientation, AminoAcid, Anticodon FROM trna WHERE PhageID IN (SELECT PhageID FROM phage WHERE HostGenus = 'Rhodococcus' ) ORDER BY PhageID, Start ASC;" -B -N > ~/Rhodo_trnas.tsv

If you want to filter on a different host, you can change from 'Rhodococcus' to 'Othergenus'.

If you want to filter on one or more clusters, substitute

WHERE HostGenus = 'Rhodococcus'

for

WHERE Cluster in ('Choose', 'your', 'clusters' )

Note that the fact that one or more tRNAs have not been annotated in a particular phage does not mean that they aren't there, only that the most recent annotation did not include them. More comprehensive studies of tRNAs should always begin by re-annotating tRNAs in the strains of interest.

Best,

Christian
Edited 30 Oct, 2022 13:18
Posted in: tRNAsData on tRNAs for many phages?
| posted 03 Jun, 2022 20:28
If tRNAscan-SE predicts it as an Ile2, that is what should be annotated. The flat file that was uploaded to PhagesDB has the following for the gene in question:

gene 97158..97230
/gene="180"
/locus_tag="SEA_JEDEDIAH_180"
tRNA 97158..97230
/gene="180"
/locus_tag="SEA_JEDEDIAH_180"
/product="tRNA-Ile"
/note="tRNA-Ile2(cat)"


The inconsistency between the product/note fields is what causes the error during flat file QC. If you adjust the annotation such that these fields are in agreement, the error should go away.

Best,

Christian
Posted in: tRNAsMet vs Ile2
| posted 10 May, 2022 18:31
Hi Amanda,

It seems that the way text is rendered in this forum changes double hyphens to a single wide one.

Try 'pip install –upgrade phamclust' with two hyphens rather than just the visible one, and see whether that fixes things.

Assuming it does, setting the clustering threshold to 0 should now yield the expected all-vs-all matrix.

-Christian
Posted in: Bioinformatic Tools and AnalysesDifficulty pulling large-scale data for batch GCS analysis using Pope/Mavrich 2017 scripts
| posted 11 Mar, 2022 22:24
Ok, I found the bug. The clustering algorithm I wrote doesn't require that you provide an explicit clustering threshold - if you don't provide one, it tries to infer one by studying the matrix. However, in the if-block that checks whether the function was given a threshold, I didn't anticipate the need to differentiate between 0 and None (`if not threshold:`).

If you run:

pip install –upgrade phamclust

that should bring phamclust up to version 0.1.2, which fixes the issue.

Thanks for reporting that unexpected behavior, Nick!

-Christian
Posted in: Bioinformatic Tools and AnalysesDifficulty pulling large-scale data for batch GCS analysis using Pope/Mavrich 2017 scripts
| posted 11 Mar, 2022 21:43
Hi Nick,

Your output using '-t 0' probably indicates that there's a bug that I need to work out. Setting a clustering threshold of 0% similarity should result in one massive cluster, regardless of the clustering algorithm… I'll look into it and get back to you.

Your '-t 100' experiment is interesting! I knew there were some groups of VERY homogeneous phages, but didn't think there were 7 groups of 100 percenters just within that host group! I'd be curious the results if you don't use GCS, but rather the default 'pocp' method of calculating similarity. The 'pocp' math is very similar to GCS, but deals with the (rare) instances of paralogs more sensibly than GCS.

Best,

Christian
Edited 11 Mar, 2022 21:44
Posted in: Bioinformatic Tools and AnalysesDifficulty pulling large-scale data for batch GCS analysis using Pope/Mavrich 2017 scripts
| posted 09 Mar, 2022 17:49
Excellent - glad that worked for you!

You can do the same analysis with any subset of genomes from Actino_Draft, the only thing that needs to change is command #1 from above, then run command #3 since you already have phamclust installed.

To analyze the things currently labelled as AZ against everything currently in draft status, you'd run something like:

mysql -u root -p Actino_Draft -e "SELECT PhageID, PhamID FROM gene WHERE PhageID IN (SELECT PhageID FROM phage WHERE Cluster = 'AZ' or Status = 'draft' ) ORDER BY PhageID, Stop ASC;" -B -N >> ~/cluster_AZ_or_draft_phams.tsv

To do the AZ-EH comparison, you'd do something like:

mysql -u root -p Actino_Draft -e "SELECT PhageID, PhamID FROM gene WHERE PhageID IN (SELECT PhageID FROM phage WHERE Cluster IN ('AZ', 'EH' ) ORDER BY PhageID, Stop ASC;" -B -N >> ~/cluster_AZ_or_EH_phams.tsv

You're basically limited in how creatively you can construct your initial MySQL query.

Phamclust's default behavior builds clusters with the 35% GCS threshold and then outputs a single pairwise matrix for each resultant cluster. You can force the creation of an all-versus-all matrix (for example if you are comparing things that are more-or-less unrelated) by adding '-t 0' to command #3.

-Christian
Posted in: Bioinformatic Tools and AnalysesDifficulty pulling large-scale data for batch GCS analysis using Pope/Mavrich 2017 scripts
| posted 09 Mar, 2022 16:01
Hi Amanda,

The database schema has changed substantially since that publication, so it's likely the script(s) will need to be altered to accomplish your goal.

Since you already have MySQL (and presumably have a local copy of Actino_Draft loaded into MySQL… ), there two paths forward: the easy way and the hard way. I'll lay out the easy way first, and if you like that we don't have to think about the hard way! I'm assuming that your computer runs macOS or Linux.

In a Terminal, run these commands:

1. mysql -u root -p Actino_Draft -e "SELECT PhageID, PhamID FROM gene WHERE PhageID IN (SELECT PhageID FROM phage WHERE Cluster = 'AZ' ) ORDER BY PhageID, Stop ASC;" -B -N >> ~/cluster_AZ_phams.tsv
2. pip3 install phamclust
3. phamclust ~/cluster_AZ_phams.tsv ~/cluster_AZ_gcs -m gcs

Cluster AZ only has 37 members, so all of this should be done very quickly.

Now in your Finder, you can go to the home directory –> cluster_AZ_gcs, and open cluster_1.txt in Excel. Select the first column and split text to columns using delimiters "tab" and "space". You can then apply a color gradient (I like a three-step: red=0, white=35, green=100) to easily visualize the strength of pairwise relationships.

Explanation of Terminal steps:
1. exporting PhageIDs and their associated PhamIDs (in ascending order of gene stop coordinate) to a tab-delimited table with no header
2. installing my phamclust package
3. run phamclust using the gcs method ('-m gcs' ) against the table we exported, with results being placed in a directory called 'cluster_AZ_gcs'
Edited 09 Mar, 2022 16:03
Posted in: Bioinformatic Tools and AnalysesDifficulty pulling large-scale data for batch GCS analysis using Pope/Mavrich 2017 scripts
| posted 18 Feb, 2022 13:23
Good morning!

The SEA VM you downloaded is very old, so it's likely that the dependencies used by PhamNexus are not installed. Dan Russell and I are in the discussion phase of creating a new SEA VM, but it's been happening on an "as we have time" basis.

If you download this more recent SEA VM and email me directly (chg60@pitt.edu) I can help you make sure that the VM is set up and will run PhamNexus properly.

It will be much more difficult in the old VM because that operating system doesn't explicitly support a high enough Python3 version to actually run pdm_utils, or by extension PhamNexus.

-Christian
Posted in: Bioinformatic Tools and AnalysesPhamNexus on SEA-VM
| posted 15 Feb, 2022 18:26
Hi Debbie and Fred,

I've attempted to begin a systematic analysis to determine how much we can trust the outputs from either of these programs.

I accumulated a list of diverse types of DNA binding proteins: tyrosine or serine integrases, terminase large subunits, HTH DNA-binding proteins, RecE exonuclease, RecT ssDNA binding protein, etc. I pulled representative sequences from a subset of phams predominated by proteins with these functions.

With the caveat that I've only run 6 sequences so far, I'm not impressed by DNABIND. It's very fast (which is nice!), but only two of the sequences were predicted as DNA-binding proteins (a tyrosine integrase and an HTH DNA-binding protein). The others were all reported as having a probability less than 40% of being DNA-binding.

DNABINDER is MUCH slower - I'm still waiting on the first protein sequence, nearly an hour later. Ignoring the question of whether we can trust its output, I'm of the opinion that this program is too slow to warrant systematic use by SEA-PHAGES annotators.

-Christian
Posted in: Functional AnnotationCan we call DNA Binding proteins based on DNABIND and DNA Binder results?