SEA-PHAGES Logo

The official website of the HHMI Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science program.

Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.

All posts created by cdshaffer

| posted 04 Jun, 2021 20:59
I am posting here the short 14 slide deck I use for the introduction to starterator in my phage class for 2021. I usually introduce starterator late in the training after they have already had a chance to work with Glimmer, GeneMark, SD, gap score, and BLAST results. Feel free to use or edit.

intro_starterator.pptx

Edit to update link 1/18/2022
Edited 18 Jan, 2022 04:34
Posted in: StarteratorStarterator intro lecture
| posted 03 Jun, 2021 18:07
Sometimes we just cannot see the simplest problems because we are concentrating so hard on the task at hand.
You should try installing DNA Master on a Windows VM, not on an Ubuntu VM. There is a way to get Windows for free. I think the most recent instructions are here:

https://phagesdb.org/media/docs/InstallingWindowsOnMac.pdf

But if anyone knows of a more recent set of instructions please post.
Posted in: DNA MasterDNA master server down?
| posted 14 May, 2021 18:20
Just a follow up. when I had two tandem start codons I always picked the longer gene model (based on the "All other things being equal, a longer call is usually preferable," rule) but recent work with mass spec on phage proteins suggest otherwise. I am quoting now from the online guide (this page on revising your annotations) with the somewhat obscure rule that came out of that mass spec work [note i have added the underline for emphasis]

Can the start site of the downstream gene be extended so that the gene covers more of the gap? Carefully consider all possible start sites for the downstream gene. If a longer one is available, compare it to the current start site to see if it is a similar or better choice. All other things being equal, a longer call is usually preferable, but do not extend genes just to fill a gap. The exception to this are genes with two start codons in tandem, in these cases all of our wet bench experiments support the second of the two codons as the correct start.
Posted in: Choosing Start SitesF1 gene needs help on start site
| posted 04 May, 2021 20:09
Wow, so cool. Never heard of a SGNH domain. It appears to be a specific subtype of acytltransferase. I have not had time to do a suitable in depth on this but here is the paper:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7448272/

That paper on the crystal is only 10 months old so not surprising it has not come up before. My brief survey suggests a good general annotation here might be "carbohydrate acetyltransferase" but that would need more work to provide more evidence to have the term added to the approved list. Since both domains in this protein appear to both be related to acetyltransferases of two different types I would think "acetyltransferase" is a good match from the options on the current approved terms list.
Posted in: Functional Annotationacyltransferase and SGNH domains
| posted 20 Apr, 2021 15:32
I think Deb is right in that you you should check for alignments to domains. I can see quite a few HHPRED matches that start in the middle of the subject but align to amino acid 1 or 2 when start 797 is selected.

When I get situations like this, I have my students take the amino acid sequence of the longer form and do an hhpred search. Then look at the results and ask: do those "extra" amino acids at the beginning (42 amino acids in this case) also aligning to the subject. If those amino acids do align, we take it as pretty evidence that those first amino acids are in the protein and we pick the longer form, if the amino acids to not align we pick the shorter form.
Posted in: Choosing Start SitesSecond opinion Cluster F1 Gene Start Site
| posted 17 Apr, 2021 03:39
OK just an update. The most recent version of the database (403) vine_74 is back to pham 58246.
It still appears as pham 57934 on phagesdb so the phagesdb links are out of date and will not work but the first link above will, at some point phagesdb will update and should then report vine_74 is back in pham 58426.
Posted in: StarteratorPham not found in Starterator
| posted 16 Apr, 2021 16:13
Yes this is a database sync issue. The new database should appear by the end of today. In the mean time the results for vine_74 are still available using the older number you mentioned:

http://phages.wustl.edu/starterator/Pham58426Report.pdf

The URL has the exact same pattern for all phams, so if get a link that does not work and you see the pham number has changed you can always manually change the URL back to the old number and see if that works. In this case, that 58246 number does work. Sometime later this link will not work. and the newer link will:

http://phages.wustl.edu/starterator/Pham57943Report.pdf
Posted in: StarteratorPham not found in Starterator
| posted 28 Mar, 2021 20:06
Just a heads up.

Christian in the Hatfull lab has been working on optimizing the parameters for the clustering of phage proteins into phams. The most recent version of the database (ver 400) shows a much larger than average shift in both the number and make-up of phams. We don't know how these changes will effect starterator analysis. It may help overall in that more genes will be grouped resulting in fewer genes ending up as orphams with no starterator report. It may also not help in that the added genes will be so divergent that they provide little evidence to interpret within the reports.

All uses should be on the lookout for changes that effect the usefulness of the starterator reports. If anything that appears "off" or "confusing" in the starterator results let us know. If things seem to be working better for you let us know that too. You can use this forum or send me an email.
Posted in: Starteratorphameration tweeks and effects on starterator
| posted 19 Mar, 2021 16:43
I am still not convinced it is not one amino acid back (i.e. the slip is D/P instead of K/P). Supporting the former is base conservation, supporting the latter is the "observed pattern" for many slippery sequences. I know of no evidence to tell me which is more informative in this situation. I will certainly say that either annotation has enough support that it will qualify as "less worse" than going with the up til now policy of "annotate T as a separate gene and pick the Longest orf". So we have several BK1 and will annotate using the CCCAAAT pattern accordingly.
Posted in: Frameshifts and IntronsNo frameshift in cluster BK1?
| posted 18 Mar, 2021 16:49
1st: Yup, stop codons are a no go as far as I am concerned. I was just looking at conservation in the MSA which is why I mentioned backing up; but you are correct, I would not think it a good gene model to add in "stop codon read through" (I know these do exist in eukaryotes do they even exist in Prok's?)

2nd: I am fine if Joyce or anyone else wants to include this data in a poster. Its kind of a pain to create all those DNA sequences if you don't have Starterator running in a VM so I am happy to send anyone the sequences or the alignment for any pham, just send me an email.

As I said above, I think the only evidence that could be relatively easily collected that would help me make up my mind is to get a sense of how often the slippery sequence changes in other phams, if we NEVER see it change in other phams then that would make me pause here on the side of caution and stick with the "least worst" model. On the other hand if we do see it happening in other phams then I could see calling it here too.

So just like in all my wet bench experiments: if you are not sure of your conclusions: run another experiment.
Edited 18 Mar, 2021 16:50
Posted in: Frameshifts and IntronsNo frameshift in cluster BK1?