Welcome to the forums at seaphages.org. Please feel free to ask any questions related to the SEA-PHAGES program. Any logged-in user may post new topics and reply to existing topics. If you'd like to see a new forum created, please contact us using our form or email us at info@seaphages.org.
Recent Activity
Blastp with ClusteredNR
| Link to this post | posted today, 18:44 | |
|---|---|
|
|
How do we feel about blastp with clusteredNR (nr_cluster_seq) database for positional annotation? It seems like clusteredNR is the new default, not sure if this is by desigh or because it's the top of a list. For me the search returns much faster compared to non-redundant protein sequences database. I'm working on a mycobacteriophage and seeing a wider diversity of hits (non-mycobacteriophage) higher in the descriptions table which is interesting. But the alignment tab shows only 1 hit from a cluster, which doesn't really illustrate how many other sequences have a 1:1 amino acid match between query and subject. It doesn't give you a message like "See 13 other titles"- this was something I previously told my students to look for to understand the depth of matches. The number of sequence in the cluster on the descriptions tab (20) is not the number of sequences that are identical according to clusteredNR result (15- I haven't really inspected these yet) or with the number with a 1:1 match using nr database (13). [Yes, I do tell them to also use phagesdb blastp, but I want them using a tool they will use more broadly after this class.] Anyone [Chris S |
| Link to this post | posted 39 minutes ago | |
|---|---|
|
|
Hi Allison, In general, I would discourage using Blastp at ncbi for stat information. My rationale is this. Do not use any Ref Sequence data because it is not provided by the owner of the sequence. For most data outside of the SEA-PHAGES program annotations are done with automated software and does not follow the same scrutiny that we use. Finally having Starterator data (with 'raw' nucleotide alignments) surpasses alignments provided of called genes that have not gone though our scrutiny. As for the clustering part, again that is provided in Starterator. Also, matches to identical sequences doesn't provide much depth either. You are looking at same instances when they are identical, in which case whether you agree or not is a about whether 'we' agree' with each other, not how a sequence is conserved over time. The hits to non-actinobacteriophage data is of interest, but again what criteria was used to make that call, especially as it pertains to starts. There is some really nice data available at the ncbi hits that could also be investigated. LIke multiple seqeunce alignments. best, debbie |
