Genbank - Curto Results File
Took master file BLAST-combined.fasta and needed to filter out curto sequences
Build algorithm to sort through file and pull out curto sequences and write in new file
Make a reference text file with curto accession numbers
curto-accession-numbers.txt
Run code to make new curto only file
Problem - new file had >4000 sequences (should be 982)
Multiple duplicate accession numbers - need to remove
Build algorithm to keep unique accession numbers
All curto and frigo bacteria -> curto-and-frigo-only.fasta
Added outlier sequence (AB695377.1 Sediminihabitans luteus) for phylo reference
curto-and-frigo-only-with-outlier.fasta
Create OTUs within curto and frigo genera
Used QIIME pick_otus.py
Use default confidence intervals (97% (n = 41))
Use curto-and-frigo-only-with-outlier.fasta as input file
curto-and-frigo-only-with-outlier_otus.txt
Generated biom file - not important with such closely related taxon
Pick representative sequence for each OTU
Assign Taxonomy to each rep set to make sure everything has worked so far
Must align multiple rep sequences to template - greengenes core database (16S gene)
align_seqs.py
Use PYNAST with min length of 75% of the median sequence length
Filter alignment (filter_alignment.py )
Remove positions which are gaps in every sequence (common for PyNAST, as typical sequences cover only 200-400 bases, and they are being aligned against the full 16S gene)
Removed some OTUs due to failure to align (moved to new file):
OTU2
OTU3
OTU4
OTU8
OTU10
OTU11
OTU26
OTU30
OTU37
OTU38
OTU39
OTU40
Removal of these OTUs reduced the overall number of samples by 50
Should have 933 samples left in 30 OTUs
Make phylogeny - newick file
For Reference:
QIIME Review
No comments:
Post a Comment