Genbank - Curto Results File
- Took master file BLAST-combined.fasta and needed to filter out curto sequences
- Build algorithm to sort through file and pull out curto sequences and write in new file
- Make a reference text file with curto accession numbers
- curto-accession-numbers.txt
- Run code to make new curto only file
- Problem - new file had >4000 sequences (should be 982)
- Multiple duplicate accession numbers - need to remove
- Build algorithm to keep unique accession numbers
- All curto and frigo bacteria -> curto-and-frigo-only.fasta
- Added outlier sequence (AB695377.1 Sediminihabitans luteus) for phylo reference
- curto-and-frigo-only-with-outlier.fasta
- Create OTUs within curto and frigo genera
- Used QIIME pick_otus.py
- Use default confidence intervals (97% (n = 41))
- Use curto-and-frigo-only-with-outlier.fasta as input file
- curto-and-frigo-only-with-outlier_otus.txt
- Generated biom file - not important with such closely related taxon
- Pick representative sequence for each OTU
- Assign Taxonomy to each rep set to make sure everything has worked so far
- Must align multiple rep sequences to template - greengenes core database (16S gene)
- align_seqs.py
- Use PYNAST with min length of 75% of the median sequence length
- Filter alignment (filter_alignment.py)
- Remove positions which are gaps in every sequence (common for PyNAST, as typical sequences cover only 200-400 bases, and they are being aligned against the full 16S gene)
- Removed some OTUs due to failure to align (moved to new file):
- OTU2
- OTU3
- OTU4
- OTU8
- OTU10
- OTU11
- OTU26
- OTU30
- OTU37
- OTU38
- OTU39
- OTU40
- Removal of these OTUs reduced the overall number of samples by 50
- Should have 933 samples left in 30 OTUs
- Make phylogeny - newick file
For Reference:
QIIME Review
No comments:
Post a Comment