Need:
gg_13_5.fasta
gg_13_5_taxonomy.txt
2. Search for taxonomy of interest - start with Microbacteriaceae
#creates a text file with IDs matching search
$ egrep "f__Microbacteriaceae" gg_13_5_taxonomy.txt | awk '{print $1}' > ./gg-microbacteriaceae.txt
3. micro-only.py
#searches fasta file and creates a new fasta file with only IDs from gg-microbacteriaceae.txt
#found 5707 sequences
4. Combine my 16S reads
$ cat my-16S-reads.fasta gg-micro.fasta > output.fasta
5. QIIME - pick_otus.py - generates 327 OTUs
-m uclust
-s 0.97
-A #optimal search
***Swarm loses OTUs when running due to its algorithm
6. QIIME - pick_rep_set.py
-f gg-all-microbacteriaceae-with-16S.fasta
-r my-16S-reads.fasta
-m longest
6B. fasta-rename.py #renames all seqs with new names on fasta header
7. Align rep_sest sequences with SINA
8. Eliminate all OTUs with <20 seqs EXCEPT for Curtobacterium OTUs (also did <50 seqs)
9. JModel Test - Computes likelihood scores with PHYML
Base Frequencies +F
Rate Variation +I +G nCat=4
ML Optimized
Base Tree Search = NNI
Best Models:
Models BIC Calculation
TlM1 + G 27589
TrN + G 27593
GTR + G 27608
10. Run TrN+G model on MEGA
-Maximum Likelihood
-Nucleotide Substitution = TrN
-Bootstrap Method = 100
-Gamma Distributed = 5
-Complete Deletion
-NNI
11. Run GTR+G on RAxML - see RAxML manual for help
$ raxmlHPC -s input_file.phy -n output_name -m GTRGAMMA -# 100 -x 100 -p 2389 -f a -o outgroup_name
MEGA - TrN + G with OTUs > 50 seqs
No comments:
Post a Comment