Thursday, October 9, 2014

Genbank - Curto Results File


  • Took master file BLAST-combined.fasta and needed to filter out curto sequences
    • Build algorithm to sort through file and pull out curto sequences and write in new file
      • Make a reference text file with curto accession numbers
        • curto-accession-numbers.txt
      • Run code to make new curto only file
        • curto-only2.py
    • Problem - new file had >4000 sequences (should be 982)
      • Multiple duplicate accession numbers - need to remove
      • Build algorithm to keep unique accession numbers
        • duplicate-removal.py
    • All curto and frigo bacteria -> curto-and-frigo-only.fasta
  • Added outlier sequence (AB695377.1 Sediminihabitans luteus) for phylo reference
    • curto-and-frigo-only-with-outlier.fasta
  • Create OTUs within curto and frigo genera 
    • Used QIIME pick_otus.py
      • Use default confidence intervals (97% (n = 41))
      • Use curto-and-frigo-only-with-outlier.fasta as input file
    • curto-and-frigo-only-with-outlier_otus.txt
    • Generated biom file - not important with such closely related taxon
      • otu-table.biom
  • Pick representative sequence for each OTU
  • Assign Taxonomy to each rep set to make sure everything has worked so far
  • Must align multiple rep sequences to template - greengenes core database (16S gene)
    • align_seqs.py
    • Use PYNAST with min length of 75% of the median sequence length
  • Filter alignment (filter_alignment.py)
    • Remove positions which are gaps in every sequence (common for PyNAST, as typical sequences cover only 200-400 bases, and they are being aligned against the full 16S gene)
    • Removed some OTUs due to failure to align (moved to new file):
      • OTU2
      • OTU3
      • OTU4
      • OTU8
      • OTU10
      • OTU11
      • OTU26
      • OTU30
      • OTU37
      • OTU38
      • OTU39
      • OTU40
    • Removal of these OTUs reduced the overall number of samples by 50
    • Should have 933 samples left in 30 OTUs
  • Make phylogeny - newick file 

For Reference: QIIME Review

No comments:

Post a Comment