Wednesday, October 29, 2014

Genbank - Phylogeny (Updated)

I combined the Metadata into the phylogeny to show that Curtobacterium displays a degree of niche specificity

Saturday, October 25, 2014

Genbank - OTU specialization


  • Go through each OTU and see whether there is a habitat specification
  • Compare to phylogeny 

Thursday, October 16, 2014

Genbank - Relate Metadata to Phylo

Use isolation source (extracted from GenBank files using extract-data-from-genbank.py) to see what habitat each sequence was obtained from.

  • Problem - most sequences do not have adequate information in iso_source to sufficiently conclude where the sequence originated
  • Use the title of the paper (listed in GenBank file) to lookup paper
    • Find geographic location and origin of sequence in Methods Section
      • Time intensive - better way to do this?
  • Categorize the sequences into either Terrestrial, Aquatic, or Air-borne
    • Subcategorize into various fields listed in excel doc 
      • BLAST-gg-aligned-with-otus.xlsx

Thursday, October 9, 2014

Genbank - Curto Results File


  • Took master file BLAST-combined.fasta and needed to filter out curto sequences
    • Build algorithm to sort through file and pull out curto sequences and write in new file
      • Make a reference text file with curto accession numbers
        • curto-accession-numbers.txt
      • Run code to make new curto only file
        • curto-only2.py
    • Problem - new file had >4000 sequences (should be 982)
      • Multiple duplicate accession numbers - need to remove
      • Build algorithm to keep unique accession numbers
        • duplicate-removal.py
    • All curto and frigo bacteria -> curto-and-frigo-only.fasta
  • Added outlier sequence (AB695377.1 Sediminihabitans luteus) for phylo reference
    • curto-and-frigo-only-with-outlier.fasta
  • Create OTUs within curto and frigo genera 
    • Used QIIME pick_otus.py
      • Use default confidence intervals (97% (n = 41))
      • Use curto-and-frigo-only-with-outlier.fasta as input file
    • curto-and-frigo-only-with-outlier_otus.txt
    • Generated biom file - not important with such closely related taxon
      • otu-table.biom
  • Pick representative sequence for each OTU
  • Assign Taxonomy to each rep set to make sure everything has worked so far
  • Must align multiple rep sequences to template - greengenes core database (16S gene)
    • align_seqs.py
    • Use PYNAST with min length of 75% of the median sequence length
  • Filter alignment (filter_alignment.py)
    • Remove positions which are gaps in every sequence (common for PyNAST, as typical sequences cover only 200-400 bases, and they are being aligned against the full 16S gene)
    • Removed some OTUs due to failure to align (moved to new file):
      • OTU2
      • OTU3
      • OTU4
      • OTU8
      • OTU10
      • OTU11
      • OTU26
      • OTU30
      • OTU37
      • OTU38
      • OTU39
      • OTU40
    • Removal of these OTUs reduced the overall number of samples by 50
    • Should have 933 samples left in 30 OTUs
  • Make phylogeny - newick file 

For Reference: QIIME Review

Genbank - Curto Only Fasta File

1. Take master fasta file (BLAST-combined.fasta)
2. Need to extract only curto and frigo taxonomic alignments
3. Create a .txt file with Accession Numbers
     a. Sort through BLAST-combined-curto_tax_assignments.txt for curto and frigo
          i. Above file was generated from QIIME assign_taxonomy.py
     b. Add accession numbers of only curto and frigo and create curto-accession-numbers.txt
4. Run curto-only2.py to cross-reference .txt file to master .fasta file
     a. Basically, code sorts through BLAST-combined.fasta and pulls out the information if the accession number is in the curto-accession-numbers.txt file
     b. PROBLEM: program adds ALL matching accession numbers (n = 4355, should be n = 982)
          i. curto-and-frigo-only-with-dups.fasta
     c. The extra sequence data is from duplicate accession numbers - need to filter out
5. Run duplicate-removal.py to filter out duplicate accession numbers, not duplicate sequences
6. FINALLY, have a fasta file with only curto and frigo sequences (n = 982 - verified)
     a. curto-and-frigo-only.fasta

Genbank - Protocol for Metadata extraction from GenBank

Protocol for Curto Sequences

1. BLAST GreenGenes Rep Sequences and take top 5000 hits per sequence blasted
2. Query search for "microbacteriaceae curtobacterium 16S ribosomal RNA gene"
     a. Returned 1255 results
     b. Concatenate results onto GG rep sets
3. Created GenBank file with all results (n = 41246)
     a. combined-curto.gb
4. Run extract-data-from-genbank.py and export results to .csv file
     a. Took accession number, Genbank ID, title, isolation source, host, and rep sequence
     b. Tallied number of unique records (n = 11484)
5. Convert .gb to .fasta file using gb-to-fasta.py
6. Use QIIME to assign-taxonomy.py using PYNAST
7. Add taxonomic information to .csv file
8. Delete duplicate accession numbers and align taxonomic information with genbank info
     a. Created master sheet with (n = 11419) sequences that aligned with GreenGenes database
     b. Had 9237 isolation sources
     c. Excel - Duplicate Removal
9. Took accession numbers that aligned with curto (n = 959) and isolation sources (n = 736)
     ***NOTE*** A lot of sequences only aligned to the Family level
     Sample below of information extracted - master file: BLAST-gg-aligned.xlsx

10. Run sequence-cleaner.py and export list of accession numbers with get-accession.py for reference of which sequences were duplicates
     a. List of duplicate sequences: duplicate-BLAST-sequences.xlsx

Genbank - GreenGenes Curto Rep Sequences

GreenGenes Rep Sequences

583016 k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Microbacteriaceae; g__Curtobacterium; s__

>583016
GAGTTTGATCCTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGATGATGCCCAGCTTGCTGGGTGGATTAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCCTGACTCTGGGATAAGCGTTGGAAACGACGTCTAATACTGGATATGACGGCCGATCGCATGGTCTGGTCGTGGAAAGATTTTTTGGTTGGGGATGGACTCGCGGCCTATCAGCTTGTTGGTGAGGTAATGGCTCACCAAGGCGACGACGGGTAGCCGGCCTGAGAGGGTGACCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCAACGCCGCGTGAGGGACGACGGCCTTCGGGTTGTAAACCTCTTTTAGTAGGGAAGAAGGGAGCTTGCTCTTGACGGTACCTGCAGAAAAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGAGCTCGTATGCGGTTTGACGCGTCTGCTGTGAAATCCCGAGGCTCAACCTCGGGCTTGCAGTGGGTACGGGCAGACTAGAGTGCGGTAGGGGAGATTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGATCTCTGGGCCGTTACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGTTGGGCGCTAGATGTAGGGACCTTTCCACGGTTTCTGTGTCGTAGCTAACGCATTAAGCGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGCGGATTAATTCGATGCAACGCGAAGAACCTTACCAAGGCTTGACATGTACTGGAAACGGCCAGAGATGGTCGCCCCCTTGTGGTCGGTATACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCGTTCTATGTTGCCAGCGGTTCGGCCGGGGACTCATAGGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACAATGGCCGGTACAAAGGGCTGCGATACCGTAAGGTGGAGCGAATCCCAAAAAGCCGGTCTCAGTTCGGATTGAGGTCTGCAACTCGACCTCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCAAGTCATGAAAGTCGGTAACACCCGAAGCCGGTGGCCTAACCCTTGTGGAAGGAGCCGTCGAAGGTGGGATCGGTAATTAGGACTAAGT


173906 k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Microbacteriaceae; g__Curtobacterium; s__

>173906
AGTCGAACGATGATGCCCAGCTTGCTGGGTGGATTAGTGGCGAACGGGTGAGTCACACGTGAGTGCACCTGCCCCTGTACTCTGGGATAAGCGTTGGAAACGACGTCTAATACTGGATATGATCACTGGCCGCATGGTCTGGTGGTGGAAAGATTTTTTGGTTGGGGATGGACTCGCGGCCTATCAGCTTGTTGGTGAGGTAATGGCTCACCAAGGCGACGACGGGTAGCCGGCCTGAGAGGGTGACCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCAACGCCGCGTGAGGGATGACGGCCTTCGGGTTGTAAACCTCTTTTAGTAGGGAAGAAGCGAAAGTGACGGTACCTGCAGAAAAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTGCTGTGAAATCCCGAGGCTCAACCTCGGGCTTGCAGTGGGTACGGGCAGACTAGAGTGCGGTAGGGGAGATTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGATCTCTGGGCCGTAACTGACGCTGAGGAGCGAAAGCATGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCATGCCGTAAACGTTGGGCGCTAGATGTAGGGACCTTTCCACGGTTTCTGTGTCGTAGCTAACGCATTAAGCGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGCGGATTAATTCGATGCAACGCGAAGAACCTTACCAAGGCTTGACATACACCGGAAACGGCCAGAGATGGTCGCCCCCTTGTGGTCGGTGTACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCGTTCTATGTTGCCAGCGGGTTATGCCGGGGACTCATAGGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACAATGGCCGGTACAAAGGGCTGCGATACCGTAAGGTGGAGCGAATCCCAAAAAGCCGGGCTCAGTTCGGATTGAGGTATGCCACTCGACCTCATGAAGTCGGAGTCGCTAGTAAGAGCAGATCAGCAACGCTGCGGTGCAGACGTTACCCGGGCCTTGGAACAACACCGCCCGTACAAGTTCATGAAAGTCGTCACAACCCGAAGCCGGTGGCCTAACCCTTGTGGAAG


4327233 k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Microbacteriaceae; g__Curtobacterium; s__

>4327233
GACGAACGCTGGCGGCGTGCTTAACCGTTGCGAGTCGAACGATGAAGCCCAGCTTGCTGGGTGGTTAGTGGCGAACGGGTGTGTACACTTAGTAACCTGCCCCTGACTCTGGGATAAGCGTTGGAAACGACGTCTAATACTGGATATGACTACGGGTCGCATGGCCTGGTGGTGGAAAGATTTTTTGGTTGGGGATGGACTCGCGGCCTATCAGCTTGTTGGTGAGGTAATGGCTCACCAAGGCGACGACGGGTAGCCGGCCTGAGAGGGTGACCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCAACGCCGCGTGAGGGATGACGGCCTTCGGGTTGTAAACCTCTTTTAGTAGGGAAGAAGCGAAAGTGACGGTACCTGCAGAAAAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTGCTGTGAAATCCCGAGGCTCAACCTCGGGCTTGCAGTGGGTACGGGCAGACTAGAGTGCGGATAGGGGAGATTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGATCTCTGGGCCGTAACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGTTGGGCGCTAGATGTAGGGACCTTTCCACGGTTTCTGTGTCGTAGCTAACGCATTAAGCGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGCGGATTAATTCGATGCAACGCGAAGAACCTTACCAAGGCTTGACATACACCGGTAACGGCCAGAGATGGTCGCCCCCTTGTGGTCGGTGTACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCGTTCTATGTTGCCAGCGCGTTATGGCGGGGACTCATAGGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACAATGGCCGGTACAAAGGGCTGCGATACCGTAAGGTGGAGCGAATCCCAAAAAGCCGGTCTCAGTTCGGATTGAGGTCTGCAACTCGACCTCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCCGGGCCTTGTACACACCACCCGTCAAGTCATGAAAGTCGGTAACACCCGAAGCCGGTGGCCTAACCCTTGTGAAGGAGCCGTCGAAGGTGGGATCGGTGATTAGGACTAAGTCGTAACAAG


106397 k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Microbacteriaceae; g__Curtobacterium; s__

>106397
AACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGATGATGCCCAGCTTGCTGGGTGGATTAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCCTGACTCTGGGATAAGCGTTGGAAACGACGTCTAATACTGGATATGACGGCCGATCGCATGGTCTGGTCGTGGAAAGATTTTTTGGTTGGGGATGGACTCCCGGCCTATCACCTTGTTGGTGAGGTAATGGCTCACCAAGGCAACAACGGGTACCCGGCCTAAAAGGGTGACCGGCCACACTGGGACTGAAACACGGCCCAAACTCCTACGGGAGGCACCATTGGGGAATATTGCACAATGGGCAAAACCCTGATGCACCAACCCCCCTTGAGGGACAACGGCCTTCGGGTTTTAAACCTCTTTTATTAGGGAAAAAGGGACCTTGCNCTTGACGGTACCTGCAAAAAAACCACCGGCTAACTACTTGCCACCACCCGCGGTAATACTTAGGGTGCAACCTTTTTCCGGAATTATTGGGCTTAAAAACCTCTTAGGCGGTTTGTCCCTTCTGCTGTGAAATCCCAAGGCTCAACCTCGGGCTTGCATTGGGTACGGGCAAACTAAATTGCGGTAGGGGAGATTGGAATTCCTGGTGTACCGGTGGAATGCGCAAATATCAGGAGGAACACCGATGGCGAAGGCARATCTCTGGGCCGTTACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGTTGGGCGCTAGATGTAGGGACCTTTCCACGGTTTCTGTGTCGTAGCTAACGCATTAAGCGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGCGGATTAATTCGATGCAACGCGAAGAACCTTACCAAGGCTTGACATATACCGGAAACGGCCAGAGATGGTCGCCCCCTTGTGGTCGGTATACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCGTTTTATGTTGCCAGCGGTTCGGCCGGGGACTCATAGGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACAATGGCCGGTACAAAGGGCTGCGATACCGTAAGGTGGAGCGAATCCCAAAAAGCCGGTTTCAGTTCGGATTGAGGTCTGCAACTTGACCTCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCAAGTCATGAAAGTCGGTAACACCCGAAGCCGGTGGCCTAACCCTTGTGGAAGGAGCCGTCGAAGGTGGGATCGGTAATT


2532575 k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Microbacteriaceae; g__Curtobacterium; s__

>2532575
GCGGCGTGCTTAACACATGCAAGTCGAACGATGATCAGGAGCTTGCTCCTGTGATTAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCCTGACTCTGGGATAAGCGTTGGAAACGACGTCTAATACNGGATATGACGGCCGATCGCATGGTCTGGTCGTGGAAAGATTTTTTGGTTGGGGATGGACTCGCGGCCTATCAGCTTGTTGGTGAGGTAATGGCTCACCAAGGCGACGACGGGTAGCCGGCCTGAGAGGGTGACCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCAACGCCGCGTGAGGGATGACGGCCTTCGGGTTGTAAACCTCTTTTAGTAGGGAAGAAGCGAAAGTGACGGTACNTGCAGAAAAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGAGCTNGTAGGCGGTTTGTCGCGTCTGCTGTGAAATCCCGAGGCTCAACCTCGGGCTTGCAGTGGGTACGGGCAGACTAGAGTGCGGTAGGGGAGATTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGATCTCTGGGCCGTAACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGTTGGGCGCTAGATGTAGGGACCTTTCCACGGTTTCTGTGTCGTAGCTAACGCATTAAGCGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGCGGATTAATTCGATGCAACGCGAAGAACCTTACCAAGGCTTGACATATACCGGAAACGGCCAGAGATGGTCGCCCCCTTGTGGTCGGTATACAGGTGGTGCATGGTNGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCNTTCTATGTTGCCAGCGGTTCGGCCGGGGACTCATAGGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACAATGGCCGGTACAAAGGGCTGCGATACCGTAAGGTGGAGCGAATCCCAAAAAGCCGGTCTCAGTTCGGATTGAGGTCTGCAACTCGACCTCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCAAGTCATGAAAGTCGGTAACACCCGAAGCCGGTGGCCTAACCCTTGTGGAAGGAGCCGTCGAAGGTGGGATCGGTGATTAGGACTAAGTCGTAACAAGGTAGCCGTACCGGAAGGTGCGGCT


849178 k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Microbacteriaceae; g__Curtobacterium; s__

>849178
GAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGATGATCACGAGCTTGCTCCTGTGATTAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCCTGACTCTGGGATAAGCGTTGGAAACCACGTCTAATACTGGATATGATCGCTGGCCGCATGGTCTGGTGGTGAAAAGATTTTTTGGTTGGGAATGGACTCCCGGCCTATCACCTTGTTGGTGAGGTAATGGCTCACCAAGGCAACAACGGGTAGCCGGCCTGAAAGGGTGACCGGCCACACTGGAACTGAAACACGGCCCAAACTCCTACGGGAGGCAGCATTGGGAAATATTGCACAATGGGCGAAAGCCTGATGCACCACCCCGCCGTGAGGAATGACGGCCTTCGGGTTGTAAACCTCTTTTATTAGGGAAAAACCAAAAGTGACGGTCCCTGCAAAAAAAGCACCGGCTAACTACTTGCCACCAGCCGCGGTAATACTTAGGGTGCAAGCGTTGTCCGGAATTATTGGGCGTAAAAAGCTCGTAGGCGGTTTGTCGCGTCTGCTGTGAAATCCCAAGGCTCACCCTCGGGCTTGCATTGGGTACGGCCAAACTAAATTGCGGTAGGGAAGATTGAAATTCCTGGTGTACCGTGTGAAATGCGCAATATATCAGGAGGAACACCGATGGCAAAGGCAGATCTCTGGGCCTTAACTGACCCTAAGAAGCGAAACCTTGGGGGAGCGAACAGGATTAAAATACCCTGGTAGTCCACGCCTAAAAACGTTGGCCGCTAGATGTAGGGACCTTTCCACGTTTCTGTGTGGTAGCTAACCCATTAAGCGCCCCGCGTAGGGAGTACGGCCGCAAGGCTATAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGCGGATTAATTCGATGCAACGGGAAGAACCTTACCAAGGCTTGACATCCACCGGAAACGGCCAGAGATGGTCGCCCCCTTGTGGTCGGTGTACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAATCCTCGTTCTATGTTGCCAGCGCGTTATGGCGGGGACTCATAGGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGATGACGTCAAATCATCATCCCCCTTATGTCTTGGGCTTCACGCATGCTACAATGGCCGGTACAAAGGGCTGCGATACCGTAAGGTGGAGCGAATCCCAAAAAGCCGGTCTCAGTTCGGATTTAGGTATGCAACTCGACCTCATTAAGTCGGAGTCGTTAGTAATCGCAGATCAGCAATCGGTGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCAAGTCAAGAAAGTCGGTAACACCCGAAGACCGGTGGCCTAACCCCTTGTGGAAGGAGCCGTCGAAGGTGGGATCCGGTGATTAGGACTAAGTCGTAACAAGGTAGCCGTA


4432662 k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Microbacteriaceae; g__Curtobacterium; s__

>4432662
AACGATGATGCCNAGCTTGCTGGGTGGATTAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCCTGACTCTGGGATAAGCGTTGGAAACGACGTCTAATACTGGATATGATCACTGGCCGCATGGTCTGGTGGTGGAAAGATTTTTGGTTGGGGATGGACTCGCGGCCTATCAGCTTGTTGGTGAGGTAATGGCTCACCAAGGCGACGACGGGTAGCCGGCCTGAGAGGGTGACCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGAAGGCCTGATGCAGCAACGCCGCCTGAGGGATGACGGCCTTCGGGTTGTAAACCTCTTTAGTAGGGAAGAAGCGAAAGTGACGGTACCTGCAGAAAAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGAGCTCCTAGCCGGTTTGTCGCGTCTGCTGTGAAATCCCGAGGCTCAACCTCGGGCTTGCAGTGGGTACGGGCAGACTAGAGTGCGGTAGGGGAGATTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGGGCACCGATGGCGAAGGCAGATCTCTGGGCCGTAACTGACGCTGAGGAGCGAATGCATGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCATGCCGTAAACGTTGGGCGCTAGATGTAGGGACCTTTCCACGGTTTCTGTGTCGTAGCTAACGCATTAAGCGCCCCGCCTGGGCCGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGCCCGCACAAGCGGCGGAGCATGCGGATTAATTCGATGCAACGCGAAGAACCTTACCAAGGCTTGACATACACCGGAAACGGCCAGAGATGGTCGCCCCCGGGTGGTCGGTGTACTGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCGTTCTATGTTGCCAGCGGGGTTATGCCGGGGACTCATAGGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCGTTATGTCTTGGGCTTCACGCATGCTACAATGGCCGGTACAAAGGGCTGCGATACCGTAAGGTGGAGCGAATCCCAAAAAGCCGGTCTCAGTTCGGATTGAGGTCTGCAACTCGACCTNATGAAGTCGGAGNNNCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCCGNCCTTGTACACACCGCCCGTCAAGTCATGAAAGTCGGTAACACCCGAAGCCGGTNNCCTAACCCTGCGGAAGNAGCCGTCGAAGGTG


404720 k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Microbacteriaceae; g__Curtobacterium; s__

>404720
CATGCAAGTCGAACGATGATGCCCAGCTTGCTGGGTGGATTAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCCTGACTCTGGGATAAGCGTTGGAAACGACGTCTAATACTGGATATGATCACTGGCCGCATGGTCTGGGGGTGGAAAGATTTTTTGGTTGGGGATGGACTCGCGGCCTATCAGCTTGTTGGTGAGGTAATGGCTCACCAAGGCGACAACGGGTAGCCGGCCTGAAAGGGTGACCGGCCACACTGGGACTGAAACACGGCCCAAACTCCTACGGGAGGCAGCAGGGGGGAATATTGCACAATGGGCGAAAGCCTGATGCAGCAACGCCGCGTGAGGGATGACGGCCTTCGGGTTGTAAACCTCTTTTAGTAGGGAAAAAGCGAAAGTGACGGTACCTGCAAAAAAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCAAGCGTTGTCCGGGGAATTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTGCTGTGAAATCCCGAGGCTCAACCTCGGGCTTGCAGTGGGTACGGGCAGACTAGAGTGCGGTAGGGGAGATTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGATCTCTGGGCCGTAACTGACGCTGAGGAGCGAAAGCATGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCATGCCGTAAACGTTGGGCGCTAGATGTAGGGACCTTTCCACGGTTTCTGTGTCGTAGCTAACGCATTAAGCGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGTCACAAGCGGCGGAGCATAGCGGGATTAATTCGATGCAACGCGAAGAACCTTACCAAGGCTTGACATACACCGGAAACGGCCAGAGATGGTCGCCCCCTTGTGGTCGGTGTACAGGTGGTGCATGGTTGTCGTCCAGCTCGTGTCGTGAGATTGTTGGGTTAAGTCCCGCAACGAGCCGCAACCCTCGTTCTATGTTGCCAGCGGGTTATGCCGGGGACTCATAGGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTCTTGGGCTTCACGCATGCTACAATGGCCGGTACAAAGGGCTGCGATACCGTAAGGTGGAGCGAATCCCAAAAAGCCGGTCTCAGTTCGGATTGAGGTCTGCAACTCGACCTCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCAAGTCATGAAAGTCGGTAACACCCGAAGCCGGTGGCC