Thursday, October 16, 2014

Genbank - Relate Metadata to Phylo

Use isolation source (extracted from GenBank files using extract-data-from-genbank.py) to see what habitat each sequence was obtained from.

  • Problem - most sequences do not have adequate information in iso_source to sufficiently conclude where the sequence originated
  • Use the title of the paper (listed in GenBank file) to lookup paper
    • Find geographic location and origin of sequence in Methods Section
      • Time intensive - better way to do this?
  • Categorize the sequences into either Terrestrial, Aquatic, or Air-borne
    • Subcategorize into various fields listed in excel doc 
      • BLAST-gg-aligned-with-otus.xlsx

No comments:

Post a Comment