Friday, November 14, 2014

EMP Biom Files Pt. II

***ALL DONE ON THE MAC IN THE LAB***

Need to make biom file into classic format to pull out Curto files
Should look something like:
Sample
Taxonomy
OTU 1 
OTU 2
OTU n

biom convert
-i full_emp_table_w_tax_closedref.biom
-o full_emp_closedref_taxonomy.txt
--biom-to-classic-table
--header-key taxonomy 

Generated .txt file but too large to export into Excel - ERROR - not enough memory

Need to breakdown master .biom file:

split_otu_table_by_taxonomy.py
-i full_emp_table_w_tax_closedref.biom
-L 3 #level3 taxonomic split (class level)
-o ./L3/

Need to split further - maybe at L5:

split_otu_table_by_taxonomy.py
-i full_emp_table_w_tax_closedref.biom
-L 5 #level5 taxonomic split (family level)
-o ./L5/

Problem: only 2 Curto OTUs present in outputted .biom file. Asked Sean Gibbons what he thought:
"Yep, any of the OTUs with 'New' in the name did not hit the reference database, so you won't find them in the closed ref table. Number-only labels are Greengenes IDs, and those should all be in the closed ref table."

So - try again with open reference database? crashed last time when I ran previous code on the open reference biom file (see Pt I)
RESULT: SystemError: Negative size passed to PyString_FromStringAndSize

^Probably due to too large of input file
full_emp_table_w_tax.biom is 2.63 GB - probably crashes as a security measure

No comments:

Post a Comment