Tuesday, December 2, 2014

EMP Biom Files Pt. V

Still working on open source files. not really sure how to access array data from hdf5 files

However, wrote short code to organize the 2 OTUs from the closed_ref_emp_table to combine with merged mapping file:

import os
import csv

mydir = os.path.expanduser("~/Desktop/alexs-stuff/EMP/")


in_file = mydir + "EMP_10k_merged_mapping_final.txt" #master mapping file

#need txt file with sample ids that had curto hits
wanted_file = mydir + "EMPclosed/sample-ids-curto.txt" 

out_file = mydir + "EMPclosed/curto-samples.csv"

wanted = set()

with open(wanted_file) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)

count = 0

with open(in_file, "rb") as tsvin, open(out_file, "wb") as csvout:
tsvin = csv.reader(tsvin, delimiter = '\t')
csvout = csv.writer(csvout)

for row in tsvin:

if row[0] in wanted:
count = count + 1
csvout.writerows([row])


print "Converted %i records" % count

Output looks like this in excel after some editing:
Total samples: 136 with 111 in the merged mapping file

No comments:

Post a Comment