Still working on open source files. not really sure how to access array data from hdf5 files
However, wrote short code to organize the 2 OTUs from the closed_ref_emp_table to combine with merged mapping file:
import os
import csv
mydir = os.path.expanduser("~/Desktop/alexs-stuff/EMP/")
in_file = mydir + "EMP_10k_merged_mapping_final.txt" #master mapping file
#need txt file with sample ids that had curto hits
wanted_file = mydir + "EMPclosed/sample-ids-curto.txt"
out_file = mydir + "EMPclosed/curto-samples.csv"
wanted = set()
with open(wanted_file) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)
count = 0
with open(in_file, "rb") as tsvin, open(out_file, "wb") as csvout:
tsvin = csv.reader(tsvin, delimiter = '\t')
csvout = csv.writer(csvout)
for row in tsvin:
if row[0] in wanted:
count = count + 1
csvout.writerows([row])
print "Converted %i records" % count
Output looks like this in excel after some editing:
Total samples: 136 with 111 in the merged mapping file
No comments:
Post a Comment