Friday, November 21, 2014

EMP Biom Files Pt. IV

Got in touch with Daniel MacDonald from the Knight Lab:

Sent him the full_emp biom file and he said it is fine but takes about ~30GB to parse (really prohibitive). Converted the open reference biom file into hdf5 format:
ftp://thebeast.colorado.edu/pub/full_emp_table_w_tax.hdf5

Wrote the following code. Only outputs one column (OTUs), but did confirm that curt OTUs are present in the file
import os
import h5py
mydir = os.path.expanduser("~/Desktop/alexs-stuff/")
in_file = mydir + "EMP/EMPopen/full_emp_table_hdf5.h5"
wanted_file = mydir + "EMP/greengenes-curto-only.txt"
out_file = mydir + "EMP/emp-curto-only.txt"
wanted = set()
with open(wanted_file) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)
hdf5_file = h5py.File(in_file, "r")
count = 0
with open(out_file, "w") as h:
for keys in hdf5_file["observation"]["ids"]:
if keys in wanted:
count = count + 1
h.write(keys + "\n")
print "Converted %i records" % count
hdf5_file.close()

1 comment:

  1. Las Vegas - Casino, Restaurants, History - MapYRO
    Las Vegas Casino, Restaurants, History. Las Vegas 광명 출장샵 is 천안 출장마사지 a 3-star resort near the airport. The casino features 392 제주도 출장샵 slot 밀양 출장샵 machines and 1,176 table games. The casino 춘천 출장샵

    ReplyDelete