Thursday, December 11, 2014

EMP Biom Files Pt. VII

Went back to the hdf5 file and attempted to solve problem in MatLab (Rich helped a lot!)

Solved the array issue:

%% Load the data
a=h5info('name of hdf file');% return structured array of the hdf
% hieracrchy for reference
observdata=h5read('name of hdf file','/observation/matrix/data');
observIndices=h5read('name of hdf file','/observation/matrix/indices');
observIndptr=h5read('name of hdf file','/observation/matrix/indptr');
ids=h5read('name of hdf file','/observation/ids');

%% Get the OTU indices
% You need a cell array of strings of your desired OTUs stored as variable
% qList

for i=1:length(qList)
    qInd=find(strcmp(qList,ids));
end

%% find the data

outmat=zeros(length(qList),length(sampleIndptr));

for i=1:length(qInd)
    p=observdata(observIndptr(qInd(i)):observIndptr(qInd(i)+1));
    pI=observIndices(observIndptr(qInd(i)):observIndptr(qInd(i)+1));
    for j=1:length(pI)
        outmat(i,pI(j)+1)=p(j); %plus one to correct for matlab python coordinate changes
    end
end

Merged files and got the following with all metadata!!!!!


2 comments:

  1. Hi Alex,
    This looks promising! Can you try to change the settings on the blog so it emails me when you post something? Thanks,
    Jen

    ReplyDelete
  2. http://stackoverflow.com/a/27728085/901925 creates a scipy/sparse matrix from a biom-format sample file (similar in structure to your massive hdf5 one).

    ReplyDelete