Solved the array issue:
%% Load the data
a=h5info('name of hdf file');% return structured array of the hdf
% hieracrchy for reference
observdata=h5read('name of hdf file','/observation/matrix/data');
observIndices=h5read('name of hdf file','/observation/matrix/indices');
observIndptr=h5read('name of hdf file','/observation/matrix/indptr');
ids=h5read('name of hdf file','/observation/ids');
%% Get the OTU indices
% You need a cell array of strings of your desired OTUs stored as variable
% qList
for i=1:length(qList)
qInd=find(strcmp(qList,ids));
end
%% find the data
outmat=zeros(length(qList),length(sampleIndptr));
for i=1:length(qInd)
p=observdata(observIndptr(qInd(i)):observIndptr(qInd(i)+1));
pI=observIndices(observIndptr(qInd(i)):observIndptr(qInd(i)+1));
for j=1:length(pI)
outmat(i,pI(j)+1)=p(j); %plus one to correct for matlab python coordinate changes
end
end
Merged files and got the following with all metadata!!!!!
Hi Alex,
ReplyDeleteThis looks promising! Can you try to change the settings on the blog so it emails me when you post something? Thanks,
Jen
http://stackoverflow.com/a/27728085/901925 creates a scipy/sparse matrix from a biom-format sample file (similar in structure to your massive hdf5 one).
ReplyDelete