Seurat: problem reading h5 file

Created on 23 Aug 2018  ·  7Comments  ·  Source: satijalab/seurat

Hi there,

I'm trying to read an h5 file from a published data set (available on GEO accession GSM2561498), using the Read10X_h5 function, but keep getting the following error.

> ley.ctrl.data <- Read10X_h5('GSM2561498.h5')
Error in x$exists(name) : HDF5-API Errors:
    error #000: H5L.c in H5Lexists(): line 879: unable to get link info
        class: HDF5
        major: Symbol table
        minor: Object not found

    error #001: H5L.c in H5L__exists(): line 2962: path doesn't exist
        class: HDF5
        major: Symbol table
        minor: Object already exists

    error #002: H5Gtraverse.c in H5G_traverse(): line 867: internal path traversal failed
        class: HDF5
        major: Symbol table
        minor: Object not found

    error #003: H5Gtraverse.c in H5G_traverse_real(): line 594: can't look up component
        class: HDF5
        major: Symbol table
        minor: Object not found

    error #004: H5Gobj.c in H5G__obj_lookup(): line 1156: can't locate object
        class: HDF5
        major: Symbol table
        minor: Object not found

    error #005: H5Gstab.c in H5G__stab_lookup(): line 886: can't read message
        class: HDF5
        major: Symbol table
        minor: Unrecognized message

Any insight into what might be going on would be hugely helpful!

Thanks so much,
Emily

Analysis Question

Most helpful comment

I just faced the same issue and came up with this solution. Maybe this will help anybody even though the same function in Seurat v3 works fine for me.

h5_data <- hdf5r::H5File$new('filtered_feature_bc_matrix.h5', mode = 'r')

feature_matrix <- Matrix::sparseMatrix(
  i = h5_data[['matrix/indices']][],
  p = h5_data[['matrix/indptr']][],
  x = h5_data[['matrix/data']][],
  dimnames = list(
    h5_data[['matrix/features/name']][],
    h5_data[['matrix/barcodes']][]
  ),
  dims = h5_data[['matrix/shape']][],
  index1 = FALSE
)

All 7 comments

Hi Emily,

It looks like that file isn't consistent with 10X's documentation on how the H5 output file should be structured and therefore the Read10X_h5 function isn't going to work here. However, you can still read in the file with

library(hdf5r)
infile <- H5File$new("GSM2561498.h5")

Alternatively, you could try cellrangerRkit from 10X as they recommend on that documentation page.

Hi Andrew,

Thank you so much for the reply. Clearly, I am not very familiar with the h5 format. I was able to read in the file with the command you suggested above, but it is unclear to me where to go from here to create the Seurat object, or if that is even possible? I would very much like to continue using Seurat if possible, since I would like to use the RunMultiCCA as well as additional packages that require a Seurat object, but using infile as the raw.data for CreateSeuratObject yields the following error:

> library(hdf5r)
> infile <- H5File$new("GSM2561498.h5")
> library(Seurat)
> ley.ctrl <- CreateSeuratObject(raw.data = infile, project = "Ley.Ctrl", min.cells = 3, min.genes = 200)
Error in object.raw.data > is.expr : 
  comparison (6) is possible only for atomic and list types

Thank you so much for your help!
Emily

Hi Emily,

You need to convert the data in the H5 file into a matrix before passing that to CreateSeuratObject.

You can read a little more about how to use hdf5 files in R here. For specific details on that particular dataset, I would recommend emailing the contact on the GEO page as that gets a bit beyond the scope of Seurat.

Thanks so much for all your help, Andrew. I will follow up with the the authors if I can't get it to work on my own.

Thanks again,
Emily

Hi, andrewwbutler.
I got an error when using Read10X_h5 to read the h5 file from the ouput of cellranger-3.0.0 count. The error told me that data["matrix/gene_names"] does not exist. And I found that gene_names in cellranger.h5 is the data["matrix/features/name"]. I didnt' test the data["matrix/genes"], but I think it won't be work either.
Below is the cellranger h5 data structures, according to the structure, neither "genes" nor "gene_names" will not be contained in cellranger h5 file. Am I right?

(root)
└── matrix [HDF5 group]
    ├── barcodes
    ├── data
    ├── indices
    ├── indptr
    ├── data
    ├── shape
    └── features [HDF5 group]
        ├─ _all_tag_keys
        ├─ feature_type
        ├─ genome
        ├─ id
        ├─ name
        ├─ pattern [Feature Barcoding only]
        ├─ read [Feature Barcoding only]
        └─ sequence [Feature Barcoding only]

Hi Emily, I got the same error while trying to read molecule_info.h5 files instead of gene barcodes matrices. You can re-generate gene-barcode matrices with the cellranger aggr command.
JB

I just faced the same issue and came up with this solution. Maybe this will help anybody even though the same function in Seurat v3 works fine for me.

h5_data <- hdf5r::H5File$new('filtered_feature_bc_matrix.h5', mode = 'r')

feature_matrix <- Matrix::sparseMatrix(
  i = h5_data[['matrix/indices']][],
  p = h5_data[['matrix/indptr']][],
  x = h5_data[['matrix/data']][],
  dimnames = list(
    h5_data[['matrix/features/name']][],
    h5_data[['matrix/barcodes']][]
  ),
  dims = h5_data[['matrix/shape']][],
  index1 = FALSE
)
Was this page helpful?
0 / 5 - 0 ratings

Related issues

igordot picture igordot  ·  3Comments

mvalenzuelav picture mvalenzuelav  ·  3Comments

tmccra2 picture tmccra2  ·  3Comments

bio-la picture bio-la  ·  3Comments

sarahwajid picture sarahwajid  ·  3Comments