cellarr-se¶
cellarr-se is a read-only, out-of-core coordinator for TileDB-backed genomic datasets. It wraps the cellarr-array and cellarr-frame primitives into a lazy, SummarizedExperiment-compatible interface, so you can slice large genomics datasets stored on disk without loading them into memory.
Single-cell and bulk RNA-seq datasets frequently exceed available RAM. cellarr-se keeps assay matrices and metadata tables on disk as TileDB arrays, performing synchronized lazy slices across all components only when you request them. The result is always a standard in-memory SummarizedExperiment object.
Install¶
pip install cellarr-se
Usage¶
Construction¶
CellArraySE wraps existing TileDB arrays and frames; it does not create them. Use cellarr-array and cellarr-frame to build the backing stores first.
from cellarr_se import CellArraySE
se = CellArraySE(
assays={"counts": my_cell_array, "tpm": my_tpm_array},
row_data=my_row_frame, # gene annotations (CellArrayFrame)
col_data=my_col_frame, # sample annotations (CellArrayFrame)
)
Inspection¶
se.shape # (n_genes, n_samples)
se.assay_names # ["counts", "tpm"]
se.row_names # pd.Index of gene identifiers
se.col_names # pd.Index of sample identifiers
se.row_columns # list of gene metadata fields
se.col_columns # list of sample metadata fields
se.show() # print a summary with the first 5 rows of each metadata table
repr(se) # <CellArraySE: 20000x500 | counts, tpm>
Slicing¶
Bracket notation supports integer indices, slices, name strings, and lists:
# Positional slice
subset = se[0:100, 0:50]
# Single element
gene = se[5, 3]
# Lists of indices or names
subset = se[["BRCA1", "TP53"], ["sample_001", "sample_042"]]
For attribute-filtered access, use slice() with TileDB query strings:
# Filter rows and columns by metadata attributes
subset = se.slice(
row_query="gene_type == 'protein_coding'",
col_query="tissue == 'liver'",
)
# Combine query with explicit column selection
subset = se.slice(
row_query="gene_type == 'protein_coding'",
col_subset=slice(0, 50),
assays=["counts"],
row_columns=["gene_id", "gene_name"],
)
Both se[...] and se.slice(...) return a standard in-memory SummarizedExperiment.
Assay metadata¶
se.is_sparse("counts") # True if backed by SparseCellArray
se.get_assay_type("counts") # numpy dtype of the assay
Demo¶
A worked example covering construction, inspection, and slicing is available in the demo notebook.
Note¶
This project has been set up using BiocSetup and PyScaffold.