cellarr_se package¶

Submodules¶

cellarr_se.cellarr_se module¶

A logical, read-only coordinator for TileDB-backed multi-dimensional datasets.

This class synchronizes slicing and metadata retrieval across multiple out-of-core components:

Assays: A dictionary of cellarr-array objects (Dense or Sparse).
Row Data: An aligned cellarr-frame for row-wise annotations.
Column Data: An aligned cellarr-frame for column-wise annotations.

CellArraySE maintains data on disk, performing synchronized “lazy” slices that return standard in-memory summarizedexperiment.SummarizedExperiment objects only when requested.

class cellarr_se.cellarr_se.CellArraySE(assays, row_data, col_data)[source]¶

Bases: object

__getitem__(key)[source]¶

Subset using bracket notation: se[rows, cols].

This method provides simple positional and name-based subsetting. For advanced filtering with TileDB query strings, use slice() instead.

Supported key types (see SubsetKey):

int: Single index (e.g., se[0, 5])
slice: Range (e.g., se[0:10, 0:5])
List[int]: Multiple indices (e.g., se[[0, 1, 2], [3, 4]])
str: Single name (e.g., se["gene1", "sample1"])
List[str]: Multiple names (e.g., se[["gene1", "gene2"], ["s1", "s2"]])

Note

Query-based filtering is not supported via bracket notation. Use se.slice(row_query="...", col_query="...") for TileDB queries.

Parameters:

key (Tuple[Union[int, slice, List[int], str, List[str], None], Union[int, slice, List[int], str, List[str], None]]) – A 2-tuple of (row_key, col_key).

Return type:

SummarizedExperiment

Returns:

SummarizedExperiment with the requested subset.

Raises:

ValueError – If key is not a 2-tuple.
TypeError – If key types are not supported.
IndexError – If indices are out of bounds.
KeyError – If names are not found.

Examples

# Slice by position
subset = se[
    0:100, 0:50
]

# Single indices
single = se[5, 3]

# List of indices
subset = se[
    [0, 2, 4],
    [1, 3],
]

# For query-based filtering, use slice():
subset = se.slice(
    row_query="gene_type == 'protein'"
)

__init__(assays, row_data, col_data)[source]¶

Initialize the SE coordinator with existing TileDB-backed handles.

Parameters:

assays (Dict[str, CellArray]) – Dictionary mapping assay names to CellArray objects. All assays must have the same shape.
row_data (CellArrayFrame) – CellArrayFrame containing row metadata. Number of rows must match assay row count.
col_data (CellArrayFrame) – CellArrayFrame containing column metadata. Number of rows must match assay column count.

Raises:

ValueError – If assays is empty, shapes don’t match, or inputs are invalid types.

__repr__()[source]¶

String representation showing shape and assay names.

Return type:: str

property assay_names: List[str]¶: Names of available assays.

property col_columns: List[str]¶: Column names of the metadata fields for column_data.

property col_names: Index¶: Index values for columns metadata table.

property dims: Tuple[int, int]¶: Alias for shape.

get_assay_type(assay_name)[source]¶

Get the data type of an assay matrix.

Parameters:: assay_name (str) – Name of the assay.
Return type:: dtype
Returns:: NumPy dtype of the assay’s data attribute.
Raises:: KeyError – If the assay name does not exist.

is_sparse(assay_name)[source]¶

Check if an assay is sparse.

Parameters:: assay_name (str) – Name of the assay to check.
Return type:: bool
Returns:: True if the assay is a SparseCellArray, False otherwise.

property row_columns: List[str]¶: Column names of the metadata fields for row_data.

property row_names: Index¶: Index values for rows metadata table.

property shape: Tuple[int, int]¶: Number of rows and columns as (n_rows, n_cols).

show(n=5)[source]¶

Display a summary of the experiment structure and metadata.

Parameters:: n (int) – Number of rows to display from row_data and col_data. Defaults to 5.

slice(row_subset=None, col_subset=None, row_query=None, col_query=None, assays=None, row_columns=None, col_columns=None)[source]¶

Slice the CellArraySE to produce an in-memory SummarizedExperiment.

This method provides full control over subsetting, including TileDB query support. For simple positional/name-based access, use bracket notation instead (e.g., se[0:10, 0:5]).

Parameters:

row_subset (Union[int, slice, List[int], str, List[str], None]) –
Row subset key. Accepted types:
- int: Single index (e.g., 5, -1 for last)
- slice: Range (e.g., slice(0, 10))
- List[int]: Multiple indices (e.g., [0, 2, 4])
- str: Single name matching row_names
- List[str]: Multiple names
- None: Select all rows (default)
col_subset (Union[int, slice, List[int], str, List[str], None]) – Column subset key. Same types as row_subset.
row_query (Optional[str]) – TileDB query string for row filtering (e.g., "gene_type == 'protein'". Mutually exclusive with row_subset.
col_query (Optional[str]) – TileDB query string for column filtering. Mutually exclusive with col_subset.
assays (Optional[List[str]]) – List of assay names to include. If None, includes all assays.
row_columns (Optional[List[str]]) – List of row metadata columns to include. If None, includes all.
col_columns (Optional[List[str]]) – List of column metadata columns to include. If None, includes all.

Return type:

SummarizedExperiment

Returns:

SummarizedExperiment with the requested subset of data.

Raises:

ValueError – If both subset and query are specified for the same dimension.
KeyError – If assay name or row/column name is not found.
IndexError – If index is out of bounds.

Examples

Basic positional subsetting:

subset = se.slice(
    row_subset=slice(
        0, 100
    ),
    col_subset=slice(
        0, 50
    ),
)

Select specific assays and metadata columns:

subset = se.slice(
    row_subset=[
        0,
        1,
        2,
    ],
    col_subset=slice(
        0, 10
    ),
    assays=[
        "counts"
    ],
    row_columns=[
        "gene_id",
        "gene_name",
    ],
)

Filter using TileDB query strings:

# Get protein-coding genes from liver samples
subset = se.slice(
    row_query="gene_type == 'protein'",
    col_query="tissue == 'liver'",
)

Combine query with column selection:

subset = se.slice(
    row_query="gene_type == 'protein'",
    col_subset=[
        0,
        1,
        2,
    ],
    assays=[
        "counts",
        "tpm",
    ],
)

cellarr_se.cellarr_se.SubsetKey¶

Type alias for subset keys accepted by __getitem__ and slice().

Supported types:

int: Single index (e.g., 5 or -1 for last element)
slice: Range of indices (e.g., slice(0, 10) or 0:10 in brackets)
List[int]: Multiple indices (e.g., [0, 2, 4])
str: Single name matching row_names/col_names
List[str]: Multiple names (e.g., ["gene1", "gene2"])
None: Select all elements in that dimension

Module contents¶

class cellarr_se.CellArraySE(assays, row_data, col_data)[source]¶

Bases: object

__annotations__ = {}¶

__getitem__(key)[source]¶

Subset using bracket notation: se[rows, cols].

This method provides simple positional and name-based subsetting. For advanced filtering with TileDB query strings, use slice() instead.

Supported key types (see SubsetKey):

int: Single index (e.g., se[0, 5])
slice: Range (e.g., se[0:10, 0:5])
List[int]: Multiple indices (e.g., se[[0, 1, 2], [3, 4]])
str: Single name (e.g., se["gene1", "sample1"])
List[str]: Multiple names (e.g., se[["gene1", "gene2"], ["s1", "s2"]])

Note

Query-based filtering is not supported via bracket notation. Use se.slice(row_query="...", col_query="...") for TileDB queries.

Parameters:

key (Tuple[Union[int, slice, List[int], str, List[str], None], Union[int, slice, List[int], str, List[str], None]]) – A 2-tuple of (row_key, col_key).

Return type:

SummarizedExperiment

Returns:

SummarizedExperiment with the requested subset.

Raises:

ValueError – If key is not a 2-tuple.
TypeError – If key types are not supported.
IndexError – If indices are out of bounds.
KeyError – If names are not found.

Examples

# Slice by position
subset = se[
    0:100, 0:50
]

# Single indices
single = se[5, 3]

# List of indices
subset = se[
    [0, 2, 4],
    [1, 3],
]

# For query-based filtering, use slice():
subset = se.slice(
    row_query="gene_type == 'protein'"
)

__init__(assays, row_data, col_data)[source]¶

Initialize the SE coordinator with existing TileDB-backed handles.

Parameters:

assays (Dict[str, CellArray]) – Dictionary mapping assay names to CellArray objects. All assays must have the same shape.
row_data (CellArrayFrame) – CellArrayFrame containing row metadata. Number of rows must match assay row count.
col_data (CellArrayFrame) – CellArrayFrame containing column metadata. Number of rows must match assay column count.

Raises:

ValueError – If assays is empty, shapes don’t match, or inputs are invalid types.

__repr__()[source]¶

String representation showing shape and assay names.

Return type:: str

property assay_names: List[str]¶: Names of available assays.

property col_columns: List[str]¶: Column names of the metadata fields for column_data.

property col_names: Index¶: Index values for columns metadata table.

property dims: Tuple[int, int]¶: Alias for shape.

get_assay_type(assay_name)[source]¶

Get the data type of an assay matrix.

Parameters:: assay_name (str) – Name of the assay.
Return type:: dtype
Returns:: NumPy dtype of the assay’s data attribute.
Raises:: KeyError – If the assay name does not exist.

is_sparse(assay_name)[source]¶

Check if an assay is sparse.

Parameters:: assay_name (str) – Name of the assay to check.
Return type:: bool
Returns:: True if the assay is a SparseCellArray, False otherwise.

property row_columns: List[str]¶: Column names of the metadata fields for row_data.

property row_names: Index¶: Index values for rows metadata table.

property shape: Tuple[int, int]¶: Number of rows and columns as (n_rows, n_cols).

show(n=5)[source]¶

Display a summary of the experiment structure and metadata.

Parameters:: n (int) – Number of rows to display from row_data and col_data. Defaults to 5.

slice(row_subset=None, col_subset=None, row_query=None, col_query=None, assays=None, row_columns=None, col_columns=None)[source]¶

Slice the CellArraySE to produce an in-memory SummarizedExperiment.

This method provides full control over subsetting, including TileDB query support. For simple positional/name-based access, use bracket notation instead (e.g., se[0:10, 0:5]).

Parameters:

row_subset (Union[int, slice, List[int], str, List[str], None]) –
Row subset key. Accepted types:
- int: Single index (e.g., 5, -1 for last)
- slice: Range (e.g., slice(0, 10))
- List[int]: Multiple indices (e.g., [0, 2, 4])
- str: Single name matching row_names
- List[str]: Multiple names
- None: Select all rows (default)
col_subset (Union[int, slice, List[int], str, List[str], None]) – Column subset key. Same types as row_subset.
row_query (Optional[str]) – TileDB query string for row filtering (e.g., "gene_type == 'protein'". Mutually exclusive with row_subset.
col_query (Optional[str]) – TileDB query string for column filtering. Mutually exclusive with col_subset.
assays (Optional[List[str]]) – List of assay names to include. If None, includes all assays.
row_columns (Optional[List[str]]) – List of row metadata columns to include. If None, includes all.
col_columns (Optional[List[str]]) – List of column metadata columns to include. If None, includes all.

Return type:

SummarizedExperiment

Returns:

SummarizedExperiment with the requested subset of data.

Raises:

ValueError – If both subset and query are specified for the same dimension.
KeyError – If assay name or row/column name is not found.
IndexError – If index is out of bounds.

Examples

Basic positional subsetting:

subset = se.slice(
    row_subset=slice(
        0, 100
    ),
    col_subset=slice(
        0, 50
    ),
)

Select specific assays and metadata columns:

subset = se.slice(
    row_subset=[
        0,
        1,
        2,
    ],
    col_subset=slice(
        0, 10
    ),
    assays=[
        "counts"
    ],
    row_columns=[
        "gene_id",
        "gene_name",
    ],
)

Filter using TileDB query strings:

# Get protein-coding genes from liver samples
subset = se.slice(
    row_query="gene_type == 'protein'",
    col_query="tissue == 'liver'",
)

Combine query with column selection:

subset = se.slice(
    row_query="gene_type == 'protein'",
    col_subset=[
        0,
        1,
        2,
    ],
    assays=[
        "counts",
        "tpm",
    ],
)