cellarr_se package

Submodules

cellarr_se.cellarr_se module

A logical, read-only coordinator for TileDB-backed multi-dimensional datasets.

This class synchronizes slicing and metadata retrieval across multiple out-of-core components:

  • Assays: A dictionary of cellarr-array objects (Dense or Sparse).

  • Row Data: An aligned cellarr-frame for row-wise annotations.

  • Column Data: An aligned cellarr-frame for column-wise annotations.

CellArraySE maintains data on disk, performing synchronized “lazy” slices that return standard in-memory summarizedexperiment.SummarizedExperiment objects only when requested.

class cellarr_se.cellarr_se.CellArraySE(assays, row_data, col_data)[source]

Bases: object

__getitem__(key)[source]

Subset using bracket notation: se[rows, cols].

This method provides simple positional and name-based subsetting. For advanced filtering with TileDB query strings, use slice() instead.

Supported key types (see SubsetKey):
  • int: Single index (e.g., se[0, 5])

  • slice: Range (e.g., se[0:10, 0:5])

  • List[int]: Multiple indices (e.g., se[[0, 1, 2], [3, 4]])

  • str: Single name (e.g., se["gene1", "sample1"])

  • List[str]: Multiple names (e.g., se[["gene1", "gene2"], ["s1", "s2"]])

Note

Query-based filtering is not supported via bracket notation. Use se.slice(row_query="...", col_query="...") for TileDB queries.

Parameters:

key (Tuple[Union[int, slice, List[int], str, List[str], None], Union[int, slice, List[int], str, List[str], None]]) – A 2-tuple of (row_key, col_key).

Return type:

SummarizedExperiment

Returns:

SummarizedExperiment with the requested subset.

Raises:

Examples

# Slice by position
subset = se[
    0:100, 0:50
]

# Single indices
single = se[5, 3]

# List of indices
subset = se[
    [0, 2, 4],
    [1, 3],
]

# For query-based filtering, use slice():
subset = se.slice(
    row_query="gene_type == 'protein'"
)
__init__(assays, row_data, col_data)[source]

Initialize the SE coordinator with existing TileDB-backed handles.

Parameters:
  • assays (Dict[str, CellArray]) – Dictionary mapping assay names to CellArray objects. All assays must have the same shape.

  • row_data (CellArrayFrame) – CellArrayFrame containing row metadata. Number of rows must match assay row count.

  • col_data (CellArrayFrame) – CellArrayFrame containing column metadata. Number of rows must match assay column count.

Raises:

ValueError – If assays is empty, shapes don’t match, or inputs are invalid types.

__repr__()[source]

String representation showing shape and assay names.

Return type:

str

property assay_names: List[str]

Names of available assays.

property col_columns: List[str]

Column names of the metadata fields for column_data.

property col_names: Index

Index values for columns metadata table.

property dims: Tuple[int, int]

Alias for shape.

get_assay_type(assay_name)[source]

Get the data type of an assay matrix.

Parameters:

assay_name (str) – Name of the assay.

Return type:

dtype

Returns:

NumPy dtype of the assay’s data attribute.

Raises:

KeyError – If the assay name does not exist.

is_sparse(assay_name)[source]

Check if an assay is sparse.

Parameters:

assay_name (str) – Name of the assay to check.

Return type:

bool

Returns:

True if the assay is a SparseCellArray, False otherwise.

property row_columns: List[str]

Column names of the metadata fields for row_data.

property row_names: Index

Index values for rows metadata table.

property shape: Tuple[int, int]

Number of rows and columns as (n_rows, n_cols).

show(n=5)[source]

Display a summary of the experiment structure and metadata.

Parameters:

n (int) – Number of rows to display from row_data and col_data. Defaults to 5.

slice(row_subset=None, col_subset=None, row_query=None, col_query=None, assays=None, row_columns=None, col_columns=None)[source]

Slice the CellArraySE to produce an in-memory SummarizedExperiment.

This method provides full control over subsetting, including TileDB query support. For simple positional/name-based access, use bracket notation instead (e.g., se[0:10, 0:5]).

Parameters:
  • row_subset (Union[int, slice, List[int], str, List[str], None]) –

    Row subset key. Accepted types:

    • int: Single index (e.g., 5, -1 for last)

    • slice: Range (e.g., slice(0, 10))

    • List[int]: Multiple indices (e.g., [0, 2, 4])

    • str: Single name matching row_names

    • List[str]: Multiple names

    • None: Select all rows (default)

  • col_subset (Union[int, slice, List[int], str, List[str], None]) – Column subset key. Same types as row_subset.

  • row_query (Optional[str]) – TileDB query string for row filtering (e.g., "gene_type == 'protein'". Mutually exclusive with row_subset.

  • col_query (Optional[str]) – TileDB query string for column filtering. Mutually exclusive with col_subset.

  • assays (Optional[List[str]]) – List of assay names to include. If None, includes all assays.

  • row_columns (Optional[List[str]]) – List of row metadata columns to include. If None, includes all.

  • col_columns (Optional[List[str]]) – List of column metadata columns to include. If None, includes all.

Return type:

SummarizedExperiment

Returns:

SummarizedExperiment with the requested subset of data.

Raises:
  • ValueError – If both subset and query are specified for the same dimension.

  • KeyError – If assay name or row/column name is not found.

  • IndexError – If index is out of bounds.

Examples

Basic positional subsetting:

subset = se.slice(
    row_subset=slice(
        0, 100
    ),
    col_subset=slice(
        0, 50
    ),
)

Select specific assays and metadata columns:

subset = se.slice(
    row_subset=[
        0,
        1,
        2,
    ],
    col_subset=slice(
        0, 10
    ),
    assays=[
        "counts"
    ],
    row_columns=[
        "gene_id",
        "gene_name",
    ],
)

Filter using TileDB query strings:

# Get protein-coding genes from liver samples
subset = se.slice(
    row_query="gene_type == 'protein'",
    col_query="tissue == 'liver'",
)

Combine query with column selection:

subset = se.slice(
    row_query="gene_type == 'protein'",
    col_subset=[
        0,
        1,
        2,
    ],
    assays=[
        "counts",
        "tpm",
    ],
)
cellarr_se.cellarr_se.SubsetKey

Type alias for subset keys accepted by __getitem__ and slice().

Supported types:
  • int: Single index (e.g., 5 or -1 for last element)

  • slice: Range of indices (e.g., slice(0, 10) or 0:10 in brackets)

  • List[int]: Multiple indices (e.g., [0, 2, 4])

  • str: Single name matching row_names/col_names

  • List[str]: Multiple names (e.g., ["gene1", "gene2"])

  • None: Select all elements in that dimension

alias of int | slice | List[int] | str | List[str] | None

Module contents

class cellarr_se.CellArraySE(assays, row_data, col_data)[source]

Bases: object

__annotations__ = {}
__getitem__(key)[source]

Subset using bracket notation: se[rows, cols].

This method provides simple positional and name-based subsetting. For advanced filtering with TileDB query strings, use slice() instead.

Supported key types (see SubsetKey):
  • int: Single index (e.g., se[0, 5])

  • slice: Range (e.g., se[0:10, 0:5])

  • List[int]: Multiple indices (e.g., se[[0, 1, 2], [3, 4]])

  • str: Single name (e.g., se["gene1", "sample1"])

  • List[str]: Multiple names (e.g., se[["gene1", "gene2"], ["s1", "s2"]])

Note

Query-based filtering is not supported via bracket notation. Use se.slice(row_query="...", col_query="...") for TileDB queries.

Parameters:

key (Tuple[Union[int, slice, List[int], str, List[str], None], Union[int, slice, List[int], str, List[str], None]]) – A 2-tuple of (row_key, col_key).

Return type:

SummarizedExperiment

Returns:

SummarizedExperiment with the requested subset.

Raises:

Examples

# Slice by position
subset = se[
    0:100, 0:50
]

# Single indices
single = se[5, 3]

# List of indices
subset = se[
    [0, 2, 4],
    [1, 3],
]

# For query-based filtering, use slice():
subset = se.slice(
    row_query="gene_type == 'protein'"
)
__init__(assays, row_data, col_data)[source]

Initialize the SE coordinator with existing TileDB-backed handles.

Parameters:
  • assays (Dict[str, CellArray]) – Dictionary mapping assay names to CellArray objects. All assays must have the same shape.

  • row_data (CellArrayFrame) – CellArrayFrame containing row metadata. Number of rows must match assay row count.

  • col_data (CellArrayFrame) – CellArrayFrame containing column metadata. Number of rows must match assay column count.

Raises:

ValueError – If assays is empty, shapes don’t match, or inputs are invalid types.

__repr__()[source]

String representation showing shape and assay names.

Return type:

str

property assay_names: List[str]

Names of available assays.

property col_columns: List[str]

Column names of the metadata fields for column_data.

property col_names: Index

Index values for columns metadata table.

property dims: Tuple[int, int]

Alias for shape.

get_assay_type(assay_name)[source]

Get the data type of an assay matrix.

Parameters:

assay_name (str) – Name of the assay.

Return type:

dtype

Returns:

NumPy dtype of the assay’s data attribute.

Raises:

KeyError – If the assay name does not exist.

is_sparse(assay_name)[source]

Check if an assay is sparse.

Parameters:

assay_name (str) – Name of the assay to check.

Return type:

bool

Returns:

True if the assay is a SparseCellArray, False otherwise.

property row_columns: List[str]

Column names of the metadata fields for row_data.

property row_names: Index

Index values for rows metadata table.

property shape: Tuple[int, int]

Number of rows and columns as (n_rows, n_cols).

show(n=5)[source]

Display a summary of the experiment structure and metadata.

Parameters:

n (int) – Number of rows to display from row_data and col_data. Defaults to 5.

slice(row_subset=None, col_subset=None, row_query=None, col_query=None, assays=None, row_columns=None, col_columns=None)[source]

Slice the CellArraySE to produce an in-memory SummarizedExperiment.

This method provides full control over subsetting, including TileDB query support. For simple positional/name-based access, use bracket notation instead (e.g., se[0:10, 0:5]).

Parameters:
  • row_subset (Union[int, slice, List[int], str, List[str], None]) –

    Row subset key. Accepted types:

    • int: Single index (e.g., 5, -1 for last)

    • slice: Range (e.g., slice(0, 10))

    • List[int]: Multiple indices (e.g., [0, 2, 4])

    • str: Single name matching row_names

    • List[str]: Multiple names

    • None: Select all rows (default)

  • col_subset (Union[int, slice, List[int], str, List[str], None]) – Column subset key. Same types as row_subset.

  • row_query (Optional[str]) – TileDB query string for row filtering (e.g., "gene_type == 'protein'". Mutually exclusive with row_subset.

  • col_query (Optional[str]) – TileDB query string for column filtering. Mutually exclusive with col_subset.

  • assays (Optional[List[str]]) – List of assay names to include. If None, includes all assays.

  • row_columns (Optional[List[str]]) – List of row metadata columns to include. If None, includes all.

  • col_columns (Optional[List[str]]) – List of column metadata columns to include. If None, includes all.

Return type:

SummarizedExperiment

Returns:

SummarizedExperiment with the requested subset of data.

Raises:
  • ValueError – If both subset and query are specified for the same dimension.

  • KeyError – If assay name or row/column name is not found.

  • IndexError – If index is out of bounds.

Examples

Basic positional subsetting:

subset = se.slice(
    row_subset=slice(
        0, 100
    ),
    col_subset=slice(
        0, 50
    ),
)

Select specific assays and metadata columns:

subset = se.slice(
    row_subset=[
        0,
        1,
        2,
    ],
    col_subset=slice(
        0, 10
    ),
    assays=[
        "counts"
    ],
    row_columns=[
        "gene_id",
        "gene_name",
    ],
)

Filter using TileDB query strings:

# Get protein-coding genes from liver samples
subset = se.slice(
    row_query="gene_type == 'protein'",
    col_query="tissue == 'liver'",
)

Combine query with column selection:

subset = se.slice(
    row_query="gene_type == 'protein'",
    col_subset=[
        0,
        1,
        2,
    ],
    assays=[
        "counts",
        "tpm",
    ],
)