cellarr_se package¶
Submodules¶
cellarr_se.cellarr_se module¶
A logical, read-only coordinator for TileDB-backed multi-dimensional datasets.
This class synchronizes slicing and metadata retrieval across multiple out-of-core components:
Assays: A dictionary of cellarr-array objects (Dense or Sparse).
Row Data: An aligned cellarr-frame for row-wise annotations.
Column Data: An aligned cellarr-frame for column-wise annotations.
CellArraySE maintains data on disk, performing synchronized “lazy” slices that return standard in-memory summarizedexperiment.SummarizedExperiment objects only when requested.
- class cellarr_se.cellarr_se.CellArraySE(assays, row_data, col_data)[source]¶
Bases:
object- __getitem__(key)[source]¶
Subset using bracket notation:
se[rows, cols].This method provides simple positional and name-based subsetting. For advanced filtering with TileDB query strings, use
slice()instead.- Supported key types (see
SubsetKey): int: Single index (e.g.,se[0, 5])slice: Range (e.g.,se[0:10, 0:5])List[int]: Multiple indices (e.g.,se[[0, 1, 2], [3, 4]])str: Single name (e.g.,se["gene1", "sample1"])List[str]: Multiple names (e.g.,se[["gene1", "gene2"], ["s1", "s2"]])
Note
Query-based filtering is not supported via bracket notation. Use
se.slice(row_query="...", col_query="...")for TileDB queries.- Parameters:
key (
Tuple[Union[int,slice,List[int],str,List[str],None],Union[int,slice,List[int],str,List[str],None]]) – A 2-tuple of (row_key, col_key).- Return type:
SummarizedExperiment- Returns:
SummarizedExperiment with the requested subset.
- Raises:
ValueError – If key is not a 2-tuple.
TypeError – If key types are not supported.
IndexError – If indices are out of bounds.
KeyError – If names are not found.
Examples
# Slice by position subset = se[ 0:100, 0:50 ] # Single indices single = se[5, 3] # List of indices subset = se[ [0, 2, 4], [1, 3], ] # For query-based filtering, use slice(): subset = se.slice( row_query="gene_type == 'protein'" )
- Supported key types (see
- __init__(assays, row_data, col_data)[source]¶
Initialize the SE coordinator with existing TileDB-backed handles.
- Parameters:
assays (
Dict[str,CellArray]) – Dictionary mapping assay names to CellArray objects. All assays must have the same shape.row_data (
CellArrayFrame) – CellArrayFrame containing row metadata. Number of rows must match assay row count.col_data (
CellArrayFrame) – CellArrayFrame containing column metadata. Number of rows must match assay column count.
- Raises:
ValueError – If assays is empty, shapes don’t match, or inputs are invalid types.
- property col_names: Index¶
Index values for columns metadata table.
- property row_names: Index¶
Index values for rows metadata table.
- show(n=5)[source]¶
Display a summary of the experiment structure and metadata.
- Parameters:
n (
int) – Number of rows to display from row_data and col_data. Defaults to 5.
- slice(row_subset=None, col_subset=None, row_query=None, col_query=None, assays=None, row_columns=None, col_columns=None)[source]¶
Slice the CellArraySE to produce an in-memory SummarizedExperiment.
This method provides full control over subsetting, including TileDB query support. For simple positional/name-based access, use bracket notation instead (e.g.,
se[0:10, 0:5]).- Parameters:
row_subset (
Union[int,slice,List[int],str,List[str],None]) –Row subset key. Accepted types:
int: Single index (e.g.,5,-1for last)slice: Range (e.g.,slice(0, 10))List[int]: Multiple indices (e.g.,[0, 2, 4])str: Single name matching row_namesList[str]: Multiple namesNone: Select all rows (default)
col_subset (
Union[int,slice,List[int],str,List[str],None]) – Column subset key. Same types as row_subset.row_query (
Optional[str]) – TileDB query string for row filtering (e.g.,"gene_type == 'protein'". Mutually exclusive with row_subset.col_query (
Optional[str]) – TileDB query string for column filtering. Mutually exclusive with col_subset.assays (
Optional[List[str]]) – List of assay names to include. If None, includes all assays.row_columns (
Optional[List[str]]) – List of row metadata columns to include. If None, includes all.col_columns (
Optional[List[str]]) – List of column metadata columns to include. If None, includes all.
- Return type:
SummarizedExperiment- Returns:
SummarizedExperiment with the requested subset of data.
- Raises:
ValueError – If both subset and query are specified for the same dimension.
KeyError – If assay name or row/column name is not found.
IndexError – If index is out of bounds.
Examples
Basic positional subsetting:
subset = se.slice( row_subset=slice( 0, 100 ), col_subset=slice( 0, 50 ), )
Select specific assays and metadata columns:
subset = se.slice( row_subset=[ 0, 1, 2, ], col_subset=slice( 0, 10 ), assays=[ "counts" ], row_columns=[ "gene_id", "gene_name", ], )
Filter using TileDB query strings:
# Get protein-coding genes from liver samples subset = se.slice( row_query="gene_type == 'protein'", col_query="tissue == 'liver'", )
Combine query with column selection:
subset = se.slice( row_query="gene_type == 'protein'", col_subset=[ 0, 1, 2, ], assays=[ "counts", "tpm", ], )
- cellarr_se.cellarr_se.SubsetKey¶
Type alias for subset keys accepted by
__getitem__andslice().- Supported types:
int: Single index (e.g.,5or-1for last element)slice: Range of indices (e.g.,slice(0, 10)or0:10in brackets)List[int]: Multiple indices (e.g.,[0, 2, 4])str: Single name matching row_names/col_namesList[str]: Multiple names (e.g.,["gene1", "gene2"])None: Select all elements in that dimension
Module contents¶
- class cellarr_se.CellArraySE(assays, row_data, col_data)[source]¶
Bases:
object- __annotations__ = {}¶
- __getitem__(key)[source]¶
Subset using bracket notation:
se[rows, cols].This method provides simple positional and name-based subsetting. For advanced filtering with TileDB query strings, use
slice()instead.- Supported key types (see
SubsetKey): int: Single index (e.g.,se[0, 5])slice: Range (e.g.,se[0:10, 0:5])List[int]: Multiple indices (e.g.,se[[0, 1, 2], [3, 4]])str: Single name (e.g.,se["gene1", "sample1"])List[str]: Multiple names (e.g.,se[["gene1", "gene2"], ["s1", "s2"]])
Note
Query-based filtering is not supported via bracket notation. Use
se.slice(row_query="...", col_query="...")for TileDB queries.- Parameters:
key (
Tuple[Union[int,slice,List[int],str,List[str],None],Union[int,slice,List[int],str,List[str],None]]) – A 2-tuple of (row_key, col_key).- Return type:
SummarizedExperiment- Returns:
SummarizedExperiment with the requested subset.
- Raises:
ValueError – If key is not a 2-tuple.
TypeError – If key types are not supported.
IndexError – If indices are out of bounds.
KeyError – If names are not found.
Examples
# Slice by position subset = se[ 0:100, 0:50 ] # Single indices single = se[5, 3] # List of indices subset = se[ [0, 2, 4], [1, 3], ] # For query-based filtering, use slice(): subset = se.slice( row_query="gene_type == 'protein'" )
- Supported key types (see
- __init__(assays, row_data, col_data)[source]¶
Initialize the SE coordinator with existing TileDB-backed handles.
- Parameters:
assays (
Dict[str,CellArray]) – Dictionary mapping assay names to CellArray objects. All assays must have the same shape.row_data (
CellArrayFrame) – CellArrayFrame containing row metadata. Number of rows must match assay row count.col_data (
CellArrayFrame) – CellArrayFrame containing column metadata. Number of rows must match assay column count.
- Raises:
ValueError – If assays is empty, shapes don’t match, or inputs are invalid types.
- property col_names: Index¶
Index values for columns metadata table.
- property row_names: Index¶
Index values for rows metadata table.
- show(n=5)[source]¶
Display a summary of the experiment structure and metadata.
- Parameters:
n (
int) – Number of rows to display from row_data and col_data. Defaults to 5.
- slice(row_subset=None, col_subset=None, row_query=None, col_query=None, assays=None, row_columns=None, col_columns=None)[source]¶
Slice the CellArraySE to produce an in-memory SummarizedExperiment.
This method provides full control over subsetting, including TileDB query support. For simple positional/name-based access, use bracket notation instead (e.g.,
se[0:10, 0:5]).- Parameters:
row_subset (
Union[int,slice,List[int],str,List[str],None]) –Row subset key. Accepted types:
int: Single index (e.g.,5,-1for last)slice: Range (e.g.,slice(0, 10))List[int]: Multiple indices (e.g.,[0, 2, 4])str: Single name matching row_namesList[str]: Multiple namesNone: Select all rows (default)
col_subset (
Union[int,slice,List[int],str,List[str],None]) – Column subset key. Same types as row_subset.row_query (
Optional[str]) – TileDB query string for row filtering (e.g.,"gene_type == 'protein'". Mutually exclusive with row_subset.col_query (
Optional[str]) – TileDB query string for column filtering. Mutually exclusive with col_subset.assays (
Optional[List[str]]) – List of assay names to include. If None, includes all assays.row_columns (
Optional[List[str]]) – List of row metadata columns to include. If None, includes all.col_columns (
Optional[List[str]]) – List of column metadata columns to include. If None, includes all.
- Return type:
SummarizedExperiment- Returns:
SummarizedExperiment with the requested subset of data.
- Raises:
ValueError – If both subset and query are specified for the same dimension.
KeyError – If assay name or row/column name is not found.
IndexError – If index is out of bounds.
Examples
Basic positional subsetting:
subset = se.slice( row_subset=slice( 0, 100 ), col_subset=slice( 0, 50 ), )
Select specific assays and metadata columns:
subset = se.slice( row_subset=[ 0, 1, 2, ], col_subset=slice( 0, 10 ), assays=[ "counts" ], row_columns=[ "gene_id", "gene_name", ], )
Filter using TileDB query strings:
# Get protein-coding genes from liver samples subset = se.slice( row_query="gene_type == 'protein'", col_query="tissue == 'liver'", )
Combine query with column selection:
subset = se.slice( row_query="gene_type == 'protein'", col_subset=[ 0, 1, 2, ], assays=[ "counts", "tpm", ], )