cellarr package

Subpackages

Submodules

cellarr.CellArrDataset module

Query the CellArrDataset.

This class provides methods to access the directory containing the generated TileDB files usually using the build_cellarrdataset().

Example

from cellarr import (
    CellArrDataset,
)

cd = CellArrDataset(
    dataset_path="/path/to/cellar/dir"
)
gene_list = [
    "gene_1",
    "gene_95",
    "gene_50",
]
result1 = cd[
    0, gene_list
]

print(result1)
class cellarr.CellArrDataset.CellArrCellIterator(obj)[source]

Bases: object

Cell iterator to a CellArrDataset object.

__init__(obj)[source]

Initialize the iterator.

Parameters:

obj (CellArrDataset) – Source object to iterate.

__iter__()[source]
__next__()[source]
class cellarr.CellArrDataset.CellArrDataset(dataset_path, assay_tiledb_group='assays', assay_uri='counts', gene_annotation_uri='gene_annotation', cell_metadata_uri='cell_metadata', sample_metadata_uri='sample_metadata', config_or_context=None)[source]

Bases: object

A class that represent a collection of cells and their associated metadata in a TileDB backed store.

__del__()[source]
__enter__()[source]
__exit__(exc_type, exc_val, exc_tb)[source]
__getitem__(args)[source]

Subset a CellArrDataset.

Mostly an alias to get_slice().

Parameters:

args (Union[int, Sequence, tuple]) –

Integer indices, a boolean filter, or (if the current object is named) names specifying the ranges to be extracted.

Alternatively a tuple of length 1. The first entry specifies the rows (or cells) to retain based on their names or indices.

Alternatively a tuple of length 2. The first entry specifies the rows (or cells) to retain, while the second entry specifies the columns (or features/genes) to retain, based on their names or indices.

Note

Slices are inclusive of the upper bounds. This is the default TileDB behavior.

Raises:

ValueError – If too many or too few slices provided.

Return type:

CellArrDatasetSlice

Returns:

A CellArrDatasetSlice object containing the cell_metadata, gene_annotation and the matrix.

__init__(dataset_path, assay_tiledb_group='assays', assay_uri='counts', gene_annotation_uri='gene_annotation', cell_metadata_uri='cell_metadata', sample_metadata_uri='sample_metadata', config_or_context=None)[source]

Initialize a CellArrDataset.

Parameters:
  • dataset_path (str) –

    Path to the directory containing the TileDB stores. Usually the output_path from the build_cellarrdataset().

    You may provide any tiledb compatible base path (e.g. local directory, S3, minio etc.).

  • assay_tiledb_group (str) –

    TileDB group containing the assay matrices.

    If the provided build process was used, the matrices are stored in the “assay” TileDB group.

    May be an empty string or None to specify no group. This is mostly for backwards compatibility of cellarr builds for versions before 0.3.

  • assay_uri (Union[str, List[str]]) – Relative path to matrix store. Must be in tiledb group specified by assay_tiledb_group.

  • gene_annotation_uri (str) – Relative path to gene annotation store.

  • cell_metadata_uri (str) – Relative path to cell metadata store.

  • sample_metadata_uri (str) – Relative path to sample metadata store.

  • config_or_context (Union[Config, Ctx, None]) – Custom TileDB configuration or context. If None, default TileDB Config will be used.

__len__()[source]
__repr__()[source]
Return type:

str

Returns:

A string representation.

get_cell_metadata_column(column_name)[source]

Access a column from the cell_metadata store.

Parameters:

column_name (str) – Name of the column or attribute. Usually one of the column names from of get_cell_metadata_columns().

Return type:

DataFrame

Returns:

A list of values for this column.

get_cell_metadata_columns()[source]

Get column names from cell_metadata store.

Return type:

List[str]

Returns:

List of available metadata columns.

get_cell_subset(subset, columns=None)[source]

Slice the cell_metadata store.

Parameters:
  • subset (Union[slice, QueryCondition]) –

    A list of integer indices to subset the cell_metadata store.

    Alternatively, may also provide a tiledb.QueryCondition to query the store.

  • columns

    List of specific column names to access.

    Defaults to None, in which case all columns are extracted.

Return type:

DataFrame

Returns:

A pandas Dataframe of the subset.

get_cells_for_sample(sample)[source]

Slice and access all cells for a sample.

Parameters:

sample (Union[int, str]) –

A string specifying the sample index to access. This must be a value in the cellarr_sample column.

Alternatively, an integer index may be provided to access the sample at the given position.

Return type:

CellArrDatasetSlice

Returns:

A CellArrDatasetSlice object containing the cell_metadata, gene_annotation and the matrix.

get_gene_annotation_column(column_name)[source]

Access a column from the gene_annotation store.

Parameters:

column_name (str) – Name of the column or attribute. Usually one of the column names from of get_gene_annotation_columns().

Return type:

DataFrame

Returns:

A list of values for this column.

get_gene_annotation_columns()[source]

Get annotation column names from gene_annotation store.

Return type:

List[str]

Returns:

List of available annotations.

get_gene_annotation_index()[source]

Get index of the gene_annotation store.

Return type:

List[str]

Returns:

List of unique symbols.

get_gene_subset(subset, columns=None)[source]

Slice the gene_metadata store.

Parameters:
  • subset (Union[slice, List[str], QueryCondition]) –

    A list of integer indices to subset the gene_metadata store.

    Alternatively, may provide a tiledb.QueryCondition to query the store.

    Alternatively, may provide a list of strings to match with the index of gene_metadata store.

  • columns

    List of specific column names to access.

    Defaults to None, in which case all columns are extracted.

Return type:

DataFrame

Returns:

A pandas Dataframe of the subset.

get_matrix_subset(subset)[source]

Slice the sample_metadata store.

Parameters:

subset (Union[int, Sequence, tuple]) – Any slice supported by TileDB’s array slicing. For more info refer to <TileDB docs https://docs.tiledb.com/main/how-to/arrays/reading-arrays/basic-reading>_.

Return type:

DataFrame

Returns:

A dictionary containing the slice for each matrix in the path.

get_number_of_cells()[source]

Get number of cells.

Return type:

int

get_number_of_features()[source]

Get number of features.

Return type:

int

get_number_of_samples()[source]

Get number of samples.

Return type:

int

get_sample_metadata_column(column_name)[source]

Access a column from the sample_metadata store.

Parameters:

column_name (str) – Name of the column or attribute. Usually one of the column names from of get_sample_metadata_columns().

Return type:

DataFrame

Returns:

A list of values for this column.

get_sample_metadata_columns()[source]

Get column names from sample_metadata store.

Return type:

List[str]

Returns:

List of available metadata columns.

get_sample_metadata_index()[source]

Get index of the sample_metadata store.

Return type:

List[str]

Returns:

List of unique sample names.

get_sample_subset(subset, columns=None)[source]

Slice the sample_metadata store.

Parameters:
  • subset (Union[slice, QueryCondition]) –

    A list of integer indices to subset the sample_metadata store.

    Alternatively, may also provide a tiledb.QueryCondition to query the store.

  • columns

    List of specific column names to access.

    Defaults to None, in which case all columns are extracted.

Return type:

DataFrame

Returns:

A pandas Dataframe of the subset.

get_slice(cell_subset, gene_subset)[source]

Subset a CellArrDataset.

Parameters:
  • cell_subset (Union[slice, QueryCondition]) – Integer indices, a boolean filter, or (if the current object is named) names specifying the rows (or cells) to retain.

  • gene_subset (Union[slice, List[str], QueryCondition]) – Integer indices, a boolean filter, or (if the current object is named) names specifying the columns (or features/genes) to retain.

Return type:

CellArrDatasetSlice

Returns:

A CellArrDatasetSlice object containing the cell_metadata, gene_annotation and the matrix for the given slice ranges.

classmethod initialize_from_paths(assay_uri, gene_annotation_uri, cell_metadata_uri, sample_metadata_uri, config_or_context=None)[source]

Initialize from absolute paths to all necessary tiledb files.

Parameters:
  • assay_uri (Union[str, List[str]]) – Absolute path to matrix store. Must be in tiledb group specified by assay_tiledb_group.

  • gene_annotation_uri (str) – Absolute path to gene annotation store.

  • cell_metadata_uri (str) – Absolute path to cell metadata store.

  • sample_metadata_uri (str) – Absolute path to sample metadata store.

  • config_or_context (Union[Config, Ctx, None]) – Custom TileDB configuration or context. If None, default TileDB Config will be used.

Returns:

A CellArrDataset object.

itercells()[source]

Iterator over samples.

Return type:

CellArrCellIterator

itersamples()[source]

Iterator over samples.

Return type:

CellArrSampleIterator

property shape
class cellarr.CellArrDataset.CellArrSampleIterator(obj)[source]

Bases: object

Sample iterator to a CellArrDataset object.

__init__(obj)[source]

Initialize the iterator.

Parameters:

obj (CellArrDataset) – Source object to iterate.

__iter__()[source]
__next__()[source]

cellarr.CellArrDatasetSlice module

Class that represents a realized subset of the CellArrDataset.

This class provides a slice data class usually generated by the access methods from cellarr.CellArrDataset.CellArrDataset().

Example

from cellarr import (
    CellArrDataset,
)

cd = CellArrDataset(
    dataset_path="/path/to/cellar/dir"
)
gene_list = [
    "gene_1",
    "gene_95",
    "gene_50",
]
result1 = cd[
    0, gene_list
]

print(result1)
class cellarr.CellArrDatasetSlice.CellArrDatasetSlice(cell_metadata, gene_annotation, matrix)[source]

Bases: object

Class that represents a realized subset of the CellArrDataset.

__annotations__ = {'cell_metadata': <class 'pandas.core.frame.DataFrame'>, 'gene_annotation': <class 'pandas.core.frame.DataFrame'>, 'matrix': typing.Any}
__dataclass_fields__ = {'cell_metadata': Field(name='cell_metadata',type=<class 'pandas.core.frame.DataFrame'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'gene_annotation': Field(name='gene_annotation',type=<class 'pandas.core.frame.DataFrame'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'matrix': Field(name='matrix',type=typing.Any,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)
__eq__(other)

Return self==value.

__hash__ = None
__init__(cell_metadata, gene_annotation, matrix)
__len__()[source]
__match_args__ = ('cell_metadata', 'gene_annotation', 'matrix')
__repr__()[source]
Return type:

str

Returns:

A string representation.

cell_metadata: DataFrame
gene_annotation: DataFrame
get_assays(transpose=False)[source]
matrix: Any
property shape
to_anndata()[source]

Convert the realized slice to AnnData.

to_summarizedexperiment()[source]

Convert the realized slice to SummarizedExperiment.

Module contents