cellarr package¶
Subpackages¶
- cellarr.build package
- Submodules
- cellarr.build.build_cellarr_steps module
SlurmBuilderSlurmBuilder.__init__()SlurmBuilder.create_array_script()SlurmBuilder.create_slurm_script()SlurmBuilder.submit_cell_metadata_job()SlurmBuilder.submit_final_assembly()SlurmBuilder.submit_gene_annotation_job()SlurmBuilder.submit_job()SlurmBuilder.submit_matrix_processing()SlurmBuilder.submit_sample_metadata_job()
main()
- cellarr.build.build_cellarrdataset module
- cellarr.build.build_options module
CellMetadataOptionsCellMetadataOptions.skipCellMetadataOptions.dtypeCellMetadataOptions.tiledb_store_nameCellMetadataOptions.column_namesCellMetadataOptions.column_typesCellMetadataOptions.__annotations__CellMetadataOptions.__dataclass_fields__CellMetadataOptions.__dataclass_params__CellMetadataOptions.__eq__()CellMetadataOptions.__hash__CellMetadataOptions.__init__()CellMetadataOptions.__match_args__CellMetadataOptions.__repr__()CellMetadataOptions.column_typesCellMetadataOptions.dtypeCellMetadataOptions.skipCellMetadataOptions.tiledb_store_name
GeneAnnotationOptionsGeneAnnotationOptions.feature_columnGeneAnnotationOptions.skipGeneAnnotationOptions.dtypeGeneAnnotationOptions.tiledb_store_nameGeneAnnotationOptions.column_typesGeneAnnotationOptions.__annotations__GeneAnnotationOptions.__dataclass_fields__GeneAnnotationOptions.__dataclass_params__GeneAnnotationOptions.__eq__()GeneAnnotationOptions.__hash__GeneAnnotationOptions.__init__()GeneAnnotationOptions.__match_args__GeneAnnotationOptions.__repr__()GeneAnnotationOptions.column_typesGeneAnnotationOptions.dtypeGeneAnnotationOptions.feature_columnGeneAnnotationOptions.skipGeneAnnotationOptions.tiledb_store_name
MatrixOptionsMatrixOptions.matrix_nameMatrixOptions.matrix_attr_nameMatrixOptions.consolidate_duplicate_gene_funcMatrixOptions.skipMatrixOptions.dtypeMatrixOptions.tiledb_store_nameMatrixOptions.__annotations__MatrixOptions.__dataclass_fields__MatrixOptions.__dataclass_params__MatrixOptions.__eq__()MatrixOptions.__hash__MatrixOptions.__init__()MatrixOptions.__match_args__MatrixOptions.__repr__()MatrixOptions.consolidate_duplicate_gene_func()MatrixOptions.dtypeMatrixOptions.matrix_attr_nameMatrixOptions.matrix_nameMatrixOptions.skipMatrixOptions.tiledb_store_name
SampleMetadataOptionsSampleMetadataOptions.skipSampleMetadataOptions.dtypeSampleMetadataOptions.tiledb_store_nameSampleMetadataOptions.column_typesSampleMetadataOptions.__annotations__SampleMetadataOptions.__dataclass_fields__SampleMetadataOptions.__dataclass_params__SampleMetadataOptions.__eq__()SampleMetadataOptions.__hash__SampleMetadataOptions.__init__()SampleMetadataOptions.__match_args__SampleMetadataOptions.__repr__()SampleMetadataOptions.column_typesSampleMetadataOptions.dtypeSampleMetadataOptions.skipSampleMetadataOptions.tiledb_store_name
- cellarr.build.buildutils_tiledb_array module
- cellarr.build.buildutils_tiledb_frame module
- Module contents
- cellarr.ml package
- Submodules
- cellarr.ml.autoencoder module
AutoEncoderAutoEncoder.__annotations__AutoEncoder.__init__()AutoEncoder.configure_optimizers()AutoEncoder.forward()AutoEncoder.get_loss()AutoEncoder.load_state()AutoEncoder.on_validation_epoch_end()AutoEncoder.on_validation_epoch_start()AutoEncoder.save_all()AutoEncoder.training_step()AutoEncoder.validation_step()
DecoderEncoder
- cellarr.ml.dataloader module
- Module contents
- cellarr.slurm package
- Submodules
- cellarr.slurm.final_assembly module
- cellarr.slurm.finalize_matrix module
- cellarr.slurm.process_cell_metadata module
- cellarr.slurm.process_gene_annotation module
- cellarr.slurm.process_matrix module
- cellarr.slurm.process_matrix_all module
- cellarr.slurm.process_sample_metadata module
- Module contents
- cellarr.utils package
Submodules¶
cellarr.CellArrDataset module¶
Query the CellArrDataset.
This class provides methods to access the directory containing the
generated TileDB files usually using the
build_cellarrdataset().
Example
from cellarr import (
CellArrDataset,
)
cd = CellArrDataset(
dataset_path="/path/to/cellar/dir"
)
gene_list = [
"gene_1",
"gene_95",
"gene_50",
]
result1 = cd[
0, gene_list
]
print(result1)
- class cellarr.CellArrDataset.CellArrCellIterator(obj)[source]¶
Bases:
objectCell iterator to a
CellArrDatasetobject.- __init__(obj)[source]¶
Initialize the iterator.
- Parameters:
obj (
CellArrDataset) – Source object to iterate.
- class cellarr.CellArrDataset.CellArrDataset(dataset_path, assay_tiledb_group='assays', assay_uri='counts', gene_annotation_uri='gene_annotation', cell_metadata_uri='cell_metadata', sample_metadata_uri='sample_metadata', config_or_context=None)[source]¶
Bases:
objectA class that represent a collection of cells and their associated metadata in a TileDB backed store.
- __getitem__(args)[source]¶
Subset a
CellArrDataset.Mostly an alias to
get_slice().- Parameters:
args (
Union[int,Sequence,tuple]) –Integer indices, a boolean filter, or (if the current object is named) names specifying the ranges to be extracted.
Alternatively a tuple of length 1. The first entry specifies the rows (or cells) to retain based on their names or indices.
Alternatively a tuple of length 2. The first entry specifies the rows (or cells) to retain, while the second entry specifies the columns (or features/genes) to retain, based on their names or indices.
Note
Slices are inclusive of the upper bounds. This is the default TileDB behavior.
- Raises:
ValueError – If too many or too few slices provided.
- Return type:
- Returns:
A
CellArrDatasetSliceobject containing the cell_metadata, gene_annotation and the matrix.
- __init__(dataset_path, assay_tiledb_group='assays', assay_uri='counts', gene_annotation_uri='gene_annotation', cell_metadata_uri='cell_metadata', sample_metadata_uri='sample_metadata', config_or_context=None)[source]¶
Initialize a
CellArrDataset.- Parameters:
dataset_path (
str) –Path to the directory containing the TileDB stores. Usually the
output_pathfrom thebuild_cellarrdataset().You may provide any tiledb compatible base path (e.g. local directory, S3, minio etc.).
assay_tiledb_group (
str) –TileDB group containing the assay matrices.
If the provided build process was used, the matrices are stored in the “assay” TileDB group.
May be an empty string or None to specify no group. This is mostly for backwards compatibility of cellarr builds for versions before 0.3.
assay_uri (
Union[str,List[str]]) – Relative path to matrix store. Must be in tiledb group specified byassay_tiledb_group.gene_annotation_uri (
str) – Relative path to gene annotation store.cell_metadata_uri (
str) – Relative path to cell metadata store.sample_metadata_uri (
str) – Relative path to sample metadata store.config_or_context (
Union[Config,Ctx,None]) – Custom TileDB configuration or context. If None, default TileDB Config will be used.
- get_cell_metadata_column(column_name)[source]¶
Access a column from the
cell_metadatastore.- Parameters:
column_name (
str) – Name of the column or attribute. Usually one of the column names from ofget_cell_metadata_columns().- Return type:
- Returns:
A list of values for this column.
- get_cell_subset(subset, columns=None)[source]¶
Slice the
cell_metadatastore.- Parameters:
subset (
Union[slice,QueryCondition]) –A list of integer indices to subset the
cell_metadatastore.Alternatively, may also provide a
tiledb.QueryConditionto query the store.columns –
List of specific column names to access.
Defaults to None, in which case all columns are extracted.
- Return type:
- Returns:
A pandas Dataframe of the subset.
- get_cells_for_sample(sample)[source]¶
Slice and access all cells for a sample.
- Parameters:
A string specifying the sample index to access. This must be a value in the
cellarr_samplecolumn.Alternatively, an integer index may be provided to access the sample at the given position.
- Return type:
- Returns:
A
CellArrDatasetSliceobject containing the cell_metadata, gene_annotation and the matrix.
- get_gene_annotation_column(column_name)[source]¶
Access a column from the
gene_annotationstore.- Parameters:
column_name (
str) – Name of the column or attribute. Usually one of the column names from ofget_gene_annotation_columns().- Return type:
- Returns:
A list of values for this column.
- get_gene_subset(subset, columns=None)[source]¶
Slice the
gene_metadatastore.- Parameters:
subset (
Union[slice,List[str],QueryCondition]) –A list of integer indices to subset the
gene_metadatastore.Alternatively, may provide a
tiledb.QueryConditionto query the store.Alternatively, may provide a list of strings to match with the index of
gene_metadatastore.columns –
List of specific column names to access.
Defaults to None, in which case all columns are extracted.
- Return type:
- Returns:
A pandas Dataframe of the subset.
- get_matrix_subset(subset)[source]¶
Slice the
sample_metadatastore.- Parameters:
subset (
Union[int,Sequence,tuple]) – Any slice supported by TileDB’s array slicing. For more info refer to <TileDB docs https://docs.tiledb.com/main/how-to/arrays/reading-arrays/basic-reading>_.- Return type:
- Returns:
A dictionary containing the slice for each matrix in the path.
- get_sample_metadata_column(column_name)[source]¶
Access a column from the
sample_metadatastore.- Parameters:
column_name (
str) – Name of the column or attribute. Usually one of the column names from ofget_sample_metadata_columns().- Return type:
- Returns:
A list of values for this column.
- get_sample_subset(subset, columns=None)[source]¶
Slice the
sample_metadatastore.- Parameters:
subset (
Union[slice,QueryCondition]) –A list of integer indices to subset the
sample_metadatastore.Alternatively, may also provide a
tiledb.QueryConditionto query the store.columns –
List of specific column names to access.
Defaults to None, in which case all columns are extracted.
- Return type:
- Returns:
A pandas Dataframe of the subset.
- get_slice(cell_subset, gene_subset)[source]¶
Subset a
CellArrDataset.- Parameters:
cell_subset (
Union[slice,QueryCondition]) – Integer indices, a boolean filter, or (if the current object is named) names specifying the rows (or cells) to retain.gene_subset (
Union[slice,List[str],QueryCondition]) – Integer indices, a boolean filter, or (if the current object is named) names specifying the columns (or features/genes) to retain.
- Return type:
- Returns:
A
CellArrDatasetSliceobject containing the cell_metadata, gene_annotation and the matrix for the given slice ranges.
- classmethod initialize_from_paths(assay_uri, gene_annotation_uri, cell_metadata_uri, sample_metadata_uri, config_or_context=None)[source]¶
Initialize from absolute paths to all necessary tiledb files.
- Parameters:
assay_uri (
Union[str,List[str]]) – Absolute path to matrix store. Must be in tiledb group specified byassay_tiledb_group.gene_annotation_uri (
str) – Absolute path to gene annotation store.cell_metadata_uri (
str) – Absolute path to cell metadata store.sample_metadata_uri (
str) – Absolute path to sample metadata store.config_or_context (
Union[Config,Ctx,None]) – Custom TileDB configuration or context. If None, default TileDB Config will be used.
- Returns:
A CellArrDataset object.
- property shape¶
cellarr.CellArrDatasetSlice module¶
Class that represents a realized subset of the CellArrDataset.
This class provides a slice data class usually generated by the access
methods from
cellarr.CellArrDataset.CellArrDataset().
Example
from cellarr import (
CellArrDataset,
)
cd = CellArrDataset(
dataset_path="/path/to/cellar/dir"
)
gene_list = [
"gene_1",
"gene_95",
"gene_50",
]
result1 = cd[
0, gene_list
]
print(result1)
- class cellarr.CellArrDatasetSlice.CellArrDatasetSlice(cell_metadata, gene_annotation, matrix)[source]¶
Bases:
objectClass that represents a realized subset of the CellArrDataset.
- __annotations__ = {'cell_metadata': <class 'pandas.core.frame.DataFrame'>, 'gene_annotation': <class 'pandas.core.frame.DataFrame'>, 'matrix': typing.Any}¶
- __dataclass_fields__ = {'cell_metadata': Field(name='cell_metadata',type=<class 'pandas.core.frame.DataFrame'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'gene_annotation': Field(name='gene_annotation',type=<class 'pandas.core.frame.DataFrame'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'matrix': Field(name='matrix',type=typing.Any,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __init__(cell_metadata, gene_annotation, matrix)¶
- __match_args__ = ('cell_metadata', 'gene_annotation', 'matrix')¶
- property shape¶