cellarr package¶
Subpackages¶
- cellarr.build package
- Submodules
- cellarr.build.build_cellarr_steps module
SlurmBuilder
SlurmBuilder.__init__()
SlurmBuilder.create_array_script()
SlurmBuilder.create_slurm_script()
SlurmBuilder.submit_cell_metadata_job()
SlurmBuilder.submit_final_assembly()
SlurmBuilder.submit_gene_annotation_job()
SlurmBuilder.submit_job()
SlurmBuilder.submit_matrix_processing()
SlurmBuilder.submit_sample_metadata_job()
main()
- cellarr.build.build_cellarrdataset module
- cellarr.build.build_options module
CellMetadataOptions
CellMetadataOptions.skip
CellMetadataOptions.dtype
CellMetadataOptions.tiledb_store_name
CellMetadataOptions.column_names
CellMetadataOptions.column_types
CellMetadataOptions.__annotations__
CellMetadataOptions.__dataclass_fields__
CellMetadataOptions.__dataclass_params__
CellMetadataOptions.__eq__()
CellMetadataOptions.__hash__
CellMetadataOptions.__init__()
CellMetadataOptions.__match_args__
CellMetadataOptions.__repr__()
CellMetadataOptions.column_types
CellMetadataOptions.dtype
CellMetadataOptions.skip
CellMetadataOptions.tiledb_store_name
GeneAnnotationOptions
GeneAnnotationOptions.feature_column
GeneAnnotationOptions.skip
GeneAnnotationOptions.dtype
GeneAnnotationOptions.tiledb_store_name
GeneAnnotationOptions.column_types
GeneAnnotationOptions.__annotations__
GeneAnnotationOptions.__dataclass_fields__
GeneAnnotationOptions.__dataclass_params__
GeneAnnotationOptions.__eq__()
GeneAnnotationOptions.__hash__
GeneAnnotationOptions.__init__()
GeneAnnotationOptions.__match_args__
GeneAnnotationOptions.__repr__()
GeneAnnotationOptions.column_types
GeneAnnotationOptions.dtype
GeneAnnotationOptions.feature_column
GeneAnnotationOptions.skip
GeneAnnotationOptions.tiledb_store_name
MatrixOptions
MatrixOptions.matrix_name
MatrixOptions.matrix_attr_name
MatrixOptions.consolidate_duplicate_gene_func
MatrixOptions.skip
MatrixOptions.dtype
MatrixOptions.tiledb_store_name
MatrixOptions.__annotations__
MatrixOptions.__dataclass_fields__
MatrixOptions.__dataclass_params__
MatrixOptions.__eq__()
MatrixOptions.__hash__
MatrixOptions.__init__()
MatrixOptions.__match_args__
MatrixOptions.__repr__()
MatrixOptions.consolidate_duplicate_gene_func()
MatrixOptions.dtype
MatrixOptions.matrix_attr_name
MatrixOptions.matrix_name
MatrixOptions.skip
MatrixOptions.tiledb_store_name
SampleMetadataOptions
SampleMetadataOptions.skip
SampleMetadataOptions.dtype
SampleMetadataOptions.tiledb_store_name
SampleMetadataOptions.column_types
SampleMetadataOptions.__annotations__
SampleMetadataOptions.__dataclass_fields__
SampleMetadataOptions.__dataclass_params__
SampleMetadataOptions.__eq__()
SampleMetadataOptions.__hash__
SampleMetadataOptions.__init__()
SampleMetadataOptions.__match_args__
SampleMetadataOptions.__repr__()
SampleMetadataOptions.column_types
SampleMetadataOptions.dtype
SampleMetadataOptions.skip
SampleMetadataOptions.tiledb_store_name
- cellarr.build.buildutils_tiledb_array module
- cellarr.build.buildutils_tiledb_frame module
- Module contents
- cellarr.ml package
- Submodules
- cellarr.ml.autoencoder module
AutoEncoder
AutoEncoder.__annotations__
AutoEncoder.__init__()
AutoEncoder.configure_optimizers()
AutoEncoder.forward()
AutoEncoder.get_loss()
AutoEncoder.load_state()
AutoEncoder.on_validation_epoch_end()
AutoEncoder.on_validation_epoch_start()
AutoEncoder.save_all()
AutoEncoder.training_step()
AutoEncoder.validation_step()
Decoder
Encoder
- cellarr.ml.dataloader module
- Module contents
- cellarr.slurm package
- Submodules
- cellarr.slurm.final_assembly module
- cellarr.slurm.finalize_matrix module
- cellarr.slurm.process_cell_metadata module
- cellarr.slurm.process_gene_annotation module
- cellarr.slurm.process_matrix module
- cellarr.slurm.process_matrix_all module
- cellarr.slurm.process_sample_metadata module
- Module contents
- cellarr.utils package
Submodules¶
cellarr.CellArrDataset module¶
Query the CellArrDataset.
This class provides methods to access the directory containing the
generated TileDB files usually using the
build_cellarrdataset()
.
Example
from cellarr import (
CellArrDataset,
)
cd = CellArrDataset(
dataset_path="/path/to/cellar/dir"
)
gene_list = [
"gene_1",
"gene_95",
"gene_50",
]
result1 = cd[
0, gene_list
]
print(result1)
- class cellarr.CellArrDataset.CellArrCellIterator(obj)[source]¶
Bases:
object
Cell iterator to a
CellArrDataset
object.- __init__(obj)[source]¶
Initialize the iterator.
- Parameters:
obj (
CellArrDataset
) – Source object to iterate.
- class cellarr.CellArrDataset.CellArrDataset(dataset_path, assay_tiledb_group='assays', assay_uri='counts', gene_annotation_uri='gene_annotation', cell_metadata_uri='cell_metadata', sample_metadata_uri='sample_metadata', config_or_context=None)[source]¶
Bases:
object
A class that represent a collection of cells and their associated metadata in a TileDB backed store.
- __getitem__(args)[source]¶
Subset a
CellArrDataset
.Mostly an alias to
get_slice()
.- Parameters:
args (
Union
[int
,Sequence
,tuple
]) –Integer indices, a boolean filter, or (if the current object is named) names specifying the ranges to be extracted.
Alternatively a tuple of length 1. The first entry specifies the rows (or cells) to retain based on their names or indices.
Alternatively a tuple of length 2. The first entry specifies the rows (or cells) to retain, while the second entry specifies the columns (or features/genes) to retain, based on their names or indices.
Note
Slices are inclusive of the upper bounds. This is the default TileDB behavior.
- Raises:
ValueError – If too many or too few slices provided.
- Return type:
- Returns:
A
CellArrDatasetSlice
object containing the cell_metadata, gene_annotation and the matrix.
- __init__(dataset_path, assay_tiledb_group='assays', assay_uri='counts', gene_annotation_uri='gene_annotation', cell_metadata_uri='cell_metadata', sample_metadata_uri='sample_metadata', config_or_context=None)[source]¶
Initialize a
CellArrDataset
.- Parameters:
dataset_path (
str
) –Path to the directory containing the TileDB stores. Usually the
output_path
from thebuild_cellarrdataset()
.You may provide any tiledb compatible base path (e.g. local directory, S3, minio etc.).
assay_tiledb_group (
str
) –TileDB group containing the assay matrices.
If the provided build process was used, the matrices are stored in the “assay” TileDB group.
May be an empty string or None to specify no group. This is mostly for backwards compatibility of cellarr builds for versions before 0.3.
assay_uri (
Union
[str
,List
[str
]]) – Relative path to matrix store. Must be in tiledb group specified byassay_tiledb_group
.gene_annotation_uri (
str
) – Relative path to gene annotation store.cell_metadata_uri (
str
) – Relative path to cell metadata store.sample_metadata_uri (
str
) – Relative path to sample metadata store.config_or_context (
Union
[Config
,Ctx
,None
]) – Custom TileDB configuration or context. If None, default TileDB Config will be used.
- get_cell_metadata_column(column_name)[source]¶
Access a column from the
cell_metadata
store.- Parameters:
column_name (
str
) – Name of the column or attribute. Usually one of the column names from ofget_cell_metadata_columns()
.- Return type:
- Returns:
A list of values for this column.
- get_cell_subset(subset, columns=None)[source]¶
Slice the
cell_metadata
store.- Parameters:
subset (
Union
[slice
,QueryCondition
]) –A list of integer indices to subset the
cell_metadata
store.Alternatively, may also provide a
tiledb.QueryCondition
to query the store.columns –
List of specific column names to access.
Defaults to None, in which case all columns are extracted.
- Return type:
- Returns:
A pandas Dataframe of the subset.
- get_cells_for_sample(sample)[source]¶
Slice and access all cells for a sample.
- Parameters:
A string specifying the sample index to access. This must be a value in the
cellarr_sample
column.Alternatively, an integer index may be provided to access the sample at the given position.
- Return type:
- Returns:
A
CellArrDatasetSlice
object containing the cell_metadata, gene_annotation and the matrix.
- get_gene_annotation_column(column_name)[source]¶
Access a column from the
gene_annotation
store.- Parameters:
column_name (
str
) – Name of the column or attribute. Usually one of the column names from ofget_gene_annotation_columns()
.- Return type:
- Returns:
A list of values for this column.
- get_gene_subset(subset, columns=None)[source]¶
Slice the
gene_metadata
store.- Parameters:
subset (
Union
[slice
,List
[str
],QueryCondition
]) –A list of integer indices to subset the
gene_metadata
store.Alternatively, may provide a
tiledb.QueryCondition
to query the store.Alternatively, may provide a list of strings to match with the index of
gene_metadata
store.columns –
List of specific column names to access.
Defaults to None, in which case all columns are extracted.
- Return type:
- Returns:
A pandas Dataframe of the subset.
- get_matrix_subset(subset)[source]¶
Slice the
sample_metadata
store.- Parameters:
subset (
Union
[int
,Sequence
,tuple
]) – Any slice supported by TileDB’s array slicing. For more info refer to <TileDB docs https://docs.tiledb.com/main/how-to/arrays/reading-arrays/basic-reading>_.- Return type:
- Returns:
A dictionary containing the slice for each matrix in the path.
- get_sample_metadata_column(column_name)[source]¶
Access a column from the
sample_metadata
store.- Parameters:
column_name (
str
) – Name of the column or attribute. Usually one of the column names from ofget_sample_metadata_columns()
.- Return type:
- Returns:
A list of values for this column.
- get_sample_subset(subset, columns=None)[source]¶
Slice the
sample_metadata
store.- Parameters:
subset (
Union
[slice
,QueryCondition
]) –A list of integer indices to subset the
sample_metadata
store.Alternatively, may also provide a
tiledb.QueryCondition
to query the store.columns –
List of specific column names to access.
Defaults to None, in which case all columns are extracted.
- Return type:
- Returns:
A pandas Dataframe of the subset.
- get_slice(cell_subset, gene_subset)[source]¶
Subset a
CellArrDataset
.- Parameters:
cell_subset (
Union
[slice
,QueryCondition
]) – Integer indices, a boolean filter, or (if the current object is named) names specifying the rows (or cells) to retain.gene_subset (
Union
[slice
,List
[str
],QueryCondition
]) – Integer indices, a boolean filter, or (if the current object is named) names specifying the columns (or features/genes) to retain.
- Return type:
- Returns:
A
CellArrDatasetSlice
object containing the cell_metadata, gene_annotation and the matrix for the given slice ranges.
- classmethod initialize_from_paths(assay_uri, gene_annotation_uri, cell_metadata_uri, sample_metadata_uri, config_or_context=None)[source]¶
Initialize from absolute paths to all necessary tiledb files.
- Parameters:
assay_uri (
Union
[str
,List
[str
]]) – Absolute path to matrix store. Must be in tiledb group specified byassay_tiledb_group
.gene_annotation_uri (
str
) – Absolute path to gene annotation store.cell_metadata_uri (
str
) – Absolute path to cell metadata store.sample_metadata_uri (
str
) – Absolute path to sample metadata store.config_or_context (
Union
[Config
,Ctx
,None
]) – Custom TileDB configuration or context. If None, default TileDB Config will be used.
- Returns:
A CellArrDataset object.
- property shape¶
cellarr.CellArrDatasetSlice module¶
Class that represents a realized subset of the CellArrDataset.
This class provides a slice data class usually generated by the access
methods from
cellarr.CellArrDataset.CellArrDataset()
.
Example
from cellarr import (
CellArrDataset,
)
cd = CellArrDataset(
dataset_path="/path/to/cellar/dir"
)
gene_list = [
"gene_1",
"gene_95",
"gene_50",
]
result1 = cd[
0, gene_list
]
print(result1)
- class cellarr.CellArrDatasetSlice.CellArrDatasetSlice(cell_metadata, gene_annotation, matrix)[source]¶
Bases:
object
Class that represents a realized subset of the CellArrDataset.
- __annotations__ = {'cell_metadata': <class 'pandas.core.frame.DataFrame'>, 'gene_annotation': <class 'pandas.core.frame.DataFrame'>, 'matrix': typing.Any}¶
- __dataclass_fields__ = {'cell_metadata': Field(name='cell_metadata',type=<class 'pandas.core.frame.DataFrame'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'gene_annotation': Field(name='gene_annotation',type=<class 'pandas.core.frame.DataFrame'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'matrix': Field(name='matrix',type=typing.Any,default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __init__(cell_metadata, gene_annotation, matrix)¶
- __match_args__ = ('cell_metadata', 'gene_annotation', 'matrix')¶
- property shape¶