cellarr.utils package¶
Submodules¶
cellarr.utils.queryutils_tiledb_frame module¶
- cellarr.utils.queryutils_tiledb_frame.get_a_column(tiledb_obj, column_name)[source]¶
 Access column(s) from the TileDB object.
- cellarr.utils.queryutils_tiledb_frame.get_index(tiledb_obj)[source]¶
 Get the index of the TileDB object.
- cellarr.utils.queryutils_tiledb_frame.get_schema_names_frame(tiledb_obj)[source]¶
 Get Attributes from a TileDB object.
- cellarr.utils.queryutils_tiledb_frame.subset_array(tiledb_obj, row_subset, column_subset, shape)[source]¶
 Subset a TileDB storing array data.
Uses multi_index to slice.
- Parameters:
 - Return type:
 - Returns:
 if the TileDB object is sparse, returns a sparse array in a coo format otherwise a numpy object.
- cellarr.utils.queryutils_tiledb_frame.subset_frame(tiledb_obj, subset, columns, primary_key_column_name=None)[source]¶
 Subset a TileDB object.
- Parameters:
 tiledb_obj (
Array) – TileDB object to subset.A
sliceto subset.Alternatively, may also provide a TileDB query expression.
columns (
list) – List specifying the atrributes from the schema to extract.primary_key_column_name (
str) – The primary key to filter for matches when aQueryConditionis used.
- Return type:
 - Returns:
 A sliced DataFrame with the subset.
cellarr.utils.utils_anndata module¶
- cellarr.utils.utils_anndata.consolidate_duplicate_symbols(matrix, feature_ids, consolidate_duplicate_gene_func)[source]¶
 Consolidate duplicate gene symbols.
- Parameters:
 matrix (
Any) – data matrix with rows for cells and columns for genes.feature_ids (
List[str]) – List of feature ids along the column axis of the matrix.consolidate_duplicate_gene_func (
callable) –Function to consolidate when the AnnData object contains multiple rows with the same feature id or gene symbol.
Defaults to
sum().
- Return type:
 - Returns:
 AnnData object with duplicate gene symbols consolidated.
- cellarr.utils.utils_anndata.extract_anndata_info(h5ad_or_adata, var_feature_column='index', var_subset_columns=None, obs_subset_columns=None, num_threads=1)[source]¶
 Extract and generate the list of unique feature identifiers and cell counts across files.
- Parameters:
 h5ad_or_adata (
List[Union[str,AnnData]]) – List of anndata objects or path to h5ad files.var_feature_column (
str) – Column containing the feature ids (e.g. gene symbols). Defaults to “index”.var_subset_columns (
List[str]) – List of var columns to concatenate across all files. Defaults to None and no metadata columns will be extracted.obs_subset_columns (
dict) – List of obs columns to concatenate across all files. Defaults to None and no metadata columns will be extracted.num_threads (
int) – Number of threads to use. Defaults to 1.
- cellarr.utils.utils_anndata.remap_anndata(h5ad_or_adata, feature_set_order, var_feature_column='index', layer_matrix_name='counts', consolidate_duplicate_gene_func=<built-in function sum>)[source]¶
 Extract and remap the count matrix to the provided feature (gene) set order from the
AnnDataobject.- Parameters:
 adata –
Input
AnnDataobject.Alternatively, may also provide a path to the H5ad file.
The index of the var slot must contain the feature ids for the columns in the matrix.
feature_set_order (
dict) – A dictionary with the feature ids as keys and their index as value (e.g. gene symbols). The feature ids from theAnnDataobject are remapped to the feature order from this dictionary.var_feature_column (
str) – Column invarcontaining the feature ids (e.g. gene symbols). Defaults to the index of thevarslot.layer_matrix_name (
Union[str,List[str]]) –Layer containing the matrix to add to TileDB. Defaults to “counts”.
Alternatively, may provide a list of layers to extract and add to TileDB.
consolidate_duplicate_gene_func (
Union[callable,List[callable]]) –Function to consolidate when the AnnData object contains multiple rows with the same feature id or gene symbol.
Defaults to
sum().
- Return type:
 - Returns:
 A dictionary with the key containing the name of the layer and the output a
csr_matrixrepresentation of the assay matrix.
- cellarr.utils.utils_anndata.scan_for_cellcounts(cache)[source]¶
 Extract cell counts across files.
Needs calling
extract_anndata_info()first.- Parameters:
 cache – Info extracted by typically running
extract_anndata_info().- Return type:
 - Returns:
 List of cell counts across files.
- cellarr.utils.utils_anndata.scan_for_cellmetadata(cache)[source]¶
 Extract and merge all cell metadata data frames across files.
Needs calling
extract_anndata_info()first.- Parameters:
 cache – Info extracted by typically running
extract_anndata_info().- Return type:
 - Returns:
 A
pandas.Dataframecontaining all cell metadata.
- cellarr.utils.utils_anndata.scan_for_features(cache, unique=True)[source]¶
 Extract and generate the list of unique feature identifiers across files.
Needs calling
extract_anndata_info()first.- Parameters:
 cache – Info extracted by typically running
extract_anndata_info().unique (
bool) – Compute gene list to a unique list.
- Return type:
 - Returns:
 List of all unique feature ids across all files.
- cellarr.utils.utils_anndata.scan_for_features_annotations(cache, unique=True)[source]¶
 Extract and generate feature annotation metadata across all files in cache.
Needs calling
extract_anndata_info()first.- Parameters:
 cache – Info extracted by typically running
extract_anndata_info().unique (
bool) – Compute gene list to a unique list.
- Return type:
 - Returns:
 List of all unique feature ids across all files.