cellarr.utils package¶
Submodules¶
cellarr.utils.queryutils_tiledb_frame module¶
- cellarr.utils.queryutils_tiledb_frame.get_a_column(tiledb_obj, column_name)[source]¶
Access column(s) from the TileDB object.
- cellarr.utils.queryutils_tiledb_frame.get_index(tiledb_obj)[source]¶
Get the index of the TileDB object.
- cellarr.utils.queryutils_tiledb_frame.get_schema_names_frame(tiledb_obj)[source]¶
Get Attributes from a TileDB object.
- cellarr.utils.queryutils_tiledb_frame.subset_array(tiledb_obj, row_subset, column_subset, shape)[source]¶
Subset a TileDB storing array data.
Uses multi_index to slice.
- Parameters:
- Return type:
- Returns:
if the TileDB object is sparse, returns a sparse array in a coo format otherwise a numpy object.
- cellarr.utils.queryutils_tiledb_frame.subset_frame(tiledb_obj, subset, columns, primary_key_column_name=None)[source]¶
Subset a TileDB object.
- Parameters:
tiledb_obj (
Array
) – TileDB object to subset.A
slice
to subset.Alternatively, may also provide a TileDB query expression.
columns (
list
) – List specifying the atrributes from the schema to extract.primary_key_column_name (
str
) – The primary key to filter for matches when aQueryCondition
is used.
- Return type:
- Returns:
A sliced DataFrame with the subset.
cellarr.utils.utils_anndata module¶
- cellarr.utils.utils_anndata.consolidate_duplicate_symbols(matrix, feature_ids, consolidate_duplicate_gene_func)[source]¶
Consolidate duplicate gene symbols.
- Parameters:
matrix (
Any
) – data matrix with rows for cells and columns for genes.feature_ids (
List
[str
]) – List of feature ids along the column axis of the matrix.consolidate_duplicate_gene_func (
callable
) –Function to consolidate when the AnnData object contains multiple rows with the same feature id or gene symbol.
Defaults to
sum()
.
- Return type:
- Returns:
AnnData object with duplicate gene symbols consolidated.
- cellarr.utils.utils_anndata.extract_anndata_info(h5ad_or_adata, var_feature_column='index', var_subset_columns=None, obs_subset_columns=None, num_threads=1)[source]¶
Extract and generate the list of unique feature identifiers and cell counts across files.
- Parameters:
h5ad_or_adata (
List
[Union
[str
,AnnData
]]) – List of anndata objects or path to h5ad files.var_feature_column (
str
) – Column containing the feature ids (e.g. gene symbols). Defaults to “index”.var_subset_columns (
List
[str
]) – List of var columns to concatenate across all files. Defaults to None and no metadata columns will be extracted.obs_subset_columns (
dict
) – List of obs columns to concatenate across all files. Defaults to None and no metadata columns will be extracted.num_threads (
int
) – Number of threads to use. Defaults to 1.
- cellarr.utils.utils_anndata.remap_anndata(h5ad_or_adata, feature_set_order, var_feature_column='index', layer_matrix_name='counts', consolidate_duplicate_gene_func=<built-in function sum>)[source]¶
Extract and remap the count matrix to the provided feature (gene) set order from the
AnnData
object.- Parameters:
adata –
Input
AnnData
object.Alternatively, may also provide a path to the H5ad file.
The index of the var slot must contain the feature ids for the columns in the matrix.
feature_set_order (
dict
) – A dictionary with the feature ids as keys and their index as value (e.g. gene symbols). The feature ids from theAnnData
object are remapped to the feature order from this dictionary.var_feature_column (
str
) – Column invar
containing the feature ids (e.g. gene symbols). Defaults to the index of thevar
slot.layer_matrix_name (
Union
[str
,List
[str
]]) –Layer containing the matrix to add to TileDB. Defaults to “counts”.
Alternatively, may provide a list of layers to extract and add to TileDB.
consolidate_duplicate_gene_func (
Union
[callable
,List
[callable
]]) –Function to consolidate when the AnnData object contains multiple rows with the same feature id or gene symbol.
Defaults to
sum()
.
- Return type:
- Returns:
A dictionary with the key containing the name of the layer and the output a
csr_matrix
representation of the assay matrix.
- cellarr.utils.utils_anndata.scan_for_cellcounts(cache)[source]¶
Extract cell counts across files.
Needs calling
extract_anndata_info()
first.- Parameters:
cache – Info extracted by typically running
extract_anndata_info()
.- Return type:
- Returns:
List of cell counts across files.
- cellarr.utils.utils_anndata.scan_for_cellmetadata(cache)[source]¶
Extract and merge all cell metadata data frames across files.
Needs calling
extract_anndata_info()
first.- Parameters:
cache – Info extracted by typically running
extract_anndata_info()
.- Return type:
- Returns:
A
pandas.Dataframe
containing all cell metadata.
- cellarr.utils.utils_anndata.scan_for_features(cache, unique=True)[source]¶
Extract and generate the list of unique feature identifiers across files.
Needs calling
extract_anndata_info()
first.- Parameters:
cache – Info extracted by typically running
extract_anndata_info()
.unique (
bool
) – Compute gene list to a unique list.
- Return type:
- Returns:
List of all unique feature ids across all files.
- cellarr.utils.utils_anndata.scan_for_features_annotations(cache, unique=True)[source]¶
Extract and generate feature annotation metadata across all files in cache.
Needs calling
extract_anndata_info()
first.- Parameters:
cache – Info extracted by typically running
extract_anndata_info()
.unique (
bool
) – Compute gene list to a unique list.
- Return type:
- Returns:
List of all unique feature ids across all files.