cellarr.utils package

Submodules

cellarr.utils.queryutils_tiledb_frame module

cellarr.utils.queryutils_tiledb_frame.get_a_column(tiledb_obj, column_name)[source]

Access column(s) from the TileDB object.

Parameters:
  • tiledb_obj (Array) – A TileDB object.

  • column_name (Union[str, List[str]]) – Name(s) of the column to access.

Return type:

list

Returns:

List containing the column values.

cellarr.utils.queryutils_tiledb_frame.get_index(tiledb_obj)[source]

Get the index of the TileDB object.

Parameters:

tiledb_obj (Array) – A TileDB object.

Return type:

list

Returns:

A list containing the index values.

cellarr.utils.queryutils_tiledb_frame.get_schema_names_frame(tiledb_obj)[source]

Get Attributes from a TileDB object.

Parameters:

tiledb_obj (Array) – A TileDB object.

Return type:

List[str]

Returns:

List of schema attributes.

cellarr.utils.queryutils_tiledb_frame.subset_array(tiledb_obj, row_subset, column_subset, shape)[source]

Subset a TileDB storing array data.

Uses multi_index to slice.

Parameters:
Return type:

Union[ndarray, csr_matrix]

Returns:

if the TileDB object is sparse, returns a sparse array in a coo format otherwise a numpy object.

cellarr.utils.queryutils_tiledb_frame.subset_frame(tiledb_obj, subset, columns, primary_key_column_name=None)[source]

Subset a TileDB object.

Parameters:
  • tiledb_obj (Array) – TileDB object to subset.

  • subset (Union[slice, str]) –

    A slice to subset.

    Alternatively, may also provide a TileDB query expression.

  • columns (list) – List specifying the atrributes from the schema to extract.

  • primary_key_column_name (str) – The primary key to filter for matches when a QueryCondition is used.

Return type:

DataFrame

Returns:

A sliced DataFrame with the subset.

cellarr.utils.utils_anndata module

cellarr.utils.utils_anndata.consolidate_duplicate_symbols(matrix, feature_ids, consolidate_duplicate_gene_func)[source]

Consolidate duplicate gene symbols.

Parameters:
  • matrix (Any) – data matrix with rows for cells and columns for genes.

  • feature_ids (List[str]) – List of feature ids along the column axis of the matrix.

  • consolidate_duplicate_gene_func (callable) –

    Function to consolidate when the AnnData object contains multiple rows with the same feature id or gene symbol.

    Defaults to sum().

Return type:

AnnData

Returns:

AnnData object with duplicate gene symbols consolidated.

cellarr.utils.utils_anndata.extract_anndata_info(h5ad_or_adata, var_feature_column='index', var_subset_columns=None, obs_subset_columns=None, num_threads=1)[source]

Extract and generate the list of unique feature identifiers and cell counts across files.

Parameters:
  • h5ad_or_adata (List[Union[str, AnnData]]) – List of anndata objects or path to h5ad files.

  • var_feature_column (str) – Column containing the feature ids (e.g. gene symbols). Defaults to “index”.

  • var_subset_columns (List[str]) – List of var columns to concatenate across all files. Defaults to None and no metadata columns will be extracted.

  • obs_subset_columns (dict) – List of obs columns to concatenate across all files. Defaults to None and no metadata columns will be extracted.

  • num_threads (int) – Number of threads to use. Defaults to 1.

cellarr.utils.utils_anndata.remap_anndata(h5ad_or_adata, feature_set_order, var_feature_column='index', layer_matrix_name='counts', consolidate_duplicate_gene_func=<built-in function sum>)[source]

Extract and remap the count matrix to the provided feature (gene) set order from the AnnData object.

Parameters:
  • adata

    Input AnnData object.

    Alternatively, may also provide a path to the H5ad file.

    The index of the var slot must contain the feature ids for the columns in the matrix.

  • feature_set_order (dict) – A dictionary with the feature ids as keys and their index as value (e.g. gene symbols). The feature ids from the AnnData object are remapped to the feature order from this dictionary.

  • var_feature_column (str) – Column in var containing the feature ids (e.g. gene symbols). Defaults to the index of the var slot.

  • layer_matrix_name (Union[str, List[str]]) –

    Layer containing the matrix to add to TileDB. Defaults to “counts”.

    Alternatively, may provide a list of layers to extract and add to TileDB.

  • consolidate_duplicate_gene_func (Union[callable, List[callable]]) –

    Function to consolidate when the AnnData object contains multiple rows with the same feature id or gene symbol.

    Defaults to sum().

Return type:

Dict[str, csr_matrix]

Returns:

A dictionary with the key containing the name of the layer and the output a csr_matrix representation of the assay matrix.

cellarr.utils.utils_anndata.scan_for_cellcounts(cache)[source]

Extract cell counts across files.

Needs calling extract_anndata_info() first.

Parameters:

cache – Info extracted by typically running extract_anndata_info().

Return type:

List[int]

Returns:

List of cell counts across files.

cellarr.utils.utils_anndata.scan_for_cellmetadata(cache)[source]

Extract and merge all cell metadata data frames across files.

Needs calling extract_anndata_info() first.

Parameters:

cache – Info extracted by typically running extract_anndata_info().

Return type:

List[int]

Returns:

A pandas.Dataframe containing all cell metadata.

cellarr.utils.utils_anndata.scan_for_features(cache, unique=True)[source]

Extract and generate the list of unique feature identifiers across files.

Needs calling extract_anndata_info() first.

Parameters:
Return type:

List[str]

Returns:

List of all unique feature ids across all files.

cellarr.utils.utils_anndata.scan_for_features_annotations(cache, unique=True)[source]

Extract and generate feature annotation metadata across all files in cache.

Needs calling extract_anndata_info() first.

Parameters:
Return type:

List[str]

Returns:

List of all unique feature ids across all files.

Module contents