cellarr_array package

Submodules

cellarr_array.CellArray module

class cellarr_array.CellArray.CellArray(uri, attr='data', mode=None, config_or_context=None, validate=True)[source]

Bases: ABC

Abstract base class for TileDB array operations.

__abstractmethods__ = frozenset({'_direct_slice', '_multi_index', 'write_batch'})
__getitem__(key)[source]

Get item implementation that routes to either direct slicing or multi_index based on the type of indices provided.

Parameters:

key (Union[slice, Tuple[Union[slice, List[int]], ...]]) – Slice or list of indices for each dimension in the array.

__init__(uri, attr='data', mode=None, config_or_context=None, validate=True)[source]

Initialize the object.

Parameters:
  • uri (str) – URI to the array.

  • attr (str) – Attribute to access. Defaults to “data”.

  • mode (Optional[Literal['r', 'w', 'n', 'd']]) –

    Open the array object in read ‘r’, write ‘w’, modify exclusive ‘m’ mode, or delete ‘d’ mode.

    Defaults to None for automatic mode switching.

  • config_or_context (Union[Config, Ctx, None]) –

    Optional config or context object.

    Defaults to None.

  • validate (bool) – Whether to validate the attributes. Defaults to True.

property attr_names: List[str]

Get attribute names of the array.

consolidate(config=None)[source]

Consolidate array fragments.

Parameters:

config (Optional[ConsolidationConfig]) – Optional consolidation configuration.

Return type:

None

property dim_names: List[str]

Get dimension names of the array.

property mode: str | None

Get current array mode.

property ndim: int

Get number of dimensions.

property nonempty_domain: Tuple[int, ...]

Get array non-empty domain.

open_array(mode=None)[source]

Context manager for array operations.

Parameters:

mode (Optional[str]) – Override mode for this operation.

property shape: Tuple[int, ...]

Get array shape from schema domain.

vacuum()[source]

Remove deleted fragments from the array.

Return type:

None

abstractmethod write_batch(data, start_row, **kwargs)[source]

Write a batch of data to the array starting at the specified row.

Parameters:
  • data (Union[ndarray, spmatrix]) – Data to write (numpy array for dense, scipy sparse matrix for sparse).

  • start_row (int) – Starting row index for writing.

  • **kwargs – Additional arguments for write operation.

Return type:

None

cellarr_array.DenseCellArray module

class cellarr_array.DenseCellArray.DenseCellArray(uri, attr='data', mode=None, config_or_context=None, validate=True)[source]

Bases: CellArray

Implementation for dense TileDB arrays.

__abstractmethods__ = frozenset({})
__annotations__ = {}
write_batch(data, start_row, **kwargs)[source]

Write a batch of data to the dense array.

Parameters:
  • data (ndarray) – Numpy array to write.

  • start_row (int) – Starting row index for writing.

  • **kwargs – Additional arguments passed to TileDB write operation.

Raises:
  • TypeError – If input is not a numpy array.

  • ValueError – If dimensions don’t match or bounds are exceeded.

Return type:

None

cellarr_array.SparseCellArray module

class cellarr_array.SparseCellArray.SparseCellArray(uri, attr='data', mode=None, config_or_context=None, return_sparse=True, sparse_coerce=<class 'scipy.sparse._csr.csr_matrix'>)[source]

Bases: CellArray

Implementation for sparse TileDB arrays.

__abstractmethods__ = frozenset({})
__annotations__ = {}
__init__(uri, attr='data', mode=None, config_or_context=None, return_sparse=True, sparse_coerce=<class 'scipy.sparse._csr.csr_matrix'>)[source]

Initialize SparseCellArray.

write_batch(data, start_row, **kwargs)[source]

Write a batch of sparse data to the array.

Parameters:
  • data (Union[spmatrix, csc_matrix, coo_matrix]) – Scipy sparse matrix (CSR, CSC, or COO format).

  • start_row (int) – Starting row index for writing.

  • **kwargs – Additional arguments passed to TileDB write operation.

Raises:
  • TypeError – If input is not a sparse matrix.

  • ValueError – If dimensions don’t match or bounds are exceeded.

Return type:

None

cellarr_array.config module

class cellarr_array.config.CellArrConfig(tile_capacity=100000, cell_order='row-major', tile_order='row-major', coords_filters=<factory>, offsets_filters=<factory>, attrs_filters=<factory>, ctx_config=<factory>)[source]

Bases: object

Configuration class for TileDB array creation and access.

__annotations__ = {'attrs_filters': typing.Dict[str, typing.List[tiledb.filter.Filter]], 'cell_order': <class 'str'>, 'coords_filters': typing.List[tiledb.filter.Filter], 'ctx_config': typing.Dict[str, typing.Any], 'offsets_filters': typing.List[tiledb.filter.Filter], 'tile_capacity': <class 'int'>, 'tile_order': <class 'str'>}
__dataclass_fields__ = {'attrs_filters': Field(name='attrs_filters',type=typing.Dict[str, typing.List[tiledb.filter.Filter]],default=<dataclasses._MISSING_TYPE object>,default_factory=<function CellArrConfig.<lambda>>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'cell_order': Field(name='cell_order',type=<class 'str'>,default='row-major',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'coords_filters': Field(name='coords_filters',type=typing.List[tiledb.filter.Filter],default=<dataclasses._MISSING_TYPE object>,default_factory=<function CellArrConfig.<lambda>>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'ctx_config': Field(name='ctx_config',type=typing.Dict[str, typing.Any],default=<dataclasses._MISSING_TYPE object>,default_factory=<class 'dict'>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'offsets_filters': Field(name='offsets_filters',type=typing.List[tiledb.filter.Filter],default=<dataclasses._MISSING_TYPE object>,default_factory=<function CellArrConfig.<lambda>>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'tile_capacity': Field(name='tile_capacity',type=<class 'int'>,default=100000,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'tile_order': Field(name='tile_order',type=<class 'str'>,default='row-major',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)
__eq__(other)

Return self==value.

__hash__ = None
__init__(tile_capacity=100000, cell_order='row-major', tile_order='row-major', coords_filters=<factory>, offsets_filters=<factory>, attrs_filters=<factory>, ctx_config=<factory>)
__match_args__ = ('tile_capacity', 'cell_order', 'tile_order', 'coords_filters', 'offsets_filters', 'attrs_filters', 'ctx_config')
__post_init__()[source]

Convert filter configurations to TileDB Filter objects.

__repr__()

Return repr(self).

attrs_filters: Dict[str, List[Filter]]
cell_order: str = 'row-major'
coords_filters: List[Filter]
static create_filter(filter_config)[source]

Create a TileDB Filter object from configuration.

Return type:

Filter

ctx_config: Dict[str, Any]
offsets_filters: List[Filter]
tile_capacity: int = 100000
tile_order: str = 'row-major'
class cellarr_array.config.ConsolidationConfig(steps=100000, step_min_frags=2, step_max_frags=10, buffer_size=15000000000, total_budget=40000000000, num_threads=4, vacuum_after=True)[source]

Bases: object

Configuration for array consolidation.

__annotations__ = {'buffer_size': <class 'int'>, 'num_threads': <class 'int'>, 'step_max_frags': <class 'int'>, 'step_min_frags': <class 'int'>, 'steps': <class 'int'>, 'total_budget': <class 'int'>, 'vacuum_after': <class 'bool'>}
__dataclass_fields__ = {'buffer_size': Field(name='buffer_size',type=<class 'int'>,default=15000000000,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=4,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'step_max_frags': Field(name='step_max_frags',type=<class 'int'>,default=10,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'step_min_frags': Field(name='step_min_frags',type=<class 'int'>,default=2,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'steps': Field(name='steps',type=<class 'int'>,default=100000,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'total_budget': Field(name='total_budget',type=<class 'int'>,default=40000000000,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'vacuum_after': Field(name='vacuum_after',type=<class 'bool'>,default=True,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)
__eq__(other)

Return self==value.

__hash__ = None
__init__(steps=100000, step_min_frags=2, step_max_frags=10, buffer_size=15000000000, total_budget=40000000000, num_threads=4, vacuum_after=True)
__match_args__ = ('steps', 'step_min_frags', 'step_max_frags', 'buffer_size', 'total_budget', 'num_threads', 'vacuum_after')
__repr__()

Return repr(self).

buffer_size: int = 15000000000
num_threads: int = 4
step_max_frags: int = 10
step_min_frags: int = 2
steps: int = 100000
total_budget: int = 40000000000
vacuum_after: bool = True

cellarr_array.helpers module

class cellarr_array.helpers.SliceHelper[source]

Bases: object

Helper class for handling array slicing operations.

static is_contiguous_indices(indices)[source]

Check if indices can be represented as a contiguous slice.

Return type:

Optional[slice]

static normalize_index(idx, dim_size)[source]

Normalize index to handle negative indices and ensure consistency.

Return type:

Union[slice, List[int]]

cellarr_array.helpers.create_cellarray(uri, shape=None, attr_dtype=None, sparse=False, mode=None, config=None, dim_names=None, dim_dtypes=None, attr_name='data', **kwargs)[source]

Factory function to create a new TileDB cell array.

Parameters:
  • uri (str) – Array URI.

  • shape (Optional[Tuple[Optional[int], ...]]) – Optional array shape. If None or contains None, uses dtype max.

  • attr_dtype (Union[str, dtype, None]) – Data type for the attribute. Defaults to float32.

  • sparse (bool) – Whether to create a sparse array.

  • mode (str) – Array open mode. Defaults to None for automatic switching.

  • config (Optional[CellArrConfig]) – Optional configuration.

  • dim_names (Optional[List[str]]) – Optional list of dimension names.

  • dim_dtypes (Optional[List[Union[str, dtype]]]) – Optional list of dimension dtypes.

  • attr_name (str) – Name of the data attribute.

  • **kwargs – Additional arguments for array creation.

Returns:

CellArray instance.

Raises:

ValueError – If dimensions are invalid or inputs are inconsistent.

cellarr_array.helpers.create_group(output_path, group_name)[source]

Module contents