cellarr_array package¶
Submodules¶
cellarr_array.CellArray module¶
- class cellarr_array.CellArray.CellArray(uri, attr='data', mode=None, config_or_context=None, validate=True)[source]¶
Bases:
ABC
Abstract base class for TileDB array operations.
- __abstractmethods__ = frozenset({'_direct_slice', '_multi_index', 'write_batch'})¶
- __getitem__(key)[source]¶
Get item implementation that routes to either direct slicing or multi_index based on the type of indices provided.
- __init__(uri, attr='data', mode=None, config_or_context=None, validate=True)[source]¶
Initialize the object.
- Parameters:
uri (
str
) – URI to the array.attr (
str
) – Attribute to access. Defaults to “data”.mode (
Optional
[Literal
['r'
,'w'
,'n'
,'d'
]]) –Open the array object in read ‘r’, write ‘w’, modify exclusive ‘m’ mode, or delete ‘d’ mode.
Defaults to None for automatic mode switching.
config_or_context (
Union
[Config
,Ctx
,None
]) –Optional config or context object.
Defaults to None.
validate (
bool
) – Whether to validate the attributes. Defaults to True.
- consolidate(config=None)[source]¶
Consolidate array fragments.
- Parameters:
config (
Optional
[ConsolidationConfig
]) – Optional consolidation configuration.- Return type:
cellarr_array.DenseCellArray module¶
- class cellarr_array.DenseCellArray.DenseCellArray(uri, attr='data', mode=None, config_or_context=None, validate=True)[source]¶
Bases:
CellArray
Implementation for dense TileDB arrays.
- __abstractmethods__ = frozenset({})¶
- __annotations__ = {}¶
- write_batch(data, start_row, **kwargs)[source]¶
Write a batch of data to the dense array.
- Parameters:
- Raises:
TypeError – If input is not a numpy array.
ValueError – If dimensions don’t match or bounds are exceeded.
- Return type:
cellarr_array.SparseCellArray module¶
- class cellarr_array.SparseCellArray.SparseCellArray(uri, attr='data', mode=None, config_or_context=None, return_sparse=True, sparse_coerce=<class 'scipy.sparse._csr.csr_matrix'>)[source]¶
Bases:
CellArray
Implementation for sparse TileDB arrays.
- __abstractmethods__ = frozenset({})¶
- __annotations__ = {}¶
- __init__(uri, attr='data', mode=None, config_or_context=None, return_sparse=True, sparse_coerce=<class 'scipy.sparse._csr.csr_matrix'>)[source]¶
Initialize SparseCellArray.
- write_batch(data, start_row, **kwargs)[source]¶
Write a batch of sparse data to the array.
- Parameters:
data (
Union
[spmatrix
,csc_matrix
,coo_matrix
]) – Scipy sparse matrix (CSR, CSC, or COO format).start_row (
int
) – Starting row index for writing.**kwargs – Additional arguments passed to TileDB write operation.
- Raises:
TypeError – If input is not a sparse matrix.
ValueError – If dimensions don’t match or bounds are exceeded.
- Return type:
cellarr_array.config module¶
- class cellarr_array.config.CellArrConfig(tile_capacity=100000, cell_order='row-major', tile_order='row-major', coords_filters=<factory>, offsets_filters=<factory>, attrs_filters=<factory>, ctx_config=<factory>)[source]¶
Bases:
object
Configuration class for TileDB array creation and access.
- __annotations__ = {'attrs_filters': typing.Dict[str, typing.List[tiledb.filter.Filter]], 'cell_order': <class 'str'>, 'coords_filters': typing.List[tiledb.filter.Filter], 'ctx_config': typing.Dict[str, typing.Any], 'offsets_filters': typing.List[tiledb.filter.Filter], 'tile_capacity': <class 'int'>, 'tile_order': <class 'str'>}¶
- __dataclass_fields__ = {'attrs_filters': Field(name='attrs_filters',type=typing.Dict[str, typing.List[tiledb.filter.Filter]],default=<dataclasses._MISSING_TYPE object>,default_factory=<function CellArrConfig.<lambda>>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'cell_order': Field(name='cell_order',type=<class 'str'>,default='row-major',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'coords_filters': Field(name='coords_filters',type=typing.List[tiledb.filter.Filter],default=<dataclasses._MISSING_TYPE object>,default_factory=<function CellArrConfig.<lambda>>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'ctx_config': Field(name='ctx_config',type=typing.Dict[str, typing.Any],default=<dataclasses._MISSING_TYPE object>,default_factory=<class 'dict'>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'offsets_filters': Field(name='offsets_filters',type=typing.List[tiledb.filter.Filter],default=<dataclasses._MISSING_TYPE object>,default_factory=<function CellArrConfig.<lambda>>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'tile_capacity': Field(name='tile_capacity',type=<class 'int'>,default=100000,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'tile_order': Field(name='tile_order',type=<class 'str'>,default='row-major',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __init__(tile_capacity=100000, cell_order='row-major', tile_order='row-major', coords_filters=<factory>, offsets_filters=<factory>, attrs_filters=<factory>, ctx_config=<factory>)¶
- __match_args__ = ('tile_capacity', 'cell_order', 'tile_order', 'coords_filters', 'offsets_filters', 'attrs_filters', 'ctx_config')¶
- __repr__()¶
Return repr(self).
- class cellarr_array.config.ConsolidationConfig(steps=100000, step_min_frags=2, step_max_frags=10, buffer_size=15000000000, total_budget=40000000000, num_threads=4, vacuum_after=True)[source]¶
Bases:
object
Configuration for array consolidation.
- __annotations__ = {'buffer_size': <class 'int'>, 'num_threads': <class 'int'>, 'step_max_frags': <class 'int'>, 'step_min_frags': <class 'int'>, 'steps': <class 'int'>, 'total_budget': <class 'int'>, 'vacuum_after': <class 'bool'>}¶
- __dataclass_fields__ = {'buffer_size': Field(name='buffer_size',type=<class 'int'>,default=15000000000,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'num_threads': Field(name='num_threads',type=<class 'int'>,default=4,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'step_max_frags': Field(name='step_max_frags',type=<class 'int'>,default=10,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'step_min_frags': Field(name='step_min_frags',type=<class 'int'>,default=2,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'steps': Field(name='steps',type=<class 'int'>,default=100000,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'total_budget': Field(name='total_budget',type=<class 'int'>,default=40000000000,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'vacuum_after': Field(name='vacuum_after',type=<class 'bool'>,default=True,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __init__(steps=100000, step_min_frags=2, step_max_frags=10, buffer_size=15000000000, total_budget=40000000000, num_threads=4, vacuum_after=True)¶
- __match_args__ = ('steps', 'step_min_frags', 'step_max_frags', 'buffer_size', 'total_budget', 'num_threads', 'vacuum_after')¶
- __repr__()¶
Return repr(self).
cellarr_array.helpers module¶
- class cellarr_array.helpers.SliceHelper[source]¶
Bases:
object
Helper class for handling array slicing operations.
- cellarr_array.helpers.create_cellarray(uri, shape=None, attr_dtype=None, sparse=False, mode=None, config=None, dim_names=None, dim_dtypes=None, attr_name='data', **kwargs)[source]¶
Factory function to create a new TileDB cell array.
- Parameters:
uri (
str
) – Array URI.shape (
Optional
[Tuple
[Optional
[int
],...
]]) – Optional array shape. If None or contains None, uses dtype max.attr_dtype (
Union
[str
,dtype
,None
]) – Data type for the attribute. Defaults to float32.sparse (
bool
) – Whether to create a sparse array.mode (
str
) – Array open mode. Defaults to None for automatic switching.config (
Optional
[CellArrConfig
]) – Optional configuration.dim_names (
Optional
[List
[str
]]) – Optional list of dimension names.dim_dtypes (
Optional
[List
[Union
[str
,dtype
]]]) – Optional list of dimension dtypes.attr_name (
str
) – Name of the data attribute.**kwargs – Additional arguments for array creation.
- Returns:
CellArray instance.
- Raises:
ValueError – If dimensions are invalid or inputs are inconsistent.