cellarr_frame package

Submodules

cellarr_frame.base module

class cellarr_frame.base.CellArrayFrame(uri=None, tiledb_array_obj=None, mode=None, config_or_context=None)[source]

Bases: ABC

Abstract base class for TileDB dataframe operations.

__abstractmethods__ = frozenset({'__getitem__', 'append_dataframe', 'columns', 'get_shape', 'index', 'read_dataframe', 'shape', 'write_dataframe'})
abstractmethod __getitem__(key)[source]

Read a slice of the dataframe.

__init__(uri=None, tiledb_array_obj=None, mode=None, config_or_context=None)[source]

Initialize the object.

Parameters:
  • uri (Optional[str]) – URI to the array. Required if ‘tiledb_array_obj’ is not provided.

  • tiledb_array_obj (Optional[Array]) – Optional, an already opened tiledb.Array instance. If provided, ‘uri’ can be None, and ‘config_or_context’ is ignored.

  • mode (Optional[Literal['r', 'w', 'd', 'm']]) –

    Open the array object in read ‘r’, write ‘w’, modify ‘m’ mode, or delete ‘d’ mode.

    Defaults to None for automatic mode switching.

    If ‘tiledb_array_obj’ is provided, this mode should ideally match the mode of the provided array or be None.

  • config_or_context (Union[Config, Ctx, None]) –

    Optional config or context object. Ignored if ‘tiledb_array_obj’ is provided, as context will be derived from the object.

    Defaults to None.

abstractmethod append_dataframe(df, row_offset=None)[source]

Append a pandas DataFrame to the TileDB array.

Parameters:
  • df (DataFrame) – The pandas DataFrame to write.

  • row_offset (Optional[int]) – Row offset to write the rows to.

Return type:

None

property attr_names: List[str]

Get attribute names of the array.

abstract property columns: Index

Get the column names of the dataframe.

property dim_dtypes: List[dtype]

Get dimension dtypes of the array.

property dim_names: List[str]

Get dimension names of the array.

abstractmethod get_shape()[source]

Get the shape of the array (number of rows for dataframes).

Return type:

tuple

abstract property index: Index

Get the row index of the dataframe.

property mode: str | None

Get current array mode. If an external array is used, this is its open mode.

property ndim: int

Get number of dimensions.

property nonempty_domain: Tuple[Any, ...] | None

Get the non-empty domain of the array.

open_array(mode=None)[source]

Context manager for array operations.

Uses the externally provided array if available, otherwise opens from URI.

abstractmethod read_dataframe(columns=None, query=None, subset=None, **kwargs)[source]

Read a pandas DataFrame from the TileDB array.

Parameters:
  • subset (Union[slice, int, str, None]) – A slice or index to select rows.

  • columns (Optional[List[str]]) – A list of column names to read.

  • query (Optional[str]) – A TileDB query condition string.

  • **kwargs – Additional arguments for the read operation.

Return type:

DataFrame

Returns:

The pandas DataFrame.

abstract property shape: tuple

Get the shape of the dataframe.

abstractmethod write_dataframe(df, **kwargs)[source]

Write a pandas DataFrame to the TileDB array.

Parameters:
  • df (DataFrame) – The pandas DataFrame to write.

  • **kwargs – Additional arguments for the write operation.

Return type:

None

cellarr_frame.dense module

class cellarr_frame.dense.DenseCellArrayFrame(uri=None, tiledb_array_obj=None, mode=None, config_or_context=None)[source]

Bases: CellArrayFrame

Handler for dense dataframes using TileDB’s native dataframe support.

__abstractmethods__ = frozenset({})
__annotations__ = {}
__getitem__(key)[source]

Read a slice of the dataframe.

append_dataframe(df, row_offset=None)[source]

Append a pandas DataFrame to the dense TileDB array.

Parameters:
  • df (DataFrame) – The pandas DataFrame to write.

  • row_offset (Optional[int]) – Row offset to write the rows to.

Return type:

None

property columns: Index

Get the column names (attributes) of the dataframe.

property dtypes: Series

Return the dtypes of the columns/attributes in the array.

classmethod from_dataframe(uri, df, **kwargs)[source]

Create a DenseCellArrayFrame from a pandas DataFrame.

This uses tiledb.from_pandas to create the array, ensuring compatibility with TileDB’s native pandas integration.

Parameters:
  • uri (str) – URI to create the array at.

  • df (DataFrame) – Pandas DataFrame to write.

  • **kwargs – Additional arguments.

Return type:

DenseCellArrayFrame

get_shape()[source]

Get the shape (number of rows) of the dense dataframe array.

Return type:

tuple

property index: Index

Get the row index of the dataframe.

read_dataframe(columns=None, query=None, subset=None, primary_key_column_name=None, **kwargs)[source]

Read a pandas DataFrame from the TileDB array.

Parameters:
  • columns (Optional[List[str]]) – A list of column names to read.

  • query (Optional[str]) – A TileDB query condition string.

  • subset (Union[slice, int, str, None]) – A slice or index to select rows.

  • primary_key_column_name (Optional[str]) – Name of the primary key column.

  • **kwargs – Additional arguments for the read operation.

Return type:

DataFrame

Returns:

The pandas DataFrame.

property rows: Index

Alias for index to match Metadata interface.

property shape: tuple

Get the shape (rows, columns) of the dataframe.

write_dataframe(df, **kwargs)[source]

Write a dense pandas DataFrame to a 1D TileDB array.

This assumes the array was created using tiledb.from_pandas or the helper function. It appends the dataframe starting at row 0.

Parameters:
  • df (DataFrame) – The pandas DataFrame to write.

  • **kwargs – Additional arguments.

Return type:

None

cellarr_frame.helpers module

cellarr_frame.helpers.create_cellarr_frame(uri, sparse=False, df=None, **kwargs)[source]

Factory function to create a TileDB array for a CellArrayFrame.

Parameters:
  • uri (str) – The URI for the new TileDB array.

  • sparse (bool) – Whether to create a sparse or dense array.

  • df (DataFrame) – An optional pandas DataFrame to infer schema from.

  • **kwargs – Additional arguments for array creation.

cellarr_frame.sparse module

class cellarr_frame.sparse.SparseCellArrayFrame(uri=None, tiledb_array_obj=None, mode=None, config_or_context=None)[source]

Bases: CellArrayFrame

Handler for sparse dataframes using a 2D sparse TileDB array.

This class wraps a cellarr_array.SparseCellArray instance, assuming it’s a 2D sparse array with string/object data.

__abstractmethods__ = frozenset({})
__annotations__ = {}
__getitem__(key)[source]

Optimized slicing for the DataFrame.

__init__(uri=None, tiledb_array_obj=None, mode=None, config_or_context=None)[source]

Initialize the object.

Parameters:
  • uri (Optional[str]) – URI to the array.

  • tiledb_array_obj (Optional[Array]) – Optional, an already opened tiledb.Array instance.

  • mode (Optional[Literal['r', 'w', 'd', 'm']]) – Default open mode.

  • config_or_context (Union[Config, Ctx, None]) – Optional config or context object.

append_dataframe(df, row_offset=None)[source]

Append data points from a pandas DataFrame to the sparse TileDB array.

If row_offset is provided, adjusts the row indices of the appended data. Assumes integer row dimensions for offset calculation.

Parameters:
  • df (DataFrame) – The pandas DataFrame to write.

  • row_offset (Optional[int]) – Row offset to write the rows to.

Return type:

None

property columns: Index

Get the column names (unique values from 2nd dim) of the dataframe.

get_shape()[source]

Get the shape based on the non-empty domain for sparse arrays.

Return type:

tuple

property index: Index

Get the row index (unique values from 1st dim) of the dataframe.

read_dataframe(subset=None, columns=None, query=None, **kwargs)[source]

Read a pandas DataFrame from the TileDB array.

Parameters:
  • subset (Union[slice, int, str, None]) – A slice or index to select rows.

  • columns (Optional[List[str]]) – A list of column names to read.

  • query (Optional[str]) – A TileDB query condition string.

  • **kwargs – Additional arguments for the read operation.

Return type:

DataFrame

Returns:

The pandas DataFrame.

property shape: tuple

Get the shape (unique rows, unique columns) of the dataframe.

write_dataframe(df, **kwargs)[source]

Write a sparse pandas DataFrame to a 2D sparse TileDB array.

The DataFrame is converted to a coordinate format (row_idx, col_idx, value).

Parameters:
  • df (DataFrame) – The sparse pandas DataFrame to write.

  • **kwargs – Additional arguments for the write operation.

Return type:

None

Module contents