daf.storage.interface

The types here define the abstract interface implemented by all daf storage format adapters. This interface focuses on simplicity to to make it easier to implement new adapters for specific formats, which makes it inconvenient to actually use. For a more usable interface, see the DafReader and DafWriter classes.

For example, we only require storage objects to accept is_optimal is_frozen MatrixInRows 2D data, but we allow storage objects to return almost anything they happen to contain (as long as it is Known2D data). This simplifies writing storage format adapters that access arbitrary data.

In general daf users would not be interested in the abstract storage interface defined here, other than to construct storage objects (using the concrete implementation constructors) and possibly accessing .name, .description, and .as_reader.

It is the higher level DafReader and DafWriter classes which will actually use the API exposed here. It is still documented as it is exported from the package, and it gives deeper insight into how daf works. In particular, storage objects do not compute queries; all access is done by exact names which either exist or do not exist in the storage. When used by the higher level classes, these names might be complex (represent the result of a query), but this is not known by the storage objects.

Implementing a new storage format adapter requires implementing the abstract methods of StorageReader and StorageWriter. These are simplified versions of the above and are omitted from the documentation to reduce clutter. If you wish to implement an adapter to a new storage format, you are advised to look at the sources and consider the existing implementations (in particular, MemoryStorage) as a starting point.

Classes:

StorageReader(*, name)

Low-level read-only storage of data in axes in some format.

StorageWriter(*, name)

Low-level read-write storage of data in axes in formats.

Functions:

prefix(text, char)

Return the characters in the text before the separator char (which need not exist).

suffix(text, char)

Return the characters in the text after the separator char (which must exist).

class daf.storage.interface.StorageReader(*, name: str)[source]

Bases: ABC

Low-level read-only storage of data in axes in some format.

This is an abstract base class defining an interface which is implemented by the concrete storage format adapters.

Note

The abstract methods are not public; if you want to implement a storage adapter yourself, look at the source code. You can use the simple MemoryStorage class as a starting point.

Attributes:

name

The name of the storage for messages.

Methods:

as_reader()

Return the storage as a StorageReader.

description(*[, detail, deep, description])

Return a dictionary describing the daf data set, useful for debugging.

item_names()

Return a collection of the names of the 0D data items that exists in the storage.

has_item(name)

Check whether the name 0D data item exists in the storage.

get_item(name)

Access a 0D data item from the storage (which must exist) by its name.

axis_names()

Return a collection of the names of the axes that exist in the storage.

has_axis(axis)

Check whether the axis exists in the storage.

axis_size(axis)

Get the number of entries along some axis (which must exist).

axis_entries(axis)

Get the unique name of each entry in the storage along some axis (which must exist).

data1d_names(axis)

Return the names of the 1D data that exists in the storage for a specific axis (which must exist).

has_data1d(name)

Check whether the name 1D data exists.

get_data1d(name)

Get the name 1D data (which must exist) as an Known1D.

data2d_names(axes)

Return the names of the 2D data that exists in the storage for a specific pair of axes (which must exist).

has_data2d(name)

Check whether the name 2D data exists.

get_data2d(name)

Get the name 2D data (which must exist).

name

The name of the storage for messages.

as_reader() StorageReader[source]

Return the storage as a StorageReader.

This is a no-op (returns self) for “real” read-only storage, but for writable storage, it returns a “real” read-only wrapper object (that does not implement the writing methods). This ensures that the result can’t be used to modify the data if passed by mistake to a function that takes a StorageWriter.

description(*, detail: bool = False, deep: bool = False, description: Optional[Dict] = None) Dict[source]

Return a dictionary describing the daf data set, useful for debugging.

The result uses the name field as a key, with a nested dictionary value with the keys class, axes, and data.

If not detail, the axes will contain a dictionary mapping each axis to a description of its size, and the data will contain just a list of the data names, data, except for StorageView where it will be a dictionary mapping each exposed name to the base name.

If detail, both the axes and the data will contain a mapping providing additional data_description of the relevant data.

If deep, there may be additional keys describing the internal storage.

If description is provided, collect the result into it. This allows collecting multiple data set descriptions into a single overall system state description.

Note

Calling repr(storage) returns yaml.dump(storage.description(detail=True)).

item_names() Collection[str][source]

Return a collection of the names of the 0D data items that exists in the storage.

has_item(name: str) bool[source]

Check whether the name 0D data item exists in the storage.

get_item(name: str) Any[source]

Access a 0D data item from the storage (which must exist) by its name.

axis_names() Collection[str][source]

Return a collection of the names of the axes that exist in the storage.

has_axis(axis: str) bool[source]

Check whether the axis exists in the storage.

axis_size(axis: str) int[source]

Get the number of entries along some axis (which must exist).

axis_entries(axis: str) Union[ndarray, Series][source]

Get the unique name of each entry in the storage along some axis (which must exist).

data1d_names(axis: str) Collection[str][source]

Return the names of the 1D data that exists in the storage for a specific axis (which must exist).

The returned names are in the format axis#name which uniquely identifies the 1D data.

has_data1d(name: str) bool[source]

Check whether the name 1D data exists.

The name must be in the format axis#name which uniquely identifies the 1D data.

get_data1d(name: str) Union[ndarray, Series][source]

Get the name 1D data (which must exist) as an Known1D.

The name must be in the format axis#name which uniquely identifies the 1D data.

data2d_names(axes: Union[str, Tuple[str, str]]) Collection[str][source]

Return the names of the 2D data that exists in the storage for a specific pair of axes (which must exist).

The returned names are in the format rows_axis,columns_axis#name which uniquely identifies the 2D data.

If two copies of the data exist in transposed axes order, then two different names will be returned.

has_data2d(name: str) bool[source]

Check whether the name 2D data exists.

The name must be in the format rows_axis,columns_axis#name which uniquely identifies the 2D data.

get_data2d(name: str) daf.typing.unions.Known2D[source]

Get the name 2D data (which must exist).

The name must be in the format rows_axis,columns_axis#name which uniquely identifies the 2D data.

class daf.storage.interface.StorageWriter(*, name: str)[source]

Bases: StorageReader

Low-level read-write storage of data in axes in formats.

This is an abstract base class defining an interface which is implemented by the concrete storage formats.

Note

The abstract methods are not public; if you want to implement a storage adapter yourself, look at the source code. You can use the simple MemoryStorage class as a starting point.

Todo

The StorageWriter interface needs to be extended to allow for deleting data (dangerous as this may be). Currently the only supported way is to create a StorageView that hides some data, saving that into a new storage, and removing the old storage, which is “unreasonable” even though this is a very rare operation.

Methods:

as_reader()

Return the storage as a StorageReader.

update(storage, *[, overwrite])

Update the storage with a copy of all the data from another storage.

set_item(name, item, *[, overwrite])

Set a name 0D data item.

create_axis(axis, entries)

Create a new axis and the unique entries identifying each entry along the axis.

set_vector(name, vector, *[, overwrite])

Set a name Vector data.

set_matrix(name, matrix, *[, overwrite])

Set a name matrix.

create_dense_in_rows(name, *, dtype[, overwrite])

Create an uninitialized ROW_MAJOR .`DenseInRows` of some dtype to be set by the name in the storage, expecting the code to initialize it.

as_reader() StorageReader[source]

Return the storage as a StorageReader.

This is a no-op (returns self) for “real” read-only storage, but for writable storage, it returns a “real” read-only wrapper object (that does not implement the writing methods). This ensures that the result can’t be used to modify the data if passed by mistake to a function that takes a StorageWriter.

update(storage: StorageReader, *, overwrite: bool = False) None[source]

Update the storage with a copy of all the data from another storage.

If overwrite, this will silently overwrite any existing data.

Any axes that already exist must have exactly the same entries as in the copied storage.

This can be used to copy data between different storage objects. A common idiom is creating a new empty storage and then calling update to fill it with the data from some other storage (often a StorageView and/or a StorageChain to control exactly what is being copied). A notable exception is AnnDataWriter which, due to AnnData limitations, must be given the copied storage in its constructor.

Note

This will convert any non-Matrix 2D data in the storage into a Matrix to satisfy our promise that we only put Matrix data into a storage (even though we allow it to return any Known2D data).

set_item(name: str, item: Any, *, overwrite: bool = False) None[source]

Set a name 0D data item.

If overwrite, will silently overwrite an existing item of the same name, otherwise overwriting will fail.

create_axis(axis: str, entries: Vector) None[source]

Create a new axis and the unique entries identifying each entry along the axis.

The entries must be is_optimal is_frozen Vector and contain string data.

It is always an error to overwrite an existing axis.

set_vector(name: str, vector: Vector, *, overwrite: bool = False) None[source]

Set a name Vector data.

The name must be in the format axis#name which uniquely identifies the 1D data. The data must be is_frozen and is_optimal.

If overwrite, will silently overwrite an existing 1D data of the same name, otherwise overwriting will fail.

set_matrix(name: str, matrix: Union[DenseInRows, SparseInRows], *, overwrite: bool = False) None[source]

Set a name matrix.

The name must be in the format rows_axis,columns_axis#name which uniquely identifies the 2D data. The data must be an is_frozen is_optimal MatrixInRows.

If overwrite, will silently overwrite an existing 2D data of the same name, otherwise overwriting will fail.

create_dense_in_rows(name: str, *, dtype: Union[str, dtype], overwrite: bool = False) Generator[DenseInRows, None, None][source]

Create an uninitialized ROW_MAJOR .`DenseInRows` of some dtype to be set by the name in the storage, expecting the code to initialize it.

daf.storage.interface.prefix(text: str, char: str) str[source]

Return the characters in the text before the separator char (which need not exist).

It would have been much nicer if this was a method of str.

daf.storage.interface.suffix(text: str, char: str) str[source]

Return the characters in the text after the separator char (which must exist).

It would have been much nicer if this was a method of str.