daf.storage.interface¶
The types here define the abstract interface implemented by all daf
storage format adapters. This interface focuses
on simplicity to to make it easier to implement new adapters for specific formats, which makes it inconvenient to
actually use. For a more usable interface, see the DafReader
and DafWriter
classes.
For example, we only require storage objects to accept is_optimal
is_frozen
MatrixInRows
2D data, but we allow
storage objects to return almost anything they happen to contain (as long as it is Known2D
data). This simplifies
writing storage format adapters that access arbitrary data.
In general daf
users would not be interested in the abstract storage interface defined here, other than to construct
storage objects (using the concrete implementation constructors) and possibly accessing .name
, .description
, and
.as_reader
.
It is the higher level DafReader
and DafWriter
classes which will actually use the API exposed here. It is still
documented as it is exported from the package, and it gives deeper insight into how daf
works. In particular,
storage objects do not compute queries; all access is done by exact names which either exist or do not exist in the
storage. When used by the higher level classes, these names might be complex (represent the result of a query), but this
is not known by the storage objects.
Implementing a new storage format adapter requires implementing the abstract methods of StorageReader
and
StorageWriter
. These are simplified versions of the above and are omitted from the documentation to reduce clutter.
If you wish to implement an adapter to a new storage format, you are advised to look at the sources and consider the
existing implementations (in particular, MemoryStorage
) as a starting point.
Classes:
|
Low-level read-only storage of data in axes in some format. |
|
Low-level read-write storage of data in axes in formats. |
Functions:
|
Return the characters in the |
|
Return the characters in the |
- class daf.storage.interface.StorageReader(*, name: str)[source]¶
Bases:
ABC
Low-level read-only storage of data in axes in some format.
This is an abstract base class defining an interface which is implemented by the concrete storage format adapters.
Note
The abstract methods are not public; if you want to implement a storage adapter yourself, look at the source code. You can use the simple
MemoryStorage
class as a starting point.Attributes:
The name of the storage for messages.
Methods:
Return the storage as a
StorageReader
.description
(*[, detail, deep, description])Return a dictionary describing the
daf
data set, useful for debugging.Return a collection of the names of the 0D data items that exists in the storage.
has_item
(name)Check whether the
name
0D data item exists in the storage.get_item
(name)Access a 0D data item from the storage (which must exist) by its
name
.Return a collection of the names of the axes that exist in the storage.
has_axis
(axis)Check whether the
axis
exists in the storage.axis_size
(axis)Get the number of entries along some
axis
(which must exist).axis_entries
(axis)Get the unique name of each entry in the storage along some
axis
(which must exist).data1d_names
(axis)Return the names of the 1D data that exists in the storage for a specific
axis
(which must exist).has_data1d
(name)Check whether the
name
1D data exists.get_data1d
(name)Get the
name
1D data (which must exist) as anKnown1D
.data2d_names
(axes)Return the names of the 2D data that exists in the storage for a specific pair of
axes
(which must exist).has_data2d
(name)Check whether the
name
2D data exists.get_data2d
(name)Get the
name
2D data (which must exist).- name¶
The name of the storage for messages.
- as_reader() StorageReader [source]¶
Return the storage as a
StorageReader
.This is a no-op (returns self) for “real” read-only storage, but for writable storage, it returns a “real” read-only wrapper object (that does not implement the writing methods). This ensures that the result can’t be used to modify the data if passed by mistake to a function that takes a
StorageWriter
.
- description(*, detail: bool = False, deep: bool = False, description: Optional[Dict] = None) Dict [source]¶
Return a dictionary describing the
daf
data set, useful for debugging.The result uses the
name
field as a key, with a nested dictionary value with the keysclass
,axes
, anddata
.If not
detail
, theaxes
will contain a dictionary mapping each axis to a description of its size, and thedata
will contain just a list of the data names, data, except forStorageView
where it will be a dictionary mapping each exposed name to the base name.If
detail
, both theaxes
and thedata
will contain a mapping providing additionaldata_description
of the relevant data.If
deep
, there may be additional keys describing the internal storage.If
description
is provided, collect the result into it. This allows collecting multiple data set descriptions into a single overall system state description.Note
Calling
repr(storage)
returnsyaml.dump(storage.description(detail=True))
.
- item_names() Collection[str] [source]¶
Return a collection of the names of the 0D data items that exists in the storage.
- get_item(name: str) Any [source]¶
Access a 0D data item from the storage (which must exist) by its
name
.
- axis_names() Collection[str] [source]¶
Return a collection of the names of the axes that exist in the storage.
- axis_entries(axis: str) Union[ndarray, Series] [source]¶
Get the unique name of each entry in the storage along some
axis
(which must exist).
- data1d_names(axis: str) Collection[str] [source]¶
Return the names of the 1D data that exists in the storage for a specific
axis
(which must exist).The returned names are in the format
axis#name
which uniquely identifies the 1D data.
- has_data1d(name: str) bool [source]¶
Check whether the
name
1D data exists.The name must be in the format
axis#name
which uniquely identifies the 1D data.
- get_data1d(name: str) Union[ndarray, Series] [source]¶
Get the
name
1D data (which must exist) as anKnown1D
.The name must be in the format
axis#name
which uniquely identifies the 1D data.
- data2d_names(axes: Union[str, Tuple[str, str]]) Collection[str] [source]¶
Return the names of the 2D data that exists in the storage for a specific pair of
axes
(which must exist).The returned names are in the format
rows_axis,columns_axis#name
which uniquely identifies the 2D data.If two copies of the data exist in transposed axes order, then two different names will be returned.
- class daf.storage.interface.StorageWriter(*, name: str)[source]¶
Bases:
StorageReader
Low-level read-write storage of data in axes in formats.
This is an abstract base class defining an interface which is implemented by the concrete storage formats.
Note
The abstract methods are not public; if you want to implement a storage adapter yourself, look at the source code. You can use the simple
MemoryStorage
class as a starting point.Todo
The
StorageWriter
interface needs to be extended to allow for deleting data (dangerous as this may be). Currently the only supported way is to create aStorageView
that hides some data, saving that into a new storage, and removing the old storage, which is “unreasonable” even though this is a very rare operation.Methods:
Return the storage as a
StorageReader
.update
(storage, *[, overwrite])Update the storage with a copy of all the data from another
storage
.set_item
(name, item, *[, overwrite])Set a
name
0D dataitem
.create_axis
(axis, entries)Create a new
axis
and the uniqueentries
identifying each entry along the axis.set_vector
(name, vector, *[, overwrite])Set a
name
Vector
data.set_matrix
(name, matrix, *[, overwrite])Set a
name
matrix
.create_dense_in_rows
(name, *, dtype[, overwrite])Create an uninitialized
ROW_MAJOR
.`DenseInRows` of somedtype
to be set by thename
in the storage, expecting the code to initialize it.- as_reader() StorageReader [source]¶
Return the storage as a
StorageReader
.This is a no-op (returns self) for “real” read-only storage, but for writable storage, it returns a “real” read-only wrapper object (that does not implement the writing methods). This ensures that the result can’t be used to modify the data if passed by mistake to a function that takes a
StorageWriter
.
- update(storage: StorageReader, *, overwrite: bool = False) None [source]¶
Update the storage with a copy of all the data from another
storage
.If
overwrite
, this will silently overwrite any existing data.Any axes that already exist must have exactly the same entries as in the copied storage.
This can be used to copy data between different storage objects. A common idiom is creating a new empty storage and then calling
update
to fill it with the data from some other storage (often aStorageView
and/or aStorageChain
to control exactly what is being copied). A notable exception isAnnDataWriter
which, due to AnnData limitations, must be given the copied storage in its constructor.
- set_item(name: str, item: Any, *, overwrite: bool = False) None [source]¶
Set a
name
0D dataitem
.If
overwrite
, will silently overwrite an existing item of the same name, otherwise overwriting will fail.
- create_axis(axis: str, entries: Vector) None [source]¶
Create a new
axis
and the uniqueentries
identifying each entry along the axis.The
entries
must beis_optimal
is_frozen
Vector
and contain string data.It is always an error to overwrite an existing axis.
- set_vector(name: str, vector: Vector, *, overwrite: bool = False) None [source]¶
Set a
name
Vector
data.The name must be in the format
axis#name
which uniquely identifies the 1D data. The data must beis_frozen
andis_optimal
.If
overwrite
, will silently overwrite an existing 1D data of the same name, otherwise overwriting will fail.
- set_matrix(name: str, matrix: Union[DenseInRows, SparseInRows], *, overwrite: bool = False) None [source]¶
Set a
name
matrix
.The name must be in the format
rows_axis,columns_axis#name
which uniquely identifies the 2D data. The data must be anis_frozen
is_optimal
MatrixInRows
.If
overwrite
, will silently overwrite an existing 2D data of the same name, otherwise overwriting will fail.