TODO¶

Todo

Is it possible to implement Significant more efficiently for sparse matrices (in pure Python)?

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/access/operations.py:docstring of daf.access.operations.Significant, line 22.)

Todo

Track which views/caches refer to each base storage and automatically invalidate any cached data on change, and provide delete operations? This would massively complicate the implementation…

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/access/writers.py:docstring of daf.access.writers, line 18.)

Todo

Currently the adapter data mapping is restricted to simple properties such as cell#age. Lift this restriction to allow for derived properties such as cell#batch#age.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/access/writers.py:docstring of daf.access.writers.DafWriter.adapter, line 22.)

Todo

Provide a more efficient implementation of DafWriter._copy_back_data2d (used by DafWriter.adapter). The current implementation uses a few temporary buffers the size of the partial data. If this were implemented in a C/C++ extension it would avoid the temporary buffers, giving a significant performance boost for large data sizes. So far we have chosen to keep daf as a pure Python package so we suffer this inefficiency. Perhaps using numba would provide the efficiency while avoiding C/C++ extension code? Of course this really should be a part of numpy and/or scipy.sparse in the 1st place.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/access/writers.py:docstring of daf.access.writers.DafWriter.adapter, line 74.)

Todo

If both the final and temporary storage are FilesWriter, avoid copying large 2D data files and instead directly move them from one directory to another.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/access/writers.py:docstring of daf.access.writers.DafWriter.computation, line 64.)

Todo

Optimize aggregate_group_data2d to avoid creating a temporary dense matrix per group for sparse data, and/or to parallelize the operation in general.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/groups.py:docstring of daf.groups.aggregate_group_data2d, line 17.)

Todo

Provide a ConcatStorage that allows concatenating two data sets along a single axis, reusing all the other axes (e.g., concatenating two data sets for distinct cells using identical genes into a single data set containing both sets of cells).

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/storage/__init__.py:docstring of daf.storage, line 15.)

Todo

The StorageWriter interface needs to be extended to allow for deleting data (dangerous as this may be). Currently the only supported way is to create a StorageView that hides some data, saving that into a new storage, and removing the old storage, which is “unreasonable” even though this is a very rare operation.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/storage/interface.py:docstring of daf.storage.interface.StorageWriter, line 10.)

Todo

Extend daf to support arrow as well.

Pandas 2.0 was extended to support it as a backend in addition to numpy. This will make our life even more “interesting” as we’ll learn by trial and error which seemingly “uniform” APIs behave subtly different for this new 1D/2D data implementation.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/typing/__init__.py:docstring of daf.typing, line 23.)

Todo

Extend daf to support “masked arrays” for storing nullable integers and nullable Booleans?

This will require accompanying each nullable array with a Boolean mask of valid elements. It will also very likely require the users to support additional code paths.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/typing/__init__.py:docstring of daf.typing, line 31.)

Todo

If/when pandas will provide some form of type annotations, get rid of the fake_pandas module.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/typing/fake_pandas.py:docstring of daf.typing.fake_pandas, line 7.)

Todo

If/when scipy.sparse will provide some form of type annotations, get rid of the fake_sparse module.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/typing/fake_sparse.py:docstring of daf.typing.fake_sparse, line 7.)

Todo

If/when https://github.com/numpy/numpy/issues/21655 is implemented, change the as_layout implementation to use it.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/typing/layouts.py:docstring of daf.typing.layouts.as_layout, line 10.)