TODO¶
Todo
Is it possible to implement Significant
more efficiently for sparse matrices (in pure Python)?
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/access/operations.py:docstring of daf.access.operations.Significant, line 22.)
Todo
Track which views/caches refer to each base storage and automatically invalidate any cached data on change, and provide delete operations? This would massively complicate the implementation…
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/access/writers.py:docstring of daf.access.writers, line 18.)
Todo
Currently the adapter
data
mapping is restricted to simple properties such as cell#age
. Lift this
restriction to allow for derived properties such as cell#batch#age
.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/access/writers.py:docstring of daf.access.writers.DafWriter.adapter, line 22.)
Todo
Provide a more efficient implementation of DafWriter._copy_back_data2d
(used by DafWriter.adapter
).
The current implementation uses a few temporary buffers the size of the partial data. If this were
implemented in a C/C++ extension it would avoid the temporary buffers, giving a significant performance
boost for large data sizes. So far we have chosen to keep daf
as a pure Python package so we suffer this
inefficiency. Perhaps using numba
would provide the efficiency while avoiding C/C++ extension code? Of
course this really should be a part of numpy
and/or scipy.sparse
in the 1st place.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/access/writers.py:docstring of daf.access.writers.DafWriter.adapter, line 74.)
Todo
If both the final and temporary storage are FilesWriter
, avoid copying large 2D data files and instead
directly move them from one directory to another.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/access/writers.py:docstring of daf.access.writers.DafWriter.computation, line 64.)
Todo
Optimize aggregate_group_data2d
to avoid creating a temporary dense matrix per group for sparse data, and/or
to parallelize the operation in general.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/groups.py:docstring of daf.groups.aggregate_group_data2d, line 17.)
Todo
Provide a ConcatStorage
that allows concatenating two data sets along a single axis, reusing all the other axes
(e.g., concatenating two data sets for distinct cells using identical genes into a single data set containing both
sets of cells).
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/storage/__init__.py:docstring of daf.storage, line 15.)
Todo
The StorageWriter
interface needs to be extended to allow for deleting data (dangerous as this may be).
Currently the only supported way is to create a StorageView
that hides some data, saving that into a new
storage, and removing the old storage, which is “unreasonable” even though this is a very rare operation.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/storage/interface.py:docstring of daf.storage.interface.StorageWriter, line 10.)
Todo
Extend daf
to support arrow
as well.
Pandas 2.0 was extended to support it as a backend in addition to numpy
. This will make our life even more
“interesting” as we’ll learn by trial and error which seemingly “uniform” APIs behave subtly different for this new
1D/2D data implementation.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/typing/__init__.py:docstring of daf.typing, line 23.)
Todo
Extend daf
to support “masked arrays” for storing nullable integers and nullable Booleans?
This will require accompanying each nullable array with a Boolean mask of valid elements. It will also very likely require the users to support additional code paths.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/typing/__init__.py:docstring of daf.typing, line 31.)
Todo
If/when pandas
will provide some form of type annotations, get rid of the fake_pandas
module.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/typing/fake_pandas.py:docstring of daf.typing.fake_pandas, line 7.)
Todo
If/when scipy.sparse
will provide some form of type annotations, get rid of the fake_sparse
module.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/typing/fake_sparse.py:docstring of daf.typing.fake_sparse, line 7.)
Todo
If/when https://github.com/numpy/numpy/issues/21655 is implemented, change the as_layout
implementation to
use it.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/daf/checkouts/latest/daf/typing/layouts.py:docstring of daf.typing.layouts.as_layout, line 10.)