daf.access.readers¶
Read-only interface for daf
data sets.
In daf
, data access uses a string name in the format described below. Even though each name uniquely identifies
whether the data is 0D, 1D or 2D, there are separate functions for accessing the data based on its dimension. This
both makes the code more readable, and also allows mypy
to provide some semblence of effective type checking (if
you choose to use it).
Note
To avoid ambiguities and to ensure that storing daf
data in files works as expected, do not use ,
,
#
, =
or |
characters in axis, property or entry names. In addition, since axis and property names are
used as part of file names in certain storage formats, also avoid characters that are invalid in file names, most
importantly /
, but also "
, :
, and \
. If you want to be friendly to interactive shell usage, try to
avoid characters used by shell such as '
, "
, *
, ?
, &
, $
and ;
, even though these can be
quoted.
The following data is used in all the examples below:
>>> import daf
>>> data = daf.DafReader(daf.FilesReader(daf.DAF_EXAMPLE_PATH), name="example")
2D Names
All 2D names start with rows_axis,columns_axis#
.
The name of a property with a value per each combination of two axes entries, optionally processed by a series of
ElementWise
operations
. For example:
>>> data.get_matrix("metacell,gene#UMIs")
array([[ 5., 6., 66., 1., 1., 1., 0., 110., 13., 1.],
[ 13., 2., 1., 2., 1., 3., 2., 3., 7., 1.],
[211., 1., 2., 0., 91., 0., 0., 1., 2., 4.],
[ 1., 0., 179., 1., 0., 2., 0., 9., 1., 2.],
[ 3., 0., 2., 18., 1., 1., 1., 1., 126., 1.],
[ 14., 0., 1., 1., 2., 10., 3., 3., 6., 6.],
[ 3., 2., 0., 0., 1., 2., 0., 2., 2., 3.],
[ 0., 1., 0., 0., 0., 1., 1., 0., 5., 1.],
[ 62., 0., 0., 0., 2., 0., 2., 0., 1., 0.],
[326., 0., 0., 0., 151., 0., 0., 1., 0., 2.]],
dtype=float32)
>>> data.get_matrix("metacell,gene#UMIs|Fraction|Log,base=2,factor=1e-1|Abs")
array([[3.0056686 , 2.9499593 , 1.239466 , 3.2528863 , 3.2528863 ,
3.2528863 , 3.321928 , 0.64562523, 2.610649 , 3.2528863 ],
[1.0848889 , 2.6698513 , 2.959358 , 2.6698513 , 2.959358 ,
2.4288433 , 2.6698513 , 2.4288433 , 1.7369655 , 2.959358 ],
[0.36534712, 3.2764134 , 3.2322907 , 3.321928 , 1.3523018 ,
3.321928 , 3.321928 , 3.2764134 , 3.2322907 , 3.1478987 ],
[3.2497783 , 3.321928 , 0.02566493, 3.2497783 , 3.321928 ,
3.1810656 , 3.321928 , 2.7744403 , 3.2497783 , 3.1810656 ],
[3.0651526 , 3.321928 , 3.1457713 , 2.2050104 , 3.2311625 ,
3.2311625 , 3.2311625 , 3.2311625 , 0.1231482 , 3.2311625 ],
[1.3063312 , 3.321928 , 3.038135 , 3.038135 , 2.801096 ,
1.6556655 , 2.5975626 , 2.5975626 , 2.1175697 , 2.1175697 ],
[1.7369655 , 2.0995355 , 3.321928 , 3.321928 , 2.5849624 ,
2.0995355 , 3.321928 , 2.0995355 , 2.0995355 , 1.7369655 ],
[3.321928 , 2.2439256 , 3.321928 , 3.321928 , 3.321928 ,
2.2439256 , 2.2439256 , 3.321928 , 0.6092099 , 2.2439256 ],
[0.03614896, 3.321928 , 3.321928 , 3.321928 , 2.9450738 ,
3.321928 , 2.9450738 , 3.321928 , 3.1212306 , 3.321928 ],
[0.35999608, 3.321928 , 3.321928 , 3.321928 , 1.2702659 ,
3.321928 , 3.321928 , 3.2921808 , 3.321928 , 3.2630343 ]],
dtype=float32)
1D Names
All 1D names start with axis#
.
- axis
#
The name of the entries of the axis. That is,
get_vector("axis#")
is the same asaxis_entries("axis")
. For example:
>>> data.get_vector("cell_type#")
array(['Amnion', 'Forebrain/Midbrain/Hindbrain', 'Neural tube Posterior',
'Presomitic mesoderm', 'Surface ectoderm', 'caudal mesoderm',
'epiblast'], dtype=object)
>>> data.axis_entries("cell_type")
array(['Amnion', 'Forebrain/Midbrain/Hindbrain', 'Neural tube Posterior',
'Presomitic mesoderm', 'Surface ectoderm', 'caudal mesoderm',
'epiblast'], dtype=object)
The name of a property with a value per entry along some axis, optionally processed by a series of
ElementWise
operations
. For example:
>>> data.get_vector("batch#age")
array([51, 38, 21, 31, 26, 43, 36, 27, 33, 45, 49, 41, 45])
>>> data.get_vector("batch#age|Clip,min=30,max=45")
array([45, 38, 30, 31, 30, 43, 36, 30, 33, 45, 45, 41, 45])
The name of properties which are indices or entry names of some axes, followed by the name of a property of the final axis, optionally processed by a series of
ElementWise
operations
. For example:
>>> data.get_vector("metacell#cell_type#color")
array(['#f7f79e', '#CDE089', '#1a3f52', '#f7f79e', '#cc7818', '#647A4F',
'#635547', '#635547', '#A8DBF7', '#1a3f52'], dtype=object)
A property can refer to an axis either by using its exact name as above or adding some qualifier using ``.``. For
example, if we had a ``metacell#cell_type.projected`` property containing the cell type obtained by projecting the
data on an atlas, we could write ``metacell#cell_type.projected#color`` to access the color of the projected cell type
of each metacell, using the ``cell_type#color`` property.
The slice for a specific entry of the data of a 2D property, optionally processed by a series of
ElementWise
operations
. For example:
>>> data.get_vector("metacell#gene=FOXA1,UMIs")
array([6., 2., 1., 0., 0., 0., 2., 1., 0., 0.], dtype=float32)
>>> data.get_vector("metacell#gene=FOXA1,UMIs|Clip,min=1,max=4")
array([4., 2., 1., 1., 1., 1., 2., 1., 1., 1.], dtype=float32)
A reduction of 2D data into a single value per row, optionally processed by a series of
ElementWise
operations
. For example:
>>> data.get_vector("metacell#gene,UMIs|Sum")
array([204., 35., 312., 195., 154., 46., 15., 9., 67., 480.],
dtype=float32)
>>> data.get_vector("metacell#gene,UMIs|Fraction|Log,base=2,factor=1e-5|Max|Clip,min=-1.5,max=-0.5")
array([-0.89103884, -1.4288044 , -0.5642817 , -0.5 , -0.5 ,
-1.5 , -1.5 , -0.84797084, -0.5 , -0.5581411 ],
dtype=float32)
0D Names
No 0D names contain
#
(at least not before the first|
).
- property
The name of a 0D data item property. For example:
>>> data.get_item("created")
datetime.datetime(2022, 7, 6, 16, 49, 44)
- axis
=
entry,
propertyThe value for a specific entry of the data of a 1D property. For example:
>>> data.get_item("batch=Batch_1,age")
38
- axis
=
entry,
second_axis=
second_entry,
propertyThe value for a specific entry of the data of a 2D property. For example:
>>> data.get_item("metacell=Metacell_1,gene=FOXA1,UMIs")
2.0
- axis
,
propertyA reduction into a single value of 1D property with a value per entry along some axis, optionally processed by a series of
ElementWise
operations
. For example:
>>> data.get_item("batch,age|Mean")
37.38461538461539
>>> data.get_item("batch,age|Clip,min=30,max=40|Mean")
36.0
- axis
,
second_axis=
entry,
propertyA reduction into a single value of a slice for a specific entry of the data of a 2D property, optionally processed by a series of
ElementWise
operations
. For example:
>>> data.get_item("metacell,gene=FOXA1,UMIs|Max")
6.0
>>> data.get_item("metacell,gene=FOXA1,UMIs|Clip,min=1,max=3|Mean")
1.4
- axis
,
second_axis,
property- A reduction of 2D data into a single value per row and then to a single value, optionally processed by a series of
ElementWise
operations
. For example:
>>> data.get_item("metacell,gene,UMIs|Sum|Max")
480.0
>>> data.get_item("metacell,gene,UMIs|Fraction|Log,base=2,factor=1e-5|Max|Clip,min=-1.5,max=-0.5|Mean")
-0.8790237
Note
See operations
for the list of built-in ElementWise
and Reduction
operations. Additional operations can be
offered by other Python packages. In all the above, prefixing the operation name with !
will prevent their
results from being cached. For example, cell#gene,UMIs|!Sum
will not cache the total number of UMIs per cell.
The current implementation doesn’t cache any 0D data regardless of whether a !
was specified.
Motivation
The above scheme makes sense if you consider that each name starts with a description of the axes/shape of the result,
followed by how to extract the result from the data set. This means that to get the sum of the UMIs of all the genes for
each cell, we first consider this is per-cell 1D data and therefore must start with cell#
. We therefore write
cell#gene,UMIs|Sum
instead of cell,gene#UMIs|Sum
.
This may seem unintuitive at first, but it has some advantages, such as clearly identify the axes/shape of the result of
a pipeline. An important feature of the scheme is that the name of any 1D data along some axis
has the common
prefix axis#
. This makes it easy to express data for get_columns
, or describe the X and Y coordinates of a
scatter plot, or anything along these lines, by providing the common axis and a list suffixes to append to it.
Classes:
|
Read-only access to a |
Functions:
|
Given a 2D data name |
- class daf.access.readers.DafReader(base: StorageReader, *, derived: Optional[StorageWriter] = None, name: str = '.daf#')[source]¶
Bases:
object
Read-only access to a
daf
data set.The following data is used in all the examples below:
>>> import daf >>> import yaml >>> data = daf.DafReader(daf.FilesReader(daf.DAF_EXAMPLE_PATH), name="example")
If the
name
starts with.
, it is appended to thebase
name. If the name ends with#
, we append the object id to it to make it unique.Attributes:
The name of the data set for messages.
The storage the
daf
data set is based on.How to store derived data computed from the storage data, for example, an alternate layout of 2D data, of the result of a pipeline (e.g.
A
StorageChain
to use to actually access the data.Methods:
Return the data set as a
DafReader
.description
(*[, detail, deep, description])Return a dictionary describing the
daf
data set, useful for debugging.verify_has
(names, *[, reason])Assert that all the listed data
names
exist in the data set, regardless if each is a 0D, 1D or 2D data name.has_data
(name)Return whether the data set contains the
name
data, regardless of whether it is a 0D, 1D or 2D data.Return the list of names of the 0D data items that exists in the data set, in alphabetical order.
has_item
(name)Check whether the
name
0D data item exists in the data set.get_item
(name, *[, default])Access a 0D data item from the data set by its
name
.Return the list of names of the axes that exist in the data set, in alphabetical order.
has_axis
(axis)Check whether the
axis
exists in the data set.axis_size
(axis)Get the number of entries along some
axis
(which must exist).axis_entries
(axis)Get the unique name of each entry in the data set along some
axis
(which must exist).axis_indices
(axis)Return a mapping from the axis string entries to the integer indices.
axis_index
(axis, entry)Return the index of the
entry
(which must exist) in the entries of theaxis
(which must exist).data1d_names
(axis, *[, full])Return the names of the 1D data that exists in the data set for a specific
axis
(which must exist), in alphabetical order.has_data1d
(name)Check whether the
name
1D data exists.get_vector
(name, *[, default])Get the
name
1D data as aVector
.get_series
(name, *[, default])Get the
name
1D data as apandas.Series
.data2d_names
(axes, *[, full])Return the names of the 2D data that exists in the data set for a specific pair of
axes
(which must exist).has_data2d
(name)Check whether the
name
2D data exists.get_matrix
(name, *[, default])Get the
name
2D data (which must exist) as aMatrixInRows
.get_frame
(name, *[, default])Get the
name
2D data (which must exist) as apandas.DataFrame
.get_columns
(axis[, columns, defaults])Get an arbitrary collection of 1D data for the same
axis
ascolumns
of apandas.DataFrame
.view
(*[, axes, data, name, cache, hide_implicit])Create a read-only view of the data set.
- name¶
The name of the data set for messages.
- base¶
The storage the
daf
data set is based on.
- derived¶
How to store derived data computed from the storage data, for example, an alternate layout of 2D data, of the result of a pipeline (e.g.
cell,gene#UMIs|Sum
). By default this is stored in aMemoryStorage
so expensive operations (such asas_layout
) will only be computed once in the application’s lifetime. You can explicitly set this toNO_STORAGE
to disable the caching, or specify some persistent storage such asFilesWriter
to allow the caching to be reused across multiple application invocations. You can even set this to be the same as the base storage to have everything (base and derived data) be stored in the same place.
- chain¶
A
StorageChain
to use to actually access the data. This looks first inderived
and then in thebase
.
- as_reader() DafReader [source]¶
Return the data set as a
DafReader
.This is a no-op (returns self) for “real” read-only data sets, but for writable data sets, it returns a “real” read-only wrapper object (that does not implement the writing methods). This ensures that the result can’t be used to modify the data if passed by mistake to a function that takes a
DafWriter
.
- description(*, detail: bool = False, deep: bool = False, description: Optional[Dict] = None) Dict [source]¶
Return a dictionary describing the
daf
data set, useful for debugging.The result uses the
name
field as a key, with a nested dictionary value with the keysclass
,axes
, anddata
.If not
detail
, theaxes
will contain a dictionary mapping each axis to a description of its size, and thedata
will contain just a list of the data names, data, except forStorageView
where it will be a dictionary mapping each exposed name to the base name.If
detail
, both theaxes
and thedata
will contain a mapping providing additionaldata_description
of the relevant data.If
deep
, there may be additional keys describing the internal storage.If
description
is provided, collect the result into it. This allows collecting multiple data set descriptions into a single overall system state description.For example:
>>> print(yaml.dump(data.description()).strip()) example: axes: batch: 13 entries cell: 524 entries cell_type: 7 entries gene: 10 entries metacell: 10 entries sex: 2 entries class: daf.access.readers.DafReader data: - created - batch#age - batch#sex - cell#batch - cell#metacell - cell_type#color - gene#forbidden - gene#marker - metacell#cell_type - metacell#umap_x - metacell#umap_y - cell,gene#UMIs - metacell,gene#UMIs
- verify_has(names: Union[str, Collection[str]], *, reason: str = 'required') None [source]¶
Assert that all the listed data
names
exist in the data set, regardless if each is a 0D, 1D or 2D data name.To verify an axis exists, list it as
axis#
.For example:
>>> data.verify_has("cell#") >>> data.verify_has(["metacell,gene#UMIs", "batch#age"]) >>> data.verify_has(["cell#color"]) Traceback (most recent call last): ... AssertionError: missing the data: cell#color which is required in the data set: example
- has_data(name: str) bool [source]¶
Return whether the data set contains the
name
data, regardless of whether it is a 0D, 1D or 2D data.To test whether an axis exists, you can use the
axis#
name.For example:
>>> data.has_data("cell#") True >>> data.has_data("cell,gene#fraction") False
- item_names() List[str] [source]¶
Return the list of names of the 0D data items that exists in the data set, in alphabetical order.
For example:
>>> data.item_names() ['created']
- has_item(name: str) bool [source]¶
Check whether the
name
0D data item exists in the data set.For example:
>>> data.has_item("created") True >>> data.has_item("modified") False
- get_item(name: str, *, default: ~typing.Any = <object object>) Any [source]¶
Access a 0D data item from the data set by its
name
.Normally, requesting missing data results in an error. If
default
is specified, it is returned instead.The name is the name of some 0D data as described above.
For example:
>>> data.get_item("created") datetime.datetime(2022, 7, 6, 16, 49, 44)
- axis_names() List[str] [source]¶
Return the list of names of the axes that exist in the data set, in alphabetical order.
For example:
>>> data.axis_names() ['batch', 'cell', 'cell_type', 'gene', 'metacell', 'sex']
- has_axis(axis: str) bool [source]¶
Check whether the
axis
exists in the data set.For example:
>>> data.has_axis("cell") True >>> data.has_axis("height") False
- axis_size(axis: str) int [source]¶
Get the number of entries along some
axis
(which must exist).For example:
>>> data.axis_size("metacell") 10
- axis_entries(axis: str) Vector [source]¶
Get the unique name of each entry in the data set along some
axis
(which must exist).Note
You can also get the axis entries using
.get_vector
by passing it the 1D data nameaxis#
.For example:
>>> data.axis_entries("gene") array(['RSPO3', 'FOXA1', 'WNT6', 'TNNI1', 'MSGN1', 'LMO2', 'SFRP5', 'DLX5', 'ITGA4', 'FOXA2'], dtype=object) >>> data.get_vector("gene#") array(['RSPO3', 'FOXA1', 'WNT6', 'TNNI1', 'MSGN1', 'LMO2', 'SFRP5', 'DLX5', 'ITGA4', 'FOXA2'], dtype=object)
- axis_indices(axis: str) Mapping[str, int] [source]¶
Return a mapping from the axis string entries to the integer indices.
For example:
>>> print(yaml.dump(data.axis_indices("gene")).strip()) DLX5: 7 FOXA1: 1 FOXA2: 9 ITGA4: 8 LMO2: 5 MSGN1: 4 RSPO3: 0 SFRP5: 6 TNNI1: 3 WNT6: 2
- axis_index(axis: str, entry: str) int [source]¶
Return the index of the
entry
(which must exist) in the entries of theaxis
(which must exist).For example:
>>> data.axis_index("gene", "FOXA2") 9
- data1d_names(axis: str, *, full: bool = True) List[str] [source]¶
Return the names of the 1D data that exists in the data set for a specific
axis
(which must exist), in alphabetical order.The returned names are in the format
axis#name
which uniquely identifies the 1D data. If notfull
, the returned names include only the simplename
without theaxis#
prefix.For example:
>>> data.data1d_names("batch") ['batch#age', 'batch#sex'] >>> data.data1d_names("batch", full=False) ['age', 'sex']
- has_data1d(name: str) bool [source]¶
Check whether the
name
1D data exists.The name must be in the format
axis#name
which uniquely identifies the 1D data.For example:
>>> data.has_data1d("batch#age") True >>> data.has_data1d("batch#height") False
- get_vector(name: str, *, default: Optional[Tuple[Any, Union[str, dtype]]] = None) Vector [source]¶
Get the
name
1D data as aVector
.Normally, requesting missing data results in an error. If
default
is specified, a vector containing the specified value and data type is returned.The name is the name of some 1D data as described above.
For example:
>>> data.get_vector("batch#age") array([51, 38, 21, 31, 26, 43, 36, 27, 33, 45, 49, 41, 45])
- get_series(name: str, *, default: Optional[Tuple[Any, Union[str, dtype]]] = None) Series [source]¶
Get the
name
1D data as apandas.Series
.Normally, requesting missing data results in an error. If
default
is specified, a series containing the specified value and data type is returned.The name is the name of some 1D data as described above.
>>> data.get_series("batch#age") Batch_0 51 Batch_1 38 Batch_2 21 Batch_3 31 Batch_4 26 Batch_5 43 Batch_6 36 Batch_7 27 Batch_8 33 Batch_9 45 Batch_10 49 Batch_11 41 Batch_12 45 dtype: int64
- data2d_names(axes: Union[str, Tuple[str, str]], *, full: bool = True) List[str] [source]¶
Return the names of the 2D data that exists in the data set for a specific pair of
axes
(which must exist).The returned names are in the format
rows_axis,columns_axis#name
which uniquely identifies the 2D data. If notfull
, the returned names include only the simplename
without therow_axis,columns_axis#
prefix.Note
Data will be listed in the results even if it is only stored in the other layout (that is, as
columns_axis,rows_axis#name
). Such data can still be fetched (e.g. usingget_matrix
), in which case it will be re-layout internally (and the result will be cached inderived
).>>> data.data2d_names("metacell,gene") ['metacell,gene#UMIs'] >>> data.data2d_names("metacell,gene", full=False) ['UMIs']
- has_data2d(name: str) bool [source]¶
Check whether the
name
2D data exists.The name must be in the format
rows_axis,columns_axis#name
which uniquely identifies the 2D data.This will also succeed if only the transposed
columns_axis,rows_axis#name
data exists in the data set. However, fetching the data in the specified order is likely to be less efficient.For example:
>>> data.has_data2d("cell,gene#UMIs") True >>> data.has_data2d("cell,gene#fraction") False
- get_matrix(name: str, *, default: Optional[Tuple[Any, Union[str, dtype]]] = None) Union[DenseInRows, SparseInRows] [source]¶
Get the
name
2D data (which must exist) as aMatrixInRows
.Normally, requesting missing data results in an error. If
default
is specified, a matrix containing the specified value and data type is returned.The name is the name of some 2D data as described above.
For example:
>>> data.get_matrix("metacell,gene#UMIs") array([[ 5., 6., 66., 1., 1., 1., 0., 110., 13., 1.], [ 13., 2., 1., 2., 1., 3., 2., 3., 7., 1.], [211., 1., 2., 0., 91., 0., 0., 1., 2., 4.], [ 1., 0., 179., 1., 0., 2., 0., 9., 1., 2.], [ 3., 0., 2., 18., 1., 1., 1., 1., 126., 1.], [ 14., 0., 1., 1., 2., 10., 3., 3., 6., 6.], [ 3., 2., 0., 0., 1., 2., 0., 2., 2., 3.], [ 0., 1., 0., 0., 0., 1., 1., 0., 5., 1.], [ 62., 0., 0., 0., 2., 0., 2., 0., 1., 0.], [326., 0., 0., 0., 151., 0., 0., 1., 0., 2.]], dtype=float32)
- get_frame(name: str, *, default: Optional[Tuple[Any, Union[str, dtype]]] = None) FrameInRows [source]¶
Get the
name
2D data (which must exist) as apandas.DataFrame
.The name is the name of some 2D data as described above.
Note
Storing
Sparse
data in apandas.DataFrame
fails in various unpleasant ways. Therefore, data forget_frame
is always returned in aDense
format. Do not callget_frame
unless you are certain that the data size is “within reason”, or that the data is memory-mapped from aDense
format on disk. In one of our data sets, callingget_frame("cell,gene#UMIs")
would result in creating anumpy.ndarray
of ~240GB(!), compared to the “mere” ~6GB needed to hold the data in ascipy.csr_matrix
.For example:
>>> data.get_frame("metacell,gene#UMIs") gene RSPO3 FOXA1 WNT6 TNNI1 MSGN1 LMO2 SFRP5 DLX5 ITGA4 FOXA2 metacell... Metacell_0 5.0 6.0 66.0 1.0 1.0 1.0 0.0 110.0 13.0 1.0 Metacell_1 13.0 2.0 1.0 2.0 1.0 3.0 2.0 3.0 7.0 1.0 Metacell_2 211.0 1.0 2.0 0.0 91.0 0.0 0.0 1.0 2.0 4.0 Metacell_3 1.0 0.0 179.0 1.0 0.0 2.0 0.0 9.0 1.0 2.0 Metacell_4 3.0 0.0 2.0 18.0 1.0 1.0 1.0 1.0 126.0 1.0 Metacell_5 14.0 0.0 1.0 1.0 2.0 10.0 3.0 3.0 6.0 6.0 Metacell_6 3.0 2.0 0.0 0.0 1.0 2.0 0.0 2.0 2.0 3.0 Metacell_7 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 5.0 1.0 Metacell_8 62.0 0.0 0.0 0.0 2.0 0.0 2.0 0.0 1.0 0.0 Metacell_9 326.0 0.0 0.0 0.0 151.0 0.0 0.0 1.0 0.0 2.0
- get_columns(axis: str, columns: Optional[Sequence[str]] = None, *, defaults: Optional[Sequence[Optional[Tuple[Any, Union[str, dtype]]]]] = None) FrameInColumns [source]¶
Get an arbitrary collection of 1D data for the same
axis
ascolumns
of apandas.DataFrame
.Normally, requesting missing columns results in an error. If a
defaults
entry is specified for some columns, a vector with the specified value and data type is returned.The returned data will always be in
COLUMN_MAJOR
order.If no
columns
are specified, returns all the 1D properties for theaxis
, in alphabetical order (that is, as ifcolumns
was set todata1d_names
withfull=False
for theaxis
).The specified
columns
names should only be the suffix following theaxis#
prefix in the 1D name of the data, as described above.For example:
>>> data.get_columns("batch") age sex batch... Batch_0 51 female Batch_1 38 female Batch_2 21 male Batch_3 31 female Batch_4 26 male Batch_5 43 female Batch_6 36 female Batch_7 27 male Batch_8 33 male Batch_9 45 female Batch_10 49 male Batch_11 41 male Batch_12 45 male
- view(*, axes: Optional[Mapping[str, Union[None, str, Sequence[Any], ndarray, _fake_sparse.spmatrix, Series, DataFrame, AxisView]]] = None, data: Optional[Mapping[str, Optional[str]]] = None, name: str = '.view#', cache: Optional[StorageWriter] = None, hide_implicit: bool = False) DafReader [source]¶
Create a read-only view of the data set.
This can be used to create slices of some axes, rename axes and/or data, and/or hide some data. It is a wrapper around the constructor of
StorageView
; see there for the semantics of the parameters, with the exception that here keys of thedata
dictionary may be any data name, including derived data.If the
name
starts with.
, it is appended to both theStorageView
and theDafReader
names. If the name ends with#
, we append the object id to it to make it unique.Note
If any of the axes is sliced, the view will ignore any derived data based on the sliced axes. While some derived data is safe to slice, some isn’t, and it isn’t easy to tell the difference; for example, when slicing the
gene
axis, thencell,gene#Log,...
is safe to slice, butcell,gene#Folds|Significant,...
is not. The code therefore plays it safe by ignoring any derived data using any of the sliced axes.For example:
>>> view = data.view(axes=dict(gene=['FOXA1', 'FOXA2'])) >>> view.axis_entries("gene") array(['FOXA1', 'FOXA2'], dtype=object)
>>> view = data.view(data={"metacell,gene#UMIs|Fraction": "fraction"}) >>> view.get_series("gene#metacell=Metacell_0,fraction") RSPO3 0.024510 FOXA1 0.029412 WNT6 0.323529 TNNI1 0.004902 MSGN1 0.004902 LMO2 0.004902 SFRP5 0.000000 DLX5 0.539216 ITGA4 0.063725 FOXA2 0.004902 dtype: float32
- daf.access.readers.transpose_name(name: str) str [source]¶
Given a 2D data name
rows_axis,columns_axis#name
return the transposed data namecolumns_axis,rows_axis#name
.Note
This will refuse to transpose pipelined names
rows_axis,columns_axis#name|operation|...
as doing so would change the meaning of the name. For example,cell,gene#UMIs|Sum
gives the sum of the UMIs of all the genes in each cell, whilegene,cell#UMIs|Sum
gives the sum of the UMIs for all the cells each gene.For example:
>>> daf.transpose_name("metacell,gene#UMIs") 'gene,metacell#UMIs'