daf.groups¶
Functions for projecting data between members and groups.
A common idiom is to have two axes such that one is a grouping of the other. For example, in scRNA-seq data, it is
common to group cells into clusters, so we have a cell
axis and a cluster
axis. Often this is a multi-level
grouping (cell
, sub-cluster
, cluster
).
In this idiom, by convention there is a 1D data property for the “member” axis specifying the entry of the “group” axis
it belongs to. That is, we may see something like cell#cluster
which gives for each cell the (integer, 0-based)
index of the cluster it belongs to. Since integers don’t support NaN
, by convention any negative value (typically
-1
) is used to say “this cell belongs to no cluster”.
Computing such groups is the goal of complex analysis pipelines and is very much out of scope for a low-level package
such as daf
. However, once such group(s) are computed, there are universal operations, which it does make sense
to provide here:
- Aggregation
Compute 1D/2D data for the group axis based on 1D/2D data of the members of the group. See
aggregate_group_data1d
for an example computing the mean age of the cells in each metacell andaggregate_group_data2d
for an example computing the total UMIs of each gene for each metacell (which is the same as themetacell,gene#UMIs
).- Counting
Compute 2D data for the group axis based on discrete 1D data of the members axis. See
count_group_values
for an example counting how many cells in each metacell came from donors of either sex.- Assignment
Compute 1D data for the member axis based on 1D data of the group axis. See
assign_group_values
for an example assigning for each cell the type of the metacell it belongs to.
Functions:
|
Compute a per-group property value which is the result of applying the |
|
Compute per-group-per-axis property values which are the result of applying the |
|
Return the most frequent value in a |
|
Create a new |
|
Count how many members are included in each group. |
|
Count how many members of each group have each possible property value. |
|
Assign the per-group property value to each members of the group. |
- daf.groups.aggregate_group_data1d(data: DafWriter, aggregation: Callable[[Vector], Any], *, default: Optional[Any] = None, dtype: Optional[Union[dtype, str]] = None, overwrite: bool = False) None [source]¶
Compute a per-group property value which is the result of applying the
aggregation
function to the vector of values of the members of the group.The
aggregation
function can be any function that converts a vector of all member values into a single group value. For example, for discrete data,most_frequent
will pick the value that appears in the highest number of members. An optimized version is used if theaggregation
is one ofnumpy.sum
,numpy.mean
,numpy.var
,numpy.std
,numpy.median
,numpy.min
ornumpy.max
.The resulting per-group 1D data will have the specified
dtype
. By default is the same as the data type of of the member values. This is acceptable for an aggregation likenp.sum
, but would fail for an aggregation likenp.mean
for integer data.If no members are assigned to some existing group, then it is given the
default
value. By default this isNone
which is acceptable for floating point values (becomes aNaN
), but would fail for integer data.Required Inputs
member#
An axis with one entry per individual group member.
group#
An axis with an entry per group of zero or more members.
member#group
The index of the group each member belongs to, or negative if not a part of any group.
member#property
Some property value associated with each individual member.
Assured Outputs
group#property
The aggregated property value associated with each group.
If
overwrite
, will overwrite existing data.For example:
import daf import numpy as np data = daf.DafWriter( storage=daf.MemoryStorage(name="example.storage"), base=daf.FilesReader(daf.DAF_EXAMPLE_PATH, name="example.base"), name="example" ) with data.adapter( axes=dict(cell="member", metacell="group"), data={"cell#metacell": "group", "cell#batch#age": "property"}, hide_implicit=True, back_data={"group#property": "age.mean"} ) as adapter: daf.aggregate_group_data1d(adapter, aggregation=np.mean) print(data.get_vector("metacell#age.mean"))
[39 41 33 45 43 41 33 38 42 32]
- daf.groups.aggregate_group_data2d(data: DafWriter, aggregation: Callable[[Vector], Any], *, default: Optional[Any] = None, dtype: Optional[Union[dtype, str]] = None, overwrite: bool = False) None [source]¶
Compute per-group-per-axis property values which are the result of applying the
aggregation
function to the vector of values of the members of the group.The
aggregation
function can be any function that converts a vector of all member values into a single group value. For example, for discrete data,most_frequent
will pick the value that appears in the highest number of members. An optimized version is used if theaggregation
is one ofnumpy.sum
,numpy.mean
,numpy.var
,numpy.std
,numpy.median
,numpy.min
ornumpy.max
.The resulting per-group-per-axis 2D data will have the specified
dtype
. By default is the same as the data type of of the member values. This is acceptable for an aggregation likenp.sum
, but would fail for an aggregation likenp.mean
for integer data.If no members are assigned to some existing group, then it is given the
default
value for all entries. By default this isNone
which is acceptable for floating point values (becomes aNaN
), but would fail for integer data.Todo
Optimize
aggregate_group_data2d
to avoid creating a temporary dense matrix per group for sparse data, and/or to parallelize the operation in general.Required Inputs
member#
An axis with one entry per individual group member.
group#
An axis with an entry per group of zero or more members.
axis#
An axis for some 2D property.
member#group
The index of the group each member belongs to, or negative if not a part of any group.
member,axis#property
The property value associated with each individual member and data axis entry.
Assured Outputs
group,axis#property
The aggregated value associated with each group and data axis entry.
If
overwrite
, will overwrite existing data.For example:
import daf import numpy as np data = daf.DafWriter( storage=daf.MemoryStorage(name="example.storage"), base=daf.FilesReader(daf.DAF_EXAMPLE_PATH, name="example.base"), name="example" ) with data.adapter( axes=dict(cell="member", metacell="group", gene="axis"), data={"cell#metacell": "group", "cell,gene#UMIs": "property"}, hide_implicit=True, back_data={"group,axis#property": "UMIs_sum"}, ) as adapter: daf.aggregate_group_data2d(adapter, aggregation=np.sum, overwrite=True) print(data.get_series("gene#metacell=Metacell_0,UMIs_sum"))
RSPO3 5 FOXA1 6 WNT6 67 TNNI1 1 MSGN1 1 LMO2 1 SFRP5 0 DLX5 111 ITGA4 13 FOXA2 1 dtype: int32
- daf.groups.most_frequent(vector: Vector) Any [source]¶
Return the most frequent value in a
vector
.There is no guarantee that this value appears in the majority of the entries, or in general that it is “very common”. The only guarantee is that there is no other value that is more common.
- daf.groups.create_group_axis(data: DafWriter, *, format: str, overwrite: bool = False) None [source]¶
Create a new
group
axis to hold per-group data.Since in
daf
axis entry names are always strings, we use theformat
to convert the group index to a string. This format should include%s
somewhere in it.Note
The created axis will be continuous, that is, group axis entries will still be created for all the group indices from zero to the maximal used group index.
Required Inputs
member#
An axis with one entry per individual group member.
member#group
The index of the group each member belongs to. If negative, it is not a part of any group.
Assured Outputs
group#
A new axis with one entry per group.
If
overwrite
, will overwrite existing data.
- daf.groups.count_group_members(data: DafWriter, *, dtype: Union[str, dtype] = 'int32', overwrite: bool = False) None [source]¶
Count how many members are included in each group.
The resulting per-group 1D data will have the specified
dtype
. By default isint32
which is a reasonable value for storing counts.Required Inputs
member#group
The index of the group each member belongs to. If negative, it is not a part of any group.
Assured Outputs
group#members
How many members exist in the group.
If
overwrite
, will overwrite existing data.For example:
import daf data = daf.DafWriter( storage=daf.MemoryStorage(name="example.storage"), base=daf.FilesReader(daf.DAF_EXAMPLE_PATH, name="example.base"), name="example" ) with data.adapter( axes=dict(cell="member", metacell="group"), data={"cell#metacell": "group"}, hide_implicit=True, back_data={"group#members": "cells"} ) as adapter: daf.count_group_members(adapter) print(data.get_vector("metacell#cells"))
[ 53 114 36 47 52 97 26 31 34 34]
- daf.groups.count_group_values(data: DafWriter, *, dtype: Union[str, dtype] = 'int32', dense: bool = False, overwrite: bool = False) None [source]¶
Count how many members of each group have each possible property value.
In
daf
, axis entries always have string values. However, the per-member values 1D data need not contain strings, the only requirement is that converting them to strings will match the values axis entry names. This allows us to deal with data such as “age” which may take a fewfloat
values (e.g. would only be one of 6, 6.5, 7 days).The resulting per-group 2D data will have the specified
dtype
. By default isint32
which is a reasonable value for storing counts.By default, store the data in
Sparse
format. Ifdense
, store it inDense
format.Required Inputs
member#
An axis with one entry per individual group member.
group#
An axis with an entry per group of zero or more members.
property#
An axis with an entry per value of some property.
member#group
The index of the group each member belongs to. If negative, it is not a part of any group.
member#property
The property value associated with each individual member.
Assured Outputs
group,property#members
How many members have each property value in each group.
If
overwrite
, will overwrite existing data.For example:
import daf data = daf.DafWriter( storage=daf.MemoryStorage(name="example.storage"), base=daf.FilesReader(daf.DAF_EXAMPLE_PATH, name="example.base"), name="example" ) with data.adapter( axes=dict(cell="member", metacell="group", sex="property"), data={"cell#metacell": "group", "cell#batch#sex": "property"}, hide_implicit=True, back_data={"group,property#members": "cells"} ) as adapter: daf.count_group_values(adapter) print(data.get_frame("metacell,sex#cells"))
sex male female metacell... Metacell_0 14 39 Metacell_1 114 0 Metacell_2 22 14 Metacell_3 47 0 Metacell_4 45 7 Metacell_5 97 0 Metacell_6 17 9 Metacell_7 9 22 Metacell_8 34 0 Metacell_9 18 16
- daf.groups.assign_group_values(data: DafWriter, *, dtype: Optional[Union[dtype, str]] = None, default: Optional[Any] = None, overwrite: bool = False) None [source]¶
Assign the per-group property value to each members of the group.
The resulting per-member 1D data will have the specified
dtype
. By default is the same as the data type of of the group values.Members that are not a part of any group are given the
default
value. This isNone
by default, which is acceptable for floating point values (becomes aNaN
), but would fail for integer data.Required Inputs
member#
An axis with one entry per individual group member.
member#group
The index of the group each member belongs to. If negative, it is not a part of any group.
group#property
The property value associated with each group.
Assured Outputs
member#property
The property value associated with the group of each member.
If
overwrite
, will overwrite existing data.For example:
import daf data = daf.DafWriter( storage=daf.MemoryStorage(name="example.storage"), base=daf.FilesReader(daf.DAF_EXAMPLE_PATH, name="example.base"), name="example" ) with data.adapter( axes=dict(cell="member", metacell="group", sex="property"), data={"cell#metacell": "group", "metacell#cell_type": "property"}, hide_implicit=True, back_data={"member#property": "type"} ) as adapter: daf.assign_group_values(adapter) print(data.get_item("cell=Cell_0,type"))
epiblast