Hi,
If there are different number of cells in different conditions (or technology), are there any issues with bias in the integration workflow for clustering? I would imagine if condition A has many more cells than condition B, then the clustering would be biased towards the cluster/cell types in condition A. If this is the case, are there strategies to deal with it such as downsampling. Are there any examples in of the workflow examples?
Thanks.
Larger datasets also tend to contain more information, so we are not inherently concerned about imbalance. However, you can certainly downsample objects if you wish (i.e. to sample 1k cells)
object.downsample = subset(object, cells = sample(Cells(object), 1000))
@satijalab Would you recommend doing the integration with all the data and maybe subsample unbalanced dataset for clustering?
Most helpful comment
Larger datasets also tend to contain more information, so we are not inherently concerned about imbalance. However, you can certainly downsample objects if you wish (i.e. to sample 1k cells)