Hi,
I was writing to seek advice on how best to integrate datasets that are from different conditions and experiments. For ex, COND1 had Exp1.1 and Exp1.2 and COND2 had Exp2.1 and Exp2.2. The process I followed is:
Could you please suggest the best way to analyze this kind of data set.
Is using CCA an option here? I thought it was not because it would automatically correct for batch across all conditions which would lose the biological differences across conditions.
Thanks,
Hi Pankaj, my advice would be to first integrate all the datasets together using the integration methods in Seurat v3 (see https://satijalab.org/seurat/pancreas_integration_label_transfer.html). You can then perform clustering on the integrated data to identify common cell states across conditions and replicates. You can find differentially expressed genes between clusters or between control/treatment within a cluster using the uncorrected data, using logistic regression with replicate as a latent variable (FindMarkers with test.use="LR", and latent.vars="replicate").
@timoast , would there be any reason not to make use of sctransform:::compare_expression? Would require some extra work as compared to FindMarkers, but seems (?) to more directly integrate with the SCTransform results.
Most helpful comment
Hi Pankaj, my advice would be to first integrate all the datasets together using the integration methods in Seurat v3 (see https://satijalab.org/seurat/pancreas_integration_label_transfer.html). You can then perform clustering on the integrated data to identify common cell states across conditions and replicates. You can find differentially expressed genes between clusters or between control/treatment within a cluster using the uncorrected data, using logistic regression with replicate as a latent variable (
FindMarkerswithtest.use="LR", andlatent.vars="replicate").