Hi,
I have a few questions regarding the vignette "Integrating stimulated vs. control PBMC datasets to learn cell-type specific responses"
1) I've noticed that ScaleData is only used once at the "integrated" assay right after FindIntegrationAnchors and IntegrateData. In the vignette, differential expression is performed with the "RNA" assay since this was set as the Default after integration (in the "Identify conserved cell type markers"), but the data is not scaled at this assay (example: @scale.data slot is not present in the RNA assay). Thus, does that mean that scaling the data at the RNA assay is not needed due to the integration steps? Or should scaling be performed prior to differential expression at this assay as well? (if so, this step is missing at the tutorial).
2) Since a new assay is created from integration as a I described in my first question, shouldn't the differential expression step be performed with the "integrated" assay? What is the reasoning behind switching to the "RNA" assay prior to differential expression? One reason I could think of is that the integrated assay only contains 2000 anchors if this is left at the default setting. But if we change the assay to the RNA assay, does that mean that the results will not be from the integrated dataset? I am just a bit confused on which assay we should perform differential expression across conditions
3) I am interested in using the sctransform method for the vignette above. Would performing this right after importing the data in the ctrl and stim objects be the correct place for this transformation? My concern is that the methods already applies scaling of the data, so if I follow this, then scaling will be performed prior to integration. __EDIT__: I found the following issue at the sctransform repo which I believe is the current answer for this question but please indicate otherwise: https://github.com/ChristophH/sctransform/issues/4
Thanks a lot for the clarification. Just trying to make sure I am grasping all the changes between v2 and v3.
I have a similat question to 1 & 2 as @lshepard2154 right now I can't plot (in heatmap) the genes I'm interested in from the batch corrected (integrated) samples since only 2000 genes are used in the integrated assay.
For question 2, I think differential expression should be performed by integrated assay. To do this, you should set features.to.integrate to all genes in IntegrateData, so data slot of integrated assay has all genes
For question 2, I think differential expression should be performed by integrated assay. To do this, you should set
features.to.integrateto all genes inIntegrateData, so data slot of integrated assay has all genes
That's what I thought, but in the vignette (https://satijalab.org/seurat/v3.0/immune_alignment.html), they switch the default assay to "RNA" prior to differential expression....
Also, if we use SCTransform, the normalized values in that case are stored in the "SCT" assay. So if using that, I think "SCT" is what I believe should be used to detect DEGs between conditions. I would really like clarification on these points.
Hi,
Please see FAQ 4 for a brief discussion on this. We realize that these types of questions are becoming more common and are working on a more detailed DE vignette to help explain in greater detail.
@andrewwbutler : Thanks, but to clarify on # 2 point from the FAQ: it says not to use SCT assay for DE, but that somewhat contradicts what is discussed here: https://github.com/ChristophH/sctransform/issues/4
I have actually run differential expression across conditions following the code on the issue above with SCTransform with the data from the vignette , and got similar results (__EDIT__: but I use the "SCT" object for DE across conditions since that was the data which was normalized. The code above stops at integration). But now I am confused: should we refrain from using SCTransform until DE is fully supported for it?
The reason why I mention this is because the sctransfrom vignette also states the following:
You can use the corrected log-normalized counts for differential expression and integration. However, in principle, it would be most optimal to perform these calculations directly on the residuals (stored in the scale.data slot) themselves. This is not currently supported in Seurat v3, but will be soon.
So, I am clear on the fact that currently the results we get are "sub-optimal", but the FAQ response makes it seem like we should not do this at all.
__A big thank you__ for your support in all this. I hope these questions help your team in the updates for the documentation and future support. I really appreciate all you do.
Hi,
Thanks for the question, and I apologize for the confusion. We're working on allowing for DE to be performed on pearson residuals from SCTransform in an optimal way. Until then, its easiest for us to advise users just to use the RNA assay. But if you're really excited to give it a try, it is not invalid to do so. Still, in the interest of simplicity, we'll keep the FAQ as-is.
best,
Rahul
@satijalab I wanted to follow up on this thread for clarification on one of @lshepard2154 questions.
Thus, does that mean that scaling the data at the RNA assay is not needed due to the integration steps? Or should scaling be performed prior to differential expression at this assay as well? (if so, this step is missing at the tutorial).
When working with an integrated object should the "RNA" assay be scaled prior to differential expression? Based on the response from @snsansom if looks like RNA does not undergo scaling prior to DE, but I wanted to confirm that is standard protocol, because the question was not formally addressed in this thread.
Thanks for your interest and comments. I recommend to follow the Satija labs' advice (see e.g. satijalab/seurat#1421). We still use the RNA counts for now.
Thanks again for your support in this!
Best,
Joey
Most helpful comment
Hi,
Thanks for the question, and I apologize for the confusion. We're working on allowing for DE to be performed on pearson residuals from SCTransform in an optimal way. Until then, its easiest for us to advise users just to use the RNA assay. But if you're really excited to give it a try, it is not invalid to do so. Still, in the interest of simplicity, we'll keep the FAQ as-is.
best,
Rahul