Seurat: SCTransform: why Normalize RNA data for visualization purposes?

Created on 26 Aug 2019  Â·  10Comments  Â·  Source: satijalab/seurat

Hi, I am using SCTransform for separately normalize several 10x scRNA-Seq datasets,
and than I am following the tutorial about PrepSCTIntegration with great results: PCA, UMAP / tSNE and clustering were calculated by setting the new "integrated" assay as default assay.

Now, which is the best way to go with data visualization (VlnPlot, FeaturePlot etc)?
Reading the online documentation, the slot [["SCT"]]@data should contains log-normalized values of corrected counts. However, since SCTransform was performed on each sample separately, if I want to use [["SCT"]]@data for visualization I need to run SCTransform again so that it runs on the whole dataset?

At the end of this tutorial: https://satijalab.org/seurat/v3.0/sctransform_vignette.html
I see that, after integration, visualization was preceded by LogNormalization with NormalizeData on the RNA assay: "Normalize RNA data for visualization purposes", but I can't find other details about visualization using SCTransform-ed data.

Thanks very much!
Best,
Alberto

Most helpful comment

One problem I can see with using the SCT assay after integration is that the SCT normalization was done separately for each sample, which is likely to introduce batch effects down the line. We are using the RNA assay, normalized after integration. The data slot is the default used for FeaturePlot, VlnPlot, FindConservedMarkers and the scale.data slot is the default for DoHeatmap, and we use the defaults. Typically scaled data (mean-centered, sd-adjusted) is only used for heatmaps and the rest, especially differential expression, you want to do on normalized count values.

All 10 comments

I also have the exact same question, so I am commenting on this one in the hopes it will be answered and more clearly explained in the vignette. It seems that if the SCTransform's normalized and scaled data are better, why do you go back to the regular normalization in the integration vignette? And would we also use ScaleData to get values for heatmaps?

Same question in my mind too! Would love to hear back on this from @satijalab ! Thank you!!

I believe they address this question in their response here: #1836 . Using the normalized RNA or SCT assay is acceptable for further downstream analysis.
I am wondering, if we revert to the SCT assay, do we use the data or scale.data slot for visualization, differential expression, or other downstream analyses, and why we'd prefer one slot over the other.

One problem I can see with using the SCT assay after integration is that the SCT normalization was done separately for each sample, which is likely to introduce batch effects down the line. We are using the RNA assay, normalized after integration. The data slot is the default used for FeaturePlot, VlnPlot, FindConservedMarkers and the scale.data slot is the default for DoHeatmap, and we use the defaults. Typically scaled data (mean-centered, sd-adjusted) is only used for heatmaps and the rest, especially differential expression, you want to do on normalized count values.

That makes a lot of sense, thank you! In that case, would it be reasonable to re-run the SCTransform on the integrated object's RNA Assay, saving it under a new assay name, as another way to normalize the data, correct batch effects, and find variable features for downstream analysis?

In #1836 (linked above) they say explicitly not to re-run SCTransform on the integrated data, but running it on the RNA assay should be fine. I have not tried that yet...

Oh yes, siamo alla versione 10

Sul forum github di Seurat un paio di mie domande hanno aperto un dibattito ahahhahaha

Inviato da iPhone

Il giorno 17 ott 2019, alle ore 19:29, Jenny Drnevich notifications@github.com ha scritto:

In #1836 (linked above) they say explicitly not to re-run SCTransform on the integrated data, but running it on the RNA assay should be fine. I have not tried that yet...

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

One problem I can see with using the SCT assay after integration is that the SCT normalization was done separately for each sample, which is likely to introduce batch effects down the line. We are using the RNA assay, normalized after integration. The data slot is the default used for FeaturePlot, VlnPlot, FindConservedMarkers and the scale.data slot is the default for DoHeatmap, and we use the defaults. Typically scaled data (mean-centered, sd-adjusted) is only used for heatmaps and the rest, especially differential expression, you want to do on normalized count values.

Hi @jdrnevich , Just to get clarification about what you said (pasted above), you use the SCT assay, normalized separately on each sample before integration, followed by PrepSCTIntegration and such as per tutorial, for PCA, etc., up until clustering and tSNE. But then, for visualization and markers, you switch back to the RNA assay. Is my understanding correct? Many thanks!

Yes, @CodeInTheSkies, we switch back to the RNA assay, run NormalizeData() and ScaleData(), then proceed with visualizations and marker detection.

From: CodeInTheSkies notifications@github.com
Sent: Monday, October 21, 2019 10:50 AM
To: satijalab/seurat seurat@noreply.github.com
Cc: Drnevich, Jenny drnevich@illinois.edu; Mention mention@noreply.github.com
Subject: Re: [satijalab/seurat] SCTransform: why Normalize RNA data for visualization purposes? (#2023)

One problem I can see with using the SCT assay after integration is that the SCT normalization was done separately for each sample, which is likely to introduce batch effects down the line. We are using the RNA assay, normalized after integration. The data slot is the default used for FeaturePlot, VlnPlot, FindConservedMarkers and the scale.data slot is the default for DoHeatmap, and we use the defaults. Typically scaled data (mean-centered, sd-adjusted) is only used for heatmaps and the rest, especially differential expression, you want to do on normalized count values.

Hi @jdrnevichhttps://github.com/jdrnevich , Just to get clarification about what you said (pasted above), you use the SCT assay, normalized separately on each sample before integration, followed by PrepSCTIntegration and such as per tutorial, for PCA, etc., up until clustering and tSNE. But then, for visualization and markers, you switch back to the RNA assay. Is my understanding correct? Many thanks!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/satijalab/seurat/issues/2023?email_source=notifications&email_token=ACREQWID3MSLDSKK73UETJDQPXFSDA5CNFSM4IPO5QL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB2ZWGA#issuecomment-544578328, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACREQWPJPVXAWGWBJNK24NLQPXFSDANCNFSM4IPO5QLQ.

Apologies that we never responded, but the last comment in this thread is reasonable.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

akhst7 picture akhst7  Â·  3Comments

RuiyangLiu94 picture RuiyangLiu94  Â·  3Comments

farhanma picture farhanma  Â·  3Comments

htc502 picture htc502  Â·  3Comments

fly4all picture fly4all  Â·  3Comments