Gatsby: Gatsby Plugin: Asset Manifest (for Server-Side Authentication in Front of Built Assets without client-side routes)

Created on 21 Jan 2020  路  14Comments  路  Source: gatsbyjs/gatsby

Summary

Gatsby static sites are very fast and optimized. But some content may not be suitable to deliver to all audiences. There are often use cases for including an authentication layer.

Currently, Gatsby promotes using client-only routes for authentication, which negate many of the benefits of the static site generation.

It is possible to set up a Node.js server to achieve server-side authentication (example with Auth0 here: https://github.com/karlhorky/auth0-node-heroku). This example can be extended to serve the Gatsby static assets via express.static or similar - if the user is authenticated, they get the static content back; if not, they receive a 403 Forbidden.

This almost achieves what we want! But it is an all-or-nothing solution - there is no way to restrict access to specific Gatsby assets (for example, based on pages), without multiple crazy, error-prone regexes like this:

const allowedUrlsUserBob = /^(\/|\/(webpack-runtime|app|styles|commons|component---src-pages(-|-courses-1(-|-modules-001-|-modules-002-))index-mdx)-[a-z0-9]+\.js|\/page-data(\/|\/index\/|\/courses\/1\/|\/courses\/1\/modules\/(001|002)\/)(page|app)-data.json|\/courses\/1\/|\/courses\/1\/modules\/(001|002)\/|\/(static|icons)\/.+\.png(\?v=[a-z0-9]+)?)(\?[^/]+)?$/;

Spoiler: I'm using this crazy regular expression option right now 馃槄

Proposals

I propose offering and documenting one or more tools to support a server-side authentication flow for completely static sites (without using client-only routes), similar to @pieh's comment on https://github.com/gatsbyjs/gatsby/issues/1100#issuecomment-477978790:

If gatsby would generate asset manifest on builds detailing what assets are used for given urls/pathnames - would that help?

1. An Asset Manifest would be a great start!

This would allow for simpler configuration of user-level and page-level access-control:

import assets from 'manifest.json'

// assets['/courses/1/modules/001'] === [
//   '/webpack-runtime-ef3a9f9842cf40a03163.js',
//   '/commons-cdc988b7a0635a52ddb8.js',
//   '/app-cafc7b4b1730f061489d.js',
//   '/styles-6c9411e5cef0c7a2398a.js',
//   '/component---src-pages-courses-1-modules-001-index-mdx-19fb7194fc6837729ecd.js',
//   '/page-data/courses/prep-l-webfs-gen-0/page-data.json',
//   '/page-data/app-data.json',
// ]

const accessControl = {
  // Key: User ID
  // Value: Array of unique pages they are allowed to view
  2: [...new Set([
    ...assets['/courses/1/modules/001'],
    ...assets['/courses/1/modules/002'],
  ])],
}

Of course, I'm not fixed on the API for the manifest. I'd be open to having helpers to extend this too!

2. Configurable Pre-fetching

One thing that these solutions cause is a lot of failed pre-fetching requests for users without full access:

Screen Shot 2020-01-21 at 12 52 12

Maybe there could be a way to configure pre-fetching client-side? So that different resources could be pre-fetched per user?

Basic example

Examples in Proposals section above.

Motivation

Gatsby users will commonly want authentication flows in their apps, and they should also want performant applications, which can be achieved with static site generation.

Alternatives Considered

Existing boilerplates, articles and blog posts, such as those below:

  1. https://github.com/auth0-blog/gatsby-auth0

  2. From @rwieruch in https://github.com/gatsbyjs/gatsby/issues/1100#issuecomment-351585069:

I implemented a quick MVP this morning to checkout a whole Firebase authentication flow in Gatsby. Turns out it works.

This doesn't really protect the static content (@sarneeh in https://github.com/gatsbyjs/gatsby/issues/1100#issuecomment-368295841):

@rwieruch Alright, but I think this is not how you should block website content. In your case you just have client-side authentication logic on your page - anyone with a link to your authorised content will still be able to get it (because it's just a static file somewhere on your host). To make it reliable you still need to authorise the user on the server which serves the content.

Ref ("Authentication support"): https://github.com/gatsbyjs/gatsby/issues/1100

cc @simoneb @pieh @samjulien

not stale webpacbabel

Most helpful comment

Ok great! So I just need to import this file on the start of the Express server and use the information within it, I guess.

I'll see if I can make something work in the repo: https://github.com/karlhorky/gatsby-serverside-auth0


Edit: Done:

Updated the proof of concept repo:

Here's the difference:

Old Solution

// Regular expression to match allowed assets related
// to src/pages/index.mdx in the Gatsby website.
//
// Trying to navigate to assets related to src/pages/page-2.mdx
// will return an "Access denied."
const allowedGatsbyWebsiteUrls = /^(\/|\/(webpack-runtime|app|styles|commons|component---src-pages-index-mdx)-[a-z0-9]+\.js|\/page-data\/(index\/)?(page|app)-data.json|\/(static|icons)\/.+\.png(\?v=[a-z0-9]+)?)(\?[^/]+)?$/;

New Solution

// Require the Gatsby asset manifest from the build
// to get paths to all assets that are required by
// each "named chunk group" (each named chunk group
// corresponds to a page).
//
// Ref: https://github.com/gatsbyjs/gatsby/issues/20745#issuecomment-577685950
const {
  namedChunkGroups,
} = require('../gatsby-website/public/webpack.stats.json');

function pageToWebpackFormat(page) {
  // Replace slashes and periods with hyphens
  return page.replace(/(\/|\.)/g, '-');
}

function pageToGatsbyPageDataPath(page) {
  // Strip the /index.mdx at the end of the page
  // If it's the index, just strip the .mdx
  return page.replace(/(\/index)?\.mdx$/, '');
}

function pageToWebPaths(page) {
  // Strip the index.mdx at the end of the page
  let pageWithoutIndex = page.replace(/((\/)?index)?\.mdx$/, '');
  // Add a slash, but only for non-root paths
  if (pageWithoutIndex !== '') pageWithoutIndex += '/';
  return [pageWithoutIndex, pageWithoutIndex + 'index.html'];
}

function getPathsForPages(pages) {
  return (
    pages
      .map(page => {
        return [
          // All asset paths from the webpack manifest
          ...namedChunkGroups[
            `component---src-pages-${pageToWebpackFormat(page)}`
          ].assets,
          // All of the Gatsby page-data.json files
          `page-data/${pageToGatsbyPageDataPath(page)}/page-data.json`,
          ...pageToWebPaths(page),
        ];
      })
      // Flatten out the extra level of array nesting
      .flat()
      .concat(
        // Everything general for the app
        ...namedChunkGroups.app.assets,
        'page-data/app-data.json',
      )

      .filter(
        assetPath =>
          // Root
          assetPath === '' ||
          // Only paths ending with js, json, html and slashes
          assetPath.match(/(\.(html|js|json)|\/)$/),
      )
      // Add a leading slash to make a root-relative path
      // (to match Express' req.url)
      .map(assetPath => '/' + assetPath)
  );
}

const allowedWebpackAssetPaths = getPathsForPages([
  'index.mdx',
]);

function isAllowedPath(path) {
  const pathWithoutQuery = path.replace(/^([^?]+).*$/, '$1');

  // Allow access to the manifest
  if (pathWithoutQuery === '/manifest.webmanifest') return true;

  // Allow access to images within static and icons
  if (pathWithoutQuery.endsWith('png')) {
    if (
      pathWithoutQuery.startsWith('/static/') ||
      pathWithoutQuery.startsWith('/icons/')
    ) {
      return true;
    }
  }

  return allowedWebpackAssetPaths.includes(pathWithoutQuery);
}

All 14 comments

Thanks for taking the time to track this request @karlhorky 馃憤

I have created a repo with my setup for the secure Express server-side authentication of Gatsby static files (no client-only routes or open static content!) here:

https://github.com/karlhorky/gatsby-serverside-auth0

This also includes my above-mentioned regular expressions for rudimentary access control on a per-user and per-page basis, which is what this issue hopes to get a better solution for!

Interesting issue, @karlhorky and thank you for taking the time to write this up.

An Asset Manifest would be a great start

webpack.stats.json which is written to public should contain _most_ of the stuff you'd like from a manifest. It typically looks like:

```{
"errors":[

],
"warnings":[

],
"namedChunkGroups":{
"app":{ },
"component---src-pages-404-js":{ },
"component---src-pages-index-js":{ },
"component---src-pages-page-2-js":{ }
},
"assetsByChunkName":{
"app":[ ],
"component---src-pages-404-js":[ ],
"component---src-pages-index-js":[ ],
"component---src-pages-page-2-js":[ ]
}
}```

We chunk per page at the moment so you should be fine mapping pages to these and including /public/app-data.json in the app chunk and /public/<page>/page-data.json for every page.

Configurable Pre-fetching

We've considered adding an opt out mechanism for prefetching and adding an imperative API for prefetching. We'd love contributions for this in case you're interested. Let's track that in https://github.com/gatsbyjs/gatsby/issues/20568

Ok great! So I just need to import this file on the start of the Express server and use the information within it, I guess.

I'll see if I can make something work in the repo: https://github.com/karlhorky/gatsby-serverside-auth0


Edit: Done:

Updated the proof of concept repo:

Here's the difference:

Old Solution

// Regular expression to match allowed assets related
// to src/pages/index.mdx in the Gatsby website.
//
// Trying to navigate to assets related to src/pages/page-2.mdx
// will return an "Access denied."
const allowedGatsbyWebsiteUrls = /^(\/|\/(webpack-runtime|app|styles|commons|component---src-pages-index-mdx)-[a-z0-9]+\.js|\/page-data\/(index\/)?(page|app)-data.json|\/(static|icons)\/.+\.png(\?v=[a-z0-9]+)?)(\?[^/]+)?$/;

New Solution

// Require the Gatsby asset manifest from the build
// to get paths to all assets that are required by
// each "named chunk group" (each named chunk group
// corresponds to a page).
//
// Ref: https://github.com/gatsbyjs/gatsby/issues/20745#issuecomment-577685950
const {
  namedChunkGroups,
} = require('../gatsby-website/public/webpack.stats.json');

function pageToWebpackFormat(page) {
  // Replace slashes and periods with hyphens
  return page.replace(/(\/|\.)/g, '-');
}

function pageToGatsbyPageDataPath(page) {
  // Strip the /index.mdx at the end of the page
  // If it's the index, just strip the .mdx
  return page.replace(/(\/index)?\.mdx$/, '');
}

function pageToWebPaths(page) {
  // Strip the index.mdx at the end of the page
  let pageWithoutIndex = page.replace(/((\/)?index)?\.mdx$/, '');
  // Add a slash, but only for non-root paths
  if (pageWithoutIndex !== '') pageWithoutIndex += '/';
  return [pageWithoutIndex, pageWithoutIndex + 'index.html'];
}

function getPathsForPages(pages) {
  return (
    pages
      .map(page => {
        return [
          // All asset paths from the webpack manifest
          ...namedChunkGroups[
            `component---src-pages-${pageToWebpackFormat(page)}`
          ].assets,
          // All of the Gatsby page-data.json files
          `page-data/${pageToGatsbyPageDataPath(page)}/page-data.json`,
          ...pageToWebPaths(page),
        ];
      })
      // Flatten out the extra level of array nesting
      .flat()
      .concat(
        // Everything general for the app
        ...namedChunkGroups.app.assets,
        'page-data/app-data.json',
      )

      .filter(
        assetPath =>
          // Root
          assetPath === '' ||
          // Only paths ending with js, json, html and slashes
          assetPath.match(/(\.(html|js|json)|\/)$/),
      )
      // Add a leading slash to make a root-relative path
      // (to match Express' req.url)
      .map(assetPath => '/' + assetPath)
  );
}

const allowedWebpackAssetPaths = getPathsForPages([
  'index.mdx',
]);

function isAllowedPath(path) {
  const pathWithoutQuery = path.replace(/^([^?]+).*$/, '$1');

  // Allow access to the manifest
  if (pathWithoutQuery === '/manifest.webmanifest') return true;

  // Allow access to images within static and icons
  if (pathWithoutQuery.endsWith('png')) {
    if (
      pathWithoutQuery.startsWith('/static/') ||
      pathWithoutQuery.startsWith('/icons/')
    ) {
      return true;
    }
  }

  return allowedWebpackAssetPaths.includes(pathWithoutQuery);
}

So the webpack.stats.json file does not include the following (will update as I find more):

Files in public/static (eg. Images)

Candidates for extraction:

  1. List out paths to all files in public/static using a library
  2. The public/index.html file contains these paths
  3. The public/app-xxxxxxxxxxxxxxxxxxxxxxx.js file contains these paths

Files in various public/xxxxxxxxxxxxxxxxxxxxxxx directories (eg. Videos, SVG files)

Candidates for extraction:

  1. List out paths to all video, etc. files in each public/xxxxxxxxxxxxxxxxxxxxxxx directory using a library
  2. The public/index.html file contains these paths
  3. The public/component---src-pages-pagepath-xxxxxxxxxxxxxxxxxxxxxxx.js file contains these paths

Files in public/icons

Candidates for extraction:

  1. List out paths to all files in public/icons using a library
  2. The public/index.html file contains these paths

Files in public/page-data (eg. app-data.json and page-data.json Files)

Candidates for extraction:

  1. List out paths to all files in public/page-data using a library
  2. The public/app-xxxxxxxxxxxxxxxxxxxxxxx.js file contains these paths

public/manifest.webmanifest

Candidates for extraction:

  1. Hardcode it
  2. The public/index.html file contains this path

@sidharthachatterjee would the Gatsby team be open to creating a separate Asset Manifest for these files? Maybe in the same format as the webpack stats?

It would allow for my new solution above to be further simplified.

@karlhorky Yup, absolutely. I think this could be a pretty cool gatsby plugin which could use onCreateWebpackConfig (off the top of my head) to hook into webpack using a custom plugin to get all assets for a page (including more than just js).

Files in public/static (eg. Images)

Hmm, this is interesting. @pieh Do we keep a dependency graph of these per page entry point?

Files in public/icons

Could list these like you said in onPostBuild in a plugin

Files in public/page-data (eg. app-data.json and page-data.json Files)

These names _can_ be hard coded because they will always be called these (by design) but I'd consider them internal implementation details which we _might_ break in a minor version

public/manifest.webmanifest

This should be okay to hardcode

Hiya!

This issue has gone quiet. Spooky quiet. 馃懟

We get a lot of issues, so we currently close issues after 30 days of inactivity. It鈥檚 been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 馃挭馃挏

Update: I've added some features and fixed some things in the proof of concept:

  • Gatsby 404 page displayed when user requests non-existent resource
  • refactored folder structure (most Gatsby-specific code in isAllowedGatsbyPath.js)
  • fixed some weirdness with Auth0 redirects firing multiple times
  • turned on TypeScript checking on the JavaScript files and fixed type errors

This is really awesome work. I am trying to deal with something similar (cloudfront+lambda@edge+s3, block unauthenticated requests to /blog/private/* with the lambda)

One concern I have for the approach you are taking: I notice that my in my website generated by gatsby-transformer-remark, the entire site's content is contained in my app-{hash}.js, so even if I filter the appropriate page-data.jsons. I still can't secure my site without blocking all JS 馃槵

Have you run into/investigated this problem @karlhorky? (it could be specific to the plugin I'm using)

(I suspect it might be due to my plugin, the allPages GQL query is what is contained in the main js bundle, which obviously contains all the page data)

No, my app-{hash}.js file does not contain page content (try the demo repo: https://github.com/upleveled/gatsby-serverside-auth0).

It only contains a mapping to each of the pages (so if the page titles are secret, that could be an issue).

Saw that @sidharthachatterjee added some new paths on /static/d/<hash>.json ([email protected]):

https://github.com/gatsbyjs/gatsby/pull/25723

These cause all JavaScript on the page to break if these pre-fetch requests do not succeed, because of how the requests are handled (no catch of errors):

Screen Shot 2020-07-25 at 12 02 27


So this caused the solution above to break (understandable, when using undocumented internals).

I've published a fix here:

https://github.com/upleveled/gatsby-serverside-auth0/commit/278020c6e2f3c50e606dbe66295c4ba1e5d1f1ef

This behavior of causing all JavaScript on the page to break if pre-fetching fails seems like it could be improved though.

Maybe it could be addressed as part of https://github.com/gatsbyjs/gatsby/issues/25330

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dustinhorton picture dustinhorton  路  3Comments

brandonmp picture brandonmp  路  3Comments

theduke picture theduke  路  3Comments

ghost picture ghost  路  3Comments

magicly picture magicly  路  3Comments