Gatsby: 'source and transform nodes' step seemingly can't handle large JSON files

Created on 12 Sep 2017  ยท  8Comments  ยท  Source: gatsbyjs/gatsby

Issue

Is there a suggested size for files within the _src/data_ directory for optimal node sourcing & transformation when running either gatsby develop or gatsby build?

There's a single JSON file taking up 35 to 45 megabytes of space that we're hoping to query through the GraphQL data layer and node /usr/bin/gatsby develop has been stuck on the 'source and transform nodes' step at least 20 minutes at the time of reporting.

Details

Gatsby version: 1.1.6
node.js version: v8.4.0
OS version: Solus 3 64-bit
package.json:

{
  "name": "gatsby-starter-default",
  "description": "Gatsby default starter",
  "version": "1.0.0",
  "author": "Kyle Mathews <[email protected]>",
  "dependencies": {
    "babel-plugin-import": "^1.4.0",
    "gatsby": "^1.9.17",
    "gatsby-link": "^1.6.15",
    "gatsby-plugin-antd": "^1.0.6",
    "gatsby-plugin-react-helmet": "^1.0.5",
    "gatsby-source-filesystem": "^1.4.12",
    "gatsby-transformer-json": "^1.0.6",
    "react-style-proptype": "^3.0.0"
  },
  "keywords": [
    "gatsby"
  ],
  "license": "MIT",
  "main": "n/a",
  "scripts": {
    "build": "gatsby build",
    "develop": "gatsby develop",
    "format": "prettier --trailing-comma es5 --no-semi --single-quote --write 'src/**/*.js'",
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "devDependencies": {
    "css-loader": "^0.28.7",
    "prettier": "^1.6.1"
  }
}

gatsby-config.js:

module.exports = {
    siteMetadata: {
        title: `Gatsby Default Starter`
    },
    plugins: [
        `gatsby-plugin-react-helmet`,
        "gatsby-plugin-antd",
        `gatsby-transformer-json`,
        {
            resolve: `gatsby-source-filesystem`,
            options: {
                name: `data`,
                path: `${__dirname}/src/data/`
            }
        }
    ]
};
stale? question or discussion

Most helpful comment

After leaving it to run for a while:

success delete html files from previous builds โ€” 0.014 s   
success open and validate gatsby-config.js โ€” 0.005 s       
success copy gatsby files โ€” 0.022 s                                                                                   
success source and transform nodes โ€” 3359.028 s            
โ                                                           
<--- Last few GCs --->                                                                                                
                                                                                                                      [1710:0x25cec20]  3497807 ms: Mark-sweep 1415.4 (1819.9) -> 1415.4 (1819.9) MB, 697.9 / 0.1 ms  allocation failure sca
venge might not succeed                                                                                               
[1710:0x25cec20]  3498502 ms: Mark-sweep 1415.4 (1819.9) -> 1415.4 (1819.9) MB, 694.6 / 0.1 ms  allocation failure sca
venge might not succeed                                                                                               
[1710:0x25cec20]  3499558 ms: Mark-sweep 1415.4 (1819.9) -> 1415.4 (1786.9) MB, 1056.0 / 0.1 ms  last resort          
[1710:0x25cec20]  3500338 ms: Mark-sweep 1415.4 (1786.9) -> 1415.4 (1785.9) MB, 779.1 / 0.1 ms  last resort           


<--- JS stacktrace --->                                                                                               

==== JS stack trace =========================================                                                         

Security context: 0x33d2b831cef1 <JSObject>                                                                               1: keys(aka keys)(this=0x33d2b8302241 <undefined>,0x264b4a87e419 <Object map = 0x2dd7e2497441>)                   
    3: /* anonymous */ [/REDACTED/node_modules/lodash/lodash.js:1219] [bytecode=
0x215742c86479 offset=15](this=0x3c4fe5890d9 <JSGlobal Object>,arg=0x264b4a87e419 <Object map = 0x2dd7e2497441>)      
    5: baseKeys [/REDACTED/no...                                                

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory                                    
 1: node::Abort() [node]     
 2: 0x1161121 [node]         
 3: v8::Utils::ReportOOMFailure(char const*, bool) [node]  
 4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [node]                                               
 5: v8::internal::Factory::NewFixedArray(int, v8::internal::PretenureFlag) [node]                                     
 6: 0xcd7d0b [node]          
 7: v8::internal::FastKeyAccumulator::GetKeysFast(v8::internal::GetKeysConversion) [node]                             
 8: v8::internal::FastKeyAccumulator::GetKeys(v8::internal::GetKeysConversion) [node]                                 
 9: v8::internal::KeyAccumulator::GetKeys(v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::KeyCollectionM
ode, v8::internal::PropertyFilter, v8::internal::GetKeysConversion, bool) [node]                                      
10: v8::internal::Runtime_ObjectKeys(int, v8::internal::Object**, v8::internal::Isolate*) [node]                      
11: 0xe2012f840dd            
Aborted                      

@evgeny-kuznetsov

All 8 comments

Yeah that's definitely causing trouble. If interested, would love it if you could identify where things are breaking. In theory gatsby should be fine with this.

Sure thing, I'll definitely try to look into exactly where it could be breaking; after 40 min of waiting, I gave up and killed the process. I'll try to split up the JSON file into many smaller ones to see if that alleviates the issue.

Unfortunately, splitting up the file into many smaller ones does not seem to make a difference at all (posting this in case anyone else was wondering).

After leaving it to run for a while:

success delete html files from previous builds โ€” 0.014 s   
success open and validate gatsby-config.js โ€” 0.005 s       
success copy gatsby files โ€” 0.022 s                                                                                   
success source and transform nodes โ€” 3359.028 s            
โ                                                           
<--- Last few GCs --->                                                                                                
                                                                                                                      [1710:0x25cec20]  3497807 ms: Mark-sweep 1415.4 (1819.9) -> 1415.4 (1819.9) MB, 697.9 / 0.1 ms  allocation failure sca
venge might not succeed                                                                                               
[1710:0x25cec20]  3498502 ms: Mark-sweep 1415.4 (1819.9) -> 1415.4 (1819.9) MB, 694.6 / 0.1 ms  allocation failure sca
venge might not succeed                                                                                               
[1710:0x25cec20]  3499558 ms: Mark-sweep 1415.4 (1819.9) -> 1415.4 (1786.9) MB, 1056.0 / 0.1 ms  last resort          
[1710:0x25cec20]  3500338 ms: Mark-sweep 1415.4 (1786.9) -> 1415.4 (1785.9) MB, 779.1 / 0.1 ms  last resort           


<--- JS stacktrace --->                                                                                               

==== JS stack trace =========================================                                                         

Security context: 0x33d2b831cef1 <JSObject>                                                                               1: keys(aka keys)(this=0x33d2b8302241 <undefined>,0x264b4a87e419 <Object map = 0x2dd7e2497441>)                   
    3: /* anonymous */ [/REDACTED/node_modules/lodash/lodash.js:1219] [bytecode=
0x215742c86479 offset=15](this=0x3c4fe5890d9 <JSGlobal Object>,arg=0x264b4a87e419 <Object map = 0x2dd7e2497441>)      
    5: baseKeys [/REDACTED/no...                                                

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory                                    
 1: node::Abort() [node]     
 2: 0x1161121 [node]         
 3: v8::Utils::ReportOOMFailure(char const*, bool) [node]  
 4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [node]                                               
 5: v8::internal::Factory::NewFixedArray(int, v8::internal::PretenureFlag) [node]                                     
 6: 0xcd7d0b [node]          
 7: v8::internal::FastKeyAccumulator::GetKeysFast(v8::internal::GetKeysConversion) [node]                             
 8: v8::internal::FastKeyAccumulator::GetKeys(v8::internal::GetKeysConversion) [node]                                 
 9: v8::internal::KeyAccumulator::GetKeys(v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::KeyCollectionM
ode, v8::internal::PropertyFilter, v8::internal::GetKeysConversion, bool) [node]                                      
10: v8::internal::Runtime_ObjectKeys(int, v8::internal::Object**, v8::internal::Isolate*) [node]                      
11: 0xe2012f840dd            
Aborted                      

@evgeny-kuznetsov

I'm running into the same issue with a 14MB sized JSON file.

When commenting out gatsby-transformer-json it doesn't hang and the file's metadata shows up in GraphQL (which makes sense as gatsby-source-filesystem is not trying to read/transform the data itself, yet). I'll try to find the issue here, if all fails I'm thinking about pushing the JSON data into a DB before consuming it with gatsby.

I assume it's the JSON.parse statement and that using e.g. stream-json might be necessary to process large JSON files.

v2 is much better in this regards I guess. I have a large 40 MiB json file (no indentation or newlines) and it takes about 60 seconds to parse. My simple site builds in less then 2 minutes in all.

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub issues, we have to clean some of the old issues as many of them have already been resolved with the latest updates.

Please make sure to update to the latest Gatsby version and check if that solves the issue. Let us know if that works for you by adding a comment ๐Ÿ‘

Probably stale. I'll test V2 and reopen if it's an issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kalinchernev picture kalinchernev  ยท  3Comments

brandonmp picture brandonmp  ยท  3Comments

ferMartz picture ferMartz  ยท  3Comments

benstr picture benstr  ยท  3Comments

theduke picture theduke  ยท  3Comments