Describe the bug
If a template that uses pagination uses data from a .11tydata.js file, the function it defines will be called once, then again for each page. So if there are three pages, the function will be called four times. This is especially slow when the function obtains data by invoking a REST service. Why is that function called more than once?
To Reproduce
Steps to reproduce the behavior:
words.md shown below.words.11tydata.md shown below.npx eleventy --serve---
pagination:
data: words
size: 2
alias: words
templateEngineOverride: njk,md
---
# Words
{% for word in words %}
{{loop.index}}) {{ word }}
{% endfor %}
module.exports = async () => {
console.log('words.11tydata.js: entered');
return {words: ['about', 'better', 'cuddle', 'dog']};
};
Expected behavior
I expected "words.11tydata.md entered" would only appear 1 time.
Environment:
I found that I can avoid processing the duplicate calls from pagination if I do something like this:
const fetch = require('node-fetch');
module.exports = async () => {
if (!global.employees) {
console.log('employees.11tydata.js: getting employees');
const url = 'https://dummy.restapiexample.com/api/v1/employees';
const res = await fetch(url);
const response = await res.json();
// This REST service returns an object with the properties
// success and data where data is an array of employee objects.
global.employees = response.data;
}
return {employees: global.employees};
};
But should I have to do this?
Interesting. I was looking into caching the server responses using lru-cache or something and caching them locally, but your use of global scope might be better.
Would using a global data file in /_data/ folder work at all?
From my very brief test, it looks like global data might be only fetched once and cached whereas local data files are called multiple times:
npm run build
> [email protected] build /Volumes/Dev/github/pdehaan/11ty-pagination-async-data-test
> eleventy
Fetching global remote data (via src/_data/names.js)
Fetching local remote data (via src/via-local-data.11tydata.js)
Fetching local remote data (via src/via-local-data.11tydata.js)
Fetching local remote data (via src/via-local-data.11tydata.js)
Fetching local remote data (via src/via-local-data.11tydata.js)
Writing www/global/one/index.html from ./src/via-global-data.njk.
Writing www/global/two/index.html from ./src/via-global-data.njk.
Writing www/global/three/index.html from ./src/via-global-data.njk.
Writing www/global/four/index.html from ./src/via-global-data.njk.
Writing www/global/five/index.html from ./src/via-global-data.njk.
Writing www/local/seven/index.html from ./src/via-local-data.njk.
Writing www/local/eight/index.html from ./src/via-local-data.njk.
Writing www/local/nine/index.html from ./src/via-local-data.njk.
Wrote 8 files in 0.15 seconds (v0.10.0)
A global data file wouldn’t work in my case because I want to get the latest data from a server every time I regenerate the site.
A global data file should be able to export an async function that uses node-fetch, or axios, or got, or request, or whatever lib to fetch data dynamically during build time.
I can’t find an exact example in my repos, but I can try doing a proof of concept later tonight.
But I think it should be the same logic as your local data file, except in your /_data/ folder.
A global data file wouldn’t work in my case because I want to get the latest data from a server every time I regenerate the site.
OK, I created a quick example here which scrapes the Reddit /r/hot subreddit feed using node-fetch:
https://github.com/pdehaan/11ty-data-async-test/blob/master/src/_data/reddit_hot.js
Thanks @pdehaan! I see that putting that code in the _data directory results in it only being invoked once. But I wonder if it is correct or good for the code in the same directory as the template to be invoked once for every page when pagination is used. Can you think of a benefit of it being invoked once for each page?
Hmm, I think this is a fair question.
Does it happen with a Directory Data file too?
I believe it does. If I recall correctly, only global data is retrieved once. All other sources retrieve the data again for each page when pagination is used.
Personally, I am doing network request in a separate script, which gets invoked as npm run-script every time I build.
This way I can work locally with downloaded responses (and not run into the risk of hitting an API limit or similiar).
You could switch the network requests to a file I/O one (or SQLite read). But those would still happen multiple times.
In my opinion, eleventy should be wrapping any data function calls in a memoization function. This would allow eleventy to call the functions multiple times like on pagination but only pay for the execution costs once. Memoization in Underscore is well explained and is very similar to how @mvolkmann did it. Just not with the global object, but with a cache property on the function itself.
I don't think memoizing everything is the thing to do. It adds more computation and memory.
But I agree it can help in some parts, like I did twice:
slugify filter: https://github.com/nhoizey/nicolas-hoizey.com/blob/master/src/_utils/slugify.jsinclude_raw Nunjucks shortcode (to inline JS and CSS without Nunjucks parsing): https://github.com/nhoizey/nicolas-hoizey.com/blob/master/src/_11ty/shortcodes/include_raw.jsinclude_raw is use in a Nunjucks part used in every age. It reduced the number of calls to fs.readFileSync from more than 2000 to just 2.
I agree that it will add to ram usage, but it shouldn't increase computation because you are only adding a simple object/array lookup that will short circuit any actual calculations. It's really a trade-off, memoize and increase ram usage or re-calculate and increase CPU usage. Memoizing will also help stop edge cases where impure functions are being called which can change in the middle of a run or worse, cause side-effects multiple times.
I had memory issues lately, that's why I'm concerned about generalized memoization… 😅
Hm, I wonder, whether Eleventy could check the available RAM and then apply different strategies … e.g. using process.memoryUsage
Most helpful comment
Personally, I am doing network request in a separate script, which gets invoked as npm run-script every time I build.
This way I can work locally with downloaded responses (and not run into the risk of hitting an API limit or similiar).
You could switch the network requests to a file I/O one (or SQLite read). But those would still happen multiple times.