Node: Why use new URL() may lead to memory leak ?

Created on 4 Dec 2017  路  10Comments  路  Source: nodejs/node

  • Version: 8.4.0
  • Platform: Linux version 3.10.0-514.10.2.el7.x86_64
  • Subsystem: CentOS Linux release 7.3.1611 (Core)

My project is using express.js, and having a code fragment like this:

conse {URL} = require('url');
const express = require('express');

const router = express.Router();

router.put('', (req, res) => {
  let data = req.body.flow ? JSON.parse(req.body.flow) : {};
  if (data.url && !data.domain) {
    data.domain = new URL(data.url).hostname;
  }

  res.send();
})

When many request come in, the project memory will rise slowly ,and sometimes it will drop a little. but in general it's still up. When i change to url.parse(), the memory looks normal. I use PM2 to look memory.

What's the difference between the two(new URL and url.parse) in memory or gc?

memory whatwg-url

Most helpful comment

This is happening because union url_host_value has std::string members but an empty destructor.

I鈥檓 working on this but it鈥檚 probably going to be a bit of a refactor to fix this in a safe way

All 10 comments

I would figure that there's a higher memory requirement given internals of the WHATWG-URL interface but it sounds like you're saying there's an actual memory leak? Could you provide some more precise numbers/stats? Are we talking a steady increase (say, 10% higher, but not constantly rising) or a never-ending increase over the run of the process?

/cc @jasnell & @TimothyGu

The WHATWG URL implementation uses C++, so there's a possibility that the C++ code has some memleaks in them. In fact Valgrind seems to agree: there's an awful lot of things like the following coming from memcheck:

==28049== 649,419 bytes in 20,949 blocks are definitely lost in loss record 701 of 703
==28049==    at 0x4C2C21F: operator new(unsigned long) (vg_replace_malloc.c:334)
==28049==    by 0x5665A46: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==28049==    by 0x10E4CAE: node::url::ParseHost(node::url::url_host*, char const*, unsigned long, bool, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049==    by 0x10E16C5: node::url::URL::Parse(char const*, unsigned long, node::url::url_parse_state, node::url::url_data*, bool, node::url::url_data const*, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049==    by 0x10E7776: node::url::Parse(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049== 984,467 bytes in 31,757 blocks are definitely lost in loss record 702 of 703
==28049==    at 0x4C2C21F: operator new(unsigned long) (vg_replace_malloc.c:334)
==28049==    by 0x5665A46: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==28049==    by 0x10E4CAE: node::url::ParseHost(node::url::url_host*, char const*, unsigned long, bool, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049==    by 0x10E16C5: node::url::URL::Parse(char const*, unsigned long, node::url::url_parse_state, node::url::url_data*, bool, node::url::url_data const*, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049== 3,721,922 bytes in 120,062 blocks are definitely lost in loss record 703 of 703
==28049==    at 0x4C2C21F: operator new(unsigned long) (vg_replace_malloc.c:334)
==28049==    by 0x5665A46: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==28049==    by 0x10E4CAE: node::url::ParseHost(node::url::url_host*, char const*, unsigned long, bool, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049==    by 0x10E16C5: node::url::URL::Parse(char const*, unsigned long, node::url::url_parse_state, node::url::url_data*, bool, node::url::url_data const*, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049==    by 0x10E7776: node::url::Parse(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/timothy-gu/dev/node/node/out/Release/node)

all pointing towards ParseHost() as the culprit.

Test script:

'use strict';

const { URL, parse } = require('url');
// Adjust to taste.
for (let i = 0; i < 100001; i++) {
  new URL('https://domain1.domain.test.com/789012/3456/789012345678-2?section=10');
  if (i % 1000 === 0) {
    gc();
  }
  if (i % 100000 === 0) {
    console.log(process.memoryUsage());
  }
}

Tracked it down to

==29610== 1,546,342 bytes in 49,882 blocks are definitely lost in loss record 85 of 85
==29610==    at 0x4C2C21F: operator new(unsigned long) (vg_replace_malloc.c:334)
==29610==    by 0x5665FD4: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==29610==    by 0x10EA1B0: node::url::PercentDecode(char const*, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x10EB0AF: node::url::ParseHost(node::url::url_host*, char const*, unsigned long, bool, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x10E9839: node::url::ParseHost(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x10E7D0D: node::url::URL::Parse(char const*, unsigned long, node::url::url_parse_state, node::url::url_data*, bool, node::url::url_data const*, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x10F55D2: node::url::Parse(node::Environment*, v8::Local<v8::Value>, char const*, unsigned long, node::url::url_parse_state, v8::Local<v8::Value>, v8::Local<v8::Value>, v8::Local<v8::Function>, v8::Local<v8::Value>) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x10F29C0: node::url::Parse(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x9F4A8E: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) (in /home/timothy-gu/dev/node/node/out/Release/node)

But I don't understand why this is the case, as everything should be allocated on the stack (and therefore freed automatically)...

@TimothyGu How did you run Node + valgrind to get that output?

@apapirovski
The phenomenon I observed is that memory continues to rise, and memory is released(a little), but the overall trend is rising. My online project (one process qps is 50 - 100) runs 7 days, the memory is up to 1GB. When using url.parse , the memory is stable in 100MB.

@addaleax

valgrind --leak-check=full --undef-value-errors=no ./node --expose-gc a.js where a.js contains the script in https://github.com/nodejs/node/issues/17448#issuecomment-349108826. Valgrind is the default installation in Debian stable (1:3.12.0~svn20160714-1). node is built normally with Clang 3.9 (release config) except node_url.o, which is manually built with -O0.

@TimothyGu That might be a false positive from memory pooling because it is not run with a debug build of STL, see http://valgrind.org/docs/manual/faq.html#faq.reports

@joyeecheung That FAQ talks about "still reachable" leaks. The leak here is "definitely lost".

We have been struggelig for a while with a very small memory leak without being able to find it. We have almost simmilar code as posted in the initial message here and I'm able to get that part of our code to leak when stresstesting it.

In my testing I noticed that the leak are only happening if the host have more then 3 parts.

In other words, this does leak:

new URL('https://domain1.domain.test.com/789012/3456/789012345678-2?section=10');

This does not leak:

new URL('https://domain.test.com/789012/3456/789012345678-2?section=10');

This is happening because union url_host_value has std::string members but an empty destructor.

I鈥檓 working on this but it鈥檚 probably going to be a bit of a refactor to fix this in a safe way

Was this page helpful?
0 / 5 - 0 ratings

Related issues

addaleax picture addaleax  路  3Comments

loretoparisi picture loretoparisi  路  3Comments

vsemozhetbyt picture vsemozhetbyt  路  3Comments

danialkhansari picture danialkhansari  路  3Comments

mcollina picture mcollina  路  3Comments