Nim: Very high memory usage due to numerous inefficiencies in the STD lib (async)

Created on 8 Apr 2020  路  20Comments  路  Source: nim-lang/Nim

The program below would use 830 MB of memory (tested on both macOS and Linux). It consists of a simple async upload of a 100MB buffer to any server using TLS/SSL.

Example

import asyncdispatch
import httpclient
import random

proc inefficientCopiesAllOver() {.async.} =
  # Generate a garbage of 100MB in size
  echo "Generating random garbage ..."
  randomize()
  let rpool = int('A') .. int('z')
  var randomBuff = newString(100*1024*1024)
  for i in .. (randomBuff.len - 1):
    randomBuff[i] = char rand(rpool)
  echo "Garbage ready ... sending to Google"

  # Send it to Google
  let client = newAsyncHttpClient("", proxy=nil)
  client.headers = newHttpHeaders({
    "Accept": "application/json",
    "Content-Type": "multipart/related; boundary=zgitdjizofpqamg",
    "Authorization": "Bearer Zfjfjii.fizjfomdmgfjefjzeifmf.fjzef",
    "Content-Length": $randomBuff.len,
  })
  let url = "https://storage.googleapis.com/upload/storage/v1/b/myBucket/o?uploadType=multipart"
  let response = await client.request(url, "POST", randomBuff)
  echo await response.body

# Run async
waitFor inefficientCopiesAllOver()

Possible Solution

  • As suggested by @Araq in the chat, use of a shallow can help. And it does, but it has to be done manually. And it's not far reaching and the effect is not enough
  • The standard library probably contains many inefficient deep copy strings, a problem that needs to be addressed as that's completely against the value of the language itself
  • The compiler should be able to infer these types of inefficiencies, and for the httpclient module, it was not hard to spot
  • I suspect another excuse to try to dismiss this issue as late as possible is that "Nim has been used by many people for many years, your use case is not common"; please spare me that because excuses are not solutions
  • Serious consideration of the problem, it is a fact that this is a problem

Additional Information

  • At this point, my project is rendered impossible because of this memory issue. I decided to change direction and keep using Go for now, and waiting for this bug to be fixed.
  • Note that this is not uncommon as @dom96 will likely say, just as much as the HTTP streaming requests is not uncommon. And as a reminder, even the fact that a problem might be uncommon is not a reason to not fix it. On the contrary, a tool's value lies also in its ability to cope well with uncommon situations and exceptions.
  • As mentioned by @Araq, this issue is related to the fact that it's in an async context; but then again, async is not uncommon.

Any version of Nim is affected by this.

Performance

Most helpful comment

Oh, and I also talked to @mratsim and their team even decided to completely skip Nim's async and built their own library instead AFAIK; what a shame, and that is a good indication that it's not a problem for beginners to be able to solve.

To be fair we also have/had memory issues in Chronos due to closure iterators (the backbone of both async implementations) introducing cycles and timers holding resources.
However as we are gearing up for audit and for a very long running and high profile application that will very likely be DOS-ed we are pouring significant effort to optimize memory usage and fix them as recurrent crashes would be a red flag:

All 20 comments

COW strings would help. But the stdlib could also use streams intead of strings...

Serious consideration of the problem, it is a fact that this is a problem

Yes, indeed. It is a problem. No excuses from my side.

I'm not excusing this either, FutureStreams are the solution here but they remain sub-par. IIRC there was a PR somewhere which allowed FutureStreams to be used in the context of a HTTP POST.

COW strings would help

What is that? and how can I use it?

Guys, I really do want to help in this situation, but I have no clue where to go to. I've asked on the chat, where in the source code that is responsible of detecting and inferring the move. But, nobody seemed to care, and I still don't know the answer to that question to this day.

You know, I consider myself as a new-comer, even though I've already been involved in Nim and using it for production in OpenWRT routers in 2014 and 2015. And as new comer, I'm interested in a set of advertised feature of the language. But then the langage falls short of my expectations because of some issues, and recently, these are really something deep in the language.

Please give me direction on how I can help, URL, documentations, ...

I've asked on the chat, where in the source code that is responsible of detecting and inferring the move. But, nobody seemed to care, and I still don't know the answer to that question to this day.

Move inference is done inside compiler / injectdestructors.

Please give me direction on how I can help, URL, documentations, ...

Well you're already on IRC, please be patient and forgive me when I'm not online 24h a day.

It is not new that httpclient uses a lot of memory. I opened an issue 10825 myself when it was version 0.19.4. Until today it has not been fixed, as I took the test now and consumed 350MB of memory the example.

And I remember that I even searched all issues, open and closed, on httpclient. It seems that it is a known problem, but that it is not given due attention. I believe that problems with the stdlib httpclient should be a high priority, given that it should be used a lot.

I recently had a case of high memory usage with httpclient, but a synchronous client. I haven't had time to report yet.

I believe that problems with the stdlib httpclient should be a high priority, given that it should be used a lot.

@rockcavera You're right, and it looks as if it is a deeper rooted problem in how strings are handled in async and closure contexts. So, the root of this problem (and yours) is string.

If understand well, when you start an async or a closure, it will need to capture surrounding variables, and for that, Nim seems to make deep copies. This is where the problem starts to grow because when you use async, typically every chain of call that you make (or the stdlib is making) will also be async, and so you end-up with many deep copies of the same data filling-up the memory quickly.

While this is safe, I think it should be the responsibility of the programmer, not the compiler. With move, sink and shallow, the programmer can be explicit about it. Nim has taken good things from the likes of Golang, and in this particular problem, Nim could also treat strings like Go in any async context, and it is the programmer's responsibility to ensure he doesn't change the original buffer while it is being used in deeper async procs, or if he wishes to, he should be allowed to. This is the efficient route.

Another thing, it seems that shallow doesn't fully have the desired effect. So, I suppose string slicing is another problem. I don't know if the stdlib is making use of string.substr() when slicing, but I think it should make a view without any copy of a shallow string instead of making a copy. But I'll be surprised if that's not already the case.

Until this string handling problem is solved, Nim is not going to be a serious option for doing production async programs.

Personally I disagree that this is up to the programmer to solve. Even if all these strings are deep copied, they should eventually be released. The GC should do work to do that, if it's not then there is an issue with it.

They are released but by then the memory consumption grew considerably. A GC is not a magic wand, GC activity costs CPU cycles and invoking the GC after every big allocation is too expensive, not too mention that usually there are stale pointers on the stack meaning that the GC sees an imprecise picture and it cannot free the objects quickly enough. Maybe so much effort was put into --gc:arc development for a reason, you know.

The GC should do work to do that, if it's not then there is an issue with it.

@dom96 Leaving the compiler to do a bunch of deep copies and letting the GC collect them after is exactly what we're trying to solve here; it is better to not make the deep copies in the first place; it's not as if deep copies is the only way to go.

Let me give you some figure and use case why it needs solving ... let's say that you get yourself a VPS with 1GB RAM, in theory, it should be able to serve 9 persons asynchronously where each person will allocate 100MB; at this time, that same server can only server 1 person at any given time, because currently, it will allocate 830MB not even leaving enough for a second person.

Cost-wise, using Nim to serve 9 persons would require a server with 8GB RAM in the simple bug report scenario. This needs addressing because people will be very disappointed in Nim, it promises efficiency and then people want to adopt it, then it doesn't live up to the expectation. For a language to be used and to become widespread, it also needs to stop disappointing.

I'm trying to impress "Efficiency" to Nim here; I hope you still believe in that. You already know that Go has GC, and I was referring to how things are treated in Go ... let me try to explain with this simple pseudo code:

connectedSocket := GetConnectedAndWritableSocketSomehow()
connectedSocket.Write = func(buff_passedByRef []byte) {
  // this is a rough idea of what's going to happen in a Write()
  waitForSocketToBeWritable() // this could take a while depending on context
  time.Sleep(10 * time.Millisecond) // simulates delay in readiness
  writeToSocket(buff_passedByRef)
}

myBuff := []byte{"643210"}
go connectedSocket.Write(myBuff)
myBuff[0] = '5'
time.Sleep(10 * time.Millisecond)

// "543210" will be written to the socket instead of the original "643210"

As you can see here, the []byte is passed by ref to the Write() function, and as long as the write hasn't occurred, the changes that I make in myBuff will be written to the socket. I would say that is a sensible and good assumption.

This is not the case in Nim because it will make a copy of myBuff as I call Write(), and changes that I make in myBuff will be invisible to Write().

And what I was saying is: it should be up to the programmer to make this copy if he wants to, not the compiler. This has nothing to do with the GC, because in any case, myBuff will be GCed. The difference is that in Go, I had only 1 buffer with 2 references, where as in Nim I had 2 references pointing to 2 completely different things (and there will be much more deep copies as there are more deeper calls).

Interesting. How does Go's writeToSocket function work? Does it also take strings by reference? Is this just a convention across all of Go's stdlib?

How do other languages handle this? It seems to me that the problem here is that Nim's default behaviour with regards to strings and closures is inefficient. You're implying that we need these new shallow, sink, move operators to solve it, but I am skeptical. After all, Python and many other GC'd and async languages somehow manage to do this right. Why is it up to the programmer to solve when this seems like a language problem?

Well we can easily use ref array in Nim too, or a stream. You don't have to use strings everywhere which then don't support laziness anyway. Why can't the stdlib use streams?

Interesting. How does Go's writeToSocket function work? Does it also take strings by reference? Is this just a convention across all of Go's stdlib?

I believe so ... []byte slices are passed by reference every-time. If you need to make a copy, you have to call copy(...). Moreover, slicing the original slice will also give you a reference into the same slice, and any modification you make in the sub-slice will be reflected in the original.

Well we can easily use ref array in Nim too, or a stream. You don't have to use strings everywhere which then don't support laziness anyway

Yes, but I guess that all the stdlib are using string, passing it around, which doesn't seem to be a problem in synchronous programs. I guess the amount of works to do that will be huge, but probably necessary to make Nim fully efficient.

Why is it up to the programmer to solve when this seems like a language problem?

I agree that it is a problem in the language. The way I see this is that it needs to be solved 2 ways:

  • Programmer: give the programmer some control over it, like Go does; this does not suppress the role of the GC
  • Compiler: and this will have the biggest impact, it will even considerably boost performance of Nim programs by avoiding thousands of little copies ("performance hit" by a thousand cut).

I already tried to see how this could be fixed, and I know it can be fixed at the compiler level, at least, partially. Unfortunately, I didn't have enough context information. This is what I tried:

  • In compiler/lambdalifting.nim::rawClosureCreation() around line 619, I tried to check to check (local.kind == skParam or local.kind == skLet) and fieldAccess.typ.kind == tyString and do a nkFastAsgn

    • It did seem to lower memory usage, but what I wanted to do isn't really happening because I could not know from the variables around it and the context. I could not assert if the variable was immutable or not, I could never see skLet matched, always skParam.

    • The problem with it also is the crash, and I had to --gc:markAndSweep

  • I tried to debug also in the system area, but I couldn't progress further because I could not use echo there, and debugging natively doesn't help either because it's not possible to see values of variables or their types. And printf didn't help much.

Any progress on this?

Not that I'm aware of. Someone that's passionate about this needs to push for better async streams and make them first class citizens across the stdlib. Maybe you'd be interested in doing this?

I did try, and I faced blocking issues:

  • the code base is already too big for me to grasp in a reasonable amount of time; I wish I had time to get deep into it as despite the fact there is this bug, I truely love Nim and I want it to grow and be widely adopted as much as you want, unfortunately, I've got mouths to feed
  • @Araq gave a few directions, and I did find some potential places that proved to be effective, but it's not enough and it feels very hacky and uncertain
  • I think the code will be easier to read with more comments, at this point, unless one has been working for a rather long time in the Nim code-base (probably funded for doing so), many things will be overwhelming

But that said, I am interested.

Yeah, I can sympathise with that. I'm happy to help explain the code as much as I can, but I also have mouths to feed :)

I believe you can get away with by simply understanding how Futures work (not just in Nim, although their specific implementation in Nim will help you). Once you have a grasp of them you should be able to design a better FutureStream.

What I thought of doing is disregard async/await completely, and just use an existing pure C lib for async stuffs, and do things manually. But that's kind of dumb. So, I guess until this is solved, I'll keep using Golang.

Now that you're working fulltime for Facebook (I kinda heard that in your Nim talk few days ago), I'm guessing the chance of this getting solved is even slimmer as @Araq will be extremely overloaded. :sigh:

Now that you're working fulltime for Facebook (I kinda heard that in your Nim talk few days ago)

For what it's worth, this isn't a recent thing, I've been at FB for the past (almost) two years.

If you don't have time to fix these kinds of issues then yes, use Golang.

Well, I don't think beginners like me would have any chance of fixing this. Given that the problem requires much deeper knowledge of the language, only core team members would have a chance of being able to move this forward.

And I personally think it should be a priority issue, given the fact that it just makes async basically broken. As before, the fact that async is currently being used in jester and some other libs isn't an indication that it's fine.

Few months back, even some core team members where not feeling confident to tackle this problem, so count me out, I'm far from being a capable core contributor, as I said before, just a beginner. Oh, and I also talked to @mratsim and their team even decided to completely skip Nim's async and built their own library instead AFAIK; what a shame, and that is a good indication that it's not a problem for beginners to be able to solve.

Oh, and I also talked to @mratsim and their team even decided to completely skip Nim's async and built their own library instead AFAIK; what a shame, and that is a good indication that it's not a problem for beginners to be able to solve.

To be fair we also have/had memory issues in Chronos due to closure iterators (the backbone of both async implementations) introducing cycles and timers holding resources.
However as we are gearing up for audit and for a very long running and high profile application that will very likely be DOS-ed we are pouring significant effort to optimize memory usage and fix them as recurrent crashes would be a red flag:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

juancarlospaco picture juancarlospaco  路  3Comments

SolitudeSF picture SolitudeSF  路  3Comments

teroz picture teroz  路  3Comments

zzz125 picture zzz125  路  4Comments

Vindaar picture Vindaar  路  3Comments