Couchdb: Slow operation in couch_proc_manager

Created on 28 Nov 2018  路  6Comments  路  Source: apache/couchdb

We've noticed a couple couch_proc_manager backups when deploying the latest version. Inspecting code I ended up stumbling over the new config code. Its theoretically quite a bit slower than the old code since its doing OS env checking but its also doing things quite inefficiently beyond that.

https://github.com/apache/couchdb/blob/master/src/couch/src/couch_proc_manager.erl#L375-L383

The three big things to notice here are that we're joining the variable names inside the loop which is a bit wasteful. Given the size of the strings I'm not sure how bad that is though. I'd wager it likely depends on how many environment variables there are to know whether its making a difference. The tokens call with a pattern match also seems like an expensive thing to be doing. However the killer bit is probably that we're looping over os:getenv() every time we create a new process which is likely what's caused the backups.

I'd propose two changes. The first simpler of the two is to just create our expected environment variable and use os:getenv/1.

Something like:

get_env_for_spec(Spec, Target) ->
    SpecStr = Spec ++ Target,
    os:getenv(SpecStr, undefined).

We're still going to be hitting os:getenv/1 every time we create a new process though so I'd also recommend caching succcessful finds in an ets table or similar. That would allow us to have similar speed to before the patch while still maintaining the OS env config approach.

performance

Most helpful comment

I wrote that super quickly. Don't @ me.

All 6 comments

Also a bit more background I realize I didn't share. We had a couple nodes (not even entire clusters) that experienced a backup on the couch_proc_manager process to the tune of 6M+ message in the mailbox. What likely happened was that when the node was booted there was a sudden massive surge of view indexing triggered. The slightly slower couch_proc_manager was then not able to keep up processing the new process requests which leads to a positive feedback cycle. Lots of processes trying to use get_proc piling into the message queue, and the larger that message queue gets the slower couch_proc_manager gets because its garbage collecting a lot due to that new loop over the os:getenv() output with the string creations and so on.

Its fairly similar to how couch_server can get stuck with the same issues. The big mailbox causes garbage collection to dominate the CPU time of that process preventing it from being able to catch up and clear out its mailbox.

I wrote that super quickly. Don't @ me.

@davisp since the nature of that specific code is to move from runtime config to startup config, it is entirely prudent to load all matching env vars into an ets table on startup and read it from there on new_proc.

@janl I considered something like that but for some reason discounted the idea thinking we didn't know all of the possible languages at boot time. Though you're quire right that we can just look for the prefix on the environment variables and then take everything after as the language name after lower casing the suffix. I like that approach quite a bit better.

@davisp 馃憣

@iilyak self-assigned on IRC :)

Was this page helpful?
0 / 5 - 0 ratings