When POSTing new documents to couchdb using _bulk_docs a strange error is returned.
The documents should be added to the database.
I am receiving the response:
{
"error": "unknown_error",
"reason": "badarg",
"ref": 818340776
}
For my POST request:
{
"new_edits": false,
"docs": [<valid documents>]
}
I have validated the json too.
1.
2.
3.
4.
Thanks for the report.
Would it be possible to have an example document that generates this error.
Also if you search the logs for 818340776 it might show a stacktrace. Sanitize that log snippet and post it as it might also help solve the issue.
@ngosafety we can't proceed on this ticket unless you provide an example.
Also ran into this issue:
{ error: 'unknown_error',
reason: 'badarg',
ref: 4258677272,
name: 'unknown_error',
status: 500,
message: 'badarg' }
Looking at the DB logs around that time:
Apr 11 13:58:11 [error] 2018-04-11T10:58:10.848863Z nonode@nohost <0.2276.41> 59b7147c82 rexi_server: from: nonode@nohost(<0.2106.41>) mfa: fabric_rpc:update_docs/3 exit:{timeout,{gen_server,call,[couch_proc_manager,{get_proc,{doc,<<"_design/myUpdateValidation">>,{1,[<<100,39,93,199,51,195,88,67,22,50,78,244,160,45,176,213>>]},{[{<<"validate_doc_update">>,<<"function (newDoc, oldDoc, userCtx) { /* redacted */ }">>}]},[],false,[]},{<<"_design/myUpdateValidation">>,<<"1-64275dc733c3584316324ef4a02db0d5">>}},5000]}} [{gen_server,call,3,[{file,"gen_server.erl"},{line,190}]},{couch_query_servers,get_ddoc_process,2,[{file,"src/couch_query_servers.erl"},{line,498}]},{couch_query_servers,with_ddoc_proc,2,[{file,"src/couch_query_servers.erl"},{line,463}]},{couch_query_servers,validate_doc_update,5,[{file,"src/couch_query_servers.erl"},{line,318}]},{couch_db,'-validate_doc_update_int/3-lc$^0/1-0-',5,[{file,"src/couch_db.erl"},{line,784}]},{couch_db,'-validate_doc_update_int/3-fun-0-',3,[{file,"src/couch_db.erl"},{line,784}]},{couch_stats,update_histogram,2,[{file,"src/couch_stats.erl"},{line,102}]},{couch_db,'-prep_and_validate_updates/6-fun-1-',3,[{file,"src/couch_db.erl"},{line,878}]}]
Apr 11 13:58:11 [error] 2018-04-11T10:58:10.862568Z nonode@nohost <0.2106.41> 59b7147c82 req_err(4258677272) unknown_error : badarg
Apr 11 13:58:11 [<<"dict:fetch/2 L130">>,<<"couch_util:-reorder_results/2-lc$^1/1-1-/2 L456">>,<<"fabric_doc_update:go/3 L41">>,<<"fabric:update_docs/3 L259">>,<<"chttpd_db:db_req/2 L428">>,<<"chttpd:process_request/1 L295">>,<<"chttpd:handle_request_int/1 L231">>,<<"mochiweb_http:headers/6 L91">>]
Apr 11 13:58:11 [error] 2018-04-11T10:58:10.863338Z nonode@nohost <0.2305.41> -------- CRASH REPORT Process (<0.2305.41>) with 1 neighbors exited with reason: bad return value {os_process_error,"OS process timed out."} at gen_server:init_it/6(line:334) <= proc_lib:init_p_do_apply/3(line:237); initial_call: {couch_os_process,init,['Argument__1']}, ancestors: [<0.2286.41>], messages: [], links: [<0.2286.41>], dictionary: [], trap_exit: false, status: running, heap_size: 987, stack_size: 27, reductions: 264
Apr 11 13:58:11 [error] 2018-04-11T10:58:10.863546Z nonode@nohost <0.2307.41> -------- CRASH REPORT Process (<0.2307.41>) with 1 neighbors exited with reason: bad return value {os_process_error,"OS process timed out."} at gen_server:init_it/6(line:334) <= proc_lib:init_p_do_apply/3(line:237); initial_call: {couch_os_process,init,['Argument__1']}, ancestors: [<0.2267.41>], messages: [], links: [<0.2267.41>], dictionary: [], trap_exit: false, status: running, heap_size: 987, stack_size: 27, reductions: 264
Apr 11 13:58:11 [error] 2018-04-11T10:58:10.863766Z nonode@nohost <0.2280.41> -------- CRASH REPORT Process (<0.2280.41>) with 1 neighbors exited with reason: bad return value {os_process_error,"OS process timed out."} at gen_server:init_it/6(line:334) <= proc_lib:init_p_do_apply/3(line:237); initial_call: {couch_os_process,init,['Argument__1']}, ancestors: [<0.2266.41>], messages: [], links: [<0.2266.41>], dictionary: [], trap_exit: false, status: running, heap_size: 987, stack_size: 27, reductions: 264
Apr 11 13:58:11 [error] 2018-04-11T10:58:10.874587Z nonode@nohost <0.2306.41> -------- CRASH REPORT Process (<0.2306.41>) with 1 neighbors exited with reason: bad return value {os_process_error,"OS process timed out."} at gen_server:init_it/6(line:334) <= proc_lib:init_p_do_apply/3(line:237); initial_call: {couch_os_process,init,['Argument__1']}, ancestors: [<0.2268.41>], messages: [], links: [<0.2268.41>], dictionary: [], trap_exit: false, status: running, heap_size: 987, stack_size: 27, reductions: 264
Apr 11 13:58:11 sh: sh: echo: I/O error
Apr 11 13:58:11 echo: I/O error
Apr 11 13:58:11 sh: sh: echo: I/O error
Apr 11 13:58:11 sh: echo: I/O error
Apr 11 13:58:11 echo: I/O error
The strange thing is that this seems sporadic: trying to reproduce it with a smaller bulk POST always succeeded, and now even the larger bulk POST (with prod data) seems to work.
Will update this issue if I learn more.
Running the official Docker image of 2.1.1 on AWS EC2, if it helps.
Curiously, https://github.com/apache/couchdb/issues/1106#issuecomment-378389508 reports having changed _config/couch_httpd_auth/timeout to a higher value, and then getting this error. I have also changed the default to 31556952 (i.e. one year).
If/when this issue reappears, I'll see if changing it matters.
@jareware Thank you for the logs snippet, that really helped to reproduce this.
So what's happening is that when the bulk size exceeds 100 docs we are switching to a different data structure when reordering the results, to speed things up. If an update validator function takes too long, os process will throw timeout, but the whole batch will be processed nevertheless. It seems that in 2.1.1 that timeout from the update validator was swallowed up the pipe and an attempt to fetch the missing key on reorder was throwing an exception.
Evidently it was fixed since then. I can't pinpoint where that happened, but when I run the same case on current master it handles the error fine:
...
{
"id": "1719307e8c6a14bc1f89f6e9570612bd",
"ok": true,
"rev": "1-1aefbf37f7524ef1d7708843e035d128"
},
{
"error": "os_process_error",
"id": "1719307e8c6a14bc1f89f6e957061b2d",
"reason": "OS process timed out."
},
{
"id": "1719307e8c6a14bc1f89f6e957062769",
"ok": true,
"rev": "1-ca8a74b9258d00dc126bbae88ab1844f"
},
...
This can be reproduces with validation fun looking something like this.
function(newDoc, oldDoc, userCtx) {
if (newDoc.timeout) {
var now = new Date().getTime();
while(new Date().getTime() < now + 60000) { }
}
}
The bulk's payload needs to be larger than 100 docs and have at least one doc with field "timeout": true.
Note, that the config parameter [couch_httpd_auth] timeout not really related to this, os process timeout configured by [couchdb] os_process_timeout and is 5 seconds by default.
Thanks a lot for the thoughtful reply! And I'm glad this has been fixed on master already.
Suppose this still leaves me with the mystery of how my humble little validation function could take over 5 seconds to complete (it really doesn't do anything besides comparing a few values on the doc and user context), but maybe I'll start by upping that timeout.
Most helpful comment
@jareware Thank you for the logs snippet, that really helped to reproduce this.
So what's happening is that when the bulk size exceeds 100 docs we are switching to a different data structure when reordering the results, to speed things up. If an update validator function takes too long, os process will throw timeout, but the whole batch will be processed nevertheless. It seems that in 2.1.1 that timeout from the update validator was swallowed up the pipe and an attempt to fetch the missing key on reorder was throwing an exception.
Evidently it was fixed since then. I can't pinpoint where that happened, but when I run the same case on current master it handles the error fine:
This can be reproduces with validation fun looking something like this.
The bulk's payload needs to be larger than 100 docs and have at least one doc with field
"timeout": true.Note, that the config parameter
[couch_httpd_auth]timeoutnot really related to this, os process timeout configured by[couchdb]os_process_timeoutand is 5 seconds by default.