Running some load against yugabyte primarily consisting of hmset/hmget operations.
Occasionally observing higher latencies. Found some occurrences of the following stack in the yb-tserver.INFO log file.
W0208 21:55:19.366822 46558 kernel_stack_watchdog.cc:139] Thread 46772 stuck at ../../../../../src/yb/rpc/outbound_call.cc:326 for 104ms:
Kernel stack:
[<ffffffff810f5464>] futex_wait_queue_me+0xc4/0x120
[<ffffffff810f5fd9>] futex_wait+0x179/0x280
[<ffffffff810f80de>] do_futex+0xfe/0x5b0
[<ffffffff810f8610>] SyS_futex+0x80/0x180
[<ffffffff81697749>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
User stack:
@ 0x7f57ab5bbe43 sys_futex (./src/base/linux_syscall_support.h:2097)
@ 0x7f57ab5bbe43 base::internal::SpinLockDelay(int volatile*, int, int) (./src/base/spinlock_linux-inl.h:88)
@ 0x7f57ab5bbd13 SpinLock::SlowLock() (src/base/spinlock.cc:133)
@ 0x7f57a5cc1156 SpinLock::Lock() (src/base/spinlock.h:71)
@ 0x7f57a5cc1156 SpinLockHolder::SpinLockHolder(SpinLock*) (src/base/spinlock.h:136)
@ 0x7f57a5cc1156 tcmalloc::ThreadCache::IncreaseCacheLimit() (src/thread_cache.cc:276)
@ 0x7f57a5cd2e11 tcmalloc::ThreadCache::Deallocate(void*, unsigned long) (src/thread_cache.h:392)
@ 0x7f57a5cd2e11 do_free_helper (src/tcmalloc.cc:1195)
@ 0x7f57a5cd2e11 do_free_with_callback (src/tcmalloc.cc:1228)
@ 0x7f57a5cd2e11 do_free (src/tcmalloc.cc:1234)
@ 0x7f57a5cd2e11 tc_deletearray (src/tcmalloc.cc:1665)
@ 0x7f57a6bd9f77 google::protobuf::internal::ArenaStringPtr::DestroyNoArena(std::string const*) (thirdparty/installed/uninstrumented/include/google/protobuf/arenastring.h:264)
@ 0x7f57a6bd9f77 yb::RedisKeyValueSubKeyPB::clear_subkey() (yb/common/redis_protocol.pb.cc:4357)
@ 0x7f57a6bda00e yb::RedisKeyValueSubKeyPB::~RedisKeyValueSubKeyPB() (yb/common/redis_protocol.pb.cc:4321)
@ 0x7f57a6bda444 yb::RedisKeyValueSubKeyPB::~RedisKeyValueSubKeyPB() (yb/common/redis_protocol.pb.cc:4322)
@ 0x7f57a6bda444 google::protobuf::internal::GenericTypeHandler<yb::RedisKeyValueSubKeyPB>::Delete(yb::RedisKeyValueSubKeyPB*, google::protobuf::Arena*) (thirdparty/installed/uninstrumented/include/google/protobuf/repeated_field.h:623)
@ 0x7f57a6bda444 void google::protobuf::internal::RepeatedPtrFieldBase::Destroy<google::protobuf::RepeatedPtrField<yb::RedisKeyValueSubKeyPB>::TypeHandler>() (thirdparty/installed/uninstrumented/include/google/protobuf/repeated_field.h:1473)
@ 0x7f57a6bda444 google::protobuf::RepeatedPtrField<yb::RedisKeyValueSubKeyPB>::~RepeatedPtrField() (thirdparty/installed/uninstrumented/include/google/protobuf/repeated_field.h:1934)
@ 0x7f57a6bda444 yb::RedisKeyValuePB::~RedisKeyValuePB() (yb/common/redis_protocol.pb.cc:4680)
@ 0x7f57a6bdb35f yb::RedisKeyValuePB::~RedisKeyValuePB() (yb/common/redis_protocol.pb.cc:4683)
@ 0x7f57a6bdb35f yb::RedisReadRequestPB::SharedDtor() (yb/common/redis_protocol.pb.cc:2474)
@ 0x7f57a6bdb42e yb::RedisReadRequestPB::~RedisReadRequestPB() (yb/common/redis_protocol.pb.cc:2470)
@ 0x7f57a6bdb480 yb::RedisReadRequestPB::~RedisReadRequestPB() (yb/common/redis_protocol.pb.cc:2471)
@ 0x7f57abb3ad08 std::default_delete<yb::RedisReadRequestPB>::operator()(yb::RedisReadRequestPB*) const (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/unique_ptr.h:76)
@ 0x7f57abb3ad08 std::unique_ptr<yb::RedisReadRequestPB, std::default_delete<yb::RedisReadRequestPB> >::~unique_ptr() (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/unique_ptr.h:236)
@ 0x7f57abb3ad08 yb::client::YBRedisReadOp::~YBRedisReadOp() (yb/client/yb_op.cc:109)
@ 0x7f57abac2f58 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/shared_ptr_base.h:150)
@ 0x7f57abac2f58 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/shared_ptr_base.h:659)
@ 0x7f57abac2f58 std::__shared_ptr<yb::client::YBOperation, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/shared_ptr_base.h:925)
@ 0x7f57abac2f58 std::shared_ptr<yb::client::YBOperation>::~shared_ptr() (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/shared_ptr.h:93)
@ 0x7f57abac2f58 yb::client::internal::InFlightOp::~InFlightOp() (yb/client/in_flight_op.h:91)
@ 0x7f57abac2f58 void __gnu_cxx::new_allocator<yb::client::internal::InFlightOp>::destroy<yb::client::internal::InFlightOp>(yb::client::internal::InFlightOp*) (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/ext/new_allocator.h:124)
@ 0x7f57abac2f58 std::enable_if<std::__and_<std::allocator_traits<std::allocator<yb::client::internal::InFlightOp> >::__destroy_helper<yb::client::internal::InFlightOp>::type>::value, void>::type std::allocator_traits<std::allocator<yb::client::internal::InFlightOp> >::_S_destroy<yb::client::internal::InFlightOp>(std::allocator<yb::client::internal::InFlightOp>&, yb::client::internal::InFlightOp*) (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/alloc_traits.h:285)
@ 0x7f57abac2f58 void std::allocator_traits<std::allocator<yb::client::internal::InFlightOp> >::destroy<yb::client::internal::InFlightOp>(std::allocator<yb::client::internal::InFlightOp>&, yb::client::internal::InFlightOp*) (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/alloc_traits.h:414)
@ 0x7f57abac2f58 std::_Sp_counted_ptr_inplace<yb::client::internal::InFlightOp, std::allocator<yb::client::internal::InFlightOp>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/shared_ptr_base.h:531)
@ 0x7f57abab5a84 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/shared_ptr_base.h:150)
@ 0x7f57abab5a84 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/shared_ptr_base.h:659)
@ 0x7f57abab5a84 std::__shared_ptr<yb::client::internal::InFlightOp, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/shared_ptr_base.h:925)
@ 0x7f57abab5a84 std::shared_ptr<yb::client::internal::InFlightOp>::~shared_ptr() (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/shared_ptr.h:93)
@ 0x7f57abab5a84 void std::_Destroy<std::shared_ptr<yb::client::internal::InFlightOp> >(std::shared_ptr<yb::client::internal::InFlightOp>*) (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/stl_construct.h:93)
@ 0x7f57abab5a84 void std::_Destroy_aux<false>::__destroy<std::shared_ptr<yb::client::internal::InFlightOp>*>(std::shared_ptr<yb::client::internal::InFlightOp>*, std::shared_ptr<yb::client::internal::InFlightOp>*) (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/stl_construct.h:103)
@ 0x7f57abab5a84 void std::_Destroy<std::shared_ptr<yb::client::internal::InFlightOp>*>(std::shared_ptr<yb::client::internal::InFlightOp>*, std::shared_ptr<yb::client::internal::InFlightOp>*) (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/stl_construct.h:126)
@ 0x7f57abab5a84 void std::_Destroy<std::shared_ptr<yb::client::internal::InFlightOp>*, std::shared_ptr<yb::client::internal::InFlightOp> >(std::shared_ptr<yb::client::internal::InFlightOp>*, std::shared_ptr<yb::client::internal::InFlightOp>*, std::allocator<std::shared_ptr<yb::client::internal::InFlightOp> >&) (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/stl_construct.h:151)
@ 0x7f57abab5a84 std::vector<std::shared_ptr<yb::client::internal::InFlightOp>, std::allocator<std::shared_ptr<yb::client::internal::InFlightOp> > >::~vector() (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/stl_vector.h:424)
@ 0x7f57abab5a84 yb::client::internal::AsyncRpc::~AsyncRpc() (yb/client/async_rpc.cc:103)
@ 0x7f57abab8c80 yb::client::internal::AsyncRpcBase<yb::tserver::ReadRequestPB, yb::tserver::ReadResponsePB>::~AsyncRpcBase() (yb/client/async_rpc.h:111)
@ 0x7f57abab8c80 yb::client::internal::ReadRpc::~ReadRpc() (yb/client/async_rpc.cc:464)
@ 0x7f57abab5914 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/shared_ptr_base.h:150)
@ 0x7f57abab5914 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/shared_ptr_base.h:659)
@ 0x7f57abab5914 std::__shared_ptr<yb::rpc::RpcCommand, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/shared_ptr_base.h:925)
@ 0x7f57abab5914 std::__shared_ptr<yb::rpc::RpcCommand, (__gnu_cxx::_Lock_policy)2>::reset() (/home/centos/.linuxbrew-yb-build/Cellar/gcc/5.3.0/include/c++/5.3.0/bits/shared_ptr_base.h:1022)
@ 0x7f57abab5914 yb::client::internal::AsyncRpc::Finished(yb::Status const&) (yb/client/async_rpc.cc:137)
Thanks @mchang31415 . Could you share the RAM on the yugabyte nodes?
@pritamdamania87 : assigning to you for an initial look.
Thanks @mchang31415. Could you provide some additional details regarding the workload so that we can try to reproduce this on our end? In particular:
Also opened an issue on gperftools: gperftools/gperftools#959, to see if they have some recommendations for tcmalloc.
How many threads are there ? One tweak you can try is bumping thread cache size via TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES environment variable.
If the issue is easily reproducible, or if you want to help improve tcmalloc for your workloads, please consider contributing malloc trace as described at https://github.com/alk/malloc-trace-replay
@alk Thanks a lot for the recommendation! We'll try setting a higher value for TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES and see if that helps. We have about 300 threads and looking at our metrics I see that tcmalloc_max_total_thread_cache_bytes is 33554432 and tcmalloc_current_total_thread_cache_bytes is 67018544. If I understand correctly, is this an indication that the thread cache is too small since the total size is almost double the max allowed size?
@alk I increased TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES to 268435456 and currently TCMALLOC_CURRENT_TOTAL_THREAD_CACHE_BYTES is about 172705296. Although, during startup of the process there were still a few spinlock issues, but after TCMALLOC_CURRENT_TOTAL_THREAD_CACHE_BYTES stabilized, it seems like there aren't any further issues. These are a few stack traces that I saw during startup:
W0211 09:36:53.940006 6785 kernel_stack_watchdog.cc:139] Thread 6774 stuck at ../../../../../src/yb/rpc/outbound_call.cc:326 for 236ms:
Kernel stack:
[<ffffffff810f5136>] futex_wait_queue_me+0xc6/0x130
[<ffffffff810f5dfb>] futex_wait+0x17b/0x280
[<ffffffff810f7b36>] do_futex+0x106/0x5a0
[<ffffffff810f8050>] SyS_futex+0x80/0x180
[<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
User stack:
@ 0x7ff3e41d30df (unknown) (/tmp/glibc-20180109-1462-njgus6/glibc-2.23/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0)
@ 0x7ff3ed612ec3 sys_futex (./src/base/linux_syscall_support.h:2097)
@ 0x7ff3ed612ec3 base::internal::SpinLockDelay(int volatile*, int, int) (./src/base/spinlock_linux-inl.h:88)
@ 0x7ff3ed612d93 SpinLock::SlowLock() (src/base/spinlock.cc:133)
@ 0x7ff3e7d229a7 SpinLock::Lock() (src/base/spinlock.h:71)
@ 0x7ff3e7d229a7 SpinLockHolder::SpinLockHolder(SpinLock*) (src/base/spinlock.h:136)
@ 0x7ff3e7d229a7 tcmalloc::CentralFreeList::Populate() (src/central_freelist.cc:328)
@ 0x7ff3e7d22bb7 tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) (src/central_freelist.cc:284)
@ 0x7ff3e7d22c5a tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) (src/central_freelist.cc:264)
@ 0x7ff3e7d25dd0 tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) (src/thread_cache.cc:166)
@ 0x7ff3e7d37ba7 tcmalloc::ThreadCache::Allocate(unsigned long, unsigned long) (src/thread_cache.h:365)
@ 0x7ff3e7d37ba7 do_malloc_small (src/tcmalloc.cc:1103)
@ 0x7ff3e7d37ba7 do_malloc_no_errno (src/tcmalloc.cc:1112)
@ 0x7ff3e7d37ba7 cpp_alloc (src/tcmalloc.cc:1438)
@ 0x7ff3e7d37ba7 tc_newarray (src/tcmalloc.cc:1646)
@ 0x7ff3e724b236 google::LogMessage::Init(char const*, int, int, void (google::LogMessage::*)()) (src/logging.cc:1191)
@ 0x7ff3e724b9fb google::LogMessage::LogMessage(char const*, int) (src/logging.cc:1155)
@ 0x7ff3ec8f1336 yb::consensus::LeaderElection::HandleVoteDeniedUnlocked(std::string const&, yb::consensus::LeaderElection::VoterState const&) (yb/consensus/leader_election.cc:410)
@ 0x7ff3ec8f1a1c yb::consensus::LeaderElection::VoteResponseRpcCallback(std::string const&, scoped_refptr<yb::consensus::LeaderElection> const&) (yb/consensus/leader_election.cc:339)
@ 0x7ff3e9c278ae std::function<void ()>::operator()() const (/n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:2267)
@ 0x7ff3e9c278ae yb::rpc::OutboundCall::CallCallback() (yb/rpc/outbound_call.cc:327)
@ 0x7ff3e9c288a4 yb::rpc::OutboundCall::SetResponse(yb::rpc::CallResponse&&) (yb/rpc/outbound_call.cc:366)
@ 0x7ff3e9c17a7b yb::rpc::Connection::HandleCallResponse(yb::Slice) (yb/rpc/connection.cc:410)
@ 0x7ff3e9c5b8cd yb::rpc::YBConnectionContext::HandleCall(std::shared_ptr<yb::rpc::Connection> const&, yb::Slice) (yb/rpc/yb_rpc.cc:166)
W0211 09:37:11.758664 6785 kernel_stack_watchdog.cc:139] Thread 7063 stuck at ../../../../../src/yb/consensus/log.cc:245 for 177ms:
Kernel stack:
[<ffffffffffffffff>] 0xffffffffffffffff
User stack:
@ 0x7ff3e41d30df (unknown) (/tmp/glibc-20180109-1462-njgus6/glibc-2.23/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0)
@ 0x7ff3e4282346 (unknown) (../sysdeps/unix/syscall-template.S:84)
@ 0x7ff3e7d21969 TCMalloc_SystemRelease(void*, unsigned long) (src/system-alloc.cc:539)
@ 0x7ff3e7d23060 tcmalloc::PageHeap::DecommitSpan(tcmalloc::Span*) (src/page_heap.cc:248)
@ 0x7ff3e7d234fb tcmalloc::PageHeap::ReleaseLastNormalSpan(tcmalloc::PageHeap::SpanList*) (src/page_heap.cc:460)
@ 0x7ff3e7d235be tcmalloc::PageHeap::ReleaseAtLeastNPages(unsigned long) (src/page_heap.cc:484)
@ 0x7ff3e7d1793e TCMallocImplementation::ReleaseToSystem(unsigned long) (src/tcmalloc.cc:757)
@ 0x7ff3e82933ad yb::MemTracker::GcTcmalloc() (yb/util/mem_tracker.cc:528)
@ 0x7ff3e8294a84 yb::MemTracker::Release(long) (yb/util/mem_tracker.cc:384)
@ 0x7ff3ec8f4d54 yb::consensus::LogCache::AccountForMessageRemovalUnlocked(std::shared_ptr<yb::consensus::ReplicateMsg> const&) (yb/consensus/log_cache.cc:409)
@ 0x7ff3ec8f6a04 yb::consensus::LogCache::EvictSomeUnlocked(long, long) (yb/consensus/log_cache.cc:397)
@ 0x7ff3ec8f70ef yb::consensus::LogCache::LogCallback(long, bool, yb::Callback<void (yb::Status const&)> const&, yb::Status const&) (yb/consensus/log_cache.cc:241)
@ 0x7ff3ec67c9ae yb::Callback<void (yb::Status const&)>::Run(yb::Status const&) const (yb/gutil/callback.h:451)
@ 0x7ff3ec67c9ae yb::log::Log::AppendThread::RunThread() (yb/consensus/log.cc:248)
@ 0x7ff3e82ebc25 std::function<void ()>::operator()() const (/n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:2267)
@ 0x7ff3e82ebc25 yb::Thread::SuperviseThread(void*) (yb/util/thread.cc:602)
@ 0x7ff3e4544693 start_thread (/tmp/glibc-20180109-1462-njgus6/glibc-2.23/nptl/pthread_create.c:333)
@ 0x7ff3e42863cc (unknown) (../sysdeps/unix/sysv/linux/x86_64/clone.S:109)
W0211 09:37:13.770215 6785 kernel_stack_watchdog.cc:139] Thread 7063 stuck at ../../../../../src/yb/consensus/log.cc:245 for 258ms:
Kernel stack:
[<ffffffff810b4cd8>] hrtimer_try_to_cancel.part.25+0x58/0x100
[<ffffffff810b4dc8>] hrtimer_cancel+0x28/0x40
[<ffffffff810f5e25>] futex_wait+0x1a5/0x280
[<ffffffff810f7b5a>] do_futex+0x12a/0x5a0
[<ffffffff810f8050>] SyS_futex+0x80/0x180
[<ffffffff816b51ab>] sysret_audit+0x17/0x21
[<ffffffffffffffff>] 0xffffffffffffffff
User stack:
@ 0x7ff3e41d30df (unknown) (/tmp/glibc-20180109-1462-njgus6/glibc-2.23/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0)
@ 0x7ff3ed612ec3 sys_futex (./src/base/linux_syscall_support.h:2097)
@ 0x7ff3ed612ec3 base::internal::SpinLockDelay(int volatile*, int, int) (./src/base/spinlock_linux-inl.h:88)
@ 0x7ff3ed612d93 SpinLock::SlowLock() (src/base/spinlock.cc:133)
@ 0x7ff3e7d227cf SpinLock::Lock() (src/base/spinlock.h:71)
@ 0x7ff3e7d227cf SpinLockHolder::SpinLockHolder(SpinLock*) (src/base/spinlock.h:136)
@ 0x7ff3e7d227cf tcmalloc::CentralFreeList::InsertRange(void*, void*, int) (src/central_freelist.cc:232)
@ 0x7ff3e7d25f86 tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) (src/thread_cache.cc:236)
@ 0x7ff3e7d2602b tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long) (src/thread_cache.cc:197)
@ 0x7ff3e7d380a7 tcmalloc::ThreadCache::Deallocate(void*, unsigned long) (src/thread_cache.h:390)
@ 0x7ff3e7d380a7 do_free_helper (src/tcmalloc.cc:1195)
@ 0x7ff3e7d380a7 do_free_with_callback (src/tcmalloc.cc:1225)
@ 0x7ff3e7d380a7 do_free (src/tcmalloc.cc:1234)
@ 0x7ff3e7d380a7 tc_deletearray (src/tcmalloc.cc:1665)
@ 0x7ff3e9359a6a std::string::_Rep::_M_dispose(std::allocator<char> const&) (/n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/bits/basic_string.h:2740)
@ 0x7ff3e9359a6a std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() (/n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/bits/basic_string.h:3037)
@ 0x7ff3e9359a6a google::protobuf::internal::ArenaStringPtr::DestroyNoArena(std::string const*) (thirdparty/yugabyte-thirdparty-2018-02-06T17_39_35/thirdparty/installed/uninstrumented/include/google/protobuf/arenastring.h:264)
@ 0x7ff3e9359a6a yb::docdb::KeyValuePairPB::SharedDtor() (yb/docdb/docdb.pb.cc:253)
@ 0x7ff3e9359aae yb::docdb::KeyValuePairPB::~KeyValuePairPB() (yb/docdb/docdb.pb.cc:248)
@ 0x7ff3e9359b00 yb::docdb::KeyValuePairPB::~KeyValuePairPB() (yb/docdb/docdb.pb.cc:249)
@ 0x7ff3e935b92d google::protobuf::internal::GenericTypeHandler<yb::docdb::KeyValuePairPB>::Delete(yb::docdb::KeyValuePairPB*, google::protobuf::Arena*) (thirdparty/yugabyte-thirdparty-2018-02-06T17_39_35/thirdparty/installed/uninstrumented/include/google/protobuf/repeated_field.h:623)
@ 0x7ff3e935b92d void google::protobuf::internal::RepeatedPtrFieldBase::Destroy<google::protobuf::RepeatedPtrField<yb::docdb::KeyValuePairPB>::TypeHandler>() (thirdparty/yugabyte-thirdparty-2018-02-06T17_39_35/thirdparty/installed/uninstrumented/include/google/protobuf/repeated_field.h:1473)
@ 0x7ff3e935af77 google::protobuf::RepeatedPtrField<yb::docdb::KeyValuePairPB>::~RepeatedPtrField() (thirdparty/yugabyte-thirdparty-2018-02-06T17_39_35/thirdparty/installed/uninstrumented/include/google/protobuf/repeated_field.h:1934)
@ 0x7ff3e935af77 yb::docdb::KeyValueWriteBatchPB::~KeyValueWriteBatchPB() (yb/docdb/docdb.pb.cc:557)
@ 0x7ff3e935afd0 yb::docdb::KeyValueWriteBatchPB::~KeyValueWriteBatchPB() (yb/docdb/docdb.pb.cc:560)
@ 0x7ff3e957e415 yb::tserver::WriteRequestPB::SharedDtor() (yb/tserver/tserver.pb.cc:1332)
@ 0x7ff3e958630e yb::tserver::WriteRequestPB::~WriteRequestPB() (yb/tserver/tserver.pb.cc:1327)
@ 0x7ff3e9586380 yb::tserver::WriteRequestPB::~WriteRequestPB() (yb/tserver/tserver.pb.cc:1328)
Also, I was reading the tcmalloc documentation and came across this:
If the sum of the thread cache sizes is less than --tcmalloc_max_total_thread_cache_bytes,
max_size grows easily. If not, thread cache 1 will try to steal from thread cache 2 (picked round-
robin) by decreasing thread cache 2's max_size. In this way, threads that are more active will steal
memory from other threads more often than they are have memory stolen from themselves.
Mostly idle threads end up with small caches and active threads end up with big caches. Note
that this stealing can cause the sum of the thread cache sizes to be greater than --
tcmalloc_max_total_thread_cache_bytes until thread cache 2 deallocates some memory to trigger
a garbage collection.
This sounds similar to what might be happening in our case and hence increasing tcmalloc_max_total_thread_cache_bytes might help. Just wanted to confirm this is your reasoning as well to bump up tcmalloc_max_total_thread_cache_bytes.
@pritamdamania87 - can we close out the issue with a link to the two commits you did in this regard?
Hey, sorry Pritam just saw this now.
Thanks,
Michael Chang
On Mon, Feb 12, 2018 at 8:28 PM, Kannan Muthukkaruppan <
[email protected]> wrote:
@pritamdamania87 https://github.com/pritamdamania87 - can we close out
the issue with a link to the two commits you did in this regard?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/YugaByte/yugabyte-db/issues/47#issuecomment-365147522,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABn6RrfA6-SN82c9uIRWtubVdc1aX-ZOks5tUQ9fgaJpZM4R_QdQ
.
Most helpful comment
@alk I increased TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES to 268435456 and currently TCMALLOC_CURRENT_TOTAL_THREAD_CACHE_BYTES is about 172705296. Although, during startup of the process there were still a few spinlock issues, but after TCMALLOC_CURRENT_TOTAL_THREAD_CACHE_BYTES stabilized, it seems like there aren't any further issues. These are a few stack traces that I saw during startup:
Also, I was reading the tcmalloc documentation and came across this:
This sounds similar to what might be happening in our case and hence increasing tcmalloc_max_total_thread_cache_bytes might help. Just wanted to confirm this is your reasoning as well to bump up tcmalloc_max_total_thread_cache_bytes.