Thanos: sidecar: not upload data to object storage

Created on 13 Mar 2019  Â·  12Comments  Â·  Source: thanos-io/thanos

Thanos, Prometheus and Golang version used

/prometheus/wal $ /bin/thanos --version
thanos, version 0.2.1 (branch: HEAD, revision: 30e7cbdafd3ef01189f202945c2728fcf37e1cf1)
build user: circleci@43764c068b6b
build date: 20181227-15:49:46
go version: go1.10.7

What happened
sidecar not uplaod history data to s3

What you expected to happen
history metric data will upload to s3

Anything else we need to know
in my k8s cluster i use prometheus-operator 0.28.0 
my prometheus start more than 2 hours

wal
/prometheus/wal $ ls -l
total 786080
-rw-rw-r-- 1 1000 2000 134152192 Mar 13 03:10 00000000
-rw-rw-r-- 1 1000 2000 134217728 Mar 13 03:44 00000001
-rw-r--r-- 1 1000 2000 134184960 Mar 13 04:11 00000002
-rw-r--r-- 1 1000 2000 134217728 Mar 13 04:39 00000003
-rw-r--r-- 1 1000 2000 134217728 Mar 13 05:06 00000004
-rw-r--r-- 1 1000 2000 113672064 Mar 13 05:30 00000005

logs

level=info ts=2019-03-13T03:31:13.745428957Z caller=flags.go:90 msg="StoreAPI address that will be propagated through gossip" address=10.224.7.77:10901
level=info ts=2019-03-13T03:31:13.748064794Z caller=main.go:256 component=sidecar msg="disabled TLS, key and cert must be set to enable"
level=info ts=2019-03-13T03:31:13.748110541Z caller=factory.go:39 msg="loading bucket configuration"
level=info ts=2019-03-13T03:31:13.748504426Z caller=sidecar.go:280 msg="starting sidecar" peer=
level=info ts=2019-03-13T03:31:13.748677004Z caller=sidecar.go:220 component=sidecar msg="Listening for StoreAPI gRPC" address=[10.224.7.77]:10901
level=info ts=2019-03-13T03:31:13.74882731Z caller=main.go:308 msg="Listening for metrics" address=0.0.0.0:10902
level=warn ts=2019-03-13T03:31:13.75065166Z caller=sidecar.go:132 msg="failed to fetch initial external labels. Is Prometheus running? Retrying" err="got non-200 response code: 503, response: Service Unavailable"
level=warn ts=2019-03-13T03:31:15.754416139Z caller=sidecar.go:132 msg="failed to fetch initial external labels. Is Prometheus running? Retrying" err="got non-200 response code: 503, response: Service Unavailable"

All 12 comments

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.000128906
go_gc_duration_seconds{quantile="0.25"} 0.000150611
go_gc_duration_seconds{quantile="0.5"} 0.000175406
go_gc_duration_seconds{quantile="0.75"} 0.000224085
go_gc_duration_seconds{quantile="1"} 0.000448111
go_gc_duration_seconds_sum 0.233028867
go_gc_duration_seconds_count 1195
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 32
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.10.7"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 4.572608e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 2.43786264e+09
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.49944e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 3.765945e+06
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 4.3589749801036844e-05
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 569344
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 4.572608e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 1.703936e+06
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 6.193152e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 20844
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 0
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 7.897088e+06
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.5524553477762089e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 6103
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 3.786789e+06
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 55552
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 65536
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 80104
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 98304
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 5.355824e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 3.512264e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 3.637248e+06
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 3.637248e+06
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 1.7279224e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 72
# HELP grpc_server_handled_total Total number of RPCs completed on the server, regardless of success or failure.
# TYPE grpc_server_handled_total counter
grpc_server_handled_total{grpc_code="OK",grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary"} 1
# HELP grpc_server_handling_seconds Histogram of response latency (seconds) of gRPC that had been application-level handled by the server.
# TYPE grpc_server_handling_seconds histogram
grpc_server_handling_seconds_bucket{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary",le="0.001"} 1
grpc_server_handling_seconds_bucket{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary",le="0.01"} 1
grpc_server_handling_seconds_bucket{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary",le="0.05"} 1
grpc_server_handling_seconds_bucket{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary",le="0.1"} 1
grpc_server_handling_seconds_bucket{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary",le="0.2"} 1
grpc_server_handling_seconds_bucket{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary",le="0.4"} 1
grpc_server_handling_seconds_bucket{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary",le="0.8"} 1
grpc_server_handling_seconds_bucket{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary",le="1.6"} 1
grpc_server_handling_seconds_bucket{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary",le="3.2"} 1
grpc_server_handling_seconds_bucket{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary",le="6.4"} 1
grpc_server_handling_seconds_bucket{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary",le="+Inf"} 1
grpc_server_handling_seconds_sum{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary"} 0.000111975
grpc_server_handling_seconds_count{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary"} 1
# HELP grpc_server_msg_received_total Total number of RPC stream messages received on the server.
# TYPE grpc_server_msg_received_total counter
grpc_server_msg_received_total{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary"} 1
# HELP grpc_server_msg_sent_total Total number of gRPC stream messages sent by the server.
# TYPE grpc_server_msg_sent_total counter
grpc_server_msg_sent_total{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary"} 1
# HELP grpc_server_started_total Total number of RPCs started on the server.
# TYPE grpc_server_started_total counter
grpc_server_started_total{grpc_method="Info",grpc_service="thanos.Store",grpc_type="unary"} 1
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 16.6
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 16
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.6011264e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.55244787286e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 3.8289408e+07
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes -1
# HELP thanos_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which thanos was built.
# TYPE thanos_build_info gauge
thanos_build_info{branch="HEAD",goversion="go1.10.7",revision="30e7cbdafd3ef01189f202945c2728fcf37e1cf1",version="0.2.1"} 1
# HELP thanos_cluster_members Number indicating current number of members in cluster.
# TYPE thanos_cluster_members gauge
thanos_cluster_members 4
# HELP thanos_gossip_messages_received_total Total gossip NotifyMsg calls.
# TYPE thanos_gossip_messages_received_total counter
thanos_gossip_messages_received_total 0
# HELP thanos_grpc_req_panics_recovered_total Total number of gRPC requests recovered from internal panic.
# TYPE thanos_grpc_req_panics_recovered_total counter
thanos_grpc_req_panics_recovered_total 0
# HELP thanos_objstore_bucket_last_successful_upload_time Second timestamp of the last successful upload to the bucket.
# TYPE thanos_objstore_bucket_last_successful_upload_time gauge
thanos_objstore_bucket_last_successful_upload_time{bucket="thanos"} 0
# HELP thanos_sidecar_last_heartbeat_success_time_seconds Second timestamp of the last successful heartbeat.
# TYPE thanos_sidecar_last_heartbeat_success_time_seconds gauge
thanos_sidecar_last_heartbeat_success_time_seconds 1.552455347778438e+09
# HELP thanos_sidecar_memberlist_gossip thanos_sidecar_memberlist_gossip
# TYPE thanos_sidecar_memberlist_gossip summary
thanos_sidecar_memberlist_gossip{quantile="0.5"} 0.016161000356078148
thanos_sidecar_memberlist_gossip{quantile="0.9"} 0.01900400035083294
thanos_sidecar_memberlist_gossip{quantile="0.99"} 0.07451699674129486
thanos_sidecar_memberlist_gossip_sum 535.1268449591007
thanos_sidecar_memberlist_gossip_count 37378
# HELP thanos_sidecar_memberlist_probeNode thanos_sidecar_memberlist_probeNode
# TYPE thanos_sidecar_memberlist_probeNode summary
thanos_sidecar_memberlist_probeNode{quantile="0.5"} 0.44258400797843933
thanos_sidecar_memberlist_probeNode{quantile="0.9"} 0.5442280173301697
thanos_sidecar_memberlist_probeNode{quantile="0.99"} 0.5442280173301697
thanos_sidecar_memberlist_probeNode_sum 3563.915416419506
thanos_sidecar_memberlist_probeNode_count 7475
# HELP thanos_sidecar_memberlist_pushPullNode thanos_sidecar_memberlist_pushPullNode
# TYPE thanos_sidecar_memberlist_pushPullNode summary
thanos_sidecar_memberlist_pushPullNode{quantile="0.5"} NaN
thanos_sidecar_memberlist_pushPullNode{quantile="0.9"} NaN
thanos_sidecar_memberlist_pushPullNode{quantile="0.99"} NaN
thanos_sidecar_memberlist_pushPullNode_sum 322.35836923122406
thanos_sidecar_memberlist_pushPullNode_count 250
# HELP thanos_sidecar_memberlist_tcp_accept thanos_sidecar_memberlist_tcp_accept
# TYPE thanos_sidecar_memberlist_tcp_accept counter
thanos_sidecar_memberlist_tcp_accept 239
# HELP thanos_sidecar_memberlist_tcp_connect thanos_sidecar_memberlist_tcp_connect
# TYPE thanos_sidecar_memberlist_tcp_connect counter
thanos_sidecar_memberlist_tcp_connect 250
# HELP thanos_sidecar_memberlist_tcp_sent thanos_sidecar_memberlist_tcp_sent
# TYPE thanos_sidecar_memberlist_tcp_sent counter
thanos_sidecar_memberlist_tcp_sent 424450
# HELP thanos_sidecar_memberlist_udp_received thanos_sidecar_memberlist_udp_received
# TYPE thanos_sidecar_memberlist_udp_received counter
thanos_sidecar_memberlist_udp_received 470754
# HELP thanos_sidecar_memberlist_udp_sent thanos_sidecar_memberlist_udp_sent
# TYPE thanos_sidecar_memberlist_udp_sent counter
thanos_sidecar_memberlist_udp_sent 471725
# HELP thanos_sidecar_prometheus_up Boolean indicator whether the sidecar can reach its Prometheus peer.
# TYPE thanos_sidecar_prometheus_up gauge
thanos_sidecar_prometheus_up 1

now sidecar uploaded the history metrics data to s3。
the first upload time after started about 3 hours.
how can i config, how long sidecar upload history data?

Hi, could you provide the command line arguments for Prometheus? The tsdb block file will be persisted dependent on the --storage.tsdb.min-block-duration and --storage.tsdb.max-block-duration.

hi @jojohappy this is args

    Args:
      --web.console.templates=/etc/prometheus/consoles
      --web.console.libraries=/etc/prometheus/console_libraries
      --config.file=/etc/prometheus/config_out/prometheus.env.yaml
      --storage.tsdb.path=/prometheus
      --storage.tsdb.retention=15d
      --web.enable-lifecycle
      --storage.tsdb.no-lockfile
      --web.route-prefix=/
      --storage.tsdb.min-block-duration=2h
      --storage.tsdb.max-block-duration=2h

Hi, this should not be changed. Thanos ships data to the object storage only after prometheus dumps them to the file system which is by default after 2h and that shouldn't be changed.
So this is expected behavior, it will ship the data every cca 2h.

thanks @FUSAKLA

Literally been searching for this comment for days and just found it. @FUSAKLA is there anyway to include this in the documentation?

"So this is expected behavior, it will ship the data every cca 2h."

This isn't explained anywhere that I could find and it makes it seem like either the connection to the bucket has failed silently or things just aren't working.

Maybe adding "connection to bucket successful" somewhere so it's obvious that, if there is an issue, it's not the connection to the bucket?

Agreed @halfmatthalfcat
I cannot find this documented anywhere either. It is quite hard to determine from the sidecar container logs when metrics are being uploaded to object storage.

It was explained many times, as this is how Prometheus works (it flushes data in proper format after some time.), but we might want to make it more verbose if you have ideas, where (:

You can read that from second bullet in the docs: https://thanos.io/components/sidecar.md/#sidecar (:

As per log line: We already have log line on info level that block was uploaded: https://github.com/thanos-io/thanos/blob/a993678b7cb2519488baf5691184286eb9e9b7c7/pkg/shipper/shipper.go#L333

Anything else we can improve? (:

My apologies, clearly it is mentioned as you say @bwplotka 😄

It is quite an important piece of information and clearly myself and others are still having trouble spotting it in the docs.

Is there any way in the docs to draw more attention to it? I'm not sure how the docs are generated but in docfx you can have NOTES which stand out to draw attention specific pieces of information.

https://dotnet.github.io/docfx/spec/docfx_flavored_markdown.html#note-warningtipimportant

Is there anything similar to this which could be done here? Or whether you think it's even worth doing? 😅 Or maybe something like that isn't necessary and something as simple as bolding that piece of text may be enough?

@bwplotka Could a "connection established to bucket" log line be added so we know that, in diagnosing potential issues, there isn't anything wrong with the connection itself? That's where I lost my way.

Was this page helpful?
0 / 5 - 0 ratings