jaeger-all-in-one pod invalid memory address panic during test suite

Created on 3 Jul 2018  ·  5Comments  ·  Source: jaegertracing/jaeger

We're attempting to integrate the jaeger all-in-one pod in our development environments and are experiencing shutdowns during our test suite, with the jaeger-all-in-one pod reporting the following the following during the test:

all-in-one logs:

{"level":"info","ts":1530582648.1951566,"caller":"healthcheck/handler.go:99","msg":"Health Check server started","http-port":14269,"status":"unavailable"}
{"level":"info","ts":1530582648.1964855,"caller":"tchannel/builder.go:89","msg":"Enabling service discovery","service":"jaeger-collector"}
{"level":"info","ts":1530582648.196542,"caller":"peerlistmgr/peer_list_mgr.go:111","msg":"Registering active peer","peer":"127.0.0.1:14267"}
{"level":"info","ts":1530582648.1970081,"caller":"standalone/main.go:179","msg":"Starting agent"}
{"level":"info","ts":1530582648.1976883,"caller":"standalone/main.go:219","msg":"Starting jaeger-collector TChannel server","port":14267}
{"level":"info","ts":1530582648.1977384,"caller":"standalone/main.go:229","msg":"Starting jaeger-collector HTTP server","http-port":14268}
{"level":"info","ts":1530582648.214731,"caller":"standalone/main.go:290","msg":"Registering metrics handler with jaeger-query HTTP server","route":"/metrics"}
{"level":"info","ts":1530582648.2148683,"caller":"standalone/main.go:296","msg":"Starting jaeger-query HTTP server","port":16686}
{"level":"info","ts":1530582648.2148962,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1530582649.1985323,"caller":"peerlistmgr/peer_list_mgr.go:157","msg":"Not enough connected peers","connected":0,"required":1}
{"level":"info","ts":1530582649.198617,"caller":"peerlistmgr/peer_list_mgr.go:166","msg":"Trying to connect to peer","host:port":"127.0.0.1:14267"}
{"level":"info","ts":1530582649.199323,"caller":"peerlistmgr/peer_list_mgr.go:176","msg":"Connected to peer","host:port":"[::]:14267"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x8bd634]

goroutine 122 [running]:
github.com/jaegertracing/jaeger/cmd/agent/app/reporter/tchannel.(*Reporter).EmitBatch(0xc420200180, 0x0, 0x0, 0x0)
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/reporter/tchannel/reporter.go:119 +0x84
github.com/jaegertracing/jaeger/thrift-gen/jaeger.(*agentProcessorEmitBatch).Process(0xc420326170, 0xc400000000, 0xe863c0, 0xc4201b17c0, 0xe863c0, 0xc4201b17c0, 0xc42065ff28, 0xc420432f80, 0xc4203261f0)
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/thrift-gen/jaeger/agent.go:137 +0xc9
github.com/jaegertracing/jaeger/thrift-gen/jaeger.(*AgentProcessor).Process(0xc4202a2040, 0xe863c0, 0xc4201b17c0, 0xe863c0, 0xc4201b17c0, 0x0, 0x0, 0x0)
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/thrift-gen/jaeger/agent.go:111 +0x2fd
github.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).processBuffer(0xc4200812c0)
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:110 +0x179
created by github.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).Serve
    /home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:82 +0x62

The pod is implemented via helm using a slightly modified version of the all-in-one chart template found at https://github.com/jaegertracing/jaeger-kubernetes/blob/master/all-in-one/jaeger-all-in-one-template.yml

Following are our deployment and service configurations.

all-in-one deployment:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: 2018-07-02T21:00:27Z
  generation: 1
  labels:
    app: jaeger-all-in-one
    chart: jaeger-all-in-one-0.1.00
    component: all-in-one-deployment
    heritage: Tiller
    jaeger-infra: jaeger-all-in-one
    release: tjb-jgr-allinone-stability-v2-1
  name: tjb-jgr-allinone-stability-v2-1-jaeger-all-in-one
  namespace: default
  resourceVersion: "73847"
  selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/tjb-jgr-allinone-stability-v2-1-jaeger-all-in-one
  uid: f2662b0a-7e3a-11e8-ac4d-42010a8e0fe3
spec:
  replicas: 1
  selector:
    matchLabels:
      app: jaeger-all-in-one
      chart: jaeger-all-in-one-0.1.00
      component: all-in-one-deployment
      jaeger-infra: jaeger-all-in-one
      release: tjb-jgr-allinone-stability-v2-1
  strategy:
    type: Recreate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: jaeger-all-in-one
        chart: jaeger-all-in-one-0.1.00
        component: all-in-one-deployment
        jaeger-infra: jaeger-all-in-one
        release: tjb-jgr-allinone-stability-v2-1
    spec:
      containers:
      - env:
        - name: COLLECTOR_ZIPKIN_HTTP_PORT
        image: jaegertracing/all-in-one:1.5.0
        imagePullPolicy: IfNotPresent
        name: tjb-jgr-allinone-stability-v2-1-jaeger-all-in-one
        ports:
        - containerPort: 5775
          protocol: UDP
        - containerPort: 6831
          protocol: UDP
        - containerPort: 6832
          protocol: UDP
        - containerPort: 5778
          protocol: TCP
        - containerPort: 16686
          protocol: TCP
        - containerPort: 9411
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /
            port: 16686
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 750m
            memory: 1Gi
          requests:
            cpu: 250m
            memory: 1Gi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: 2018-07-03T11:53:59Z
    lastUpdateTime: 2018-07-03T11:53:59Z
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

all-in-one services:

---
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2018-07-02T21:00:27Z
  labels:
    app: jaeger-all-in-one
    chart: jaeger-all-in-one-0.1.00
    component: all-in-one-agent
    heritage: Tiller
    jaeger-infra: jaeger-all-in-one
    release: tjb-jgr-allinone-stability-v2-1
  name: tjb-jgr-allinone-stability-v2-1-jaeger-all-in-one-agent
  namespace: default
  resourceVersion: "834"
  selfLink: /api/v1/namespaces/default/services/tjb-jgr-allinone-stability-v2-1-jaeger-all-in-one-agent
  uid: f2815108-7e3a-11e8-ac4d-42010a8e0fe3
spec:
  clusterIP: None
  ports:
  - name: agent-zipkin-thrift
    port: 5775
    protocol: UDP
    targetPort: 5775
  - name: agent-compact
    port: 6831
    protocol: UDP
    targetPort: 6831
  - name: agent-binary
    port: 6832
    protocol: UDP
    targetPort: 6832
  - name: agent-configs
    port: 5778
    protocol: TCP
    targetPort: 5778
  selector:
    app: jaeger-all-in-one
    component: all-in-one-deployment
    jaeger-infra: jaeger-all-in-one
    release: tjb-jgr-allinone-stability-v2-1
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
---
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2018-07-02T21:00:27Z
  labels:
    app: jaeger-all-in-one
    chart: jaeger-all-in-one-0.1.00
    component: all-in-one-collector
    heritage: Tiller
    jaeger-infra: jaeger-all-in-one
    release: tjb-jgr-allinone-stability-v2-1
  name: tjb-jgr-allinone-stability-v2-1-jaeger-all-in-one-collector
  namespace: default
  resourceVersion: "832"
  selfLink: /api/v1/namespaces/default/services/tjb-jgr-allinone-stability-v2-1-jaeger-all-in-one-collector
  uid: f27c80fd-7e3a-11e8-ac4d-42010a8e0fe3
spec:
  clusterIP: 10.113.30.20
  ports:
  - name: jaeger-collector-tchannel
    port: 14267
    protocol: TCP
    targetPort: 14267
  - name: jaeger-collector-http
    port: 14268
    protocol: TCP
    targetPort: 14268
  - name: jaeger-collector-zipkin
    port: 9411
    protocol: TCP
    targetPort: 9411
  selector:
    app: jaeger-all-in-one
    component: all-in-one-deployment
    jaeger-infra: jaeger-all-in-one
    release: tjb-jgr-allinone-stability-v2-1
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
---
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2018-07-02T21:00:27Z
  labels:
    app: jaeger-all-in-one
    chart: jaeger-all-in-one-0.1.00
    component: all-in-one-query
    heritage: Tiller
    jaeger-infra: jaeger-all-in-one
    release: tjb-jgr-allinone-stability-v2-1
  name: tjb-jgr-allinone-stability-v2-1-jaeger-all-in-one-query
  namespace: default
  resourceVersion: "827"
  selfLink: /api/v1/namespaces/default/services/tjb-jgr-allinone-stability-v2-1-jaeger-all-in-one-query
  uid: f270e4b3-7e3a-11e8-ac4d-42010a8e0fe3
spec:
  clusterIP: 10.113.20.47
  ports:
  - name: query-http
    port: 80
    protocol: TCP
    targetPort: 16686
  selector:
    app: jaeger-all-in-one
    component: all-in-one-deployment
    jaeger-infra: jaeger-all-in-one
    release: tjb-jgr-allinone-stability-v2-1
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
---
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2018-07-02T21:00:27Z
  labels:
    app: jaeger-all-in-one
    chart: jaeger-all-in-one-0.1.00
    component: all-in-one-zipkin
    heritage: Tiller
    jaeger-infra: jaeger-all-in-one
    release: tjb-jgr-allinone-stability-v2-1
  name: tjb-jgr-allinone-stability-v2-1-jaeger-all-in-one-zipkin
  namespace: default
  resourceVersion: "846"
  selfLink: /api/v1/namespaces/default/services/tjb-jgr-allinone-stability-v2-1-jaeger-all-in-one-zipkin
  uid: f2927bd1-7e3a-11e8-ac4d-42010a8e0fe3
spec:
  clusterIP: None
  ports:
  - name: jaeger-collector-zipkin
    port: 9411
    protocol: TCP
    targetPort: 9411
  selector:
    app: jaeger-all-in-one
    component: all-in-one-deployment
    jaeger-infra: jaeger-all-in-one
    release: tjb-jgr-allinone-stability-v2-1
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
bug

Most helpful comment

That's helpful, thanks! It sounds like a concurrency issue to me, which will make it a bit harder to reproduce. In any case, we'll need your help again during the validation of the fix!

All 5 comments

Could you just confirm what you see when you run the version command for the backing container image? Like:

$ kubectl exec -it POD_NAME -- /go/bin/standalone-linux version
{"gitCommit":"c8d514c4b45e32fb037497ae2ef4042b3005f902","GitVersion":"v1.5.0","BuildDate":"2018-06-30T22:11:21Z"}

POD_NAME look like this to me when I run kubectl create -f ....../jaeger-all-in-one-template.yml: jaeger-deployment-8648cb84d5-cg82n

@jpkrohling standalone-linux version on my pod reports the following:

─ k exec -it tjb-jgr-allinone-stability-v2-1-jaeger-all-in-one-6786d849hr2ml -- /go/bin/standalone-linux version
{"gitCommit":"ab77ac7efdabbcdce9e53628681ec45421f05e58","GitVersion":"v1.5.0","BuildDate":"2018-05-28T15:24:54Z"}

Thanks, I'll try to reproduce it on my side. Is the test failure happening consistently, or is it intermittent? If it's consistent, are you able to share a code that would trigger this condition?

Unable to share code, the failure occurs consistently during our pytest suite of our smart contract modules although at varying points in the tests. I've had it get nearly through the entire suite of tests before failing, sometimes in the first 50 or so out of a total of 200+.

We have a total of four modules reporting into the all-in-one pod for tracing across python, go, and haskell modules if thats helpful at all.

That's helpful, thanks! It sounds like a concurrency issue to me, which will make it a bit harder to reproduce. In any case, we'll need your help again during the validation of the fix!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pavolloffay picture pavolloffay  ·  3Comments

black-adder picture black-adder  ·  4Comments

rur0 picture rur0  ·  4Comments

NeoCN picture NeoCN  ·  4Comments

saulshanabrook picture saulshanabrook  ·  4Comments