Onpremise: Bulk action are failing

Created on 3 May 2019  路  12Comments  路  Source: getsentry/onpremise

Bulk action like merge or delete on 652 issues is not working

Screenshot from 2019-05-03 19-53-29

On a delete action I can see in the logs:

 172.18.0.1 - - [03/May/2019:17:51:31 +0000] "GET /api/0/projects/geokrety/geokrety-legacy/issues/?sort=date&shortIdLookup=1&environment=kumy&limit=25&statsPeriod=24h&query=is%3Aunresolved&cursor=1556897291000:0:1 HTTP/1.1" 200 803 "https://sentry.kumy.org/geokrety/geokrety-legacy/?environment=kumy" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:66.0) Gecko/20100101 Firefox/66.0"
 172.18.0.1 - - [03/May/2019:17:51:37 +0000] "GET /api/0/projects/geokrety/geokrety-legacy/issues/?sort=date&shortIdLookup=1&environment=kumy&limit=25&statsPeriod=24h&query=is%3Aunresolved&cursor=1556897291000:0:1 HTTP/1.1" 200 803 "https://sentry.kumy.org/geokrety/geokrety-legacy/?environment=kumy" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:66.0) Gecko/20100101 Firefox/66.0"
 Traceback (most recent call last):
   File "/usr/local/lib/python2.7/site-packages/sentry/api/base.py", line 90, in handle_exception
     response = super(Endpoint, self).handle_exception(exc)
   File "/usr/local/lib/python2.7/site-packages/sentry/api/base.py", line 190, in dispatch
     response = handler(request, *args, **kwargs)
   File "/usr/local/lib/python2.7/site-packages/sentry/api/endpoints/organization_group_index.py", line 292, in delete
     search_fn,
   File "/usr/local/lib/python2.7/site-packages/sentry/api/helpers/group_index.py", line 414, in delete_groups
     'paginator_options': {'max_limit': 1000},
   File "/usr/local/lib/python2.7/site-packages/sentry/api/endpoints/organization_group_index.py", line 40, in _search
     result = search.query(**query_kwargs)
   File "/usr/local/lib/python2.7/site-packages/sentry/search/django/backend.py", line 404, in query
     paginator_options, search_filters, **parameters)
   File "/usr/local/lib/python2.7/site-packages/sentry/search/snuba/backend.py", line 302, in _query
     search_filters=search_filters,
   File "/usr/local/lib/python2.7/site-packages/sentry/search/snuba/backend.py", line 466, in snuba_search
     sample=1,  # Don't use clickhouse sampling, even when in turbo mode.
   File "/usr/local/lib/python2.7/site-packages/sentry/utils/snuba.py", line 477, in raw_query
     raise SnubaError(err)
 SnubaError: HTTPConnectionPool(host='localhost', port=1218): Max retries exceeded with url: /query (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f516eadbed0>: Failed to establish a new connection: [Errno 111] Connection refused',))
 172.18.0.1 - - [03/May/2019:17:51:39 +0000] "DELETE /api/0/organizations/geokrety/issues/?query=is%3Aunresolved&environment=kumy&project=5 HTTP/1.1" 500 362 "https://sentry.kumy.org/geokrety/geokrety-legacy/?environment=kumy" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:66.0) Gecko/20100101 Firefox/66.0"

Bulk merge is also giving a similar error

 SnubaError: HTTPConnectionPool(host='localhost', port=1218): Max retries exceeded with url: /query (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f516eadbe10>: Failed to establish a new connection: [Errno 111] Connection refused',))

Bug

All 12 comments

Can you please provide more details on your setup?

I'm using swarm deployment, with this docker-compose.yml, anything you wish to know?

version: '3.7'

x-defaults: &defaults
  # build: .
  image: sentry-local:9.1.1
  environment:
    SENTRY_SECRET_KEY: xxx
    SENTRY_SECRET_KEY_: xxx
    SENTRY_MEMCACHED_HOST: memcached
    SENTRY_REDIS_HOST: redis
    SENTRY_POSTGRES_HOST: postgres
    SENTRY_EMAIL_HOST: smtp
    SENTRY_DB_PASSWORD: xxx
    SENTRY_EMAIL_HOST: smtp.xyz.net
    SENTRY_EMAIL_PASSWORD: 'xxx'
    SENTRY_EMAIL_USER: 'xxx'
    SENTRY_EMAIL_PORT: 587
    SENTRY_EMAIL_USE_TLS: 'True'
    SENTRY_SERVER_EMAIL: xxx
    SENTRY_USE_SSL: 1
    # OPTIONAL: If you want GitHub integration
    GITHUB_CLIENT_ID: xxx
    GITHUB_CLIENT_SECRET: xxx
    GITHUB_EXTENDED_PERMISSIONS: "repo"


  volumes:
    - /srv/SENTRY/data:/var/lib/sentry/files
  deploy:
    labels:
      traefik.enable: "false"
    restart_policy:
      condition: any
  depends_on:
    - redis
    - postgres
    - memcached
    - smtp
  networks:
    default:

x-defaults-other: &defaults-other
  deploy:
    labels:
      traefik.enable: "false"
    restart_policy:
      condition: any
  networks:
    default:


services:

  memcached:
    image: memcached:1.5-alpine
    <<: *defaults-other

  redis:
    image: redis:3.2-alpine
    <<: *defaults-other

  postgres:
    image: postgres:9.5
    environment:
      POSTGRES_PASSWORD: xxx
    volumes:
      - /srv/SENTRY/postgres:/var/lib/postgresql/data
    <<: *defaults-other

  web:
    <<: *defaults
    # ### To upgrade: run as sleep, then connect in container and `$ upgrade`
    #command: sleep 10000
    deploy:
      labels:
        traefik.enable: "true"
        traefik.docker.network: "traefik_default"
        traefik.frontend.rule: "Host:xxx"
        traefik.frontend.passHostHeader: "true"
        traefik.protocol: "http"
        traefik.port: 9000
      restart_policy:
        condition: any
    networks:
      default:
      traefik_default:

  cron:
    <<: *defaults
    command: run cron

  worker:
    <<: *defaults
    command: run worker

networks:
  default:
  traefik_default:
    external: true

Can you also share your config as the error logs indicate snuba but it should not be enabled by default and you don't seem to be running that service.

No config file updated, only configured using environment variables. sentry.conf.py and config.yml are the one from commit 0b1843047ae28425d428cf4f264b0fa07f59a76c

Alright, thanks for the info. Investigating...

@kumy - initial investigation reveals that this is not a feature that should be enabled in 9.x releases as it requires a new service to be running and that's why you are seeing the issues.

I'll dig deeper and see why this is exposed without the feature flag being enabled. Can you share which page are seeing this command? Looks like it is the issues page but just trying to confirm. Also, although this may seem irrelevant, can you share how many projects you have on your Sentry instance? Just one or more than one would be enough rather than a precise answer.

Finally, this feature likely won't be available in the 9.x releases so the "fix" will simply remove the path to get there unless you have the feature flag enabled to indicate that you have the new service running. Apologies for the inconvenience.

@BYK I have 2 projects. Here are the used steps:

  • Open project -> issues
  • tick check mark on table header
  • a pop up appear asking if I wish to select all issues
  • click it
  • then 3 dots menu -> delete issues

Screenshot from 2019-06-11 23-32-23
Screenshot from 2019-06-11 23-32-33
Screenshot from 2019-06-11 23-32-43

However, I've just upgraded to 9f8c89a5f7d7718c6f3c1e48cee0f32d77340810, and on launch, I saw many lines in logs like:

 21:32:13 [ERROR] sentry.errors.events: preprocess.failed.empty (cache_key=u'e:445977fb8d5546199d45986ac4d7b648:5')
 21:32:13 [ERROR] sentry.errors.events: preprocess.failed.empty (cache_key=u'e:d9712136606c40759a91d381b8cc828c:5')
 21:32:13 [ERROR] sentry.errors.events: preprocess.failed.empty (cache_key=u'e:64b941a358484a0781223823e95c6f34:5')

And tried the action again and seems to work now!!! But celery is eating one core :/

And you interested in a test back on commit 0b18430?

@kumy - thanks a lot for the follow up and all the useful information! I'll check this again myself that said I don't think testing this with older versions is useful as we expect people to upgrade to latest if an issue arises (any fix I'd do would mean a newer version here anyway :D).

That said looking at the diff between the commit you shared and master, I mostly see some config fixes: https://github.com/getsentry/onpremise/compare/0b1843047ae28425d428cf4f264b0fa07f59a76c..master and none of them seems to be related.

Maybe your celery worker was stuck and upgrading forced it to restart which fixed the issue?

Killing celery didn't fixed the problem. It started again and took one complete cpu again. I'll try to start again from scratch without data at all. Maybe something was broken in db while upgrading.

@kumy - I meant killing celery triggering the fix you observed :) I think it is normal for celery to consume the CPU to its full while it is processing all these. Not sure if you can add a new worker to speed things up tho. Or did you mean celery was stuck and just consuming CPU?

I started the stack again tonight, celery seems quite smooth since start time (~5 minutes right now).

Well then, I'll close this one out but will still investigate the Snuba dependency thing. Thanks a lot for coming back and following up with everything!

Was this page helpful?
0 / 5 - 0 ratings