A customer's analytics uploads were failing since the bundle size was greater than 100MB. This is due to an upload size limit in the upload service. Tower should upload bundles smaller than 100MB by building a bundle and if it is too big then divide the time period in half and send two bundles instead. Recurse until small enough bundles are reached.
The simplest way to do this is probably split how we gather the data and do uploads.
That will probably be less CPU and memory intensive on the query side. Paginating the events and sending a constant number in a larger number of bundles would be better for the sender and the receiver.
functional example of how to split at ship time https://gist.github.com/jctanner/db5a4d5e6a054ac0468ef09ed42ea276
Tested with local api and did not see any errors, but also didn't validate the data was processed correctly.
fastapi_1 | {"@timestamp": "2020-07-22T19:34:22.438Z", "@version": 1, "source_host": "4c2715477e42", "name": "tower_analytics_report.processor.tower_analytics_processor", "args": [], "levelname": "INFO", "levelno": 20, "pathname": "./tower_analytics_report/processor/tower_analytics_processor.py", "filename": "tower_analytics_processor.py", "module": "tower_analytics_processor", "stack_info": null, "lineno": 840, "funcName": "handle_events_table", "created": 1595446462.438589, "msecs": 438.58909606933594, "relativeCreated": 8097716.229200363, "thread": 140240822705480, "threadName": "MainThread", "processName": "SpawnProcess-1", "process": 32, "message": "handle_events_table.merge_encrypted"}
fastapi_1 | {"@timestamp": "2020-07-22T19:34:22.440Z", "@version": 1, "source_host": "4c2715477e42", "name": "tower_analytics_report.processor.tower_analytics_processor", "args": [], "levelname": "INFO", "levelno": 20, "pathname": "./tower_analytics_report/processor/tower_analytics_processor.py", "filename": "tower_analytics_processor.py", "module": "tower_analytics_processor", "stack_info": null, "lineno": 855, "funcName": "handle_events_table", "created": 1595446462.440438, "msecs": 440.43803215026855, "relativeCreated": 8097718.078136444, "thread": 140240822705480, "threadName": "MainThread", "processName": "SpawnProcess-1", "process": 32, "message": "handle_events_table.mapping"}
fastapi_1 | {"@timestamp": "2020-07-22T19:34:22.441Z", "@version": 1, "source_host": "4c2715477e42", "name": "tower_analytics_report.processor.tower_analytics_processor", "args": [], "levelname": "INFO", "levelno": 20, "pathname": "./tower_analytics_report/processor/tower_analytics_processor.py", "filename": "tower_analytics_processor.py", "module": "tower_analytics_processor", "stack_info": null, "lineno": 864, "funcName": "handle_events_table", "created": 1595446462.4415565, "msecs": 441.556453704834, "relativeCreated": 8097719.196557999, "thread": 140240822705480, "threadName": "MainThread", "processName": "SpawnProcess-1", "process": 32, "message": "handle_events_table.insert"}
fastapi_1 | {"@timestamp": "2020-07-22T19:34:22.443Z", "@version": 1, "source_host": "4c2715477e42", "name": "tower_analytics_report.processor.tower_analytics_processor", "args": [], "levelname": "INFO", "levelno": 20, "pathname": "./tower_analytics_report/processor/tower_analytics_processor.py", "filename": "tower_analytics_processor.py", "module": "tower_analytics_processor", "stack_info": null, "lineno": 916, "funcName": "handle_events_table", "created": 1595446462.4432404, "msecs": 443.2404041290283, "relativeCreated": 8097720.880508423, "thread": 140240822705480, "threadName": "MainThread", "processName": "SpawnProcess-1", "process": 32, "message": "handle_events_table.commit"}
fastapi_1 | {"@timestamp": "2020-07-22T19:34:22.448Z", "@version": 1, "source_host": "4c2715477e42", "name": "uvicorn.access", "args": ["172.19.0.1:56830", "POST", "/api/tower-analytics/upload_bundle/", "1.1", 202], "levelname": "INFO", "levelno": 20, "pathname": "/usr/local/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", "filename": "httptools_impl.py", "module": "httptools_impl", "stack_info": null, "lineno": 454, "funcName": "send", "created": 1595446462.4488645, "msecs": 448.8644599914551, "relativeCreated": 8097726.504564285, "thread": 140240822705480, "threadName": "MainThread", "processName": "SpawnProcess-1", "process": 32, "status_code": 202, "scope": {"type": "http", "http_version": "1.1", "server": ["172.19.0.15", 8000], "client": ["172.19.0.1", 56830], "scheme": "http", "method": "POST", "root_path": "/api/tower-analytics", "path": "/upload_bundle/", "raw_path": "b'/api/tower-analytics/upload_bundle/'", "query_string": "b''", "headers": [["b'host'", "b'192.168.122.1:8004'"], ["b'accept-encoding'", "b'identity'"], ["b'user-agent'", "b'Red Hat Ansible Tower 3.7.1 (enterprise)'"], ["b'content-length'", "b'3363'"], ["b'content-type'", "b'multipart/form-data; boundary=94d6ccd9edf6590a6d6495534d138b35'"], ["b'authorization'", "b'Basic U0FNSUFNOmJhcg=='"]], "app": "<fastapi.applications.FastAPI object at 0x7f8c5b80f898>", "state": {"engine": "Engine(postgres://debug:***@postgres:5432/tenant_1)", "decrypt": "<function EngineMiddleware.dispatch.<locals>.decrypt at 0x7f8c5733a598>"}, "router": "<fastapi.routing.APIRouter object at 0x7f8c59d082b0>", "path_params": {}, "app_root_path": "", "endpoint": "<function upload_bundle at 0x7f8c59b609d8>"}, "message": "172.19.0.1:56830 - \"POST /api/tower-analytics/upload_bundle/ HTTP/1.1\" 202"}
refresher_1 | refresh_module_count_by_date_and_cluster_mview
refresher_1 | failed_task_count_by_date_and_template_mview
refresher_1 | job_event_count_by_date_and_org_mview
refresher_1 | job_state_count_by_date_org_cluster_and_template_mview
refresher_1 | hosts_by_date_and_org_mview
refresher_1 | roi_templates_mview
This customer here is me by the way :)
Happy to run any tests if it helps
since merge of ansible/awx#7709 analytics tests are busted because changes basic presumption about whats in the tarball it seems
Initial problems w/ #7709 have been solved.
Now bundles are less than 100MB:
[root@ip-10-0-2-31 ~]# du -h tar*
201M tar1
204K tar2
201M tar3
14M tar4
[root@ip-10-0-2-31 ~]# du -h /tmp/ea48ba97-21c2-4ff5-b42c-4100a6178d38_2020-09-16-214001+0000_*
24K /tmp/ea48ba97-21c2-4ff5-b42c-4100a6178d38_2020-09-16-214001+0000_0.tar.gz
34M /tmp/ea48ba97-21c2-4ff5-b42c-4100a6178d38_2020-09-16-214001+0000_1.tar.gz
34M /tmp/ea48ba97-21c2-4ff5-b42c-4100a6178d38_2020-09-16-214001+0000_2.tar.gz
2.4M /tmp/ea48ba97-21c2-4ff5-b42c-4100a6178d38_2020-09-16-214001+0000_3.tar.gz
Going to call this verified and fixed for devel. We can discuss if more backports are needed.
Most helpful comment
The simplest way to do this is probably split how we gather the data and do uploads.