Elasticsearch: user_agent plugin failed parsing fields in Filebeat CI tests

Created on 21 Oct 2019  路  7Comments  路  Source: elastic/elasticsearch

Beats CI filebeat tests start failing caused by something changed with the user_agent plugin or regex, specifically user_agent.os.name,聽user_agent.version and user_agent.name field.

For example: user_agent original message Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.3; WOW64; Trident/7.0; .NET4.0E; .NET4.0C; .NET CLR 3.5.30729; .NET CLR[ 2.0.50727](tel: 2050727); .NET CLR 3.0.30729) is parsed and showing user_agent.version=11.0, which 11.0 is nowhere to be found in the original message. Please see https://travis-ci.org/elastic/beats/jobs/600418441#L2296 for more details.

More user agent lines that failed:
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:50.0) Gecko/20100101 Firefox/50.0
Wget/1.13.4 (linux-gnu)
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:49.0) Gecko/20100101 Firefox/49.0

More unit tests probably need to be added, especially for windows related https://github.com/elastic/elasticsearch/blob/master/modules/ingest-user-agent/src/test/java/org/elasticsearch/ingest/useragent/UserAgentProcessorTests.java#L93

:CorFeatureIngest CorFeatures

All 7 comments

Pinging @elastic/es-core-features (:Core/Features/Ingest)

I'm currently working on disabling the specific checks in Beats https://github.com/elastic/beats/pull/14179.

Find here the differences in expectations in Beats test files: https://github.com/elastic/beats/pull/14190/files

Most cases can be summarized in these ones, I think that the first two may worth investigating.

  • Dots added after the version:
-        "user_agent.version": "50.0"
+        "user_agent.version": "50.0."
  • Some versions changed number:
         "user_agent.original": "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.3; WOW64; Trident/7.0; .NET4.0E; .NET4.0C; .NET CLR 3.5.30729; .NET CLR[ 2.0.50727](tel: 2050727); .NET CLR 3.0.30729)",
-        "user_agent.os.name": "Windows 8.1",
-        "user_agent.version": "7.0"
+        "user_agent.os.full": "Windows 8.1",
+        "user_agent.os.name": "Windows",
+        "user_agent.os.version": "8.1",
+        "user_agent.version": "11.0"
  • More details in some user agent (they look good):
-        "user_agent.os.name": "Windows 7",
+        "user_agent.os.full": "Windows 7",
+        "user_agent.os.name": "Windows",
+        "user_agent.os.version": "7",
  • More detailed versions (they look good too):
-        "user_agent.version": "54.0.2840"
+        "user_agent.version": "54.0.2840.98"

@jsoriano @kaiyan-sheng - It looks like https://github.com/elastic/elasticsearch/pull/47807 is the catalyst here. We use reg-exes from https://github.com/ua-parser/uap-core which were recently updated. I hesitant to roll back the changes introduced there since UA strings are a moving target and we should move with them. If you find errors in the parsing we should contribute that back upstream and update our parser config.

Maybe the Beats tests need to be updated with more modern UA strings ?

cc @spinscale

The one with the extra dot at the end of the version number is Firefox:

        "user_agent.original": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:50.0) Gecko/20100101 Firefox/50.0",
-       "user_agent.version": "50.0"
+       "user_agent.version": "50.0."

I hesitant to roll back the changes introduced there since UA strings are a moving target and we should move with them. If you find errors in the parsing we should contribute that back upstream and update our parser config.

Agree, I wouldn't roll back the changes. But I wonder if the new values for these Windows/IE versions (from 7.0 to 11.0) are expected.

Maybe the Beats tests need to be updated with more modern UA strings ?

Yep, we have done that by now (https://github.com/elastic/beats/pull/14190).

The one with the extra dot at the end of the version number is Firefox:

But is the dot at the end expected?

But is the dot at the end expected?

I guess not, it doesn't make much sense. I was just adding an example for what looks like another error

Was this page helpful?
0 / 5 - 0 ratings