Elasticsearch: Geoip not able to use IPv6 address with zone ID

Created on 3 Jan 2019 · 20Comments · Source: elastic/elasticsearch

IP grok pattern correctly parses IPv6 addresses with zone ids, but if these addresses are used with the geoip processor it fails with errors like '::1%0' is not an IP string literal..

To avoid confusion I think that geoip module should accept zone ids, even if they are not relevant in this case and ignored.

Seen in https://github.com/elastic/beats/issues/9836

:CorFeatureIngest CorFeatures

Source

jsoriano

All 20 comments

Pinging @elastic/es-core-features

elasticmachine on 3 Jan 2019

@jsoriano I would love to start on this issue , but I'm newbie to this.. how can I contribute?

Asitha26 on 8 Jan 2019

Hi @Asitha26, this is great :slightly_smiling_face:

To start please take a look to the contributing guidelines.

You can find the code of the geoip module using these addresses in modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpProcessor.java and the code parsing the address in server/src/main/java/org/elasticsearch/common/network/InetAddresses.java.

I have been taking a quick look and the java class for IPV6 addresses (https://docs.oracle.com/javase/10/docs/api/java/net/Inet6Address.html) supports zone ids (as scope ids), we could add support for that on InetAddresses when parsing the address and when instantiating the InetAddress.

jsoriano on 8 Jan 2019

👍1

thanks, I'll have a look

Asitha26 on 8 Jan 2019

Also please note that this change should constrained to the geo-ip plugin and _not_ a generic change to Elasticsearch. (see https://github.com/elastic/elasticsearch/issues/22400#issuecomment-271656246)

jakelandis on 8 Jan 2019

@jakelandis sure

Asitha26 on 9 Jan 2019

I'm unable to build on my machine i have switched to jdk 11 and have set path variables but when i try importing project as a gradle on eclipse it throws an exception
capture

Asitha26 on 9 Jan 2019

@jsoriano Can this issue be closed since there are several merged pull requests that reference it?

cshtdd on 15 Jan 2019

@camilin87 merged PRs referencing this issue are workarounds mitigating this issue. The issue is still present in ES.

jsoriano on 15 Jan 2019

👍1

@Asitha26 were you able to progress on this issue?

jsoriano on 15 Jan 2019

@jsoriano no actually I'm stuck with environment issue , my gradle build wasn't success and hence couldent continue can you help me getting environment fixed

Asitha26 on 16 Jan 2019

I'm able to reproduce the issue.

Here's a failing test https://github.com/camilin87/elasticsearch/commit/2a8913f463f9ce8734e56327e105f993ffc9f2d6

So far I've found the problem lies in the InetAddresses.ipStringToBytes it expects the ipString to contain only digits, colons or dots. Everything else will produce an invalid address.

The method textToNumericFormatV6 will also need some modification.

The IPv6 RFC clearly states

`%' is a delimiter character to distinguish between
and

I'll continue working on this.
Any feedback is appreciated.

cshtdd on 16 Jan 2019

I created a pull request to ignore the zone ID in IPv6 addresses https://github.com/elastic/elasticsearch/pull/37547

This will be a first step to address this issue and https://github.com/elastic/beats/issues/9836

I'll continue working on building IPv6 addresses that have the correct zone ID

cshtdd on 16 Jan 2019

@jakelandis Can you elaborate on the thinking behind your comment? Why would we add support in ingest-geoip but not in core Elasticsearch given that we previously decided against supporting it within Elasticsearch? I think that either:

we support it in both places (so revisiting the previous decision)
we support it in neither place

jasontedor on 16 Jan 2019

My thought process was mostly around the scope of the change (both technical and functional). Supporting this fully requires more consideration and potentially impacts things lower level items like the IPData type. Supporting it only for the ingest-geoip, I see less about supporting zoneID's and more about being more lenient in what is what is accepted for a very specific case. However, in hindsight, supporting this only for that one use case would likely only change where the error happens.

It looks like @camilin87 PR #37547 started down the path of a more core change with the implicit decision to always ignore the zoneId. Always ignoring it is one way to support it, but not sure that is the direction we want to go.

Can we re-visit the previous decision ?

I don't think an error is the correct behavior since this is a valid IP address and will be encountered. But I also don't think we can simply ignore the zoneId for the core index case. (please correct me if I am wrong) We don't have any precedence (nor want) for mutating data by default. Perhaps a new ingest processor to explicitly handle this ? (let me know if I should open a new issue for discussion).

@camilin87 - We appreciate the work here, but probably best to get this sorted out before committing too much more time here.

jakelandis on 16 Jan 2019

Always ignoring it is one way to support it, but not sure that is the direction we want to go.

Makes sense.

We appreciate the work here, but probably best to get this sorted out before committing too much more time here.

Thanks. I'm almost done with the part of the change where the zoneId is read and printed

https://github.com/camilin87/elasticsearch/commits/issue_37107_part2

Are the changes in this area still relevant?
Or you want to table the issue altogether?

cshtdd on 16 Jan 2019

I know this issue may need some discussion.
I updated the PR https://github.com/elastic/elasticsearch/pull/37547 either way in case it adds some clarity.

That being said. I understand if after your discussions this issue is no longer relevant

cshtdd on 17 Jan 2019

We discussed this internally and have landed on implementing an ingest node processor to help handle IPv6 zone_ids. I have logged https://github.com/elastic/elasticsearch/issues/38064.

jakelandis on 31 Jan 2019

👍1

@jakelandis the ip processor idea sounds great.

I think that any processor accepting an IP as input should accept the IPs parsed by IP grok pattern, but having a processor that transparently handles IPs sound good too.

At the end I think that one of the following pipelines should work no matter the type of IP or if it has a zone id, so the pipelines don't have to handle specifics of IP addresses formats.

[
{ "grok": { "field": "message", "patterns":["...%{IP:source.ip}..."]} },
{ "geoip": { "field": "source.ip", target_field:.... } },
]

[
{ "grok": { "field": "message", "patterns":["...%{IP:source.ip}..."]} },
{ "ip": { "field": "source.ip" ... } },
{ "geoip": { "field": "source.ip", target_field:.... } },
]

jsoriano on 31 Jan 2019

+1 on latest version (installation this weekend)