IP grok pattern correctly parses IPv6 addresses with zone ids, but if these addresses are used with the geoip processor it fails with errors like '::1%0' is not an IP string literal..
To avoid confusion I think that geoip module should accept zone ids, even if they are not relevant in this case and ignored.
Pinging @elastic/es-core-features
@jsoriano I would love to start on this issue , but I'm newbie to this.. how can I contribute?
Hi @Asitha26, this is great :slightly_smiling_face:
To start please take a look to the contributing guidelines.
You can find the code of the geoip module using these addresses in modules/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/GeoIpProcessor.java and the code parsing the address in server/src/main/java/org/elasticsearch/common/network/InetAddresses.java.
I have been taking a quick look and the java class for IPV6 addresses (https://docs.oracle.com/javase/10/docs/api/java/net/Inet6Address.html) supports zone ids (as scope ids), we could add support for that on InetAddresses when parsing the address and when instantiating the InetAddress.
thanks, I'll have a look
Also please note that this change should constrained to the geo-ip plugin and _not_ a generic change to Elasticsearch. (see https://github.com/elastic/elasticsearch/issues/22400#issuecomment-271656246)
@jakelandis sure
I'm unable to build on my machine i have switched to jdk 11 and have set path variables but when i try importing project as a gradle on eclipse it throws an exception
@jsoriano Can this issue be closed since there are several merged pull requests that reference it?
@camilin87 merged PRs referencing this issue are workarounds mitigating this issue. The issue is still present in ES.
@Asitha26 were you able to progress on this issue?
@jsoriano no actually I'm stuck with environment issue , my gradle build wasn't success and hence couldent continue can you help me getting environment fixed
I'm able to reproduce the issue.
Here's a failing test https://github.com/camilin87/elasticsearch/commit/2a8913f463f9ce8734e56327e105f993ffc9f2d6
So far I've found the problem lies in the InetAddresses.ipStringToBytes it expects the ipString to contain only digits, colons or dots. Everything else will produce an invalid address.
The method textToNumericFormatV6 will also need some modification.
The IPv6 RFC clearly states
`%' is a delimiter character to distinguish between
and
I'll continue working on this.
Any feedback is appreciated.
I created a pull request to ignore the zone ID in IPv6 addresses https://github.com/elastic/elasticsearch/pull/37547
This will be a first step to address this issue and https://github.com/elastic/beats/issues/9836
I'll continue working on building IPv6 addresses that have the correct zone ID
@jakelandis Can you elaborate on the thinking behind your comment? Why would we add support in ingest-geoip but not in core Elasticsearch given that we previously decided against supporting it within Elasticsearch? I think that either:
My thought process was mostly around the scope of the change (both technical and functional). Supporting this fully requires more consideration and potentially impacts things lower level items like the IPData type. Supporting it only for the ingest-geoip, I see less about supporting zoneID's and more about being more lenient in what is what is accepted for a very specific case. However, in hindsight, supporting this only for that one use case would likely only change where the error happens.
It looks like @camilin87 PR #37547 started down the path of a more core change with the implicit decision to always ignore the zoneId. Always ignoring it is one way to support it, but not sure that is the direction we want to go.
Can we re-visit the previous decision ?
I don't think an error is the correct behavior since this is a valid IP address and will be encountered. But I also don't think we can simply ignore the zoneId for the core index case. (please correct me if I am wrong) We don't have any precedence (nor want) for mutating data by default. Perhaps a new ingest processor to explicitly handle this ? (let me know if I should open a new issue for discussion).
@camilin87 - We appreciate the work here, but probably best to get this sorted out before committing too much more time here.
Always ignoring it is one way to support it, but not sure that is the direction we want to go.
Makes sense.
We appreciate the work here, but probably best to get this sorted out before committing too much more time here.
Thanks. I'm almost done with the part of the change where the zoneId is read and printed
https://github.com/camilin87/elasticsearch/commits/issue_37107_part2
Are the changes in this area still relevant?
Or you want to table the issue altogether?
I know this issue may need some discussion.
I updated the PR https://github.com/elastic/elasticsearch/pull/37547 either way in case it adds some clarity.
That being said. I understand if after your discussions this issue is no longer relevant
We discussed this internally and have landed on implementing an ingest node processor to help handle IPv6 zone_ids. I have logged https://github.com/elastic/elasticsearch/issues/38064.
@jakelandis the ip processor idea sounds great.
I think that any processor accepting an IP as input should accept the IPs parsed by IP grok pattern, but having a processor that transparently handles IPs sound good too.
At the end I think that one of the following pipelines should work no matter the type of IP or if it has a zone id, so the pipelines don't have to handle specifics of IP addresses formats.
[
{ "grok": { "field": "message", "patterns":["...%{IP:source.ip}..."]} },
{ "geoip": { "field": "source.ip", target_field:.... } },
]
[
{ "grok": { "field": "message", "patterns":["...%{IP:source.ip}..."]} },
{ "ip": { "field": "source.ip" ... } },
{ "geoip": { "field": "source.ip", target_field:.... } },
]
+1 on latest version (installation this weekend)