Elasticsearch version: 2.2 - 5.1
JVM version:
openjdk version "1.8.0_72-internal"
OpenJDK Runtime Environment (build 1.8.0_72-internal-b15)
OpenJDK 64-Bit Server VM (build 25.72-b15, mixed mode)
OS version:
Alpine Linux v3.4 in Docker 1.12.5 Stable on Mac OS 10.12.2
Description of the problem including expected versus actual behavior:
Envelopes create a bounding box that won't cross longitude 180 ( or -180, roughly the international dateline).
Steps to reproduce:
ShapBuilder, in parseEnvelope, modifies the upper left and lower right coordinates. It sets the upper left to the minimum longitude and maximum latitude of the coordinates provided. Lower right gets the maximum longitude and minimum latitude of the coordinates provided. This means that an envelope of [[170, 10], [10, -10]] will not find a polygon with a coordinate or (179,1) as part of a geo_shape query using envelope because the bounds of the envelope will be modified to [[10, 10], [170, -10]]. It's fine to modify the latitude coordinates setting the maximum to the upper left, and the minimum to the upper right, but the longitude coordinates should not be changed.
Here is a simple Sense script that demonstrates the problem. It creates two polygons on the equator, one on the prime meridian and one on the international dateline. It then searches the polygon on the international dateline using an identical polygon and envelope search. The polygon search returns the correct document, the envelope search returns the wrong document.
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"location": {
"type": "geo_shape"
}
}
}
}
}
PUT my_index/my_type/1
{
"text": "Geo-shape as an polygon on dateline and equator",
"location": {
"type":"polygon",
"coordinates":[[
[179,1],
[179,-1],
[-179,-1],
[-179,1],
[179,1]
]]
}
}
PUT my_index/my_type/2
{
"text": "Geo-shape as an polygon on prime meridian and equator",
"location": {
"type":"polygon",
"coordinates":[[
[1,1],
[1,-1],
[-1,-1],
[-1,1],
[1,1]
]]
}
}
GET my_index/_search
{
"query":{
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_shape": {
"location": {
"shape": {
"type": "polygon",
"coordinates" : [[
[170, 10],
[170, -10],
[-170, -10],
[-170, 10],
[170, 10]
]]
},
"relation": "within"
}
}
}
}
}
}
GET my_index/_search
{
"query":{
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_shape": {
"location": {
"shape": {
"type": "envelope",
"coordinates" : [
[170, 10],
[-170, -10]
]
},
"relation": "within"
}
}
}
}
}
}
@nknize please could you take a look
Still reproduces in master, but it might make sense to revisit this after #32039 since it changes most of this code.
That still seems to be an issue after #35320.
Hi,
the problem here exists since this change: https://github.com/elastic/elasticsearch/pull/9091
The problem is that an "envelope" has a different meaning than a "bounding box". The above fix is there to exchange the coordinates if the envelope is not in "cartesian order". If Elasticsearch would support "bounding boxes" (which have a special meaning in the GIS world), then the order of coordinates would be defined (topLeft first, then bottomRight or better northWest, southEast). In that case a box crossing the date line would have a x coordinate (longitude) on the eastern bound of the box that is smaller than the west bound longitude (e.g., west bound is 179, east bound is -179, so its a box crossing date line and spans 2 degrees). In that case its clear that it crosses date border.
With envelopes, it's just a box and Elasticsearch just corrects it to be a box in cartesian form, which is broken for spherical coordinates.
The work around we are intending at PANGAEA is to use a geometryCollection while indexing and while searching that has 2 separate envelopes (on both sides of date line). This would also solve the user's original request. Of course using polygons is also fine, but that's even more complicated to handle correctly if your own APIs just handle with bboxes.
IMHO, Elasticsearch should add another GeoShape type as "bbox" that has the common bounding box semantics used in WGS84.
IMHO, Elasticsearch should add another GeoShape type as "bbox" that has the common bounding box semantics used in WGS84.
:+1: I agree.
BBox should follow subclause 10.2.5 and D.13 of the OGC Web Service Common Implementation Specification; specifically D.13:
The bounding box contents defined in Subclause 10.2 will not always specify the
MINIMUM rectangular BOUNDING region, if the referenced CRS uses an Ellipsoidal,
Spherical, Polar, or Cylindrical coordinate system.
.
.
.
b.) ... (The LowerCorner would no longer always use the minimum value, and
the UpperCorner would no longer always use the maximum value. The value at the
LowerCorner can be greater than at the UpperCorner when this bounding box crosses
the value discontinuity.)
Hi @nknize,
I agree. In GML or ISO19115 metadata (that uses GML) the coordinates in the bbox data type are already named westBoundLongitude, southBoundLatitude, northBoundLatitude and eastBoundLongitude. With that definition there is no discussion needed, west longitude can definitely be numerically larger that east longitude when it crosses dateline.
Problem here is GeoJSON which uses X/Y and uses terms like min/max.
But all tools out there (like Google Maps) where you definitely need to implement cross date line bboxes use the GML definition.
Problem here is GeoJSON which uses X/Y and uses terms like min/max.
That's not correct as of the publication of RFC7946:
[easting/longitude, northing/latitude, [height]][west, south, [min-height,] east, north, [max-height]]it specifically discusses the antimeridian with respect to bounding boxes
Yes, but the bbox is just metadata for the JSON file. It's not defined as a geometry. But yes, you are right.
In general in GeoJSON you should split geometries at date line, but Elasticsearch does not require this for all other datatypes except envelope.
Most helpful comment
:+1: I agree.
BBoxshould follow subclause 10.2.5 and D.13 of the OGC Web Service Common Implementation Specification; specifically D.13: