Orientdb: Need a sample configuartion for import .osm files to OrientDB via ETL .

Created on 22 Jan 2017  路  4Comments  路  Source: orientechnologies/orientdb

Hi
I need a sample for ETL configuration to importing OSM data to OrientDB .
In my case , i'm using spatial module in orientdb and i need to store the node , ways , tags as Vertex and relationship as the Edge or something like this . i want to define the data version field and using data version in the runtime query too . after all i want to use routing functions like the astar function to find cheapest way (i don't like to use postgis routing) so i want to calculate the distance between node when ETL transfortms data .
i read ETL XML but i confused exactly .

i need help or more documentation to solve my problem .
Thanks

question

All 4 comments

Hey @saeedtabrizi do you have a example of a XML snippet for OSM and how you want to import it in OrientDB?

Hi @lvca . I tried multiple ways to handle my xml OSM data to OrientDB by using ETL but it so complicated and hard in my case .
I decided to write a java api to handle data conversion , because OrientDB ETL does not suitable for me in my case.
Now i can convert data from .osm files to orientdb . but when i want to create some edges for connecting way and nodes , i can't transform easily and is so complicated .

I think the ETL feature has lack of documentation and current documentation does not have true guides or best practices . (TD;LR ETL must be add some facilities and documentation and best practice for ETL at the future) .

Anyway i put my ETL json files here that i have success transform data from .osm to OrientDB but it is not my favor ETL process .

ETL Process for OSM Data .

  1. osm data can take from OpenStreetMap as test.osm .
  2. create the osm_node_etl.json file as below content .
{
    "config": {
        "log": "debug"
    },
    "source": {
        "file": {
            "path": "/mnt/hdd2/osmdata/test.osm"
        }
    },
    "extractor": {
        "xml": {
            "rootNode": "osm.node"
        }
    },
    "transformers": [{
            "field": {
                "fieldName": "@class",
                "value": "Osm_Node"
            }
        }, {
            "vertex": {
                "class": "Osm_Node",
                "skipDuplicates": true
            }
        },
        {
            "field": {
                "fieldName": "changeset",
                "expression": "$input.changeset.asLong()"
            }
        },
        {
            "field": {
                "fieldName": "id",
                "expression": "$input.id.asLong()"
            }
        },
        {
            "field": {
                "fieldName": "lat",
                "expression": "$input.lat.asFloat()"
            }
        },
        {
            "field": {
                "fieldName": "lon",
                "expression": "$input.lon.asFloat()"
            }
        },
        {
            "field": {
                "fieldName": "timestamp",
                "expression": "$input.timestamp"
            }
        },
        {
            "field": {
                "fieldName": "version",
                "expression": "$input.version.asLong()"
            }
        },
        {
            "field": {
                "fieldName": "visible",
                "expression": "$input.visible.asBoolean()"
            }
        },
        {
            "field": {
                "fieldName": "location",
                "expression": "'POINT('+ $input.lon + ' ' + $input.lat + ')' "
            }
        },
        {
            "field": {
                "fieldName": "location",
                "expression": "St_GeomFromText($input.location)"
            }
        },
        {

            "field": {
                "fieldName": "user",
                "operation": "remove"
            }


        },
        {
            "code": {
                "language": "Javascript",
                "code": " var sdf = new java.text.SimpleDateFormat(\"yyyy-MM-dd'T'HH:mm:ss'Z'\"); record.field('timestamp',sdf.parse(record.field('timestamp')))   ;var tags =  new java.util.HashMap();var tag = record.field('tag'); if(tag){ print('has tags ==> ' + tag) ; tag.forEach(function(ix){ var k = ix.field('k').replaceAll(':','_').replaceAll('-','__'); tags.put(k, ix.field('v')) ;print('has key ==>' + ix.field('k'));  } ); record.field('tags',tags,com.orientechnologies.orient.core.metadata.schema.OType.EMBEDDEDMAP) ; };  "
            }
        },
        {
            "field": {
                "fieldName": "tag",
                "operation": "remove"
            }
        }
    ],
    "loader": {
        "orientdb": {

            "dbURL": "plocal:./../databases/MapDB",
            "dbUser": "admin",
            "dbPassword": "admin",
            "dbType": "graph",
            "dbAutoCreate": true,
            "tx": false,
            "batchCommit": 1000,
            "wal": false,
            "classes": [{
                    "name": "Osm_Node",
                    "extends": "V"
                },
                {
                    "name": "Osm_Way",
                    "extends": "V"
                },
                {
                    "name": "HasWayTo",
                    "extends": "E"
                }
            ],
            "indexes": [{
                "class": "Osm_Node",
                "fields": ["id:long"],
                "type": "UNIQUE"
            }]
        }
    }
}
  1. add the osm_way_etl.json file as below content .

{
    "config": {
        "log": "debug"
    },
    "source": {
        "file": {
            "path": "/mnt/hdd2/osmdata/test.osm"
        }
    },
    "extractor": {
        "xml": {
            "rootNode": "osm.way"
        }
    },
    "transformers": [{
            "field": {
                "fieldName": "@class",
                "value": "Osm_Way"
            }
        }, {
            "vertex": {
                "class": "Osm_Way",
                "skipDuplicates": true
            }
        },
        {
            "field": {
                "fieldName": "changeset",
                "expression": "$input.changeset.asLong()"
            }
        },
        {
            "field": {
                "fieldName": "id",
                "expression": "$input.id.asLong()"
            }
        },
        {
            "field": {
                "fieldName": "timestamp",
                "expression": "$input.timestamp"
            }
        },
        {
            "field": {
                "fieldName": "version",
                "expression": "$input.version.asLong()"
            }
        },
        {
            "field": {
                "fieldName": "visible",
                "expression": "$input.visible.asBoolean()"
            }
        },
        {

            "field": {
                "fieldName": "user",
                "operation": "remove"
            }


        },
        {
            "code": {
                "language": "Javascript",
                "code": " var sdf = new java.text.SimpleDateFormat(\"yyyy-MM-dd'T'HH:mm:ss'Z'\"); record.field('timestamp',sdf.parse(record.field('timestamp')))   ;var tags =  new java.util.HashMap();var tag = record.field('tag'); if(tag){ print('has tags ==> ' + tag) ; tag.forEach(function(ix){ var k = ix.field('k').replaceAll(':','_').replaceAll(' ','_').replaceAll(',','_').replaceAll('-','__') ; tags.put(k, ix.field('v') ) ; print('has key ==>' + k);  } ); record.field('tags' , tags , com.orientechnologies.orient.core.metadata.schema.OType.EMBEDDEDMAP ) ; };  "
            }
        }
    ],
    "loader": {
        "orientdb": {

            "dbURL": "plocal:./../databases/MapDB",
            "dbUser": "admin",
            "dbPassword": "admin",
            "dbType": "graph",
            "dbAutoCreate": true,
            "tx": false,
            "batchCommit": 1000,
            "wal": false,
            "classes": [{
                    "name": "Osm_Node",
                    "extends": "V"
                },
                {
                    "name": "Osm_Way",
                    "extends": "V"
                },
                {
                    "name": "HasWayTo",
                    "extends": "E"
                }
            ],
            "indexes": [{
                "class": "Osm_Way",
                "fields": ["id:long"],
                "type": "UNIQUE"
            }]
        }
    }
}

For each nd element is the way element we must create and edge . so i need to access the vertex that i converted in step 2 .
This configuration works well and convert data from xml to orientdb now . but this is not my favor solution and its so complicated . i prefer to use a java api program to handle my problem now . may be i write a plugin for orientdb to handle OSM data conversion like the postgis (postgres) .

This issue can be closed .
Thanks .

@saeedtabrizi thanks for the JSON files, it could be useful to some users. If you can create something generic, we would more than happy to include it in the GeoSpatial module and maybe also in Studio!

@lvca with the pleasure . i handle it asap . (may be in this week)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bbourgois picture bbourgois  路  3Comments

akizze01 picture akizze01  路  3Comments

tglman picture tglman  路  5Comments

andreyvk picture andreyvk  路  4Comments

fpertl picture fpertl  路  4Comments