Streetcomplete: Task: Script that imports oneway-data dump and makes it available via a web API

Created on 14 Apr 2018  路  20Comments  路  Source: westnordost/StreetComplete

I could need help with something that is not a part of this app (=not Java code). Anyone interested?

Introduction

Amongst other things, Telenav collects and aggregates data about the traffic flow direction of likely oneway roads that have not been tagged as oneway=yes in OSM yet. This data is available as a daily worldwide dump on http://missingroads.skobbler.net/dumps/OneWays/

This data can be used for the oneway-quest (#370). However, it must be made available to the app in form of a web API which can run on my webspace (Python, PHP, Ruby, Perl, perhaps more).

Mission

Your task is to write a script that

  1. downloads and imports the OneWays data dump into a MySQL database daily
  2. makes available a web API to be queried by StreetComplete to get the data within a certain bounding box

In detail:

1. Download and Import

  • The SQL table into which the data is imported needs to have the following rows: wayId, fromNodeId, toNodeId, latitude, longitude. Latitude and longitude should be the centroid of the given LINESTRING geometry. Only those rows should be imported which have a status of OPEN. Whether numberOfTrips should play a role there is TBD.
  • On each import, the previous data must be completely overwritten with the new data. During import, it should be avoided that queries happening at the same time end in an error or spit out wrong data. (Possible solution: import the new data into a new SQL table, after that delete the old table and move the new table into its place)

2. Web-API

  • If the query is an invalid bounding box, a HTTP error should be returned
  • On success, a query should ideally return a json like this. If there is no data in the given bounding box, the segments array should simply be empty. It can also be a CSV instead of a JSON if you feel this would make more sense.
{
  "segments":[
    {"wayId":1, "fromNodeId":7, "toNodeId":8},
    {"wayId":1, "fromNodeId":10, "toNodeId":12},
    {"wayId":2, "fromNodeId":23, "toNodeId":42}
  ]
}

Resources

Distance between two (on assumed spherical Earth) geo-points in meters, necessary for centroid calculation:

// see https://en.wikipedia.org/wiki/Earth_radius#Mean_radius
final double EARTH_RADIUS = 6371000; //m
// see https://en.wikipedia.org/wiki/Great-circle_navigation#cite_note-2
double distanceInMeters(double 蠁1, double 位1, double 蠁2, double 位2) // 蠁 = latitude, 位 = longitude
{
    double 螖位 = 位2 - 位1;

    double y = sqrt(sqr(cos(蠁2)*sin(螖位)) + sqr(cos(蠁1)*sin(蠁2) - sin(蠁1)*cos(蠁2)*cos(螖位)));
    double x = sin(蠁1)*sin(蠁2) + cos(蠁1)*cos(蠁2)*cos(螖位);
    return EARTH_RADIUS * atan2(y, x);
}
help wanted

Most helpful comment

:shipit:
I think this project is finished. Thank you @ENT8R and thank you @exploide :-)

Everything seems to work now https://www.westnordost.de/streetcomplete/oneway-data-api/?bbox=18,-34,19,-33

All 20 comments

Sound like a cool project! Is there a programming language which is preferred by you?

The only requirement is that it runs on my webspace, the language is the implementer's choice.

@ENT8R does this mean you picked this task?

Otherwise, three questions for @westnordost due to the webspace capabilities:

  • Can you configure something like cron jobs? Populating the database should be independent from the web API, because triggering long running tasks from a web server might cause problems such as timeouts. I would separate both scripts.
  • Supports your webspace serving WSGI applications (for Python and other) or would it need to be a CGI interface?
  • Can your main webserver (Apache/nginx) be a reverse proxy for the API application? (Otherwise it would need to handle TLS and other stuff itself, which would be more cumbersome.)

(If you won't like to answer everything in detail, feel free to point me to your hoster's documentation.)

@ENT8R does this mean you picked this task?

Actually I started some experiments and I can now generate already a CSV file with only the ways with the status OPEN with only 30 lines of code... So I will do some more steps and then decide if I bring this to an end but probably this should be relatively easy-to-do...

Can you configure something like cron jobs?

Yes

I can't answer the other questions and I wouldn't know how to find this out. The hoster is wint.global, managed with Plesk.

Wow, they have pretty bad documentation. At least I found nearly nothing. If I would do it with Python, I would ask for your assistance to find out which deployment possibilities they offer. However, since @ENT8R is doing it right now (and I assume with PHP) this is no longer important for me. Thanks anyway.

I am not sure if he is doing it, he has still a couple of other things open.

That was the reason why I asked for a commitment to the task. I don't like duplicating effort and while the data crunching and web stuff is fun and pretty straight forward, I would need to read one or two Wikipedia articles to get that LINESTRING centroid right.

I prefer to let him do the steps he wanted to do, and wait for his decision. If he won't, I can give it a try. :)

I would need to read one or two Wikipedia articles to get that LINESTRING centroid right.

Here is a hint.
Though, the algorithm could be improved by not calculating the total length beforehand and instead iterate from the start and end of the list at the same time

I think I can show you a first working example in a few hours. I will tell if the code is available on Github.

Alright. The very first version is now available on Github: https://github.com/ENT8R/oneway-data-api It is completely written in PHP
If you want to update the list just call the endpoint /update.php
To get the data for a specific bounding box (e.g. Cape Town):
/get.php?bbox=18.3072,-34.3583,19.0053,-33.4713

The main problem is currently that the data is not imported into a SQL database but only saved in a file which needs to be accessed everytime a request is made. @exploide Do you have any experience with SQL databases?

Yes, I have. Feel free to tell me how I can help.

Maybe I have time to review later. Hopefully it's not too complex due to the external dependencies you made use of. My PHP is a bit rusty, but I will see :P

The first thing I immediately spot is that your unprotected /update endpoint might be used to cause unnecessary load on the server. Additionally, when it will take a longer time, e.g. the data becomes larger or database operations slow this down, then it may fail due to HTTP timeouts. I still propose making this only available offline and triggered by a cron job.

endpoint might be used to cause unnecessary load on the server

So you want to introduce an API key? IMHO a good idea.

No, I want to have this offline xD Working on it now. If @ENT8R or @westnordost really think they want the update functionality online, then I stop this and you can go with it, but I don't know why this would be useful. Who should have a legitimate reason to trigger this update from the outside? A simple cron job invoking the php script and it's up to date everyday...

Ah, yes, of course, it may be internally done by a cron job. Anyway as for external connections, one could introduce an API key anyway. Could be useful to limit the server load.

I just submitted a WIP PR there. Maybe it's good to continue discussion there and in the other issue tracker, so less emails and notifications are emitted to the main project here.

From my point of view we have now a great working tool/API. If you (@westnordost) have some more suggestions feel free to open an issue in the other repo.

Fine. @ENT8R are there further construction sites you are aware of? Otherwise we can let @westnordost shoot a glance.

I think one could see if either the geophp library or the own geometry code could be removed, but I leave this up to you. Regarding the bounds checks, I also only adapted what you did there to the MySQL query. JSON looked like yours before, but a second round of sanity checking would be good.

After I spend the last years in the ORM world, raw SQL looked ugly to me, but for such a small page, I decided to not make use of an additional ORM library and keep the dependencies clean and easy.

So from my point of view, we are pretty much done?!

EDIT: ok, you commented just in the same moment :P Maybe update your testing site to the DB enabled version?!

I shot a glance

:shipit:
I think this project is finished. Thank you @ENT8R and thank you @exploide :-)

Everything seems to work now https://www.westnordost.de/streetcomplete/oneway-data-api/?bbox=18,-34,19,-33

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lost-geographer picture lost-geographer  路  3Comments

Helium314 picture Helium314  路  3Comments

nmxcgeo picture nmxcgeo  路  3Comments

westnordost picture westnordost  路  3Comments

Atrate picture Atrate  路  3Comments