Electricitymap-contrib: Australia AUS 🇦🇺 returning data from the future - switch data source?

Created on 12 Mar 2018  ·  15Comments  ·  Source: tmrowco/electricitymap-contrib

Seen constantly over about 2 days for every zone. Flagged by quality.py.
Logger

Traceback (most recent call last):
  File "feeder_electricity.py", line 157, in fetch_production
    validate_production(obj, country_code)
  File "/home/electricitymap/parsers/lib/quality.py", line 40, in validate_production
    raise Exception("Data from %s can't be in the future" % zone_key)
Exception: Data from AUS-SA can't be in the future
bug 🐞 help wanted parser

Most helpful comment

OpenNEM have a python repo https://github.com/opennem/opennempy that we could use to work with the data returned by the web api.

One possibility is that instead of switching the data source entirely we create some new zones that don't display on the map (e.g. AUS-NSW-ON), write a new parser to work with this data (should be simple) and do a comparison to see which is better. Then we can make an informed choice on what source to use.

All 15 comments

Logs in Kibana:
https://kibana.electricitymap.org/app/kibana#/discover?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:now-12h,mode:quick,to:now))&_a=(columns:!(_source),index:'93e631f0-245f-11e8-a779-9d01de8d7a71',interval:auto,query:(language:lucene,query:'"data%20from%20aus"'),sort:!('@timestamp',desc))

We really need to see what time it's returning to understand this. Here is the line in quality.py that is testing the datetime key https://github.com/tmrowco/electricitymap/blob/master/parsers/lib/quality.py#L39

On a quick read of those logs the error is happening approximately every 13mins or integer multiples of 13.

There is some weirdness here.

I've been running the parser trying to catch it, and right around the 5 minute mark, the data returned is unstable, like it comes from two load-balanced servers where one updates faster than another:

(venv3)jarek@x1:~/projects/electricitymap/parsers$ date && python AU.py && date
Tue Mar 20 20:44:57 CET 2018
fetch_production("AUS-NSW") ->
{'capacity': {...clipped...}, 'production': {...clipped...}, 'source': 'aremi.nationalmap.gov.au, pv-map.apvi.org.au', 'datetime': datetime.datetime(2018, 3, 21, 5, 40, tzinfo=tzoffset(None, 36000)), 'zoneKey': 'AUS-NSW', 'storage': {}}
fetch_production("AUS-QLD") ->
{'capacity': {...clipped...}, 'production': {...clipped...}, 'source': 'aremi.nationalmap.gov.au, pv-map.apvi.org.au', 'datetime': datetime.datetime(2018, 3, 21, 5, 45, tzinfo=tzoffset(None, 36000)), 'zoneKey': 'AUS-QLD', 'storage': {}}
fetch_production("AUS-SA") ->
{'capacity': {...clipped...}, 'production': {...clipped...}, 'source': 'aremi.nationalmap.gov.au, pv-map.apvi.org.au', 'datetime': datetime.datetime(2018, 3, 21, 5, 40, tzinfo=tzoffset(None, 36000)), 'zoneKey': 'AUS-SA', 'storage': {'battery': -0.0}}
fetch_production("AUS-TAS") ->
{'capacity': {...clipped...}, 'production': {...clipped...}, 'source': 'aremi.nationalmap.gov.au, pv-map.apvi.org.au', 'datetime': datetime.datetime(2018, 3, 21, 5, 45, tzinfo=tzoffset(None, 36000)), 'zoneKey': 'AUS-TAS', 'storage': {}}
fetch_production("AUS-VIC") ->
{'capacity': {...clipped...}, 'production': {...clipped...}, 'source': 'aremi.nationalmap.gov.au, pv-map.apvi.org.au', 'datetime': datetime.datetime(2018, 3, 21, 5, 40, tzinfo=tzoffset(None, 36000)), 'zoneKey': 'AUS-VIC', 'storage': {}}
Tue Mar 20 20:45:12 CET 2018

AUS-VIC and AUS-SA had data timestamped 5:40, even though they were requested _after_ zones that had data timestamped for 5:45. All zones are coming from the same CSV URL.

Caught some failures, too (added some more info to quality logger to help):

(venv3)jarek@x1:~/projects/electricitymap/parsers$ date && python AU.py && date
Tue Mar 20 20:47:51 CET 2018
fetch_production("AUS-NSW") ->
{'storage': {}, 'source': 'aremi.nationalmap.gov.au, pv-map.apvi.org.au', 'datetime': datetime.datetime(2018, 3, 21, 5, 45, tzinfo=tzoffset(None, 36000)), 'zoneKey': 'AUS-NSW', 'capacity': {...clipped...}, 'production': {...clipped...}}
fetch_production("AUS-QLD") ->
{'storage': {}, 'source': 'aremi.nationalmap.gov.au, pv-map.apvi.org.au', 'datetime': datetime.datetime(2018, 3, 21, 5, 45, tzinfo=tzoffset(None, 36000)), 'zoneKey': 'AUS-QLD', 'capacity': {...clipped...}, 'production': {...clipped...}}
fetch_production("AUS-SA") ->
Traceback (most recent call last):
  File "AU.py", line 575, in <module>
    print(fetch_production('AUS-SA'))
  File "AU.py", line 430, in fetch_production
    quality.validate_production(data, zone_key)
  File "/home/jarek/projects/electricitymap/parsers/lib/quality.py", line 47, in validate_production
    (zone_key, data_time, arrow.now()))
Exception: Data from AUS-SA can't be in the future, data was 2018-03-21T05:50:00+10:00, now is 2018-03-20T20:48:06.227934+01:00

(venv3)jarek@x1:~/projects/electricitymap/parsers$ date && python AU.py && date
Tue Mar 20 20:49:39 CET 2018
fetch_production("AUS-NSW") ->
Traceback (most recent call last):
  File "AU.py", line 571, in <module>
    print(fetch_production('AUS-NSW'))
  File "AU.py", line 430, in fetch_production
    quality.validate_production(data, zone_key)
  File "/home/jarek/projects/electricitymap/parsers/lib/quality.py", line 47, in validate_production
    (zone_key, data_time, arrow.now()))
Exception: Data from AUS-NSW can't be in the future, data was 2018-03-21T05:50:00+10:00, now is 2018-03-20T20:49:44.403172+01:00

(venv3)jarek@x1:~/projects/electricitymap/parsers$ date && python AU.py && date
Tue Mar 20 20:51:34 CET 2018
fetch_production("AUS-NSW") ->
Traceback (most recent call last):
  File "AU.py", line 571, in <module>
    print(fetch_production('AUS-NSW'))
  File "AU.py", line 430, in fetch_production
    quality.validate_production(data, zone_key)
  File "/home/jarek/projects/electricitymap/parsers/lib/quality.py", line 47, in validate_production
    (zone_key, data_time, arrow.now()))
Exception: Data from AUS-NSW can't be in the future, data was 2018-03-21T05:55:00+10:00, now is 2018-03-20T20:51:37.328255+01:00

At 05:51 I downloaded the CSV and _all_ timestamps were 05:55! There was nothing with 05:50.

@corradio how do we proceed here? It doesn't happen always, but sometimes the data is just too far ahead. Can we just subtract 5 minutes? We do that with AUS exchanges, but that's a different data source.

That's a tough one. I think we need to clarity to understand how the data is timestamped. Is there any place where we can look up if the timestamp depicts the end or the start of the interval?
It seems like here it would depict the end, but it would be nice to be sure.

Possible different source for AUS data, from systemcatch's comment in https://github.com/tmrowco/electricitymap/issues/799#issuecomment-372155243 :

There is the api by global roam https://ausrealtimefueltype.global-roam.com/api/SeriesSnapshot?time= which provides data for http://reneweconomy.com.au/nem-watch/. We tried using it for the tesla battery but changed to another source after a while. It's 5 min aggregated by type in json format. I think you can specify historical timestamps (no luck yet).

I already see that that endpoint also returns dates 4-5 minutes into the future. But we could try investigating if it's safe to subtract 5 minutes, and if the source is trustworthy, comparable to the data we're using now, and/or more reliable.

Further possible different source, from @alixunderplatz 's comment in Slack: http://opennem.org.au/#/all-regions - appears to have 5 minute granularity.

Currently, particularly for South Australia, the data we show in EM is much more "jumpy" - though that might also be caused by us rejecting lots of datapoints because they're 2 minutes too early.

That data source looks nice. Doesn't seem to be returning data from the future, split by type so we don't have to rely on plant mapping and you can query by region. Also price data by type of fuel?

One thing that could make it a bit more complicated with this data source are the exports and imports, which are only given as total values (maybe even as "net exchange balance" per state?).
However, this only affects NSW and VIC and could probably be solved by some steps of logic, because QLD, SA and TAS are the edges of the system with only one direction of flow at a time.
To avoid this, we'd have to keep the recent source for exchanges, or look at the data sources of openNEM, or ask the creators if they could easily include the breakdown :)

@systemcatch price data by type of fuel is probably a volume weighted, averaged price based on when electricity was generated by which type at which timestamp during the given period.

Hey guys. Would it help if we switched to OpenNEM?
See https://opennem.org.au/#/regions/qld
They have a neat API that returns JSON: https://data.opennem.org.au/power/qld1.json

OpenNEM have a python repo https://github.com/opennem/opennempy that we could use to work with the data returned by the web api.

One possibility is that instead of switching the data source entirely we create some new zones that don't display on the map (e.g. AUS-NSW-ON), write a new parser to work with this data (should be simple) and do a comparison to see which is better. Then we can make an informed choice on what source to use.

That could work (@oli?) but just to be sure

  • Are we still experimenting issues with current Australia parsers?
  • Can their API go back in time to harmonize historical datasets if we switch to that one?
  • The net cross-border flow is a bit of an enigma. We could maybe model the flow between two regions in order to minimize loop flows, right? But will it be the same?

Let's figure out if its worth it!

You mentioned the wrong oli (I'm @corradio). I don't think we should create
other zones, but we can always have a cron job running somewhere to collect
the data and compare

On Thu, Dec 6, 2018 at 3:53 PM Bruno Lajoie notifications@github.com
wrote:

That could work (@oli https://github.com/oli?) but just to be sure

  • Are we still experimenting issues with current Australia parsers?
  • Can their API go back in time to harmonize historical datasets if we
    switch to that one?
  • The net cross-border flow is a bit of an enigma. We could maybe
    model the flow between two regions in order to minimize loop flows, right?
    But will it be the same?

Let's figure out if its worth it!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tmrowco/electricitymap-contrib/issues/1196#issuecomment-444897338,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABlEKKNYZkxemXvnUvioBPU30PgpHWLfks5u2S96gaJpZM4SmeKa
.

Any news here?
I've seen https://github.com/opennem/opennempy which could be very helpful
Their dashboard looks nice:
https://opennem.org.au/#/all-regions

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ghost picture ghost  ·  5Comments

corradio picture corradio  ·  5Comments

corradio picture corradio  ·  4Comments

systemcatch picture systemcatch  ·  5Comments

alixunderplatz picture alixunderplatz  ·  3Comments