Hi,
I was thinking about implementing target_datetime in some of the parsers, and came up with the following questions. If you let me know what you prefer, I can update comments in example parser and README to match :+1:
target_datetime doesn't exactly match available data?closest_in_time_key logic that would do the former - should that be the guideline?target_datetime parameter. Is this required, or is it fine to return a list normally but a dict when called with target_datetime?Thanks a lot jarek for your question, we haven't finished the specifications, but maybe your input could help. And your questions definitely help. I'm currently changing the ENTSOE.py code so that's not a reference.
In general, since there are many parsers and only one function to launch them all (let's call it launch_parser), the logic should be as much as possible in launch_parser so that the parser get be as simple as possible.
Ideally, the parser should return as many values as possible (but keeping it simple, only one query) starting from the target_datetime. If with a single query the parser can get data for a whole day (three days / 5 hours ...), it should return the data for the whole day (three days / 5 hours ...).
If the parser only fetches a single datapoint, it should return a single datapoint.
We will keep a hard-coded record of the timespan each parser can fetch. The idea is that if we're missing a week, we'll launch the parser once for every day if it can return for a whole day, or once for every hour if it can handle only a single datapoint.
If the parser cannot fetch for the required datetime (whatever the reason), it should return None.
So regarding more specifically your questions :
target_datetime (whatever the time difference between the two, launch_parser will throw away values that are too far from the requested datetime) If two datapoints are as close, return any or both. When returning the datapoint, the datetime value should correspond to the datapoint datetime, not the requested datetime.target_datetime if you can only get data for the last 24 hours. In that case, whatever the (non-None) target_datetime, always return data for the last 24 hours is perfectly valid (launch_parser will throw away the values too far from the target_datetime)This may not be super clear, don't hesitate if something's not clear or if you feel we can do something easier / better.
@corradio I believe it's what we talked about, don't hesitate to react if it's not or if something wasn't clear
I think that pretty much sums it up. Thanks @maxbellec !
I might add that in this iteration we're trying to stay as agile as possible and so we're optimising for simplicity rather than future scalability.
With that in mind, we might want to add more information to parsers themselves in the future to optimise things further - but for now, we're keeping it simple.
Okay, thanks!
To try to summarize:
target_datetime, with guideline being the amount of data returned in one HTTP request by source APII think that makes sense - certainly it does for now.
We talked about it again with @corradio. @jarek I'll steal your summary and add:
target_datetime means datetime for the latest data the parser will return. So if the parser returns data for 24h hours, it should return data from 24 hours before target_datetime until target_datetime. The idea is that live data can be treated by simply doing target_datetime=datetime.datetime.now()target_datetime, with guideline being the amount of data returned in one HTTP request by source APII'll adapt example.py as a consequence
This looks fine now after #1237, I'll close it. Thanks!