We can discuss about Oracle filters here:
Initial discussion:
https://github.com/neo-project/neo/pull/1243#discussion_r347913873
Shall we use filters in the oracle syscall?
(Main topic discussion Syscall and ApplicationEngine https://github.com/neo-project/neo/issues/1275)
My opinion.
Yes, the oracle must do this process. They will be paid for it. And it will optimize the queries, the spaces and also it could allow to consume third party services.
@neo-project/core What do you think? I need your opinion.
I need your opinion.
This is my pros and cons.
Pros:
Cons (one, but a big one):
As for filtering, we have the JSON API, so the contract itself can easily filter JSON.
From: https://github.com/neo-project/neo/pull/1243#discussion_r348284365
Yes, if the developer need the complete json, they can do that with an empty filter. But usually will be faster will a filter.
This is an interesting discussion. Initially, oracle was just supposed to use url and dump data on tx headers (signed by oracle). The size I'm imagining is not megabytes, but bytes or kilobytes at maximum. If we stick to this scenario, oracle filter is not too important. But I agree it may reduce chain space, if only part of data is useful.
Filtering by json or regex is already needed by other smart contracts, so it should be in general syscall section. What we discuss here is if oracles should do other processing besides data download, right? For example, they could filter, count, and in worst case, perform loops... all of this should be paid, but advantage is that output if this operation would be "data", correct?
shouldnt we consider the creation of OracleTrigger execution mode, and attach that "oracle script" on tx header attribute? It would execute DownloadData only at this trigger, and this way we could decide which other operations are valid for oracles, and output if this trigger would be a bytearray of data, to be appended on tx.
This logic could allow other things, such as specific certificate validation, etc, etc... in this case, it would not be just a data filtering, but data preprocessing.
Oracle protocol should be considered as a generic mechanism to interact with external systems. However introducing complex data preprocessing instead of simple predictable filtering could be overkill. =)
@igormcoelho propuse to create a syscall for filtering. Is not a bad idea. But this is a different thing because if you don't have the filter inside the oracle. You can't extract this information and agree the name between different nodes.
{"name":"NEO","time":1231312124124124}
So, in the end, its like @erikzhang said, we will need a standard/spec for oracle access. Because these filters, if integrated directly into oracle processing (without explicit syscalls configurable by user), will require a syntax for users to explore it.
whats the proposed syntax/language? Filter by field? Filter by regex? What kinds of fields are elegible for filtering... text only? Are there numeric filters as well?
I know we cannot solve all world's problems here, but adding a feature which is too constrained and embed into protocol, will certainly make people want to extend it in the future, messing with oracle protocol.
@realloc I dont think its overkill, its not bad to use NeoVM there, because its not just a general oracle, but NeoOracle. Its also coherent to efficient mapreduce techniques, where local reduction/preprocessing implies much less data transit and storage for global system.
@igormcoelho There are already standards for filtering. For JSON it should be JSONPath and XPath for XML. For plain text a safe subset of regex may be used to keep things simple and predictable for a start.
Having NeoVM bytecode for complex data preprocessing may introduce unpredictable execution complexity and will require to add GAS limitation on the execution of filters and eventually we may find one more Neo network there. Maybe it would be better to keep Oracles with simple filters and leave that complex preprocessing to future version of Neo, when it would have sharding features.
I know there standards for json and xml, and I know both JSONPath and XPath. Yet, the question is: are they enough for the purposes of oracle filtering? If so, how to represent both in a unified manner? In other thread we discuss allowed data formats, this supposes we will adopt only json, xml and text, but not binary and pure html, which is standard for most existing web. Will we leave space to extend this in the future? Will we be able to easily extend this for new formats?
Maybe we should discuss the data format first.
I think we should keep this as simple as possible. JSON/JSONPath is enough for the first implementation. I think we should focus on that only now.
As we have decided in the last conversations, the first filter we are going to implement is JSON PATH.
An idea that @igormcoelho highlighted is the possibility of implementing oracle filters as a smart contract.
Advantages:
Disadvantages:
We can use a contract hash for filters, we can create a syscall for Filter.Json.XPath also we can create a native contract for use this syscall by default
If the filter is a native contract, the advantage of a single implementation between all systems disappears.
There would only be the advantage that the filter can be used externally of the Oracle, or does it have any more advantages?
If the filter is a native contract, the advantage of a single implementation between all systems disappears.
But you have the other advatange, users can create his own filter.
Most helpful comment
@neo-project/core What do you think? I need your opinion.