Orientdb: Old Parser Deprecation for 3.0

Created on 12 Jun 2016  路  7Comments  路  Source: orientechnologies/orientdb

I was was wondering if there are plans to deprecate the "non-strict" mode of Parsing some time in the future (perhaps 3.0). As I have worked on the grammar and parsers, I have noticed that we are essentially parsing twice now. Clearly this is inefficient, since we now parse, then generate a string, then parse again. However, the efficiency is not the concern, it is the maintainability of the code.

The JavaCC parsing is very clean and robust. We get nice clean objects that come out and can be used to execute commands. I find the old style parsing to be very tricky and buggy. There is a lot of technical debt in the old parser.

I was wondering if you have considered breaking backwards compatibility in 3.0 to the old style of parsing and migrating to only using the new parse. This would allow us to clean up and remove a lot of code.

This could be considered a question or a feature enhancement. So I wasn't sure where to post.

Thanks!

question

All 7 comments

Hi @mmacfadden

I'll give you a precise answer: YES! 馃槃

The old parser is probably the oldest component in OrientDB, its first version was far smaller and less complex than current one, it evolved in an unstructured way and the result is exactly what you described.
The same is for the query executor (in fact there is no strict separation between the parser and the executor).

What we will do in v 3.0 is:

  • drop the old parser and leave only the JavaCC based grammar
  • review the whole grammar to get rid of inconsistencies and to strictly define all the behaviors, especially where we know we have some lacks (eg. usage of LET variables and aliases in WHERE condition, ORDER BY, GROUP BY, manipulation of embedded documents, review of tricky operators like expand() and distinct(), review return types for statements, especially in batch script, an many many more)
  • separate the parser from the executor
  • re-write the executor layer, logically separating the phases of

    • query tree optimization

    • execution planning

    • query execution

  • take into consideration important factors like multi-core and distributed architectures at query optimization time.

Together with this, we will also review some low level APIs to have a better management of streaming result-sets.

The goals of this challenging review are:

  • SQL layer consistency
  • stability
  • performance (speed, but also ability to monitor and control execution plans)
  • code quality

The first side-effect of all this will be for sure the deletion of a lot of old code ;-)

V 2.2 was released a couple of weeks ago, until now we concentrated on monitoring v 2.2 in the wild and fixing last-minute issues. In next few days we will officially start the development of v 3.0

Thanks

Luigi

Great. I would love to lend a hand when the time comes.

We're getting rid of some legacy things, the SQL engine is one of the most important piece we're rewriting from scratch. I can't wait that @luigidellaquila make it.

@lvca I have been spending a lot of time with the new parser and I really like it. I think it will allow us do more easily add new features and make the language more consistent. And also provide better error handling. Again, I am willing to help out!

In the v3.0 the SQL engine will be much smarter on using indexes and lighter on executing queries. This is a major version, so we're going to need much help on this ;-)

Sounds good! It would be great if we could start a discussion thread some point in the future to work on design goals, so I can try to follow along with you guys!

Although I can't help with the programming side, I'll be supporting ODB too, as much as I can. To me, I feel 3.0 will be THE breakthrough for ODB, because the cumulation of learning over the past years will come together in 3.0. Fun times are ahead! :smile:

Scott

Was this page helpful?
0 / 5 - 0 ratings