Presto: Enable Dynamically registered functions

Created on 22 Dec 2017  ·  6Comments  ·  Source: prestodb/presto

Goals

  • Allow for coexistence of functions and types with similar names but different semantics (e.g., compatibility layer for other databases like Teradata, Oracle, MySQL, Vertica, PostgreSQL, etc)
  • Allow for tighter control of what functions are available in a given deployment
  • Make it possible to implement user-defined functions and types in the future

Features

  • Configurable/mountable function plugins

Considerations and open questions

  • Function should be resolved to a handle during analysis just like tables.
  • Function resolution should be transaction aware.
  • Function metadata can be fetch using a handle on the coordinator.
  • Function implementation can be retrieved using the handle on coordinator and workers.
  • Optimizers can lookup functions using fully a specified signature.
  • Function handles should be transported to worker without requiring a full rewrite of the plan IR.
  • SPI interface should be easy to implement.
  • Need to define migration strategy for re-packaging existing function library

TODOs

  • [x] Introduce proper function abstractions to engine (FunctionManager, FunctionHandle, etc). #11851 #12427 #12444 #12472 #12542 #12541

  • [x] Simplify function resolution

    • Extract resolution logic from FunctionRegistry.
    • Resolve functions using simpler metadata interface.
    • Move resolution to be inside analyzer (like table resolution)
  • [x] Simplify function instance usage

    • Move call convention resolution to be inside of FunctionManager

    • Callers should really only need to deal with conventions and method handles

  • [x] Clarify function resolution logic involving multiple function namespaces

    • Move existing function registration to system catalog

    • Add namespace abstraction

  • [ ] Stored SQL function support (ref: old PR with implementation of stored procedure #2702)

    • [x] CREATE FUNCTION statement #13254

    • [x] ALTER FUNCTION, DROP FUNCTION statement #13799 13412

    • [ ] GRANT for function access

    • [x] function namespace manager to manage stored SQL functions #13972

References

4.34 SQL-paths

An SQL-path is a list of one or more <schema name>s that determines the search order for one of the following:
— The subject routine of a <routine invocation> whose <routine name> does not contain a .
— The user-defined type when the <path-resolved user-defined type name> does not contain a <schema name>.

4.43.3 SQL-session properties

An SQL-session has an SQL-path that is used to effectively qualify unqualified <routine name>s [...]
The SQL-path is initially set to an implementation-de ned value, but can subsequently be changed by the successful execution of a <set path statement>.
The text defining the SQL-path can be referenced by using the <general value specification> CURRENT_PATH.

6.4 <value specification> and <target specification>

4) The declared type of CURRENT_USER, CURRENT_ROLE, SESSION_USER, SYSTEM_USER, CURRENT_CATALOG, CURRENT_SCHEMA, and CURRENT_PATH is character string. Whether the character string is fixed-length or variable-length, and its length if it is fixed-length or maximum length if it is variable-length, are implementation-de ned. The character set of the character string is SQL_IDENTIFIER. The declared type collation is the character set collation of SQL_IDENTIFIER, and the collation derivation is implicit.
12) The value specified by CURRENT_PATH is a <schema name list> where <catalog name>s are <delimited identifier>s and the <unqualified schema name>s are <delimited identifier>s. Each<schema name> is separated from the preceding <schema name> by a <comma> with no intervening <space>s. The schemas referenced in this <schema name list> are those referenced in the SQL-path of the current SQL-session context, in the order in which they appear in that SQL-path.

10.3 <path specification>

Function

Specify an order for searching for an SQL-invoked routine.

Format

<path specification> ::= PATH <schema name list>
<schema name list> ::= <schema name> [ { <comma> <schema name> }... ]
<schema name> ::= [ <catalog name> <period> ] <unqualified schema name>
<unqualified schema name> ::= <identifier>
<catalog name> ::= <identifier>

Syntax Rules

1) No two <schema name>s contained in <schema name list> shall be equivalent.

19.8 <set path statement>

Function

Set the SQL-path used to determine the subject routine of <routine invocation>s with unqualified <routine name>s in <preparable statement>s that are prepared in the current SQL-session by an <execute immediate statement> or a <prepare statement> and in <direct SQL statement>s that are invoked directly. The SQL-path remains the current SQL-path of the SQL-session until another SQL-path is successfully set.

Format

<set path statement> ::= SET <SQL-path characteristic>
<SQL-path characteristic> ::= PATH <value specification>

Syntax Rules

1) The declared type of the shall be a character string type.

Access Rules

None.

General Rules

1) Let S be <value specification> and let V be the character string that is the value of TRIM ( BOTH ' ' FROM S )
a) If V does not conform to the Format and Syntax Rules of a <schema name list>, then an exception condition is raised: invalid schema name list specification.
b) The SQL-path of the current SQL-session is set to V.
NOTE 724 — A <set path statement> that is executed between a <prepare statement> and an <execute statement> has no effect on the prepared statement.

Conformance Rules

1) Without Feature S071, “SQL paths in function and type name resolution”, Conforming SQL language shall not contain a <set path statement>.

Roadmap enhancement sql-function

Most helpful comment

This is possible correctly using JARs. Could you elaborate a bit? Do you mean a more holistic approach where:

Yes, functions in languages such as Javascript, Ruby, Python or even the SQL-specific language --- basically, anything that can be properly sandboxed. Efforts such as Truffle/Graal can make this easier (and performant). See CREATE FUNCTION and CREATE TYPE in the spec.

How does planner/optimizer refer to function names (for a rewrite), given that the fully->> qualified name depends on deployment configuration?

Could you give an example?

For example, the upcoming spatial join optimizations. Or any of the desugaring optimizers if we decide to place the built-in functions into a namespace (although, those might go into a fixed namespace that's guaranteed to exist, similar to "sys" or "information_schema").

All 6 comments

Make it possible to implement user-defined functions and types in the future

This is possible correctly using JARs. Could you elaborate a bit? Do you mean a more holistic approach where:

  1. User would be able to define function (which language?) in CLI/SQL
  2. Such function would be serialized and managed by Presto.
  3. It would be possible to remove such user defined function

How does planner/optimizer refer to function names (for a rewrite), given that the fully-qualified name depends on deployment configuration?

Could you give an example?

This is possible correctly using JARs. Could you elaborate a bit? Do you mean a more holistic approach where:

Yes, functions in languages such as Javascript, Ruby, Python or even the SQL-specific language --- basically, anything that can be properly sandboxed. Efforts such as Truffle/Graal can make this easier (and performant). See CREATE FUNCTION and CREATE TYPE in the spec.

How does planner/optimizer refer to function names (for a rewrite), given that the fully->> qualified name depends on deployment configuration?

Could you give an example?

For example, the upcoming spatial join optimizations. Or any of the desugaring optimizers if we decide to place the built-in functions into a namespace (although, those might go into a fixed namespace that's guaranteed to exist, similar to "sys" or "information_schema").

Efforts such as Truffle/Graal can make this easier (and performant). See CREATE FUNCTION and CREATE TYPE in the spec.

I didn't expect that, but it seems that there is serious effort to support R and Python: http://gigasquidsoftware.com/blog/2017/10/22/embedded-interop-between-clojure-r-and-python-with-graalvm/.

Do you plan to embrace Truffle/Graal in the next year or so?

functions in languages such as Javascript, Ruby, Python or even the SQL-specific language

I like that!

Do you plan to embrace Truffle/Graal in the next year or so?

I'd like to start experimenting with it, but it's unlikely we'll support it for production workloads, yet. The first step is to get on Java 9.

Was this page helpful?
0 / 5 - 0 ratings