Issue for effort to support:
The advantage of cross operator lazy pages is that we can avoid IO when queries are highly selective. This requires that significant processing happens in source stage, but this becomes more and more the case with improvements like CBO ("broadcast joins") or grouped execution.
Stages are:
base PageProcessor on WorkProcessor
[ ] Stage 2
ScanFilterAndProject on WorkProcessor. The pipeline would look like follows:split singleton -> [flatMap] -> pages source
-> [transform] -> page processor
-> [transform] -> merge pages
or if split is cursor based
split singleton -> [flatMap] -> cursor source -> [transform] -> merge pages
FilterAndProject on WorkProcessor. The pipeline would look like follows:page buffer -> [transform] -> page processor -> [transform] -> [merge pages]
WorkProcessor pipelinesWorkProcessor pipelinesWorkProcessors via dedicated "gluing" operatorTopNOperator on WorkProcessor pipelines (fast data exploration!)provide base for further improvements (e.g: on stack rows without Page materialization, Graal)
Can you give details about the "Graal" plans?
Can you give details about the "Graal" plans?
Work processor provides transformation method:
WorkProcessor#transform
Let's suppose that you have chain of Page transformations, e.g:
WorkProcessor<Page> processor1 = ...;
WorkProcessor<Page> processor2 = processor1.transform(transformation1);
WorkProcessor<Page> processor3 = processor2.transform(transformation2);
...
One can observe that we can compile such chain of Page transformation into a tight loop that doesn't materialize intermediate results. Please checkout paper: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf and project: https://hyper-db.de/.
In order to generate such tight loop one can extend WorkProcessor#transform so that it can generate optimized bytedcode (using existing airlift bytecode framework), e.g:
static <Page, Page> WorkProcessor<Page> transform(
WorkProcessor<Page> processor,
Transformation<Page, Page> transformation)
{
...
if (transformation instanceof BytecodeRowTransformation) {
// generate tight loop
} else {
// proceed with intermediate pages materialization
}
}
interface BytecodeRowTransformation extends Transformation<Page, Page> {
BytecodeExpression generateTransformation(BytecodeTransformationContext context);
}
interface BytecodeTransformationContext {
..
// transformation result bytecode
BytecodeExpression needsMoreData();
BytecodeExpression producedResult()
..
// input row channels getter bytecode
BytecodeExpression getChannel(int channel);
BytecodeExpression isNull(int channel);
..
// output row channel bytecode setters
void defineChannel(int channel, Supplier<BytecodeExpression> definition);
void defineIsNull(int channel, Supplier<BytecodeExpression> definition);
..
}
BytecodeRowTransformation#generateTransformation would generate bytecode of transformation (using BytecodeTransformationContext to consume input/produce output within generated code).
However generating bytecode is really cumbersome and error prone. Truffle/Graal provides a nice abstraction for creating highly performant interpreters which we could also utilize to generate maintainable and readable WorkProcessor transformations (tutorial on using Truffle: http://cesquivias.github.io/blog/2014/12/02/writing-a-language-in-truffle-part-2-using-truffle-and-graal/). In such case we won't be using BytecodeExpression but much more friendlier classes and annotations mixed with normal type-safe Java code, e.g:
interface TruffleRowTransformation extends Transformation<Page, Page> {
TruffleNode generateTransformation(TruffleTransformationContext context);
}
interface TruffleTransformationContext {
..
// similar methods as in BytecodeTransformationContext, but using truffle node classes
}
Some notes:
WorkProcessor transformations are functional, so one could actually create a language interpreter for them, e.g:transform(
transform(
processor,
context -> python transformation),
context -> java transformation)
WorkProcessor abstraction enables us to use other languages for transformations (e.g: Python). For instance we could implement table functions where such functions are written in non-Java languages, but are JITed into tight loop with Java code.This is just a draft and I still need to play more with Truffle/Graal in order to obtain more details.
This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.
Most helpful comment
Work processor provides transformation method:
WorkProcessor#transformLet's suppose that you have chain of
Pagetransformations, e.g:One can observe that we can compile such chain of
Pagetransformation into a tight loop that doesn't materialize intermediate results. Please checkout paper: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf and project: https://hyper-db.de/.In order to generate such tight loop one can extend
WorkProcessor#transformso that it can generate optimized bytedcode (using existing airlift bytecode framework), e.g:BytecodeRowTransformation#generateTransformationwould generate bytecode of transformation (usingBytecodeTransformationContextto consume input/produce output within generated code).However generating bytecode is really cumbersome and error prone. Truffle/Graal provides a nice abstraction for creating highly performant interpreters which we could also utilize to generate maintainable and readable
WorkProcessortransformations (tutorial on using Truffle: http://cesquivias.github.io/blog/2014/12/02/writing-a-language-in-truffle-part-2-using-truffle-and-graal/). In such case we won't be usingBytecodeExpressionbut much more friendlier classes and annotations mixed with normal type-safe Java code, e.g:Some notes:
WorkProcessortransformations are functional, so one could actually create a language interpreter for them, e.g:WorkProcessorabstraction enables us to use other languages for transformations (e.g: Python). For instance we could implement table functions where such functions are written in non-Java languages, but are JITed into tight loop with Java code.This is just a draft and I still need to play more with Truffle/Graal in order to obtain more details.