Presto: LEFT JOIN against empty relation produces no rows

Created on 2 Jun 2016  路  14Comments  路  Source: prestodb/presto

This can be reproduced with the following test case:

    @Test
    public void testEmptyLeftJoin()
            throws Exception
    {
        // Use orderkey = rand() to create an empty relation
        assertQuery("SELECT * FROM lineitem a LEFT JOIN (SELECT * FROM orders WHERE orderkey = rand()) b ON a.orderkey > b.orderkey");
        assertQuery("SELECT * FROM lineitem a LEFT JOIN (SELECT * FROM orders WHERE orderkey = rand()) b ON 1 = 1");
    }
RELEASE-BLOCKER bug

Most helpful comment

SELECT * FROM lineitem a LEFT JOIN (SELECT * FROM orders WHERE orderkey = rand()) b ON 1 = 1 fails. As this one actually does not have filter function. As 1=1 is resolvable using just inner side and pushed down by the JoinPlanner. And then something wrong is happening in PredicatePushdown I believe.

As I said I will back on it ASAP.

All 14 comments

@martint or @losipiuk could you take a look? I assume this is related to the recent non-equality outer join changes

I will be able to look into that in 2-3 hiurs.

Hmm.. for some reason it gets rewritten into INNER join.

presto:tiny> explain SELECT * FROM (VALUES 1) t1(a) LEFT JOIN (SELECT b FROM (VALUES 1) t2(b) WHERE b=5) on a < b;
                                 Query Plan
----------------------------------------------------------------------------
 - Output[a, b] => [field:integer, field_1:integer]
         a := field
         b := field_1
     - InnerJoin[("field" < "field_1")] => [field:integer, field_1:integer]
         - Values => [field:integer]
                 (1)
         - Filter[("field_1" = 5)] => [field_1:integer]
             - Values => [field_1:integer]
                     (1)

I believe the bug is in PredicatePushdown.visitJoin

Hmm... the initial plan (before pushdown may be wrong, too)

Yeah - I tried to just remove original node in when filterFunction is present from PP.visitJoin
And it works for the queries with VALUES.
But Chris's query still fails. I have to run for 2 hours now. But I will be back on it when I am back (in case it is not fixed yet).
Sorry :/

SELECT * FROM lineitem a LEFT JOIN (SELECT * FROM orders WHERE orderkey = rand()) b ON 1 = 1 fails. As this one actually does not have filter function. As 1=1 is resolvable using just inner side and pushed down by the JoinPlanner. And then something wrong is happening in PredicatePushdown I believe.

As I said I will back on it ASAP.

Another observation... in PredicatePushdown.visitJoin, newJoinPredicate does not take into account the non-equi predicate, so that would appear to be why it ends up converting it into an inner join.

I understand the problem.

The proper solution is to move all the pushdown logic from RelationPlanner to happen in PredicatePushDown optimizer.
Otherwise PPD has no knowledge that FilterNode on inner source of JoinNode was actually coming from join condition. And therefore OUTER join cannot be converted to INNER.

This is not super trivial change. I think I can do it tomorrow Europe time (so it should be ready tomorrow in the morning in CA).

If thing is really urgent I can hack sth. Possible options.
(1) Always leave filter in JoinNode even for conditions which are pushable to inner side and disable PPD for JoinNodes that have filter.present()
(2) Leave pushdown in RelationPlanner but add filterPushedDownToInnerSide flag in JoinNode and make PPD skip optimizations for such node.

I do not like any of those, and would rather craft proper solution at normal pace. But if needed I can do one of those quickly. Preferably (1).

@martint, @cberner what do you say?

Let's fix it the right way.

I have just noticed that putting this in PPD has similar problem to what we have now.
Within PPD we can have context. And not convert OUTER join to INNER after all the predicates from JoinNode.filter got pushed down to to inner side.

But we run PPD two times. And second run is lacking the context. In similar way like currently PPD is lacking context after pushdown from RelationPlanner.

It seems that some extra information that push-down happend must be stored in JoinNode. This is not super nice.
Any suggestions?

@martint, @cberner
As to my understanding https://github.com/prestodb/presto/pull/5403 fixes this issue.
See my comments in of the commits.

Fixed

Was this page helpful?
0 / 5 - 0 ratings