Orm: Update docs about QueryBuilder?

Created on 12 Dec 2018  路  27Comments  路  Source: doctrine/orm

Bug Report

| Q | A
|------------ | ------
| BC Break | no
| Version | -

Summary

In this Reddit comment, @Ocramius recommends using DQL with NOWDOC strings instead of QueryBuilder. In fact, he says that "query builder is less safe and less readable"

Current behavior

The main article of QueryBuilder in Doctrine docs says things like this:

QueryBuilder helper methods are considered the standard way to build DQL queries.

Maybe some docs need some update to recommend DQL instead? (except when building queries dynamically, when using QueryBuilder will be better)

Thanks!

Documentation Question

Most helpful comment

Simplistic example that matches what I usually see in the wild:

$qb = $someService->makeQueryBuilder('e');

$query = $qb
    ->makeQueryBuilder('e')
    ->leftJoin('e.p', 'p')
    ->andWhere($qb->expr()->or(
        $qb->expr()->eq('p.field1', ':value1'),
        $qb->expr()->eq('p.field2', ':value2')
    ))
    ->andWhere($qb->expr()->in('e.fieldA', ':valuesA')))
    ->getQuery();

This happens in multiple issue requests on doctrine/doctrine2, and the typical response from @doctrine/doctrinecore is something along the lines of "what are the generated DQL and SQL strings?", or we typically try to assemble the string mentally, or maybe the original issue reporter modified the QueryBuilder instance in unexpected and un-reported ways.

Sometimes it even becomes much more esoteric, with extremely complex queries that don't need to be dynamic at all, and involve multiple query builders and ->getDQL() on them.

Compare the code above with the equivalent:

$query = $em->createQuery(
<<<'DQL'
SELECT
    e
FROM
    MyEntity e
LEFT JOIN
    e.p p
WHERE
    (
        p.field1 = :value1
        OR
        p.field2 = :value2
    )
    AND e.fieldA IN (:valuesA)
DQL
);

I expanded it to make it parallel with the above, but you can apply something like the https://www.sqlstyle.guide/ syntax if you prefer so.

Benefits (applies also to SQL):

  • can read the thing. Readability is an extremely important aspect of our job
  • can extract (final) DQL blocks by looking at <<<'NOWDOC' comments using DQL as delimiter
  • can introspect extracted DQL blocks (what PHPStorm currently does)
  • can copy-paste this into a REPL or other editor for quick checking/debugging
  • immutable data structure (string, although being squishy, has this huge advantage)
  • no need to learn all the details of the QueryBuilder unless doing dynamic DQL
  • a lot of people keep asking for adding QueryBuilder support for random custom DQL functions because they learn that everything should be done through the QueryBuilder: this is simply a misconception

This is a simplistic example, but much more to the point. I see much worse examples in day-by-day consulting gigs, where enormous query builders spawn for no real reason other than "we were told to use the QueryBuilder".

For some simplistic dynamic DQL operations (think $dql .= 'AND e.' . $fieldName . ' LIKE :' . $fieldName), I'd even argue that string concatenation is also very much viable.

In general, __if there is no need to make a query dynamic, it shouldn't be dynamic__.

All 27 comments

@Ocramius is free to recommend to use what he personally believes is better even if it's different from what doctrine team recommends. Getting all doctrine team members to get on the same side would be tough.

Overall, I fully endorse DQL strings over query builders where possible. Query builders lead to:

  • Hard to read meta-language around an existing DSL
  • Encapsulation problems, when people expose a query builder outside the unit where it is managed

QueryBuilders (any really: be it SQL, DQL or anything similar) have their place with dynamic DSL assembly (which is what their name implies), but using the DSL directly is more straightforward, readable and less error-prone, plus easily introspectible for anything that can understand the DSL itself (PHPStorm in this case, which is already awesome for SQL).

No, I'm not proposing dynamic string concatenation in userland: that's exactly where you draw the line between DSL and DSL builder.

FWIW I typically prefer and recommend QueryBuilder as it's easier to build conditional queries with it. And once you accept it for this use case, I prefer using it elsewhere too for consistency. Also using multiline strings is not something I like, neither annoyingly indented NOWDOCs.
With flexible NOWDOC in PHP 7.3 the situation got better though.

Depending on the database, the querybuilder can hide unfavourable joins. MySQL is ignoring the join order and let's its execution plan decide by default, while other databases take the joins as is. But I agree that the "standard way" makes developers choose this way of create queries because it sounds like a suggested practice. It's an evaluative sentence.

I fully agree with @Majkl578. I do that too for the same reasons.

I know that Marco is a trend-setter, not a trend-follower. So maybe his comment was ahead of time. However, I asked on Symfony's Slack and most people agreed with it and do that too. That's why I opened this issue: maybe using DQL-strings instead of QueryBuilders is much more common than we think 馃

Simplistic example that matches what I usually see in the wild:

$qb = $someService->makeQueryBuilder('e');

$query = $qb
    ->makeQueryBuilder('e')
    ->leftJoin('e.p', 'p')
    ->andWhere($qb->expr()->or(
        $qb->expr()->eq('p.field1', ':value1'),
        $qb->expr()->eq('p.field2', ':value2')
    ))
    ->andWhere($qb->expr()->in('e.fieldA', ':valuesA')))
    ->getQuery();

This happens in multiple issue requests on doctrine/doctrine2, and the typical response from @doctrine/doctrinecore is something along the lines of "what are the generated DQL and SQL strings?", or we typically try to assemble the string mentally, or maybe the original issue reporter modified the QueryBuilder instance in unexpected and un-reported ways.

Sometimes it even becomes much more esoteric, with extremely complex queries that don't need to be dynamic at all, and involve multiple query builders and ->getDQL() on them.

Compare the code above with the equivalent:

$query = $em->createQuery(
<<<'DQL'
SELECT
    e
FROM
    MyEntity e
LEFT JOIN
    e.p p
WHERE
    (
        p.field1 = :value1
        OR
        p.field2 = :value2
    )
    AND e.fieldA IN (:valuesA)
DQL
);

I expanded it to make it parallel with the above, but you can apply something like the https://www.sqlstyle.guide/ syntax if you prefer so.

Benefits (applies also to SQL):

  • can read the thing. Readability is an extremely important aspect of our job
  • can extract (final) DQL blocks by looking at <<<'NOWDOC' comments using DQL as delimiter
  • can introspect extracted DQL blocks (what PHPStorm currently does)
  • can copy-paste this into a REPL or other editor for quick checking/debugging
  • immutable data structure (string, although being squishy, has this huge advantage)
  • no need to learn all the details of the QueryBuilder unless doing dynamic DQL
  • a lot of people keep asking for adding QueryBuilder support for random custom DQL functions because they learn that everything should be done through the QueryBuilder: this is simply a misconception

This is a simplistic example, but much more to the point. I see much worse examples in day-by-day consulting gigs, where enormous query builders spawn for no real reason other than "we were told to use the QueryBuilder".

For some simplistic dynamic DQL operations (think $dql .= 'AND e.' . $fieldName . ' LIKE :' . $fieldName), I'd even argue that string concatenation is also very much viable.

In general, __if there is no need to make a query dynamic, it shouldn't be dynamic__.

Just found an extreme example in an app I'm working with. It will not make sense, because I had to drop/change some bits due to NDA and such. Also, not doctrine/orm-based:

$prices = $this->dbh->table('prices AS pp')->select([
    $this->dbh->raw('MD5(CONCAT('.$prefix.'ppc.id, '.$prefix.'prices.cid)) AS id'),
    'ppc.id AS c',
    'ppc.dg',
    'ppc.pid',
    'prices.price AS value',
    'prices.cid AS currency',
    'pp.id AS priceid',
])->join('price_plan_configurations AS ppc', function (JoinClause $join) use ($areas, $dg, $product, $timestamp) {
    $join->where('pp.active', '=', true);
    $join->where(function (JoinClause $join) use ($timestamp) {
        $join->where('pp.start_date', '<=', $timestamp);
        $join->orWhereNull('pp.start_date');
    });
    $join->where(function (JoinClause $join) use ($timestamp) {
        $join->where('pp.end_date', '>', $timestamp);
        $join->orWhereNull('pp.end_date');
    });
    $join->where('ppc.active', '=', true);
    $join->whereIn('ppc.groupid', $dg);
    $join->whereIn('ppc.area', $areas);
    $join->on('pp.id', '=', 'ppc.plan');
    $join->whereIn('ppc.product_id', $product);
})->join('time_of_days AS tod', function (JoinClause $join) use ($tod) {
    $join->where('tod.'.$tod['start'], '<=', $tod['time']);
    $join->where('tod.'.$tod['stop'], '>=', $tod['time']);
    $join->on('ppc.time_of_day_id', '=', 'tod.id');
})->join('user_groups as ug', function (JoinClause $join) use ($userGroups) {
    $join->on('ppc.user_group_id', '=', 'ug.id');
    $join->whereIn('ug.id', array_map(function ($userGroup) {
        return $userGroup->id;
    }, $userGroups));
})->join('product_prices AS prices', function (JoinClause $join) {
    $join->on('ppc.id', '=', 'prices.price_plan_configuration_id');
})->join('currencies AS cc', function (JoinClause $join) use ($c) {
    $join->on('prices.currency_id', '=', 'cc.currency_id');
    $join->where('cc.cid', '=', $c->getId());
})->leftJoin('price_plans_currencies AS ppcurr', function (JoinClause $join) {
    $join->on('ppcurr.price_plan_id', '=', 'pp.id');
})->where(function ($query) {
    $query->whereColumn('prices.currency_id', '=', 'ppcurr.currency_id')
          ->orWhereNull('ppcurr.price_plan_id');
})->get();

There is nothing dynamic in the above, yet this is what the average user is doing, and it leads to unnecessary complexity without any added value.

EDIT: imagine being someone that has to verify the above structure against the SQL optimiser. In order to do that, you'd need to write code to collect the generated SQL string, because doing it by hand will be error-prone. Also note how the developers, thinking that the query builder provides SQL escape safety, simply bound parameters directly into the assembled SQL string. Luckily not a security issue in this case, since all data is from the application configuration itself, but otherwise really problematic.

A single parameter is enough and a query is dynamic. The parameters $areas, $dg, $product, $timestamp are dynamic in the above example. So it's better to use a QueryBuilder again.

@odan those are parameters, they do not affect the generated string structure.

@odan no, because the DQL must not contain these parameters (otherwise, you open the door to DQL injection attacks). In DQL, you would write it as ppc.area IN (:areas), and set the parameter separately on the query.

FWIW I typically prefer and recommend QueryBuilder as it's easier to build conditional queries with it.

@Majkl578 in my experience there are two types of conditional queries:

  1. Conditional JOINs
  2. Conditional WHEREs

Conditional JOINs are, for me, absolutely a no-go: easy to hide domain logic, easy to mess with, inexperienced users tend to solve issues with DISTINCT or GROUP BY, killing performance and hiding logic even more.

Conditional WHEREs may be somewhere acceptable, and for the exact reason above, I do enforce in my teams to hardcode the switch in the SQL to make it clearer:

SELECT users.name
FROM users
WHERE (
        users.active = TRUE
    # Semplicistic example to get the idea:
    # a parameter determines the final SQL
    AND IF(:group IS NOT NULL,
            users.group = :group,
            TRUE
        )
)

@odan those are parameters, they do not affect the generated string structure.

  1. But $prefix whould change the DQL and could also be a security issue.

  2. In practice, requirements will change very often and suddenly you have to convert a complex DQL string into a QueryBuilder. Some people will be too afraid to refactor the query by just performing a quick and dirty string concatenation hack. And then... BOOM!

@odan I added a note about the security implications earlier in an edit at https://github.com/doctrine/doctrine2/issues/7520#issuecomment-446896394

In practice, requirements will change very often and suddenly you have to convert a complex DQL string into a QueryBuilder.

As @Slamdunk described above, adding a bit of SQL/DQL through a query builder has potentially catastrophic consequences, and shouldn't be done. The query above is a good example of something that has "grown organically" based on added business requirements, and now it is an unreadable pile of code because of that. Designing a separate SQL/DQL string, or editing an existing DSL structure, is perfectly OK.

Small question (I may be missing something obvious) : how do you use full class names for entities in DQL (like App\Entity\UserAdress)? I mean, without an editor specifically handling it, it seems quite difficult to use: either you don't have any autocompletion (and need to type a long class name by hand each time) or you use concatenation and ::class, which is kind of ugly. Am I missing something?

@tgalopin I typed the FQCN or used sprintf, but since PHPStorm now supports DQL, I'd always go with typing the full FQCN.

@tgalopin also [in my experience] a typical DQL query only needs a FQCN in the FROM, all others entities are retrieved/joined/whered/selected through relationships defined in the main entity, so I write just one FQCN per DQL.

Cheers for PHPStorm with DQL support

@Slamdunk


AND IF(:group IS NOT NULL,
            users.group = :group,
            TRUE
        )

can not be optimized by the sql analyzer as I know.

Sure, you will always hit limits in the optimiser, but you can also simplify the SQL further.

Using a builder as a trade-off is surely feasible when needed, but otherwise you are adding complexity.

@Ocramius Not being able to be use proper database indexes for me is a no-go compared to conventions that MIGHT reduce "possible" bugs

Yes, and using query builders increases that uncertainty.

@kuraobi is working on that one :)

I'm on it. As a first time contributor to Doctrine, especially the documentation, I was a bit confused with how to update/build/test it locally, so I think I will also add a bit about the documentation to CONTRIBUTING.md.

Actually I need some clarification about a specific point: in case the user does choose to use the QueryBuilder, the documentation currently states that ->expr() should be used instead of strings. Is it still true? Because $qb->expr()->eq('u.id', '?1') looks a lot messier than 'u.id = ?1'...

I don't think so

@kuraobi using 'u.id = ?1 is exactly the same as $qb->expr()->eq('u.id', '?1'). The idea of the expression/query builders is that they are useful with dynamic DQL, such as when u.id is a parameter coming from somewhere else.

If you don't have dynamic DQL, or your expressions are constant, then using a string is viable and preferable.

Handled by #7880

Was this page helpful?
0 / 5 - 0 ratings