V8-archive: Pagination for queries with filters

Created on 26 Dec 2018  Â·  25Comments  Â·  Source: directus/v8-archive

Feature Request

A new (or updated?) metadata to be able to get count of items matching the filter (when using a limit).

What problem does this feature solve?

Actually, the only metadatas we can get are:

  • result_count - Number of items returned in this response âž¡ When using a limit, it returns the limit if there are more results (not really useful)
  • total_count - Total number of items in this collection âž¡ When using a filter, it really returns the total count, without using the limit (not really useful either).

How do you think this should be implemented?

Maybe a new metadata (filter_count?) or use the same filters in total_count.
Why not also use MySQL's FOUND_ROWS() function? https://dev.mysql.com/doc/refman/5.7/en/information-functions.html#function_found-rows
This can be really useful for complex queries (we only have to build and call query once).

Would you be willing to work on this?

Why not, but I'm actually very busy 😢

enhancement

Most helpful comment

@benhaynes That makes sense to me! And like you said, we can choose the meta that suits our need as they are optional.

I'm already trying to make it work because I really need it. I'll make a PR when it will be ready so we'll be able to discuss the details of the implementation at this time. I choose to implement it under the filtered_count meta so we'll be able to replace total_count if it doesn't seem necessary anymore. :)

All 25 comments

We've been discussing this feature. We also opened a ticket for a related feature to pagination here: https://github.com/directus/api/issues/340

I am leaning towards just use the total_count metadata, instead of a new one, because I am not seeing any reason why we would like to see both filter_count and total_count at the same time. Is there any useful reason?

That makes sense – I like it! We could use filter_count or filtered_count for this.

It might be nice to know how much your filter reduced the total dataset. So your filter is returning:

850/1000 items (filtered_count / total_count)

Normally I would be worried about performance, but all of these metadata fields should be off by default and can be individually included... so I think it's a useful option to have!

@benhaynes That makes sense to me! And like you said, we can choose the meta that suits our need as they are optional.

I'm already trying to make it work because I really need it. I'll make a PR when it will be ready so we'll be able to discuss the details of the implementation at this time. I choose to implement it under the filtered_count meta so we'll be able to replace total_count if it doesn't seem necessary anymore. :)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

To achieve better clarity/visibility, we are now tracking feature requests within the Feature Request project board.

This issue being closed does not mean it's not being considered.

I've added this one to in progress @philleepflorence ... cool?

Hi all,
I started using the pagination with the filter_count meta params. I'm getting a performance issue that I would like to discuss.

Example

A collection of total 1068 items, my query has a total of 834 items filtered.

Requesting items with a limit of 50 is pretty fast:
/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3 (~160ms)

Requesting the same result with all the metadata slows down the response a lot (~1500ms)
/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3&meta=*

Requesting the same result, with only metadata filter_count doesn't help (~1500ms)
/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3&meta=filter_count

Requesting the same result, with only metadata for pagination (which is not documented btw) is also slow (~1700ms)
/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3&meta=page

Requesting the same result, with only meta total_count (which get the whole collections count) is surprisingly pretty fast (~150ms)
/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3&meta=total_count

So would there be a way to improve the performance of the query when requesting the metadata "filter_count" ?

Hello @romaiiinnn ...

Requesting items with a limit of 50 is pretty fast:
/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3 (~160ms)

Does this query return the pagination object?

Requesting the same result with all the metadata slows down the response a lot (~1500ms)
/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3&meta=*

What is the purpose of page=3**?

If you could include the meta object for each query so I can debug further.

Thanks!

Hi @philleepflorence,

Requesting items with a limit of 50 is pretty fast:
/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3 (~160ms)

Does this query return the pagination object?

No, only the data.

/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3&meta=*

What is the purpose of page=3**?

Sorry, just wanted to bold the end of the text, github added the *, correct url is :
/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3&meta=*

Please see below the returned metadata object for each query :

/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3&meta=*

meta: {
    collection: "content",
    type: "collection",
    result_count: 50,
    total_count: 1068,
    filter_count: 834,
    limit: 50,
    offset: 100,
    page: 3,
    page_count: 17,
    links: {
        self: "http://localhost:8000/_/items/content",
        current: "http://localhost:8000/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3&meta=*",
        next: "http://localhost:8000/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=4&meta=*&offset=150",
        previous: "http://localhost:8000/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=2&meta=*&offset=50",
        first: "http://localhost:8000/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=1&meta=*&offset=0",
        last: "http://localhost:8000/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=17&meta=*&offset=800"
    }
},

/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3&meta=filter_count

"meta": {
     "filter_count": 834
},

/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3&meta=page

"meta": {
    "limit": 50,
    "offset": 100,
    "page": 3,
    "page_count": 17,
    "links": {
        "self": "http://localhost:8000/_/items/content",
        "current": "http://localhost:8000/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3&meta=page",
        "next": "http://localhost:8000/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=4&meta=page&offset=150",
        "previous": "http://localhost:8000/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=2&meta=page&offset=50",
        "first": "http://localhost:8000/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=1&meta=page&offset=0",
        "last": "http://localhost:8000/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=17&meta=page&offset=800"
    }
},

/_/items/content?filter[type][eq]=7&fields=id&limit=50&page=3&meta=total_count

meta: {
    total_count: 1068
},

Thanks!

I ran some tests on a collection of 3040 items: see results below...

{{url}}/_/items/app_icons?access_token={{access_token}}&meta=*&filter[keywords][contains]=account&page=3&limit=20&offset=60

Response time 562ms >>> 362ms >>> 184ms (other times are subsequent responses - reload on PostMaster) - This is quering a non PK non indexed column.

{
    "meta": {
        "collection": "app_icons",
        "type": "collection",
        "result_count": 3,
        "total_count": 3040,
        "filter_count": 63,
        "limit": 20,
        "offset": 60,
        "page": 4,
        "page_count": 4,
        "links": {
            "self": "http:\/\/api.directus.local\/_\/items\/app_icons",
            "current": "http:\/\/api.directus.local\/_\/items\/app_icons?access_token=directus@admin&meta=*&filter[keywords][contains]=account&page=4&limit=20&offset=60",
            "next": null,
            "previous": "http:\/\/api.directus.local\/_\/items\/app_icons?access_token=directus@admin&meta=*&filter[keywords][contains]=account&page=3&limit=20&offset=40",
            "first": "http:\/\/api.directus.local\/_\/items\/app_icons?access_token=directus@admin&meta=*&filter[keywords][contains]=account&page=1&limit=20&offset=0",
            "last": "http:\/\/api.directus.local\/_\/items\/app_icons?access_token=directus@admin&meta=*&filter[keywords][contains]=account&page=4&limit=20&offset=60"
        }
    },
    "data": [...]
}

Changing the meta parameters did not slow down the response time.
Unable to duplicate the drastic difference in response times.
On what kind of machine are you getting the slow response times?
Did you try to run locally and remotely to see if you get the same results?

The was an update to the docs on June 13, 2019 awaiting merge @rijkvanzanten

@philleepflorence I don't see an open PR to the docs of that date.. Can you send me a link? We'll merge it right away

@philleepflorence Those are taken from my local computer (Mac OS). I will do further tests later on, thanks.

@romaiiinnn ... Okay cool, let me know. Also did the tests on local (MacOSX) and remote (CentOS 7).
If you still get the same results, export the SQL of the collection so I can import and run.
In the meantime, are you using MyISAM or InnoDB?

@philleepflorence Thanks I don't have the database on a remote server yet, I will update you. My tests where on an InnoDB database.

Hi @philleepflorence,
Some updates : I still don't have my database on remote server to test on another context. But from my tests and from the Directus code I checked, the filter_count value is calculated by re-running the query, and counting the number of values, which is what I think is creating this longer response time.

I applied on my side a not-perfect hotfix by running a count(*) on the query instead of requesting all the values and counting them in php.
/src/core/Directus/Database/TableGateway/RelationalTableGateway.php
functions : createEntriesMetadata and createMetadataPagination

I see on your tests that the filter_count is low, I would try other tests with a high filter_count value to reproduce the issue.

Hey @romaiiinnn ... it should be getting the number of rows on the PK field only not on all fields, but I will take a look later to confirm.

The only bottleneck would be getting the num rows that match the filters since the core query gets a max of 200. In the future, I could look into getting the number during the initial query to the DB.

If that number were included in the meta originally it would have alleviated a lot.

For the count(*) can you share that code, I can compare response times and then use the most efficient.

The only bottleneck would be getting the num rows that match the filters since the core query gets a max of 200. In the future, I could look into getting the number during the initial query to the DB.

With a count(*) query, the limit shouldn't be an issue as you get only one result (the number of rows).

My fix below for :
/src/core/Directus/Database/TableGateway/RelationalTableGateway.php

new function countFilterTotal

    private function countFilterTotal($params)
    {
        return (int)$this->fetchItems($params, null, true)[0]['total'];
    }
    public function createMetadataPagination(array $metadata = [], array $params = [], array $countedData = [])
    {
        if (empty($params)) $params = $_GET;

        $filtered = ArrayUtils::get($params, 'filter') || ArrayUtils::get($params, 'q');

        $limit = intval(ArrayUtils::get($params, 'limit', 0));
        $page = intval(ArrayUtils::get($params, 'page', 1));
        $offset = intval(ArrayUtils::get($params, 'offset', -1));

        $total = intval(ArrayUtils::get($metadata, 'Published') ?: ArrayUtils::get($countedData, 'total_count'));
        $rows = intval(ArrayUtils::get($countedData, 'result_count'));
        $pathname = explode('?', ArrayUtils::get($_SERVER, 'REQUEST_URI'));
        $url = trim(\Directus\get_url(), '/') . reset($pathname);

        $meta_param = explode(',',$params['meta']);

        //filter_count performance hotfix
        if ($filtered && (in_array('filter_count', $meta_param) || in_array('*', $meta_param) || in_array('page', $meta_param))) {
            $metadata['filter_count'] = $this->countFilterTotal($params);
        }
        $total = $metadata['filter_count'] ?: $countedData['total_count'];

        $limit = $limit < 1 ? $rows : $limit;
        $pages = $total ? ceil($total / $limit) : 1;
        $page = $page > $pages ? $pages : ($page && $offset >= 0 ? (floor($offset / $limit) + 1) : $page);
        $offset = $offset >= 0 ? $offset : ($page ? (($page - 1) * $limit) : 0);
        $next = $previous = $last = $first = -1;

        if ($pages > 1) {
            $next = ($pages > $page) ? ($offset + $limit) : null;
            $previous = ($offset >= $limit) ? ($offset - $limit) : ($limit * ($pages - 1));
            $first = ($pages < 2 || $limit < 1) ? null : 0;
            $last = ($pages < 2) ? null : (($pages - 1) * $limit);
        }


        if(in_array('page',$meta_param) || in_array('*',$meta_param)) {
            $metadata = array_merge($metadata, [
                "limit" => $limit,
                "offset" => $offset,
                "page" => $page,
                "page_count" => $pages,
                "links" => [
                    "self" => $url,
                    "current" => "{$url}?" . urldecode(http_build_query(array_merge($params, ["page" => $page]))),
                    "next" => $next > 0 && $page < $pages ? ("{$url}?" . urldecode(http_build_query(array_merge($params, ["offset" => $next, "page" => $page + 1])))) : null,
                    "previous" => $previous >= 0 && $page > 1 ? ("{$url}?" . urldecode(http_build_query(array_merge($params, ["offset" => $previous, "page" => $page - 1])))) : null,
                    "first" => $first >= 0 ? ("{$url}?" . urldecode(http_build_query(array_merge($params, ["offset" => $first, "page" => 1])))) : null,
                    "last" => $last > 0 ? ("{$url}?" . urldecode(http_build_query(array_merge($params, ["offset" => $last, "page" => $pages])))) : null
                ]
            ]);
        }

        return $metadata;
    }
    public function fetchItems(array $params = [], \Closure $queryCallback = null, $countOnly = false)
    {
        $collectionObject = $this->getTableSchema();

        $params = $this->applyDefaultEntriesSelectParams($params);
        //filter_count performance hotfix
        $fields = $countOnly ? ['*'] : ArrayUtils::get($params, 'fields');

        // TODO: Check for all collections + fields permission/existence before querying
        // TODO: Create a new TableGateway Query Builder based on Query\Builder
        $builder = new Builder($this->getAdapter());
        $builder->from($this->getTable());

        $selectedFields = $this->getSelectedNonAliasFields($fields ?: ['*']);
        if (!in_array($collectionObject->getPrimaryKeyName(), $selectedFields)) {
            array_unshift($selectedFields, $collectionObject->getPrimaryKeyName());
        }

        $statusField = $collectionObject->getStatusField();
        if ($statusField && !in_array($statusField->getName(), $selectedFields) && $this->acl->getCollectionStatuses($this->table)) {
            array_unshift($selectedFields, $statusField->getName());
        }

        // NOTE: Make sure to have the `type` field for files to determine if the supports thumbnails
        if ($this->table == SchemaManager::COLLECTION_FILES && !in_array('type', $selectedFields)) {
            $selectedFields[] = 'type';
        }

        //filter_count performance hotfix
        $selectedFields = $countOnly ? ['total' => new Expression('COUNT(*)')] : $selectedFields;
        $builder->columns($selectedFields);        

        $builder = $this->applyParamsToTableEntriesSelect(
            $params,
            $builder
        );

        $builder->orderBy($this->primaryKeyFieldName);

        try {
            $this->enforceReadPermission($builder);

            //If collection is directus_fields, also check permission of actual collection of which fields are retrieving
            if ($this->getTable() == SchemaManager::COLLECTION_FIELDS && ArrayUtils::has($params['filter'], 'collection'))
                $this->acl->enforceReadOnce(ArrayUtils::get($params['filter'], 'collection'));
        } catch (PermissionException $e) {
            $isForbiddenRead = $e instanceof ForbiddenCollectionReadException;
            $isUnableFindItems = $e instanceof UnableFindOwnerItemsException;

            if (!$isForbiddenRead && !$isUnableFindItems) {
                throw $e;
            }

            if (ArrayUtils::has($params, 'single')) {
                throw new Exception\ItemNotFoundException();
            } else if ($isForbiddenRead) {
                throw $e;
            } else if ($isUnableFindItems) {
                return [];
            }
        }

        // Validate the fields after verifies the user actually has read permission
        if (is_array($fields)) {
            $this->validateFields($fields);
        }

        if ($queryCallback !== null) {
            $builder = $queryCallback($builder);
        }

        // Run the builder Select with this tablegateway
        // to run all the hooks against the result
        $results = $this->selectWith($builder->buildSelect())->toArray();

        if (!$results && ArrayUtils::has($params, 'single')) {
            $message = null;
            if (ArrayUtils::has($params, 'id')) {
                $message = sprintf('Item with id "%s" not found', $params['id']);
            }

            throw new Exception\ItemNotFoundException($message);
        }

        // ==========================================================================
        // Perform data casting based on the column types in our schema array
        // and Convert dates into ISO 8601 Format
        // TODO: Casting value are going to be done using hooks to the Directus types
        //       With the exception of number for MySQL, which the default client
        //       Returns them as string
        // ==========================================================================
        $results = $this->parseRecord($results);

        $columnsDepth = ArrayUtils::deepLevel(\Directus\get_unflat_columns($fields));
        if ($columnsDepth > 0) {
            $relatedFields = $this->getSelectedRelatedFields($fields);

            $relationalParams = [
                'meta' => ArrayUtils::get($params, 'meta'),
                'lang' => ArrayUtils::get($params, 'lang')
            ];

            $results = $this->loadRelationalData(
                $results,
                \Directus\get_array_flat_columns($relatedFields),
                $relationalParams
            );
        }

        // When the params column list doesn't include the primary key
        // it should be included because each row gateway expects the primary key
        // after all the row gateway are created and initiated it only returns the chosen columns
        if ($fields && !array_key_exists('*', \Directus\get_unflat_columns($fields))) {
            $visibleColumns = $this->getSelectedFields($fields);
            $results = array_map(function ($entry) use ($visibleColumns) {
                foreach ($entry as $key => $value) {
                    if (!in_array($key, $visibleColumns)) {
                        $entry = ArrayUtils::omit($entry, $key);
                    }
                }

                return $entry;
            }, $results);
        }

        if ($statusField && $this->acl != null && $this->acl->getCollectionStatuses($this->table)) {
            foreach ($results as $index => &$item) {
                $statusId = ArrayUtils::get($item, $statusField->getName());
                $blacklist = $this->acl->getReadFieldBlacklist($this->table, $statusId);
                $item = ArrayUtils::omit($item, $blacklist);
                if (empty($item)) {
                    unset($results[$index]);
                }
            }

            $results = array_values($results);
        }

        if (ArrayUtils::get($params, 'single')) {
            $results = reset($results);
        }
        return $results ? $results : [];
    }

Check comment "filter_count performance hotfix" or do a git compare.
Thanks

Thanks! I will compare and update.

Hello guys,

Do we have an update on this issue ? :) This can really slow down navigation with pagination :(

Many thanks!

I will take a look and get back to you after the weekend. Thanks!

I will take a look and get back to you after the weekend. Thanks!

and? :)

I will take a look and get back to you after the weekend. Thanks!

and? :)

I guess he had a long weekend!

Hi @rijkvanzanten @philleepflorence ,

Any news on this one ? It would be a welcome fix in our project :-)

Apologies, if the fix was not applied.

As this is an Open Source Project, collaboration is welcome, so anyone can submit a PR with the fix from above, and at the same time apply the code to the local copy.

Internal resources are currently devoted to Directus 9.

The Directus 8 Repository has been moved to: https://github.com/directus/v8-archive

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jwkellyiii picture jwkellyiii  Â·  3Comments

benhaynes picture benhaynes  Â·  4Comments

Varulv1997 picture Varulv1997  Â·  3Comments

Nitwel picture Nitwel  Â·  3Comments

cdwmhcc picture cdwmhcc  Â·  3Comments