Pkp-lib: Introduce REST API

Created on 1 Jun 2016  Â·  34Comments  Â·  Source: pkp/pkp-lib

Introduce a REST API for OJS 3.0/OMP 1.2+ (at the earliest).

Most helpful comment

I think we can now close this issue and any future API development can go into new issues. Congrats @kaschioudi this is a big one!

All 34 comments

Some early experimentation over at https://github.com/asmecher/pkp-lib/commit/e54010a7b7718004685b9fb6ceeb986d5ba1d83f:

  • Tried the Slim microframework on @mjordan's recommendation. Widely used, minimal duplication of implementations and reasonably small dependency list.
  • Added an APIRouter class to map certain PKP-style requests into it:

    • PKP's index.php wrapper is still used for all requests. Maintains use of our session management, authorization framework, etc. toolsets.

    • Two namespaces potentially created [I'm not confident about this being the best approach]:

    • [index.php]/api for site-wide API requests (e.g. "list all journals on this site")

    • [index.php]/{contextPath}/api for journal- or press-specific requests (most functionality will be here)

    • Pros



      • Existing URL rewriting policies will be respected (think virtual hosting of journals)



    • Cons



      • Some of the same confusion over site- vs. context-specific URLs will continue.


      • Minor: A journal with a path called api will be superseded by the API router.



  • Still needing work/consideration:

    • This doesn't yet hook into the auth framework; the auth framework has expectations e.g. around parameter names that likely will need to be made more flexible

    • Routing is currently hard-coded into the APIRouter; need to investigate potential for autoload-based routing (like we've done for components and pages, but perhaps using a more modern PHP autoload or slim-based approach)

    • Slim needs to be assessed against e.g. @NateWr's suggestions

    • And, of course, our API needs a specification :)

+1, watching.

@asmecher about authen/z, would using Slim's middleware be an option? The example I pointed to uses it at https://github.com/mjordan/ocr_rest/blob/master/index.php#L23 to do basic IP whitelisting.

Not to complicate matters, but another PHP microframework that might be worth evaluating is Silex, which is a Symphony component.

@mjordan, we have our own rulesets for authorization/authentication, and porting these over to another framework will be a colossal pain. I want to decrease the amount of homebrew infrastructure we maintain but would advocate for baby steps -- introducing something like slim that doesn't dupe a lot of it, but uses modern/standard approaches, is a step in the right direction.

I'll assess this as I go -- we might discover a fundamental impedance mismatch when we try to use the auth framework from within the microframework -- but for now it seems like an easy win.

  1. In terms of auth, what are we able to re-use when hitting the API from a third-party client (say, a different domain or from a mobile app)? We'll probably need an oauth implementation before we can support any POST/PATCH/PUT/DELETE requests, and oauth2 is far simpler than oauth1. I think it's worth requiring SSL (oauth2 requirement) since we can impose some up-front standards on this thing.
  2. I know it's early, but can we version that endpoint (/api/v1)? ;)
  3. What about [index.php]/api and [index.php]/api/context/{contextPath}?

What about [index.php]/api and [index.php]/api/context/{contextPath}?

I'm not sure that will play well with URL rewriting.

@asmecher : I have done some refactoring to move authorization to a Middleware. Then add a generic getParameter to APIHandler to replace getEntityId. cf. https://github.com/kaschioudi/pkp-lib/commit/cc49926aace45c5c37650c6556126ce8f00f5bc3

rest api post/put/delete endpoints work PR: https://github.com/pkp/pkp-lib/pull/2622

@kaschioudi, is there one also for the OJS repo?

@asmecher : not at the moment. Once I do some refactoring based on your comments and move IssueRepository out of pkp-lib into OJS, I will create a pull request.

OK, sure. It's just a lot easier to post comments on a branch when there's a PR to attach them to, even if the PR isn't ready for merging.

We've now (just about) got an API in place for OJS 3.1. @kaschioudi has done amazing work wrestling the API infrastructure into place. I want to give a brief overview of where we're at, then raise issues for further feedback and refinement before 3.1 ships.

Where we're at

Several endpoints are available at /index.php/api/v1/. Each endpoint is serviced by an APIHandler class which you'll find under /api/v1/ in the OJS directory.

We're documenting the API using a swagger file. The most up-to-date copy is this gist, but we'll be keeping it updated in master under /docs/dev/swagger.yaml. To read the documentation, dump the contents of the gist file into the swagger editor.

Any endpoint prefixed with _ is a private endpoint, which we use to service OJS 3 app components. It's recommended that you do not rely on these for custom code, as they may change without warning. (These endpoints are not fully documented yet.)

A comment below introduces authorization with the API.

Discussion

I'd like to get some discussion going on the following questions around the API's design, so that we can get things as refined as possible by the OJS 3.1 release. This will reduce the amount of breakage we have to do in the early days of the API.

API design questions (consumption)

1. Should objects associated with the primary request be embedded or linked?

Should a request for a submission (/submissions/{submissionId}) render out the associated authors, galleys, files, sections and participants in the response? Or should it provide a link to an API endpoint where it can be retrieved.

Example of embedding authors:

{
    "title": "Magna vivamus sagittis eget dictum nisl elit nam non iaculis",
    "authors": [
        {
            "name": "Carlo Corino",
            "affiliation": "University of Bologna",
            "orcid": "http://orcid.org/0000-0002-1825-0097"
        },
    ],
}

Example of linking authors:

{
    "title": "Magna vivamus sagittis eget dictum nisl elit nam non iaculis",
    "authors": [
        'http://localhost/api/v1/authors/1'
    ],
}

Embedding authors provides a richer set of information in a single API request, but will require more database hits per request. We're also more likely to get bogged down in questions about how deep the embedding should go and what should be contained in a summary presentation of an object.

2. What use-cases should we aim to support for the OJS 3.1 release?

Are there any client applications for the API that people want to start working on right now? Can we devise a set of minimum viable endpoints that we want to be polished and finalised by 3.1's release?

The answer to this will also bear on the question of linking vs embedding (1). If we presume someone wants to use the API to render a user-facing website, embedding might be more important. But if we think client apps are more likely to want to import the data into their own systems, linking will probably be easier to build, document, and maintain with long-term compatibility.

API design questions (code)

3. How can we service the official and private endpoints with as much shared code as possible?

We'll use the API for backend UI components in the future. How can we keep the code duplication to a minimum? The /_submissions endpoint is the first private endpoint that's had serious attention. It uses the new Service class to provide data to the endpoint.

It would be good to investigate the pattern used there, to see if it works for servicing the public /submissions endpoint. And whether that pattern can be extended to the other public endpoints. If not, what other patterns could we use to cut down on the code we're maintaining?

4. How can we make our API endpoints open to extension from plugins?

Where should plugins get to intervene? And can we devise a pattern that works across the board for all of our endpoints?

The SubmissionsService class uses a series of hooks to make it's getSubmissionList method extensible. We use this to add details specific to OJS or OMP to the submission objects that are returned. Can this pattern be extended to other object service classes and incorporated into the API? Are there better approaches we could use that generalise the intervention in the API a bit more?

Existing issues

The following are a list of issues that we should try to address in time for the 3.1 release.

  • [ ] Every API endpoint should have an error response when any required params are missing or the request is malformed in some way.
  • [ ] Lots of things that should be integers get returned as strings. This isn't the API's fault, but it will cause headaches for consuming the API in a strictly-typed language. Object getData() calls often pull from a database where they're stored as strings. We should fix these in the getter methods of objects everywhere possible.
  • [ ] Objects should have a consistent representation whenever possible. For instance, if a file object is returned as part of a submission request or a galley request, that file object should be identical in both responses. This is an area where building out the service classes will help standardise things.

Discussion

1. I'm leaning towards linking over embedding. Although it means more work on the client side, I think it will be significantly easier to design and maintain the API if we keep the responses limited. How we do linking _and_ reduce code duplication with the private endpoints will be a tricky problem to solve, though.

2. I'd love to hear more input from our dev community here. My guess is that there won't be big demand for building a UI on top of the API. I think the API's most common use-cases will be for importing data into a different app/database, so requiring multiple requests won't be as much of a concern.

Additional concerns/comments

The following is a list of suggestions/thoughts that came out of my first detailed look at the public endpoints.

General:

  • In the response objects, I'd like to see the primary object ID _always_ use id instead of issueId for issues and submissionId for submissions.
  • If a parameter is localised (eg - title), I think the endpoint should supply the values for all locales.
  • /submissions/<id>/files/: The file component information should be added. This will also be necessary to know what is included in the metadata.

Submissions:

  • Currently if a submission is unpublished, /submissions/<id> returns a metadata format in JSON. Submission metadata schemas should be available at a separate endpoint. The response format of /submissions/<id> should be consistent between published and unpublished submissions, with keys missing or set to null if needed.
  • There's still a decent amount of data (like metadata, copyright, etc) someone might want from /submissions/<id> that isn't available.
  • I'm not sure we should be returning all enabled citation formats with the response. Maybe one citation. (All the citation format plugins are going to be obsolete soon and there will be an endpoint to retrieve a citation. See #723).
  • The galley details in /submissions/<id> should match what's presented in /submissions/<id>/galleys/.

Issues:

  • Maybe the current key should be isCurrent? What about making all booleans use this grammatical form?
  • How important do we think the volume, number, and year filtering is? Would it be better to filter by published status, maybe a published date range?
  • The article array includes section titles used as object keys. I don't think that will be safe in all circumstances and won't work if we want to pass the full localised set of section titles. If the client app wants to import, it will probably want to see section IDs as well.
  • Currently the issueGalleys just return a series of IssueGalley objects, not set up for the API, so everything's under the _data key. That could be broken down into an object specifically for the API.

Files:

  • Should we rename this endpoint to /download? We might want a /files endpoint to return data about a file.

Galleys:

  • $submissionFile may be empty after the $galley->getFile() request. Need some error handling here.
  • The redirect to /files needs a base path added, as it will not work if OJS is located in a subdirecetory (eg - domain.com/ojs/index.php).
  • If this endpoint is only going to redirect to files, I'd say don't include it. Unless we want to retreive some info about a galley.

Thanks, @NateWr and @kaschioudi. On use cases and embedding vs. linking: an alternate front-end is probably the single most common use case. While that often arose because of limitations in the OJS 2.x UI, it won't go away entirely -- we still have users who have an existing website/toolset they want to use, pulling OJS data into it rather than trying to get OJS to match.

@NateWr, I usually wouldn't like this kind of approach, but what about a hybrid? Something like:

{
    "title": "Magna vivamus sagittis eget dictum nisl elit nam non iaculis",
    "authors": [
        {
            "name": "Carlo Corino",
            "affiliation": "University of Bologna",
            "orcid": "http://orcid.org/0000-0002-1825-0097"
            "href": "http://localhost/api/v1/authors/1"
        },
    ],
}

...where name, affiliation, and orcid are simple, unstructured, and presumably just enough to provide a basic summary alongside the article; and href provides an access point for richer metadata, e.g. in some cases tools to query metadata in a variety of scholar-friendly formats.

We would of course be subject to the usual demands for richer data in the simple query (thinking e.g. structured author names), which I'd suggest we fairly strictly decline.

Thank you very much @NateWr for your valuable inputs.

Regarding object linking and embedding, I see them as additional features provided to ease hyperlink between resources (linking) and reduce the number of HTTP requests required (embedding). Ideally the API should support both features. Linking when supported is usually available by default whilst Embedding can be enabled for each request.

We could define a light and a full representation. When the entity is explicitly requested (e.g /authors/{authorId}) we send the full representation otherwise a light representation (e.g submission's author from /submissions/{submissionId}).

Let’s consider the following representations of author object:

Light

{
    "id": 1,
    "name": "John Doe"
}

Full

{
    "id": 1,
    "firstName": "John",
    "lastName": "Doe",
    "country": "Canada",
    "suffix": "",
    "email": "[email protected]",
    "primarycontact": "555-456-9635",
    "affiliation": "University of Bologna",
    "orcid": "http://orcid.org/0000-0002-1825-0097"
}

Requesting /submissions/123 could be:

{
    "id": 123,
    "title": "Praesent semper ultrices magna, in ultrices ipsum commodo quis",
    "authors": [
        {
            "id": 1,
            "name": "John Doe",
            "href": "http://example.com/api/v1/authors/1"
        },
    ],
}

and /submissions/123?_embed would return

{
    "id": 123,
    "title": "Praesent semper ultrices magna, in ultrices ipsum commodo quis",
    "authors": [
        {
            "id": 1,
            "firstName": "John",
            "lastName": "Doe",
            "country": "Canada",
            "suffix": "",
            "email": "[email protected]",
            "primarycontact": "555-456-9635",
            "affiliation": "University of Bologna",
            "orcid": "http://orcid.org/0000-0002-1825-0097"
        }
    ],
}

@asmecher : I saw your comment after replying to @NateWr . I believe we both suggested the same hybrid approach. I am also suggesting to be able to embed full entity object if requested.

an alternate front-end is probably the single most common use case... we still have users who have an existing website/toolset they want to use, pulling OJS data into it rather than trying to get OJS to match.

I think most will want to import to their system, rather than fetch from the API on-the-fly in their custom frontend. For instance, if I were building a WP integration, I'd want to pull the data and store it in the WP database. In my mind, this would still favor a linking over embedding approach.

I believe we both suggested the same hybrid approach. I am also suggesting to be able to embed full entity object if requested.

I had the same thought as both of you. I know the WP API uses an _embed parameter like you've suggested.

I'm a little nervous about code duplication in the process of converting objects into the JSON responses. Ideally, we'd have a pattern in which we can serve light and full data, as well as the backend endpoints, without necessarily writing separate code to handle each case. The toArray method in the SubmissionService class (here) was an attempt at such a pattern. Could this be re-used for the public API?

API authentication and authorization inner working

There are 2 ways to authenticate requests:

  • Browser cookie
  • api access token

Cookie based authentication is useful when an API endpoint is requested from a browser with an active/valid OJS session. For instance the new submissions list grids take advantage of that feature to fetch data from API backend endpoints. Other clients would lean on token based authentication using a valid access token. Please note that OJS users are able to obtain access token via the “API Key” tab from their profile page.

Behind the scene API Authentication’s is built on top of OJS authorization framework. After API’s handler bootstraps, there’s a middleware (ApiAuthorizationMiddleware) which intercepts all requests and performs authorization by calling the authorize function on the loaded handler. This means that each handler is in charge of making sure that requests are properly authorized. Handlers authorize requests using OJS policies.

I would like to add that at the moment, there's a little bit of rules duplication when it comes to request authorization. It would be good if we could centralize these rules for reusability.

I'm a little nervous about code duplication in the process of converting objects into the JSON responses. Ideally, we'd have a pattern in which we can serve light and full data, as well as the backend endpoints, without necessarily writing separate code to handle each case. The toArray method in the SubmissionService class (here) was an attempt at such a pattern. Could this be re-used for the public API?

I looked at toArray() implementation and it's a good attempt. Nonetheless my concern is that such helper function encapsulates all the logic in one place and does not provide flexibility. Since most entities in OJS are objects, we could look into decorator pattern.

We could have an abstract class like

abstract class EntityApiPropertyList {

    protected $fullList = null;

    public function __construct($fullList = false) {
        $this->fullList = $fullList;
    }

    public function propertyList() {
        if ($this->fullList)
            return $this->fullPropertyList();

        return $this->lightPropertyList();
    }

    abstract public function lightPropertyList();
    abstract public function fullPropertyList();
}

and have an implementation for each entity. for instance, Author's could be like

class AuthorEntityApiPropertyList extends EntityApiPropertyList {

    protected $author = null;

    public function __construct(\Author $author, $fullList = false) {
        parent::__construct($fullList);
        $this->author = $author;
    }

    public function lightPropertyList() {
        return [
            'id'    => $this->author->id,
            'name'  => $this->author->getFullName(),
        ];
    }

    public function fullPropertyList() {
        return [
            'id'    => $this->author->id,
            'name'  => $this->author->getFullName(),
            'orcid' => $this->author->getOrcid(),
            'email' => $this->author->getEmail(),
            ...
        ];
    }

}

This is how we would get an object representation

$props = with(new AuthorEntityApiPropertyList(
    $author, 
    $this->getQueryParam('_embed', false)
))->propertyList();

note: with() is helper function introduced by illuminate/database useful for chaining

Therefore entities linked to other entities could reuse their linked entities' formatter like

class SubmissionEntityApiPropertyList implements EntityApiPropertyList {

    protected $submission = null;

    public function __construct(\Submission $submission, $fullList = false) {
        parent::__construct($fullList);
        $this->submission = $submission;
    }

    public function lightPropertyList() {
        $props = [
            'id'    => $this->submission->getId(),
            'title' => $this->submission->getTitle(null),
        ];

        // first author
        $authors = $this->submission->getAuthors();
        $author_props = new AuthorEntityApiPropertyList($authors[0], $this->fullList);

        $props = array_merge($props, $author_props);
        return $props;
    }

    public function fullPropertyList() {
        // to be implemented
    }
}

This way modifying entities will cascade to other linked entities.

@asmecher @NateWr Let me know what you think.

It may just be personal preference, but I'm hesitant to introduce a new tree of classes. I thought the point of introducing the Services pattern was to consolidate the object handling in a centralised location.

I'd be inclined to perform your approach above using the Services classes like this:

class SubmissionService extends Service {
    function getPropertyList($submission) {
        $props = [
            'id' => $submission->getId(),
            'authors' => \Services\AuthorService->getSubmissionAuthorPropertyList($submission),
            'reviewRounds' => \Services\ReviewRoundsService->getSubmissionRoundPropertyList($submission)
        ];
    }
}

Is that not how you see the Services classes being used?

Also, I'd rather turn getPropertyList into a more flexible method that can be used for light/full lists, as well as other configurations a third-party may desire. So that would look something like this:

class SubmissionService extends Service {
    function getPropertyList($submission, $params = array()) {
        $props = [
            'id' => $submission->getId(),
        ];

        if (!empty($params['authors'])) {
            $props['authors'] = \Services\AuthorService->getSubmissionAuthorPropertyList($submission);
        }

        if (!empty($params['reviewRounds'])) {
            $props['reviewRounds'] = \Services\ReviewRoundsService->getSubmissionRoundPropertyList($submission);
        }
    }
    // If we wanted shorthand functons, we can add those as an additional layer
    function getSummaryPropertyList($submission) {
        return $this->getPropertyList($submission, array('id' => true));
    }
    function getFullPropertyList($submission) {
        $params = array(
            'id' => true,
            'authors' => true,
            'reviewRounds' => true,
        );
        return $this->getPropertyList($submission, $params);
    }
}

This way, getPropertyList is it's own little API that isn't tied directly to the use-cases we're addressing right now, but instead just provides an interface for transposing associated information as needed.

Is there maybe some common ground here?

I see the service layer as a place to define complex business logic or functions with interact with multitple entities. Nonetheless under the hood it's still a big class with a bunch of global functions. Therefore there is a risk of overuse with the consequence of ending up with very large unmanageable classes.

The other aspect I dislike with service methods is that they make it hard to extend functionnalities. For instance, once we define let's say SubmissionService::getPropertyList() and we start using it, if for whatever reason we need to modify implementation in a very specific scenario, we will have to modify this method (with the risk of affecting existing code) and add new rules whilst things will get more complicated for plugins.

But that being said, I am comfortable with either implementations.

During the pkp2017 sprint, we discussed a way forward for generating properties for the API. Kassim is going to work on implementing that. I'm going to try to draw up a solid spec for the endpoints we want to ship in OJS 3.1 so that Kassim has a good idea of the end-goal and can start building out the endpoints.

Dulip did a lot of work drawing up OMP specs, which will come in handy when we get to that.

I've drawn up a proposal for the API specification to ship OJS 3.1. I decided to work towards the minimum endpoints needed to port published articles to a third-party platform. I then added on some things that we'll need to eventually synchronise the backend and public endpoints.

A copy of the swagger file for this specification can be found in this pull request: https://github.com/pkp/ojs/pull/1484

The description of each endpoint outlines what features to include for 3.1, and additional notes indicating what kind of data is expected for specific keys.

I've also got the following questions:

  • Should DOIs (and other pubIds) attach their own data to the api (eg - under a doi key) or should we generate all pubId details automatically under a pubId key?
  • Is there interest in having a /submission/{id}/metadata/{format} endpoint to extract metadata in a given format?
  • Should file downloads be accessed through the existing URL in the system (ie - not an API url) or at a separate /download endpoint?
  • What access details should be included about a submission or issue, related to subscriptions? (@asmecher, any thoughts?)

    • How do we want to handle the tableOfContents in issues? The approach I've used seems closely tied to the existing UI, where articles are sorted into sections. But I can see a future where people want alternatives to how articles are sorted. Should we leave this out for 3.1 until we think about this problem?

@NateWr, I've just merged more of the subscription stuff, moving from the back end towards the front end. Our access model looks kind of like this (carried over from OJS 2.x):

  • Journals can be open-access, (possibly partially) subscription, or no publishing at all (or via external system)
  • Issues can be open access or (possibly partially) subscription
  • Articles can be open access or fall back on the issue's setting

We support delayed open access, i.e. an issue can become open access x number of months after publication.

Subscriptions are individual or institutional. Individual subscriptions typically require login; institutional subscriptions are typically by domain or IP address.

I would suggest exposing the access status (attached to either issue or article) via the API for requests of both article and issue metadata.

Is any of that helpful?

@asmecher Thanks yeah, so my understanding is:

  • Submissions only have an access status
  • Issues have an access status and (optionally) a date when it will be open access

Does the access status contain the information about whether a login is required? ie - is returning the access status sufficient to "understand" access?

Finally, should any of this information be "protected" from an unauthorized user if a submission is not open access?

  • id
  • section
  • issue
  • title
  • subtitle
  • fullTitle
  • prefix
  • authors
  • abstract
  • language
  • pages
  • datePublished
  • status
  • submissionProgress
  • urlWorkflow
  • urlPublished
  • galleys (summary of details, with download URL -- but presumably download URL is not accessible)
  • doi

None of that should be protected, I would say. It's nothing more/less than we're already exposing via the web interface, with the exception of when that's turned off -- maybe we'll need to consider a second switch for the API, as some users will turn off the publishing front end in order to use the API in its place.

Your summary of the access information is accurate. You can almost get away with just fetching those using getters on the article/issue objects and just passing it along -- except that you'll also have to respect the journal's override (i.e. the "everything is open access" setting) as it'll override any subscription settings on individual objects.

@kaschioudi, is this one ready to be closed, or are there still parts needing work before OJS 3.1 is released?

@asmecher this is not ready yet. Before switching back to OTS plugin I was in the process of integrating the architecture patterns that we came out with during the sprint.
https://github.com/kaschioudi/ojs/tree/api-property-list
https://github.com/kaschioudi/pkp-lib/tree/api-property-list

I'm getting a minor warning that could use some attention:

PHP Warning: Declaration of APIRouter::url($request, $endpoint, $params) should be compatible with PKPRouter::url($request, $newContext = NULL, $handler = NULL, $op = NULL, $path = NULL, $params = NULL, $anchor = NULL, $escape = false) in /home/travis/build/pkp/omp/lib/pkp/classes/core/APIRouter.inc.php on line 24 [27-Sep-2017 21:01:09 UTC] PHP Stack trace: [27-Sep-2017 21:01:09 UTC] PHP 1. {main}() /home/travis/build/pkp/omp/index.php:0 [27-Sep-2017 21:01:09 UTC] PHP 2. Application->execute() /home/travis/build/pkp/omp/index.php:64 [27-Sep-2017 21:01:09 UTC] PHP 3. Dispatcher->dispatch() /home/travis/build/pkp/omp/lib/pkp/classes/core/PKPApplication.inc.php:239 [27-Sep-2017 21:01:09 UTC] PHP 4. Dispatcher->_instantiateRouter() /home/travis/build/pkp/omp/lib/pkp/classes/core/Dispatcher.inc.php:96 [27-Sep-2017 21:01:09 UTC] PHP 5. instantiate() /home/travis/build/pkp/omp/lib/pkp/classes/core/Dispatcher.inc.php:179 [27-Sep-2017 21:01:09 UTC] PHP 6. import() /home/travis/build/pkp/omp/lib/pkp/includes/functions.inc.php:203 [27-Sep-2017 21:01:09 UTC] PHP 7. require_once() /home/travis/build/pkp/omp/lib/pkp/includes/functions.inc.php:25

@kaschioudi It looks like APIRouter::url() is not actually used. Can we remove it?

https://github.com/pkp/pkp-lib/blob/master/classes/core/APIRouter.inc.php#L121-L130

@NateWr : Sure.

I think we can now close this issue and any future API development can go into new issues. Congrats @kaschioudi this is a big one!

Was this page helpful?
0 / 5 - 0 ratings