Pkp-lib: API calls fail if context path is obscured by RESTful URL configuration

Created on 12 Jul 2019  路  39Comments  路  Source: pkp/pkp-lib

Describe the bug
See: https://forum.pkp.sfu.ca/t/errors-after-removing-index-php/31314
and: https://forum.pkp.sfu.ca/t/ojs-3-1-2-1-journal-statistic/54224/4

To Reproduce
Steps to reproduce the behavior:

  1. Add a context specific base url which masks the context shortname, e.g.: base_url[journal] = https://my.journal.com
  2. Navigate to the Article Statistics (Dashboard -> Statistics -> Articles)
  3. Enter some text in the Article Details search
  4. Result will be 404s from the application API calls

What application are you using?
OJS 3.1.2

Additional information
Core problem appears to be here:
https://github.com/pkp/pkp-lib/blob/27eeff555bdcfa6da9f787d259184b1cfacff025/classes/handler/APIHandler.inc.php#L174
This pattern requires the URL to contain the context key.

My trace through the code to find this is documented here:
https://forum.pkp.sfu.ca/t/errors-after-removing-index-php/31314/33?u=ctgraham

All 39 comments

@NateWr , @asmecher , I'm not sure if there is a way to designate the contextPath as optional in this pattern, or if we need to add a bit more logic here:
https://github.com/pkp/pkp-lib/blob/27eeff555bdcfa6da9f787d259184b1cfacff025/classes/handler/APIHandler.inc.php#L70-L78

Thanks for chasing this down @ctgraham! So, getEndpointPattern is just a little helper method for when we define our endpoints.

It should be possible to make this behave correctly as long as there is some way to determine whether the contextPath should or should not appear in the URL. Can this data be reliably extracted from the Config or is it somewhere else?

I'm pretty fuzzy on the internals of the routing so I'm not sure where to look to sort this out. One hiccup we may need to check is how this effects site-wide API endpoints that do not expect the context to be passed.

To make routing a little fuzzier, I feel ambivalent about whether the contextPath is part of the application URL arguments, or if it is really part of the base_url or base_url[context]. Obviously, since it is included in after index.php, and is the first argument of Router::url(), it is an internal argument, but whenever I'm communicating API endpoints externally (e.g. SUSHI-COUNTER), it is always in the form of base_url and endpoint, where the base_url includes the context (named or unnamed).

I imagine it is a heavy ask, but what if API endpoints were built from the base_urls, rather than from the application root?

Otherwise, what we are looking at is examining the PKPRequest object in concert with the config.inc.php base_url settings to identify whether the contextPath is named or not. We can definitely do that if needed; I think there was something like that to process usage statistics URLs, though I'm not immediately finding it.

I imagine it is a heavy ask, but what if API endpoints were built from the base_urls, rather than from the application root?

Hmm, maybe my memory is failing me, but I think the base_url doesn't include the context path unless it is configured like:

base_url['journal'] = http://journal.com

Without this configuration, the baseUrl that we get from Request->getBaseUrl() is something like: http://ojs.com without the context path.

Maybe that's not what you were referring to, or maybe I'm wrong there.

The URL construction is definitely complex. APIRouter::url() needs to follow the method signature Router::url(), so I think that's why the $newContext is in there.

The base URLs could look like:

base_url = http://journals.mysite.com
base_url[index] = http://journals.mysite.com/index
base_url[joj] = http://journalofjournals.com
base_url[another] = http://journals.mysite.com/another

The context path is present in the site index and in the journal another, but is masked in joj.

If I were to share API information (say for OAI-PMH) for any of them, I would describe it as the "base url" + "/oai/", not the application formulation of domain + optional context + "/oai/".

So, should this external formulation carry back at some level into the internal formulation?
APIRouter::url() correctly formulates the URIs without the context as an explicit URI component. The URL https://something.org/stats/publishedSubmissions will create API calls to https://something.org/api/v1/stats/publishedSubmissions?..., where the contextPath is implicit in both URIs. So, what would the implications be if APIHandler::getEndpointPattern presumed that endpoint patterns originated from a base_url?

$this->_pathPattern = '/api/{version}/' . $this->_handlerPath;

I haven't spent much time in the API code, so perhaps this makes no sense.

So, should this external formulation carry back at some level into the internal formulation?

I think you're right that we should build the API urls off of the external formulation, whatever that is. And I agree that for your example the API should be http://journalofjournals.com/api/v1/whatever. But I'm not sure if internally we use the baseUrl in that way.

For example, a default config might look like this:

base_url = "http://localhost:8000"

And my publicknowledge journal is accessed at http://localhost:8000/publicknowledge. But when I call Request::getBaseUrl() from within that context, it returns http://localhost:8000 without the context path.

So, internally, I don't think we include the context path in the baseUrl, and therefore I'm not sure if we can use it as the basis for constructing API urls.

I'm sure there is probably a request method that will return to us what we want, or maybe we need to write one that checks against the config file and adds the context path where necessary. Alec will probably be able to think of this off the top of his head.

I haven't spent much time in the API code, so perhaps this makes no sense.

I think it does. I think that we just need to tell the SlimAPI handler how to recognize a URL for which it should be invoked. I think all of the other stuff (like setting up the request context) is done by our handlers so the only trick is getting the URL route regex set up properly. (I also haven't dug in enough here to know how it works.)

And my publicknowledge journal is accessed at http://localhost:8000/publicknowledge. But when I call Request::getBaseUrl() from within that context, it returns http://localhost:8000 without the context path.

Oi. You are right. The base_url in the config.inc.php is not necessarily consistent with the getBaesUrl() method of PKPRequest.

I found the method I was vaguely remembering from the Usage Stats implementation: Core::removeBaseUrl(). The code I'm imagining would look something like:

$apiEndpoint = Core::removeBaseUrl(APIRouter::url(...));

Thanks @ctgraham, that should be helpful. I will have to dive into this further to figure out a solution. The tricky part, I think, is that the pathPattern uses placeholders ({contextPath}, {version}) that are then incorporated into Slim's routing. So we won't necessarily be able to strip the base url from the pathPattern directly. Somehow, we'll need to intervene in Slim's route determination and I haven't touched anything like that before.

@asmecher do you think that this should be assigned to 3.1.2-2 or 3.2 milestone?

Hrm, seems like a 3.1.2-2 to me.

I've been staring at this one for a while and I have to say it's a can of worms. A related-but-different issue: this same problem should affect the submission lists, but doesn't, because there URLs are glued together (and they shouldn't be). Deferring to OJS/OMP 3.1.2-3, and this may also need a major release as it might involve some fundamental changes (e.g. to the way we override URLs in the config file).

(See also: https://github.com/pkp/pkp-lib/issues/3110)

URLs are glued together (and they shouldn't be)

They're not in master.

URLs are glued together (and they shouldn't be)

They're not in master.

OK, that's good news, except that this means that the submission list API calls will suffer the same problem as the stats API calls when the context path is rewritten out -- so it'll definitely need to be fixed for 3.2. I'll take another look.

PR for early review: https://github.com/pkp/pkp-lib/pull/5201

So I don't love this solution because it's a bit of a hack. However, that might be necessary barring big (breaking) changes, and it does have some advantages.

The source of the problem is this: OJS routes using CGI PATH_INFO, and Slim routes using REQUEST_URI. PATH_INFO is not subject to mod_rewrite's meddling, but REQUEST_URI is. So OJS doesn't need to worry about mod_rewrite's effects on its routing, but Slim does.

The suggestions described above go about the problem by trying to detect the mismatch and adjust the route patterns accordingly. I think that'll probably be convoluted and unreliable.

The solution I've proposed here instead injects a PATH_INFO-based path into Slim directly and has it work with that instead of REQUEST_URI. Thus both OJS and Slim work from the same source.

Unfortunately Slim (and underlying Fastroute) are built from the ground up to route from URIs as their source, and there's no good cleave point I could find in which to inject something different in a nicer way than this.

There's still considerable risk that someone's environment somewhere is going to have an allergic reaction to this approach, and unfortunately I can't think of any way around this that doesn't take the same risk. One benefit of merging to stable-3_1_2 for release in OJS/OMP 3.1.2-2 is that this issue only affects a few things; starting with 3.2, the submission lists will be subject to this problem, so hopefully we can field-harden this change before that happens.

@ctgraham and @NateWr, what say you?

The solution looks straightforward enough to me. A bit of glue will always be necessary when we're marrying together two approaches to routing.

In terms of the 3.1.2-2 vs 3.2 question, I am tempted to hold it back until 3.2. I take your point that we would get a wider distribution of testers with 3.1.2-2, but I am keen to build as much stability into our maintenance releases as possible.

3.2 will be our most heavily tested .x release, due to the long testing cycle after November as well as the scale of the changes going into it. We'll have a lot of PS staff testing, and it will also get more rigorous testing during the upgrade process for those hosting it elsewhere.

I am tempted to hold it back until 3.2. I take your point that we would get a wider distribution of testers with 3.1.2-2, but I am keen to build as much stability into our maintenance releases as possible.

Yup, this kind of change makes me uncomfortable for a build too. My only hesitation is that our testing is generally very monocultural, and this change is one that, if it does introduce regressions, will be very dependent on the platform and configuration (i.e. web server, rewriting, etc). We're not generally good at testing that representatively.

On the flip side, just in getting my head around the problem I've already tested it both using Apache (rewriting and non-rewriting) and Lighttpd; the Travis test scripts use PHP's built-in web server (no rewriting). @ctgraham, can you test the PR (https://github.com/pkp/pkp-lib/pull/5201) to see if it solves the problem on your environment? That's not a bad diversity of environments already.

Yes, I am planning to test this PR here. Probably tomorrow...

This resolves the reported issue for my instance of Apache 2.2.15 / PHP 7.2.23 with mod_rewrite masking the context path to the root:

    RewriteEngine on
    RewriteRule ^/(index.php)?/?$ /ojs/index.php/JOURNALNAME/index [L]

    RewriteRule ^/JOURNALNAME/(.*)$ /$1 [R,L]

    RewriteCond %{REQUEST_URI} !/index.php/
    RewriteCond %{DOCUMENT_ROOT}ojs%{REQUEST_URI} !-d
    RewriteCond %{DOCUMENT_ROOT}ojs%{REQUEST_URI} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^(.*)$ /ojs/index.php/JOURNALNAME/$1 [QSA,L]

    RewriteCond %{DOCUMENT_ROOT}ojs%{REQUEST_URI} -f
    RewriteRule ^(.*)$ /ojs/$1 [QSA,L]

If it doesn't make it into 3.1.2, I'll probably be including it as a local in our installs, to move our remaining 3.1.1-4 into 3.1.2.

Thanks, @ctgraham and @NateWr. I opted against merging into stable-3_1_2 (the patch is available if anyone needs/wants it) and merged to master instead: https://github.com/pkp/pkp-lib/commit/77c4f0ab23e108035c3ca0aa17a7c8865c54723f

This removes some of the duplication, too, so I'm more satisfied with it as a long-term fix.

@asmecher Tried 6bbcb58 and 77c4f0a on a OJS 3.1.2-4 system. Did not work out, made things worse. When backend loads --> errors.

@mpbraendle , what is your base_url and base_url[] configuration, and what rewrite rules are you using?

Hi, while upgrading a journal with 7 years old database, I tried hard and succeeded. I faced all the listed problems, and fixed them with the help of OJS forum. In my site, I modified apache httpd.conf to remove journal prefix and publish multiple journals on their own domains.
I had the same API error after upgrading to OJS 3.2.0.2
I also tried some fixes here, but no success.
Is there any good news about this point?
Best Regards,

@drugurkocak, same questions as above -- what is your base_url and base_url[] configuration, and what rewrite rules are you using?

Hi @asmecher ,

Thank you for your support. As a dr, I wish health to you, your family and all OJS team on these Corona days.

The OJS version is 3.1.2-4 now because I had to restore the server from its backup image. But its configuration is identical to the one after upgrade. After getting success on my notebook (MAMP), I had applied the upgrade directly on the server side.

Tonight, I will create an exact duplicate of the OJS site on my test domains, and apply the upgrade again to get php-error logs. I had several fatal errors too, but I didn't save any of them.

The Journal gallery (base url) is
https://www.akademisyen.net
First Journal:
https://www.adlitipbulteni.com
Second Journal:
https://www.citymedicaljournal.com
Third Journal:
https://www.akademisyen.net/tjcr

OJS 3 Config:
base_url = "https://www.akademisyen.net"

allow_url_fopen = On

base_url[index] = https://www.akademisyen.net
base_url[cmj] = https://www.citymedicaljournal.com
base_url[tjcr] = https://www.akademisyen.net/tjcr
base_url[atb] = https://www.adlitipbulteni.com

restful_urls = On

My Server Config:
Webmin Panel 1.942
Operating system CentOS Linux 7.7.1908
Perl version 5.016003
BIND version 9.11
Apache version 2.4.6
PHP versions 5.4.16, 7.2.24, 7.3.11
Logrotate version 3.8.6
MySQL version 10.4.12

php runs as CGI wrapper (run as virtual server owner)

I want to send my mod_rewrite rules via personal message, because a programmer wrote it, and Because of my respect for him, I don't want to share it publicly.

Best Regards,
Ugur

@drugurkocak , in my experience the use of %{REQUEST_FILENAME} as a RewriteCond directive in your mod_rewrite rules will not work as you might expect with Apache after 2.2 (e.g. with your use of Apache 2.4). See, for example, this commented re-write rule example:

# If index.html precedes index.php, PATH_INFO will incorrectly list index.html in index requests
DirectoryIndex index.php index.html

# Ensure mod_rewrite is enabled
RewriteEngine On

# If the URI already has index.php in it, don't change it
RewriteCond %{REQUEST_URI} !/index.php/
# Skip existing directories if Apache 2.2 or later
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
# Skip existing files if Apache 2.2 or later
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
# Skip existing directories if Apache prior to 2.2
RewriteCond %{REQUEST_FILENAME} !-d
# Skip existing files if Apache prior to 2.2
RewriteCond %{REQUEST_FILENAME} !-f
# Rewrite all other requests to OJS
RewriteRule ^(.*)$ /index.php/$1 [QSA,L]

This doesn't redirect specific journals to mask the context path, but the example should be usable in combination with the example I tested successfully above.

Hi @ctgraham and @asmecher
Thank you for your support. I don't have a deep knowledge on Regex, but I will try to apply your solution on my test portal and share the results.
Best regards,

Hi. Unfortunately my trials were not successful. It is best to wait a little. I can wait even a year. This is not a problem that someone like me will solve. But I don't want to get stuck in OJS 3.1. The current mod_rewrite configuration works in OJS 3.1.2-4, it only gives errors when changing the date range in editorial statistics.

Since there are many people who use this feature like me, you should listen who want a configuration like this.

This reminded me of this; In 2010, I installed Typo3 to make our medical school website, but the administrators in the computing department removed it and installed Wordpress because they couldn't understand it and deal with it.
In those years, Wordpress was relatively young (10 years), and now millions of websites use Wordpress. I also moved my personal website from Typo3 to Wordpress when I found a nice template. There is nothing short of it.
I mean OJ3 is a great system, you are adding new features every day and doing a great job. This is also a need, and you should add it. Even if you have to make important changes.
Greetings to all of you.

Hello,
has there been any updates on this? Is there a known fix?
Thank you!

Hi @KBodarwe, looking over the history, it looks like a fix for this was released as part of v3.2. Updating to 3.2 or later should resolve this issue.

Thanks @NateWr for the answer. I have 3.2.0-3 installed on two seperate installations which both show the problem in the statistics and submissions pages.
I need to shorten the URL by index.php/JOURNALNAME/ to keep DOI continuity.
Here's my .htaccess:

RewriteEngine On
RewriteBase /

RewriteRule ^submissions(.*)$ index.php/journal/submissions$1 [R,L]
RewriteRule ^stats(.*)$ index.php/journal/stats$1 [R,L]

RewriteCond %{SERVER_NAME} ^www.server.de
RewriteCond %{REQUEST_URI} !/journal/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ index.php/journal/$1 [L]

this shows the complete URL via a redirect for the pages having issues, but sadly does not change the errors from appearing.

Apache Version is 2.4.38
PHP Version is PHP 7.3.14-1

Thanks!

I'm afraid mod_rewrite is not one of my strengths, so you may need to wait for further help from Clinton or Alec. That said, it looks to me like these two lines do not actually manipulate the API urls:

RewriteRule ^submissions(.*)$ index.php/journal/submissions$1 [R,L]
RewriteRule ^stats(.*)$ index.php/journal/stats$1 [R,L]

The API urls in the application are prefixed with api/v1 so they are /api/v1/_submissions and /api/v1/stats. Could that be the source of the discrepancy you're facing?

Hm, I figured the API somehow called upon the context of the URI, so by changing that to show the correct URI I could circumvent that problem.

Maybe redirecting the api calls directly? Like:
RewriteRule ^api/v1(.*)$ index.php/journal/api/v1$1 [R,L]
could that help?

I can't say for certain but that looks more likely to re-route API calls.

@KBodarwe , how are you handling paths to your site index? Does that also sit under this same DNS name? Your re-write rule seems incomplete, since if you turn on restful_urls, you haven't covered all URLs with mod_rewrite and it will break, but if you leave restful_urls off, you are expecting OJS to respond to (and with) RESTful URLs just with one journal path.

If you are isolating just this journal on the domain, then something like the following should work:

    RewriteCond %{REQUEST_URI} !/index.php/
    RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
    RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^(.*)$ /index.php/JOURNALANAME/$1 [QSA,L]

Where JOURNALNAME is your journal path.

Alternately, consider just using a standard documented re-write, such as:

# Ensure mod_rewrite is enabled
RewriteEngine On


# If the URI already has index.php in it, don't change it
RewriteCond %{REQUEST_URI} !/index.php/
# Skip existing directories if Apache 2.2 or later
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
# Skip existing files if Apache 2.2 or later
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
# Skip existing directories if Apache prior to 2.2
RewriteCond %{REQUEST_FILENAME} !-d
# Skip existing files if Apache prior to 2.2
RewriteCond %{REQUEST_FILENAME} !-f
# Rewrite all other requests to OJS
RewriteRule ^(.*)$ /index.php/$1 [QSA,L]

You can update the DOIs to point to new local URLs. That is the whole point of a DOI.

Thanks for the answer.
Site Index runs over the same DNS. This is an installation with a single journal, as we want to keep the userbases separate.
So as theres no reason to have the journalname in the URL I'm trying to strip it.

You can update the DOIs to point to new local URLs. That is the whole point of a DOI.

Probably will do that and and accept the journalname in the URL.

Update: I tried this approach:

`RewriteEngine On
RewriteBase /

# Bugfix for OJS 3.2.0-3: submissions API doesnt work with shortened URLs
RewriteRule ^api/v1(.*)$ index.php/journal/api/v1$1 [R,L]

# Rewrite URL to look cleaner
RewriteCond %{SERVER_NAME} ^www.server.com
RewriteCond %{REQUEST_URI} !/journal/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ index.php/journal/$1 [L]`

and it works.

The re-routing of the API seems to work.

Hi there!

I am not pretty sure if it is the same issue and if it is solvable with a correct htaccess, but I am trying to access any of the REST API endpoints, and keep receiving an error 500 like the following:

[19-Sep-2020 12:05:30 UTC] PHP Fatal error:  Uncaught Error: Call to undefined function import() in /var/www/webroot/ROOT/ojs/api/v1/contexts/index.php:16
Stack trace:
#0 {main}
  thrown in /var/www/webroot/ROOT/ojs/api/v1/contexts/index.php on line 16

My .htaccess is like the following:

<IfModule mod_rewrite.c>
  # Ensure mod_rewrite is enabled
  RewriteEngine On


  # If the URI already has index.php in it, don't change it
  RewriteCond %{REQUEST_URI} !/index.php/

  # Skip existing directories if Apache 2.2 or later
  RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d

  # Skip existing files if Apache 2.2 or later
  RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f

  # Skip existing directories if Apache prior to 2.2
  RewriteCond %{REQUEST_FILENAME} !-d

  # Skip existing files if Apache prior to 2.2
  RewriteCond %{REQUEST_FILENAME} !-f

  # Rewrite all other requests to OJS
  RewriteRule ^(.*)$ /index.php/$1 [QSA,L]
</IfModule>

And my other settings mentioned on this thread are these:

base_url = "https://semeiosis.editora.dev/ojs"
; base_url[index] = http://www.myUrl.com <-- commented, not active
; base_url[myJournal] = http://www.myUrl.com/myJournal <-- commented, not active
; base_url[myOtherJournal] = http://myOtherJournal.myUrl.com <-- commented, not active
restful_urls = On

@educkf , the error message of "PHP Fatal error: Uncaught Error: Call to undefined function import()" suggests you may be trying to access the index.php file directly from the URL, rather than allowing it to be routed through OJS's handlers.

For example, if you try:
https://myhost.tld/api/v1/_submissions/index.php
instead of:
https://myhost.tld/myJournal/api/v1/_submissions
Then OJS doesn't have the opportunity to bootstrap the request.

See this documentation here:
https://docs.pkp.sfu.ca/dev/api/#access-the-api

For additional troubleshooting, please open a discussion topic in the forum:
https://forum.pkp.sfu.ca/

We have a setup with multiple journals. Some use regular url's (site.com/JOURNALNAME) and some use subdomains (JOURNALNAME.site.com). For subdomains, we use wildcards. This is the htaccess that seems to work.

RewriteEngine On
RewriteBase /

RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

# Redirect regular url's without index.php
RewriteCond %{SERVER_NAME} ^site.com
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ index.php/$1 [L]

# Rewrite API calls to subdomain journals
RewriteCond %{REQUEST_URI} api\/v1\/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{HTTP_HOST} ^([a-z0-9-]+)\.site\.com$ [NC]
RewriteRule ^(.*)$ index.php/%1/$1 [L,R=307]

# Rewrite subdomain journals
RewriteCond %{SERVER_NAME} ^([a-z0-9-]+)\.site\.com$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ index.php/%1/$1 [L]
Was this page helpful?
0 / 5 - 0 ratings