Describe the problem you would like to solve
Use of iframes for display of HTML Galleys is suboptimal. Embedding the HTML directly content can improve consistency of presentation and ease of indexing.
Describe the solution you'd like
Who is asking for this feature?
https://forum.pkp.sfu.ca/t/html-galleys-and-iframes-in-ojs-3-2/62596
Additional information
Question: How should download links to the HTML be exposed? Should it be done on the abstract page, on the galley view, or both? Should it be optional?
Am I right in thinking that you want to take everything between <body> and </body> and display it on the article landing page? Wouldn't that end up with duplicated article title, abstract, contributors, etc? My understanding is that the HTML galleys typically include a full representation of the article.
Have you considered instead a plugin which adds a fullText property to the Publication object, and then adds a TinyMCE field in the Publication forms to create and save to that property?
This would allow the editor some control over the content outside of the HTML file itself, for example stripping out data that is already present on the article landing page. And it would align with our existing plans for full-text/JATS publishing, which plan to extract the body of a JATS document, render it to HTML, and save it to a fullText property (this is what JATSParser is doing).
This wouldn't be on the article landing page, but on the galley view page. As the htmlArticleGalley currently wraps the full iframe'd HTML in the galley view, this would instead insert the galley's body into the standard article display template. Dirty commits, but pending PR:
https://github.com/ulsdevteam/ojs/compare/uls-install...ulsdevteam:html-article-galley-inline
Article landing page:

Galley view:

Galley download:

I'm interested in more discussion on the fullText property on the Publication object, especially with regard to future use in JATS. We tried to get the editors connected with direct XML publication, but we couldn't get uptake in the same way that we could get uptake of an HTML template.
Oh! I see what you're doing now. It's a good idea and carefully implemented with the regex fallback to get content between the <body>. However, I'm keen to move away from HTML "galleys" and to move towards publishing full text right there on the article landing page, similar to how eLife does it.
We'll always support HTML galleys for back content. But what I envision is a fullText property that is a rendered HTML view of the article, minus the title, contributors and other information displayed on the article landing page. This could be as simple as adding a TinyMCE field to Publication tabs.
In the long run, we're looking to support this as part of our XML publishing by integrating with a WYSIWYG editor for JATS. The idea is that non-technical editorial staff can build and edit the full text in the OJS backend, and this is rendered to an HTML string and stored in the fullText property. (We're releasing a version of the Texture plugin to do this soon, but will be refocusing our efforts on a different editor in the future.)
The idea here is that OJS does not care where the fullText property comes from. It can come from our JATS editor, or it could come from a TinyMCE editor, or it could be extracted from a HTML galley, or it could be produced through a XSL conversion for publishers that have their own XML pipeline. Then our themes can be built around this fullText property rather than trying to account for where it comes from.
I can think of a couple of ways forward for you here:
We merge this to our HTMLArticleGalleyPlugin as an optional setting that works for now, and on back content that does not need to be processed in any way. My main concern here is that we (PKP) are then responsible for maintaining this, for troubleshooting every HTML that fails to display correctly, and for responding to every poorly formatted HTML page that doesn't render nicely on our themes. Putting this behind a setting will reduce our exposure here, but I'm still a bit nervous (I could be persuaded, though).
You split this off into a plugin and we get it into the Plugin Gallery. It'd be very simple to hook into the template display hook and make this happen. I'd be happy to do the initial work if Pitt is happy to be responsible for maintenance. :)
Another approach would be to build a plugin that implements the fullText setting right now. When an article is published (or a HTML galley is uploaded) your plugin could hook in, extract the body content, and save it to fullText. Then as we get themes updated to support fullText you could take advantage of our themes as well. (One caveat here is that publication_settings are stored as text in the database. Vitaliy is changing this in the next version, but you might run into string length limits for long articles unless you altered this column yourself.) This approach would set you up well for the long-term, but would probably require a little bit more work up-front (processing back content, making sure that extracted HTML does not include title, abstract, etc.)
Feedback from our journal editors / designers:
- The eLife design looks pretty slick. I especially like the dynamic the table of contents in the left-hand side, which is something I’ve seen before in some demos of the JATS plugin running on OJS.
- One of the huge upsides, it seems to me, of eventually doing away with the HTML galleys and replacing them with a slightly more intuitive, and partially automated, system like WYSIWG is that it could significantly the amount of labor that goes into producing said galleys, thus freeing up time to work on other aspects of the project, such as copyediting and producing PDF galleys, which also takes a lot of time. It would also lower the likelihood (seemingly) of mechanical errors being introduced during the conversion phase from Microsoft Word manuscript to HTML galley, which right now we’re doing more or less “by hand.” So in a word, the rationalization and partial automation of the publication process I regard as a good thing in and of itself.
- With that said, I would be concerned about losing some functionalities that we consider crucial. I suspect footnotes would be easy to retain, and could maybe even be enhanced, but I worry for instance about how this might impact the excursus boxes and whether those would be replicable within the context of a WYSIWG interface. Basically, I would not want to switch over to this new way of publishing if it meant that core functionalities either get lost or that re-introducing them somehow creates huge burdens for the publication process.
- Regarding options # 1 or # 2, I don’t feel strongly either way, since I think this is more of a question of how the integration of the plugin will impact the broader community of users – no matter what, we’re on board and will enable the plugin or turn on the “embed HTML galley” setting. Integrating it into the pre-existing plugin would certainly give it a lot more exposure, even if it’s tucked into an optional setting.
- Regarding option # 3, my understanding is that this is working more aggressively towards the long term plan of entering article content directly into an OJS data field. So basically it uses the code that you wrote to embed the HTML galley, grabs that content and enters it into the fullText field, and then – once enough of the backend functionality has been built out – the switch could be flipped to make the move away from the HTML galleys. As I said above, my only concern here is about whether we would lose anything in that switch (which, if I’m not mistaken, would also entail somehow converting the HTML to XML, no?). If not, then I think it’s an exciting possibility.
- One other thing I think would be worth bringing up is your idea about creating plugins that can be interfaced into the right-hand navigation area, or more broadly, being able to tinker with what gets displayed on what is currently the galley view, but what could eventually just be the article landing page, if that’s where the content of the article will eventually get displayed. Let’s say for instance that the HTML galley gets dropped in favor of directly embedding content on the landing page. In that case, I assume some of the content boxes beneath the abstract will need to be repositioned to accommodate that change – giving journal managers and plugin designs the ability to swap different side-panel boxes in or out of the navigation bar would be a really nice way, in my opinion, to lend those landing pages additional functionality while retaining some of the old functions (citation format and links to issue/volume).
The "excursus box" functionality mentioned references certain aside content which is classed to have javascript handlers manipulate the display to expand/fold the content. It will be an interesting design challenge to decide what scripted (or styling?) "content" is part of the article, vs. part of the UI.
Thanks @ctgraham, that's helpful feedback. And it largely aligns with the challenges we've heard/faced with XML so far: generating full-text from the XML is a dream, but everyone's galleys are so different that it is unrealistic to expect a WYSIWYG editor to accommodate all of these variations.
For example, we specifically moved the boxed-text JATS tag off of our list of tags that we want the editor to support, which may be a requirement to implement the excursus box you describe (got an example?). I suspect only a small segment of our community will be able to move to such an editor right away, and it will be years before it expands enough to support most use-cases.
That said, the purpose of the fullText field is to allow for this gradual adoption. If the XML->HTML conversion isn't sufficient for you, you can generate the fullText HTML another way, and that content would persist even if you eventually were able to convert directly from XML.
I do think # 3 this is the best approach for you to take _for the long term_. But I appreciate that you have deadlines and limited resources, and it may be more effort than you're able to put in right now. I'm open to discussing 1 or 2 if that's what you need now.
It will be an interesting design challenge to decide what scripted (or styling?) "content" is part of the article, vs. part of the UI.
We'll be tackling this in two ways:
A "clean" HTML presentation with semantic tags that will adopt existing styles as much as possible.
A hook to allow plugins to intervene and tailor the markup to their own needs (:warning: this may be a bit like hacking template output at first!)
I'm sketching this out currently as a contributed plugin, which extends HtmlArticleGalleyPlugin:
https://github.com/ulsdevteam/inlineHtmlGalley
(so don't go weirdly changing HtmlArticleGalleyPlugin, or pulling it from the codebase!)
Great! In your template I'd recommend replacing:
{$inlineHtmlGalley}
With:
<div class="inline_html_galley">{$inlineHtmlGalley}</div>
It will make it easier to override theme styles that may be applied to common elements (like <p> and <div>) and apply formatting specifically to the body content.
Planned to do that initially, but then thought it should be the responsibility of the Galley designer to add any wrapping elements in their galley. Is it better to add divs the designer may not want?
If the editor wants a containing div, it should be entered as part of the HTML galley. This way, the journal staff can choose, and can apply their own class names.
The benefit of a wrapping div applied by the plugin is that a theme can anticipate it and write styles specifically for it. I'm mostly thinking of cases where a theme's styles clobber basic elements like <p>, <blockquote>, <ul>`, etc.
The wrapper won't stop the galley designer from adding their own wrapper if they'd like. Yes, it's another layer of div nesting. But this has few downsides.