If you're interested please comment here and come join our "Contributors" community channel on our daily build server, where you can discuss questions with community members and the Mattermost core team.
We're looking to add previews for Word, Excel and PowerPoint files. Given PDF previewer has been implemented, best approach is to convert Microsoft Office files (Word, Excel, PowerPoint) uploaded to the server to PDFs for previewing on the client.
Key thing is finding a library that would suite that purpose.
Hello,
Would you be interested in ODF support as well? WebODF could be used for that.
You guys open to using a tool like this and embedding it into Mattermost? I guess the draw back is it requires these docs to be publicly accessible, which I'm guessing they're not by default.
https://blogs.office.com/en-us/2013/04/10/office-web-viewer-view-office-documents-in-a-browser/
https://products.office.com/en-us/office-online/view-office-documents-online
libreoffice has a headless mode and is capable of generating more or less accurate pdf exports and even bitmap previews:
$聽libreoffice --headless --convert-to pdf:writer_pdf_Export test.docx
convert /tmp/convert-test/test.docx -> /tmp/convert-test/test.pdf using filter : writer_pdf_Export
$ libreoffice --headless --convert-to png test.docx
convert /tmp/convert-test/test.docx -> /tmp/convert-test/test.png using filter : writer_png_Export
Setup for doing it this way would be not trivial - probably you would need to build a separate system just to handle document conversion.
How about using an OnlyOffice/LibreOffice document server for this?
@silver-gw Would it be a separate system/server that handles document conversion, or would the server be able to preview the Office files itself?
@jasonblais To keep things separate and clean, we deployed doc-server in a separate server and it runs in a docker. In our company, we use NextCloud for document sharing and it has a plugin that talks to our OnlyOffice doc-server but the only parameter passed for this connection is the secure URL for the document server. Apart from that, I cant recall settings anything else for connecting the 2 servers. Office documents opened in NextCloud have the NextCloud header and the URL so I am guessing it should be practical to open/preview docs on Mattermost in the pop-up window, without leaving Mattermost.
@jasonblais Sorry, not sure if I answered your original question. It actually opens xlsx, docx, pptx, ppsx, csv directly and without any conversion. It does however convert office files with old extension (xls to xlsx, doc to docx) prior to opening them.
@silver-gw That's very interesting. Wondering if this would be a good solution via our plugins framework. Basically an OnlyOffice plugin for Mattermost.
Thoughts?
@jasonblais Would be hassle free and awesome! and their server API concept looks pretty straight forward. https://api.onlyoffice.com/editors/basic
@silver-gw Great! Sorry for the delay in following up, I've posted to our Developer Toolkit channel to hear if there are any technical concerns with this approach. If not, we'll likely open a help wanted ticket for this plugin :)
Would there be any interest from community helping with that plugin?
This feature would be great. What is the current status?
This feature can be implemented with the help of the plugin framework. If the plugin framework does not yet allow for custom file/mime type-specific previews, we should enable that functionality. It can also be done on the server, in Go.
@levb , we are actually in the midst of building this functionality as a plugin using gotenberg. Generating the preview PDF is pretty straightforward within the MM Plugin Framework using server side APIs/hooks. However, we have not found a feasible means to allow the user to view the preview via plugins.
Ideally, it would behave just like the PDF preview, where the use can click the file attachment and get a modal windows to view the document. Unfortunately, today the plugin framework does not provide the ability to hook the preview event or override the preview component.
You suggest above that you would be willing to enable "custom file/mime type-specific previews". Would it be possible to enhance the plugin framework to allow plugins to customize previews for docs like pptx, docx, xlsx, etc.
Another option is implement built preview handling for any file that has a PDF preview image set in it's fileinfo. That way a server side plugin could be used to generate the PDF preview and update the fileinfo in whatever way makes sense. Then the webapp would just look for a PDF preview in the fileinfo and display it if it exists.
Thoughts?
@crspeller @jespino opinions?
@levb @mkdbns Both a client side hook for custom previews and supporting alternate filetypes for previews sound good. The client side hook might be the more general solution but could be harder on plugin authors, while the alternate filetypes keeps everything server side. I would lean towards the alternate filetypes solution but I am 1/5.
Thanks @crspeller , I would also favor a pure server-side plugin requirement as well so that mobile apps could also take advantage of the previews. @levb @crspeller , what would the next steps on this be?
@mkdbns If you want to take it, I can remove the up for grabs label and you can get started. Sound good?
@crspeller , yes, we can make the updates for the front-end handling of the preview. However, the backend plugin that we'll develop is being sponsored by one of your enterprise customers. While they are not opposed to opensourcing it at some point, it will take time for them to work through the legal approvals, etc. I'm only clarifying as what we deliver in the near term may not fully satisfy this issue.
@mkdbns As long as the changes are independently useful that's fine.
@crspeller - Sounds good. We'll get started on this. Thanks!
@crspeller @mkdbns Continuing our conversation from the call, we will be going forward with creating a webapp component to override file previews. This will work as follows:
file_info
and post
. It takes 2 arguements, _handler_ and _component_: registerFilePreviewComponent(handler, component)
file_info
and post
as arguments and returns _true_ if file preview should be overridden for this file and _false_ if we need to fall back to the default file preview options available in the webapp for this file.handler
function returns _true_, the component is rendered with file_info
and post
passed as props. (cc: @levb)
I think that this custom preview capability is useful, but I thought we decided to focus on a server-side convert-to-PDF? Am I confused?
The client side hook might be the more general solution but could be harder on plugin authors, while the alternate filetypes keeps everything server side. I would lean towards the alternate filetypes solution but I am 1/5.
@chengkun Sounds like a good direction.
@levb We had discussed this approach with @crspeller over a call and decided on it to make it to v5.10 release.
P.S. @crspeller It seems you have tagged someone else.
Why not extract the preview image delivered with the docx format?
I'm not sure of licensing implications, but wouldn't ImageMagick handle most of the cases (docx, xlsx, pdf, odt, others) ? Also, there seems to be available a build of IM to WASM, which could perhaps do the work client-side (lazily loaded). I briefly tested on the command line and IM was able to generate previews of first-pages for docx and xlsx. It seems to use a filter to convert to PDF first.
https://github.com/KnicKnic/WASM-ImageMagick~~
Depends on soffice
, see below.
New considerations born out of talking with @crspeller
ImageMagick could be used as a sub-process through the ImageMagick uses convert
command. I believe linking would only be desired if done statically, and that would complicate the build process quite a bit. I imagine hiding this behind a feature flag to not be desirable.soffice
(LibreOffice).
As conversion can be somewhat costly, the work could be queued. I don't know if there are mechanisms for queuing and dispatching of such tasks at the moment. If the command fails, the result could be graceful degradation to a standard thumbnail, if there isn't already one.
Regarding preview before submitting. Doing so is nice in my opinion, from a UX perspective, as it might warn the user of a wrongly chosen file. If this was done through an API endpoint, there would be a few implications.
2.i still exists on server-side conversion, but 2.ii could be mitigated by doing the conversion client-side, if ImageMagick as WASM works out and this whole idea is accepted legally.
I agree, though, that it should probably left for a second moment and another ticket.
EDIT: current best idea of mine is using soffice
directly (sub process) + GraphicsMagick, for server-side thumbnail generation. GraphicsMagick could possibly be substituted by some Go library that renders PDF, as that is the output of soffice
.
One idea is creating a project that delivers a Docker image with LibreOffice, GraphicsMagick or something of the kind, and a service for making the conversion. LibreOffice can be used remotely through a socket, but I don't know how practical that is. Headless conversion for PDF seems practical enough.
The whole solution is not elegant, but has the benefit of handling multiple formats through a commonly used software.
@chetanyakan I'm wondering if you'd be open to sharing the approach you took to implement a preview plugin? Would be interesting to see how that compares against the above proposal from @krjn
It seems to me, after I read the conversation again, that the idea I thought I had is the same as previously proposed, and using https://github.com/thecodingmachine/gotenberg is probably a better idea.
Why not extract the preview image delivered with the docx format?
@kaystrobach I could not find any information after briefly searching, either in the docx spec, an actual docx or of general practice of embedding previews in files.
@jasonblais sure.
The solution we have implemented relies on Gotenberg to convert office documents to PDF.
The gotenberg server is hosted separately from mattermost.
Gotenberg provides a Docker image that can be scaled based on your requirements. The documentation is available here.
The code for PDF generation and rendering the document is in a plugin. Thus, it will only work on webapp and a browser on mobile but not on the RN mobile apps.
In the plugin system console settings page, we require the URL of the Gotenberg server form the system admin.
On the plugin server side, we use the MessageHasBeenPosted
hook to check if the post has any files attached. If so, we generate the PDFs for these files by making an API request to Gotenberg server.
The entire preview generation works concurrently using goroutines. Also, since Gotenberg uses unoconv
, it can only convert one office document to PDF at a time and queueing is handled internally by Gotenberg.
Then, we use the post props to set the current state of the preview:
On the Webapp side, we use the registerFilePreviewComponent
to override file previews for all files with a supported extension.
For rendering the PDF on client side, the PDFJS
library is used. The reference implementation is available in the mattermost-webapp: components/pdf_preview.jsx.
Based on the availability of the preview, we show the loading spinner or error messages.
@chetanyakan
Regarding this option to use the Gotenberg Docker Image to convert files to a PDF have you already created the plugin that you could share in order to configure this in our environment?
We would be keen to try it out and review the performance etc. or if you have any documentation regarding how to set it up again we would quite like to take a look.
I'll take this one.
Some initial thoughts,
1-
I think, mobile client can launch native apps to view the file. This is much mature in the mobile environment and PDFs are already treated that way.
2-
For the web client, previewing the file without leaving Mattermost is a valid need. It seems that PDF is the luckiest file format on web, there are multiple packages already that prints PDF on a webpage, which Mattermost Web App already use one of them. Unlike PDF, Office files aren't that lucky, I'm struggling to find an open source viewer.
As an outcome of 1- & 2-, solving this issue at the backend by converting Office files to PDFs seems to be the best option to me.
As said, this time we struggle to find a package to convert Office documents to PDFs, there aren't many of them. In fact, I struggle to find an "importable" package in any language. There are server based solutions that we can benefit from but their API won't satisfy Mattermost's Plugin Client API, so we cannot directly use them as plugins.
We can create a plugin to interact with a server but the question is, how to ship and run that server within Mattermost's production build? It can be documented as a dependency to run Mattermost but, I'd like to avoid adding another dependency to the installation.
Any ideas on this? /cc @grundleborg @lieut-data @jespino @jasonblais
Thanks @ilgooz, great insights! Regarding the conversion, wondering if the previous post about using the Gothenberg library may be an option? https://github.com/mattermost/mattermost-server/issues/4300#issuecomment-504996461
Adding @aaronrothschild as well who has been exploring integrations with Microsoft, as well as new options for shipping plugins without it being a pre-packaged binary/dependency
@jasonblais Absolutely, Gothenberg seems very promising.
My question is the same, how we would ship this server (in this case Gothenberg in a Docker container) with Mattermost's production build?
As I'm aware of, our only chance is shipping plugins as executables and it's not possible to a- ship servers that does not satisfy Plugin Client API and b- ship servers runs inside containers.
@aaronrothschild Hey, any ideas?
After we figure this out, as already recommended, we can just create another plugin to interact with the server and use hooks to detect Office files that needs to be converted PDF for previewing and update FileInfo
model by adding a PDF preview path for needed files.
@ilgooz
how we would ship this server (in this case Gothenberg in a Docker container) with Mattermost's production build?
Instead of shipping the Gothenberg server with Mattermost, we could have instructions on how to set the server up separately. We did something similar for Elasticsearch https://docs.mattermost.com/deployment/elasticsearch.html#setting-up-an-elasticsearch-server
The configurable Gotenburg endpoint seems like a pretty neat solution. It's obviously a bit more involved that a typical plugin deployment, but seems like a good combination of tools.
My specific interest is in the area of how to use/extend the API to facilitate interactions:
After we figure this out, as already recommended, we can just create another plugin to interact with the server and use hooks to detect Office files that needs to be converted PDF for previewing and update FileInfo model by adding a PDF preview path for needed files.
I'm hesitant to go down the hooks approach. This will preclude installing the plugin and previewing files created before the plugin was installed and requires ongoing previews to be generated and stored.
Would it make sense to leverage the existing registerFilePreviewComponent
and write a custom component that routes this file through the server-side plugin to the Gotenberg endpoint and caches for reuse by other clients?
We should get @aaronrothschild involved here. In our discussions, the preview was relevant in 2 use-cases - one is an attachment/link in a post, but the other use case was "Search/browse". If we develop a new component for the previews, it should be designed to support both use cases.
馃憤 to a configurable URL to an external file->PDF service. We do it for the Antivirus plugin as well.
@lieut-data
Would it make sense to leverage the existing registerFilePreviewComponent and write a custom component that routes this file through the server-side plugin to the Gotenberg endpoint and caches for reuse by other clients?
Yes, this approach feels neat as well. I'm in favor of splitting logic into smaller parts, plugins seems a way to do that. Also, Office to PDF thing is a workaround at the end, having this thing completely isolated into a plugin seems like to a cleaner approach.
I just need to check how to store persistent data (PDFs) within plugins, also trying to figure out how do you manage some "core" plugins, the ones that developed internally and - assumed that - shipped with the production build. Because we may want to install & activate this plugin by default.
@levb I'll check how these done by the Antivirus plugin, thanks.
@ilgooz, agree that it would be nice to have it all bundled together into a single binary. No idea if that's tractable.
I wouldn't worry about the distinction between "core" plugins and other plugins. We're actively trying to minimize the differences right now, and this plugin wouldn't likely be to be enabled by default in the near term.
Storing persistent files might not be "neat" right now. Ideally, we'd just write them back to the file store, but I think our current model requires associating them with a post which might be awkward. Some investigation required here.
Hey @ilgooz thanks for taking this on, I think this will benefit anyone who use Office and Mattermost....which is a lot of people :)
1-
I think, mobile client can launch native apps to view the file. This is much mature in the mobile environment and PDFs are already treated that way.
Yes, I think this is an appropriate path to take. There are non-open options available via MS such as : https://docs.microsoft.com/en-us/officeonlineserver/office-online-server-overview but that is fairly heavy and very closed option. Many companies do not use this server it seems.
2-
For the web client, previewing the file without leaving Mattermost is a valid need. It seems that PDF is the luckiest file format on web, there are multiple packages already that prints PDF on a webpage, which Mattermost Web App already use one of them. Unlike PDF, Office files aren't that lucky, I'm struggling to find an open source viewer.
Yes, they can be difficult to work with, but there are JS libraries out there to help: https://stackoverflow.com/questions/27957766/how-do-i-render-a-word-document-doc-docx-in-the-browser-using-javascript
And that is just for Doc files... so there is considerable complexity to interpreting Office files.
As an outcome of 1- & 2-, solving this issue at the backend by converting Office files to PDFs seems to be the best option to me.
As said, this time we struggle to find a package to convert Office documents to PDFs, there aren't many of them. In fact, I struggle to find an "importable" package in any language. There are server based solutions that we can benefit from but their API won't satisfy Mattermost's Plugin Client API, so we cannot directly use them as plugins.
Agreed. I found the same.
We can create a plugin to interact with a server but the question is, how to ship and run that server within Mattermost's production build? It can be documented as a dependency to run Mattermost but, I'd like to avoid adding another dependency to the installation.
Agree with @jasonblais - We should ship instructions on how to get the PDF service running on a server (perhaps a docker container). This is how we support the Antivirus plugin which works with ClamAV server.
Let me know if you want to chat more about this, happy to setup a Zoom meeting.
@aaronrothschild thank you.
I drafted some code, everything works nicely so far, webapp is able to preview Office docs and I'm using PDFs under the hood as discussed. I went with Gotenberg to do the conversions which is running in a container fine.
All works on my local machine but one glitch I'm facing right now is on the mattermost-webapp
side. The thing is, I want to use existing PDFPreviewer
introduced in mattermost-webapp
instead of throwing a custom component out but webapp does not export this component to the public, thus I cannot use it in my plugin.
I'm looking at the webapp's plugin API and there is this initialize(registry, store)
func for plugins to register hooks and access to the global store but it doesn't expose components defined under mattermost-webapp/compoenents
dir in some way.
I had a quick discussions about this with @lieut-data and it seems that, at this point, webapp does not intent to expose any components to the public, maybe with an API like this: initialize(registry, compoenents, store)
.
I made a quick hack and exposed the PDFPreview to public as window.PDFPreview
and used it in my plugin by passing the original fileInfo
of Office file and the pdf preview url in fileUrl
as props to the component and it works seamlessly.
The other option, since we dont't have PDFPreview
to be available to plugins in production, I tried to copy every code related to it to my plugin but I had a few errors about workers that pdfjs
is starting and still trying to solve them.
The conclusions is, server side of the plugin works, I just need to perfect it a bit and apply authorisations rules. In the webapp side, a- we can either have PDFPreview
component to be available to plugins API or b- I can hard copy the component and make it work inside the plugin without errors. And I believe that we don't need to craft a custom component to do all of this, PDFPreview
seems like enough to accomplish what we need.
Waiting for your feedbacks about what way should we go, a or b.
_Here are some screenshots of the end results from my local machine while using window.PDFPreview
:_
_yo.docx:_
_the preview by using PDF:_
_downloading the original doc by clicking the _Download_ link in dialog:_
@ilgooz, that's very impressive!
@grundleborg, I'm wondering if you or your team could comment on some ideas for how to surface a "common component" such as PDFPreview
above. I realize this isn't exactly the scope of your current efforts, but I'm wondering if you have any thoughts to help guide the implementation here. I agree with @ilgooz that it doesn't seem to make a ton of sense to replicate the entire component plugin-side, especially when we want consistency on the footer and other widgets when we tweak the UX.
@ilgooz Boom! :)
I would like to point out that I know of a customer who implemented a similar solution (but cannot release it publicly) that took over a month or two of work for them to do what you've seemingly done. Great work, can't wait to try it out.
Of course, as a PM - I'm always going to ask for "one more thing" ;) but I suspect this is best filed as a separate PR: Make the PDF previewer larger so it is reasonable to read small text directly within mattermost. @asaadmahmood Thoughts? I was thinking that maybe we use the "full screen modal" component to present the PDF instead of the constrained PDFViewer...
@aaronrothschild I think we have the fullscreen modal planned for file previews, so I instead of doing this only for specific files, we should wait for the official improvement to avoid inconsistencies between different file previews.
@aaronrothschild Thanks a lot!
--
Adding the link to the initial version of PDF plugin: https://github.com/ilgooz/mattermost-plugin-topdf
I'd like to move the plugin under @mattermost if you guys would like to accept!
@lieut-data
Storing persistent files might not be "neat" right now. Ideally, we'd just write them back to the file store, but I think our current model requires associating them with a post which might be awkward. Some investigation required here.
I made the association with KV store
. What do you think about it?
One small thing to think about is there is no any locking mechanism used between uploading (caching) the PDF file and saving its id to KV store to associate with the original file.
This does not cause to a bug but in case of there are some convert requests happing at the same time for the same file, as a result, multiple uploads (caches) can be made instead of just one.
If we want to avoid this, we can implement a distributed lock with KV store as a Helper
and use it but I think this should be a separated issue.
And KV store might be a bit slow to accomplish this since it's not an in memory storage.
An external distributed lock service can be used too but I found having a distributed lock available to Plugins useful while I'm brainstorming.
I just hard copied it to the plugin and made it working with some small changes. But we can delete the whole thing in future when PDFPreview
is made available to wabapp's Plugin API.
I didn't implement support for client side request cancellation for this one, it seemed to me a good idea to continue anyway and convert the file to PDF and cache since it might be requested in future again.
--
make sure that you're not hitting this issue: https://github.com/mattermost/mattermost-server/issues/12939
I made the association with KV store. What do you think about it?
The KV store has a maximum Value
length of 8192
bytes, so it wouldn't be general purpose enough to store files. Our file store is generally agnostic to the files contained therein, but it's just our API that currently only associates files with posts. A bit hacky, but perhaps the bot could post a message to itself with the attached "file" and use that as a backing store?
some thoughts about distributed lock:
Can we just make the id deterministic, such as taking the hash of the input file? We still might have multiple uploads, but so long as the concurrency doesn't cause a problem, we wouldn't end up with duplicates.
about PDFPreview: I just hard copied it to the plugin and made it working with some small changes...
This sounds good to me!
about handleConvert:
I'll reference this when doing the code review.
@ilgooz, to facilitate code review, would you be able to follow the steps on https://developers.mattermost.com/extend/plugins/best-practices/#how-can-i-review-the-entire-code-base-of-a-plugin and effectively create a pull-request to master
for discussion? We can promote to the mattermost
organization afterwards.
@lieut-data
The KV store has a maximum Value length of 8192 bytes, so it wouldn't be general purpose enough to store files. Our file store is generally agnostic to the files contained therein, but it's just our API that currently only associates files with posts. A bit hacky, but perhaps the bot could post a message to itself with the attached "file" and use that as a backing store?
I think, I misunderstood your point about _association_ at the first place. -reading your prev post again and yes files might need to reference to posts but _UploadFile()_ API only requires references to channels-.
What I meant now is, I store the information about relationship between a fileID and it's PDF version's id in KV store, like this: pdf:fileID(original) = fileID(pdf)
.
And I used the actual _UploadFile()_ API (not KV store) to store files by uploading the PDF files to the same channels with original files.
Can we just make the id deterministic, such as taking the hash of the input file? We still might have multiple uploads, but so long as the concurrency doesn't cause a problem, we wouldn't end up with duplicates.
I'll look at this tomorrow to see if we can pre-set id while uploading files perhaps or somewhere else. The _id_ I get is randomly generated by the _UploadFile()_, which is not deterministic. I'm not sure if we need deterministic ids tho.
to facilitate code review, would you be able to follow the steps on https://developers.mattermost.com/extend/plugins/best-practices/#how-can-i-review-the-entire-code-base-of-a-plugin and effectively create a pull-request to master for discussion? We can promote to the mattermost organization afterwards.
Something like this? https://github.com/ilgooz/mattermost-plugin-topdf/pull/1 But I'm afraid that you won't be able to put review comments into this PR since you and other guys have no access to my repo.
Moving the discussion related to distributed locks to a chat thread: https://community.mattermost.com/core/pl/bb376sjsdbym8kj7nz7zrcos7r
Thanks @ilgooz! I'll take a look at ilgooz/mattermost-plugin-topdf#1 -- I do seem to be able to provide comments there, and will aim to get you feedback soon.
Hey, this plugin has been implemented and ready to be used. It lives at: https://github.com/ilgooz/mattermost-plugin-topdf
I have just merged https://github.com/ilgooz/mattermost-plugin-topdf/pull/1 PR to master which includes updates about most of the change request made by @lieut-data's first review.
Since I don't have further time to work on this plugin, please feel free to fork the plugin's repo and add remaining changes on top.
Also, feel free to clone and push the repo under @mattermost if you'd like to further maintain this plugin.
Thanks!
Thanks for all your contributions, @ilgooz!
@levb, I wonder if it makes sense to add this to the integrations team backlog as something to evaluate? Your call on next steps.
After updating mattermost version to latest, preview documents function which has implemented "https://github.com/ilgooz/mattermost-plugin-topdf" is no more working.
Can anyone please suggest, what's need to do in this case. Any tweak, please suggest.
Most helpful comment
Hello,
Would you be interested in ODF support as well? WebODF could be used for that.