I am going a little crazy trying to figure out how to convert a relatively simple notebook to html or pdf with the images included. I'm catching flak from my prof for using Jupyter instead of Powerpoint because my images didn't show up on another computer. :flushed:
nbconvert --to html my_notebook.ipynb works great on my machine, but the file doesn't transfer to others nicely. I can't find any straightforward guide as to why this is.
I have markdown cells with the images in question and they are referenced as follows:
<img src="./Images/My_image.png" width="800" height="800" alt="Alt_name" title="Mytitle" align="center" />
and they don't show up when converting to html or pdf once the file leaves the original dir. When using markdown syntax, images show up, but I have much less control over the format of the images, which is why I am using HTML. Any ideas? My bet is that this is a solved issue, just there is not easily find-able documentation of it.
Edit:
Using
nbconvert==4.2
jupyter-client==4.2.2
jupyter-console==4.1.1
jupyter-core==4.1.0
If you reference external files, you'll need to copy those files around with the notebook or HTML export. Images should get embedded into PDF, but I'm not sure if the conversion via pandoc handles HTML image tags like the ones you're using.
It would be possible to read in the images when converting to HTML and embed them as data urls, but we don't currently have any code to do this.
Hmm. I guess this was an unexpected answer, but thank you for the response!
Seeing as Jupyter is being advertised to the data/scientific community, and students like myself will want to use it to share presentations, notebooks, data etc both statically and dynamically (acheivable via servers, etc). I include a lot of .png in my notebooks for communicating previous work, in addition to the figures that I whip-up via something like matplotlib.
I guess I don't know what would be a good way forward on this. To me it seems like a self-contained file would be ideal for sending via email. To others, I would imagine sending a zip with figures is just as easy, and still others, serving the notebook would be the solution.
All seem valid. I guess I'd advocate for some sort of self-contained, offline solution. This would be solved via .pdf, if pandoc could handle the HTML tags. Or Jupyter's markdown syntax was a bit more versatile with image manipulation.
When you say
It would be possible to read in the images when converting to HTML and embed them as data urls, but we don't currently have any code to do this.
How easy would that be for a amateur to contribute to? Is this a pre/post processor job? I've looked into the templates functionality offered by nbconvert, but it feels a little overwhelming, as I know as much HTML as i've shown :disappointed_relieved:
I agree that it would be good to have an option to export to HTML with embedded images.
The code itself should be simple enough to write, with a bit of learning about the HTML parser interface and data urls. See the citations filter for an example that parses HTML in Markdown cells to replace bits.
The trickier thing is working out exactly where to plug it into the nbconvert API. It could be done as any of a preprocessor, custom exporter, or postprocessor. I'll keep thinking about that...
So i took citations.py and modified it so that it will take some source string and replace a file name in an image tag to the base64 encoding. I figure the gist here a good first step. I'd like to extend the img tag parser to do something similar to what @randy3k did here to control image size so that everyone can have size control when converting to .tex and subsequently .pdf if this eventually gets pulled into nbconvert.
The gist outputs the correct base64 string (checked it by copy and paste into this online converter), but when I was checking it in a Jupyter Markdown cell with the HTML tag, it can't show the image, nor any other base64 image. Can anyone confirm that Jupyter can display data URIs encoded in base64? I can't get it to.
EDIT: Corrected links
I think we strip some possible HTML when rendering markdown cells. Maybe data urls are something that get stripped. @minrk, is that plausible? If so, it shouldn't be a problem when nbconverting to HTML.
I think the gist links you gave are the URL for embedding the gist - here's the link for the nice view of it: https://gist.github.com/soamaven/4de1727f76790b574342bd6231402843
data URIs are allowed in markdown cells, but I think they were not for some early versions after we added sanitization. What version of the notebook @soamaven (Help > About in the notebook)?
@takluyver Thanks for letting me know. I updated the links.
Notebook version ==4.2.1
The data uri don't show up as images either in notebook view or after converting to static html. I created a repo in case anyone wanted to see what was going on. My money is that I am doing something wrong.... :neutral_face: but the base64 encoding seems correct when manually copying it to a decoder as mentioned before.
I won the bet that I was doing something wrong.
Apparently using base64.b64encode() gives the correct output while base64.urlsafe_b64encode does not. I would not have guess this based on what the Python docs describe, but perhaps I would have known if I knew more about HTML. Other odd thing is that the gist has this correct.
So the Markdown Cells won't display the base64 in the notebook when evaluating the MD cell, but the images are properly displayed when using jupyter nbconvert --to html mynotebook.ipynb and also within the notebook when using the IPython function IPython.display(Markdown()). This behavior should be reflected in the repo
EDIT Accidentally closed the issue, reopened
Aha! Yes, I think data URLs do not need the url-safe version of base64, confusingly.
Thanks for taking the time to put it into a repo.
@Carreau, you're good at helping people make extensions. Do you think this would make sense as a preprocessor? Should we develop some hooks for plugging in external preprocessors like we can have external exporters?
Probably. @michaelpacer is starting halftime today, that might be one of the first tasks we can try to investigate.
Welcome @michaelpacer :-)
@Carreau @takluyver Is it more conventional to make notes about the source repo as comments here(in this issue on nbconvert) or as issues on @soamaven's repo? Or is it contextually dependent?
e.g., I just ran into the point that in the .ipynb file it was expecting a py35 rather than Python 2 or Python 3 as I have it set up on my system. This is just a naming issue I know, and my instinct was to ask that question as an issue on the repo, not in comments here, but I wanted to check in about what the common policy is (given that I sense that the nbconvert_data_uri repo is intended to be a mechanism for developing a test case for whether this feature works within the nbconvert repo rather than a repo intended to be distributed and developed independently).
So, I just tried running this locally and discovered that I cannot reproduce the original bug, see:

@soamaven Was this something that was pushed to the notebook since this issue was raised? I don't trust my ability to make the extension and know that I've covered your use case if I cannot reproduce the original error you were trying to address.
@michaelpacer this is odd. Is this behavior from the .html produced by executing jupyter nbconvert --to html img2base64.ipynb or the .ipynb? If .ipynb, the image it shouldn't show up as mentioned before due to sanitization. This behavior is consistent on my machine shown below. (Here I changed the image alignment to the left so that it would not be hidden by the horizontal scroll if it appears, however, it doesn't appear as expected. Also, I have the auto-sectioning extension enabled, explaining the numbering discrepancy)

However, after jupyter nbconvert --to html img2base64.ipynb the data uri shows properly as below (aligned-left)

I think we could still put together an extension for embedding images into converted notebooks. I am not sure why I cannot view the fourth markdown cell's image but you can however.
I am using:
Chrome Version 51.0.2704.106 (64-bit) on Fedora 23 to view the notebook.
jupyter-client==4.3.0
jupyter-core==4.1.0
notebook==4.2.1
EDIT:I misread the comment referenced. Apparently data uri's should be showing in my notebook upon executing the MD cell, but are not, even though I am using a newer version of notebook. I have tried on both Firefox v47.0 and Chrome with the same results on my machine.
That was the behaviour of the notebook itself (which I thought was your concern).
So i think i was working on
jupyter-core==4.1.0
notebook==5.0.0dev
And I think that jupyter-client is what you get when you run jupyter kernelspec --version…I can't figure out where to get that otherwise, but if that is the case, mine is also 4.3.0.
Does it work on your system if you upgrade to the dev version of the notebook (you have to build it from github, instructions for a dev build can be found here: https://github.com/jupyter/notebook/blob/master/CONTRIBUTING.rst)
And sorry about the unclear comment, it was poorly worded.
I meant, "I'm running on the most recent version of the notebook (i.e., the one that hasn't been released). Has something been pushed to the 5.0.0dev version that happens to fix this problem?"
I installed notebook from source, but the version is 4.2.1, not 5.0.0dev... what repo are you pulling from?
Edit: If you are refering to ipython, that is now version 5.0.0 for me
Also, while I appreciate your help @michaelpacer to resolve this issue, it is separate from the one that I opened this #328 for, which was help to create an extension for embedding images .html converted notebooks. Should we open a separate issue for the markdown cells not showing dataurl over at notebook?
You also can take a look at the nbconvert postprocessor for embedding images in HTML over at ipython-contrib:
https://github.com/ipython-contrib/jupyter_contrib_nbextensions/blob/master/src/jupyter_contrib_nbextensions/nbconvert_support/post_embedhtml.py
It only recognizes <img> tags in markdown for now.
@michaelpacer I have some more information. Apparently i was silently prepending a call to my python2.7.12 environment in my path, and so it looked there there every time I used 'jupyter notebook'. I have now fixed this.
I can see the dataurl images in my python3.5 environment now :grinning:
I cannot see them still when running my python2.7.12 environment, however. Is this expected behavior?
EDIT: Clarity
@juhasch Thanks for the link! It looks like what I was trying to write, but better put together. This was a bit too hard to find though, I would vote it gets included into nbconvert as a cmd line template? Or referenced in either the documents of nbconvert or the extensions docs/wiki? It seems that post processors are pretty powerful, it would be nice to have some more information about what is available.
Also, I have reread the nbconvert docs trying to figure out exactly how to use such a post processor... sorry, I am admittedly an amateur, but its pretty cryptic.
Do we think we can add support/postprocessor for image re-sizeing when converting to latex/pdf, ala "what @randy3k did here to control image size ?" I'll look into this...
Yes i think this is a different issue I'll make one now, though it may belong in the notebook not the nbconvert repo.
Ok — I've managed to reproduce that error in python 2 in the notebook v 4.2.1. It displays in both python 2 and python 3 in notebook v 5.0.0dev. I won't create a new issue on the notebook repo, but that's where I would have done it.
now that the display thing is dealt with onto the next steps :)
@soamaven Sidenote: If you build from the master branch of jupyter notebook it should have version 5.0.0dev.
@minrk @takluyver @Carreau Is this something that would be included in the 5.0 or since it's an extension would it be better to add afterward?
Hi all, this conversation is long, so I might not have got everything.
Note that in the current version of the notebook (master); you can now drag and drop images, that will be stored in the notebook metadata field. A reference to this image is automatically inserted like so: [filename](attachment:<a key>)
(Side note, i'm unsure why we use attachment: as the common mark "spec" says  and attachment: is not an url, but attachement:// is. Plus we don't escape the filename, that shoudl not include space. so a file named ])foo.png will break)
So now we need to make sure that nbconvert support the attachments, and render them correctly as exported html and PDF.
Would that suits everyone ?
Just as a heads up the multimarkdown spec has a way to add attributes and metadata if you use reference style markup, see this, though @jgm already knew that (see https://github.com/jgm/pandoc/issues/261) so it probably was rejected from commonmark for a reason.
@michaelpacer That syntax seems simple and useful, +1. ALso, I got 5.0.0dev installed thanks!
So now we need to make sure that nbconvert support the attachments, and render them correctly as exported html and PDF.
@Carreau That is a beautiful summary. The only thing I would add is something along the lines of:
.html and PDF*_Alternately_, have the docs point readers to the postprocessor included in nbextentions would be an appropriate solution. Even I could attempt to make these changes to the docs literature.
I make my case for packaging this below, though I hate nitpicking to you wonderful people who make Jupyter so awesome. Thank you!
TLDR: Why Default? It's too difficult to share a self-contained _report-style_ file with html referenced images formatted via attributes.
_Why package embedded images with nbconvert?_: Jupyter is being advertised to scientific and computation users. The first adopters have so far been, from what I can tell, users proficient in coding. However, the user base has been gaining popularity among MATLAB and Mathematica converts, looking for F.O.S. solutions. Being able to quickly and easily share entire notebooks with results in a _report style_ (self-contained files with formatting) will be key. For example, this issue arose when I wanted to share my work with my less savvy adviser, who unfortunately balks at a report that contains multiple files. nbconvert obviously has this goal in mind, and I think a quicker route to embed images with metadata taken into account in .html files advances this goal. I'm surprised more users haven't been looking for this, since using html is currently the best way to give attributes to images in MD cells. Just my 2 cents.
@soamaven I think that sounds really good, though I would not make it the default behavior for HTML export. Embedding images in HTML files allows for easy sharing via email, but it doesn't result in performant loading, as all the binary data must be loaded before there's a page to look at, and you don't get the benefit of browser caching behavior, etc. I do think it should be _simple_, though, as easy as:
nbconvert --to html-single-file notebook.ipynb
for ensuring that everything's encapsulated.
That said, we do bundle images and things in the notebooks themselves, for exactly the sharing purpose you describe. There are a lot of downsides to this, though, and we are considering (at least optionally) splitting notebooks into source-only files and output bundles. The downsides of bundling the images in the notebook are the reasons why I want to _not_ include them by default in nbconvert output, but the precedence of bundling in the notebooks themselves makes a decent case for doing it by default. Whichever way we go, we should make the other as simple as possible (i.e. not requiring --ExtractOutputPreprocessor.enabled=True).
@minrk I'll agree with everything you say above. Favoring performance by default is a better cause, but ease of encapsulation needs to be enhanced.
@juhasch I can't figure out how to use the post processor you've linked from CLI. I've tried:
jupyter nbconvert --to html my_nb.ipynb --post /path/to/post_embeddhtml.py
I guess I need to import it's functionality since there is no main functionality and do it from a python terminal, correct?EDIT: Also, it looks like this post-processor uses regex to parse the img tag, whereas, the nb in the repo I've linked above uses the HTMLparsing library, what would be the virtues of one versus the other? I should more thoroughly test my proposed parser, but I've grown up wary of regex.
The postprocessor needs to be in the Python path, so it can be imported.
Using the HTMLparsing library sounds like a good idea. I used a regex because I did not know better.
Okay. Added to path via:
$ export PYTHONPATH=${PYTHONPATH}:/home/user/.local/share/jupyter/extensions
nbconvert seems to find it now, but now I get the error
$ jupyter nbconvert --to html my_nb.ipynb --post post_embedhtml
[NbConvertApp] CRITICAL | Bad config encountered during initialization:
[NbConvertApp] CRITICAL | The 'postprocessor_factory' trait of a NbConvertApp instance must be a subclass of 'builtins.object' or None, but a value of class 'module' (i.e. <module 'post_embedhtml' from '/home/user/.local/share/jupyter/extensions/post_embedhtml.py'>) was specified.
I'm missing something here still :confused:
Looks like the API has changed. Don't know.
Maybe you should open an issue :smirk:
I've updated @michaelpacer on the internal of nbconvert during a long discussion. I think he now get the gist of how nbconvert is supposed to work, and will likely be able to take care of digging more into that soon.
One of the thing we're going to do is have nbconvert itself register his own exporter using entry points, to decouple a bit things and show the good examples.
Okay. Added to path via:
$ export PYTHONPATH=${PYTHONPATH}:/home/user/.local/share/jupyter/extensions
nbconvert seems to find it now, but now I get the error
And no you need to pass a fully qualified name not a path, your think must be an importable class and you say --post mypackage.subpackage.ClassName and the case is important, module have to be lowercase, classname have to be uppse case.
Ah, exactly. Thanks @Carreau. This works:
jupyter nbconvert --to html --post post_embedhtml.EmbedPostProcessor Untitled.ipynb
However, the API has changed. You need to change the line with export_format:
def postprocess(self, input):
if self.config.NbConvertApp.export_format == "html":
I updated the HTMLParser-based repo. Should handle svg and pdf similarly/properly now (though maybe not efficiently).
Okay @juhasch I made the change above, and use the command above. It definitely runs and 'processes' the notebook in question but I hate to say it, but PostProcessor doesn't convert and embed the img tags :cold_sweat:, it just silently fails to convert anything. Tried a bit to debug why, but the code is a little out of my league. Can you confirm it still works for you with nbconvert version 4.2.0? I'd like to time it and test it against the HTMLParser code for edification.
Hi @soamaven . It works with nbconvert 4.2.0 for me (with the above fix):
(root) juhasch@linbox ~/Dokumente/notebooks $ jupyter nbconvert --to html --post jupyter_contrib_nbextensions.nbconvert_support.post_embedhtml.EmbedPostProcessor Untitled.ipynb
[NbConvertApp] Converting notebook Untitled.ipynb to html
[NbConvertApp] Writing 248615 bytes to Untitled.html
[NbConvertApp] embedding url: test.png, format: png
Example notebook:
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"<img src=\"test.png\" />"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
This is a short summary of steps needed, as I found the code referenced by soamaven on 8 Aug as incomplete.
First, save the attached code into your nbconvert installation as post_embedhtml.py (in my case on Ubuntu into /usr/local/lib/python2.7/dist-packages/nbconvert/postprocessors/post_embedhtml.py)
post_embedhtml.py.zip
Then edit postprocessors module init.py to include the new postprocessor by inserting
from .post_embedhtml import EmbedPostProcessor
Finally, you can convert your notebooks using:
jupyter nbconvert --to html --post nbconvert.postprocessors.EmbedPostProcessor Seminar01b.ipynb
If you install the jupyter_contrib_extensions, it should be easier now. I added a custom exporter for nbconvert using entrypoints, that can be called using:
nbconvert --to html_embed mynotebook.ipynb
@juhasch your solution is amazing, this is exactly what I had wanted to do and was not able to figure out entrypoints. Thank you!
Any idea how one can use this in conjunction with --to slides for a stand alone reveal.js presentation? I'll close this issue if so.
Hi everyone,
this issue is a bit long and confusing. I think I have a problem related to this, but I'm not 100% sure, so bear with me (and please tell me if I should just open a new issue).
I believe the issue here is not (or not only) about image embedding, but more about HTML image tag parsing within markdown cells. The summary above by @Carreau gets close to it, I think.
Let me explain what my problem is: I am preparing slides using a notebook and want to include various images. To be able to control size, I use HTML syntax directly, i.e. <img src=URL>.
As i try to convert the notebook with
jupyter-nbconvert --to slides slides.ipynb --reveal-prefix=reveal.js
I obtain a functioning HTML document, but some images do not show up and the HTML code is shown instead. Inspecting the HTML file, it's easy to spot the problematic piece of code, as what in the notebook was
<img src="./graphics/githubSF.png" alt="test" width=600>
becomes
<img src="./graphics/githubSF.png" alt="test" width=600>.
To give a bit more context with images, the notebook cell

is rendered as follows in the HTML file:

Some additional info:
jupyter==1.0.0
jupyter-client==4.3.0
jupyter-console==4.1.1
jupyter-contrib-core==0.3.0
jupyter-core==4.1.0
jupyter-nbextensions-configurator==0.2.2
A quick follow-up: my problem is ascribable to mistune: downgrading to mistune version 0.7.2 the problem disappears.
EDIT: Looks like it's related to this issue on mistune.
I was having the same problem of my HTML image tags showing up as text in my export when selecting "save as HTML with toc" from file, any every other HTML export method (including nbconvert --to html).
Downgrading mistune fixed it for me to.
Hello everyone,
I confirm that the problem is caused by a change in how mistune parses html code and attibutes, as specified here. The workaround for version 0.7.3 of mistune is to put quotes around all html attributes, e.g. <img src="http://this.com/that.jpg" width="300">.
This is now fixed on the mistune master.
Thanks @teoguso
With mistune 0.7.4 and nbconvert 5.1.1, I am still having a similar problem with tag parsing using nbconvert to HTML.
After some testing I've found it's failing with spaces around the equals sign in HTML attributes:


I mentioned this on lepture/mistune#81 as well.
Given that we are 1 year after the issue was originally raised and having tested the that nbconverted htmls with referenced images works in master (unless you hit upstream issue described by danzimmerman) , @mpacer what do you thing about closing this one?
OK, I will close this one. Feel free to re-open if you disagree.
Issue is closed, so I'm sure it's a PEBKAC, but I am having the exact issues OP is describing running:
4.2.15.2.13.6.1I wonder if someone could shed some light on what steps to undertake to fix this in the discussion on StackOverflow? Many thanks.
Solution found on SO -- needed doublequotes around all HTML attributes, even width. Thanks @mpacer, you're a goddamn hero!
How can we make a button/or add e menu entry Download -> "HTML (Embedded)"
I made a button extension (similar to the hide_all extension) which does something like that
but does not work because the html_embed is not recognized:
var load_ipython_extension = function() {
Jupyter.toolbar.add_buttons_group([{
id : 'export_embedded',
label : 'Embedded HTML Export',
icon : '+',
callback : function() {
Jupyter.menubar._nbconvert('html_embed', true);
}
}]);
if (Jupyter.notebook !== undefined && Jupyter.notebook._fully_loaded) {
// notebook_loaded.Notebook event has already happened
initialize();
}
events.on('notebook_loaded.Notebook', initialize);
};
[W 19:29:41.330 NotebookApp] 404 GET /nbconvert/html_embed/EmbedImages/Test.ipynb?download=true (::1): No exporter for format: html_embed
I think that thing should be done somehow in
notebook/notebook/nbconvert/handlers.py
@gabyx I think your comments should probably be an issue in notebook rather than nbconvert given that the code you're running into is there (not in nbconvert). My guess is what will need to change is the js object not the py endpoints(if the issue has to do with Jupyter.menubar._nbconvert having an improper target)
Thanks I will open an issue =)
Just identified a hack that might be of help.
If you generate the image through python code like bellow it works:
import matplotlib.pyplot as plt
img = plt.imread('<your_image_path>')
plt.imshow(img)
Most helpful comment
If you install the
jupyter_contrib_extensions, it should be easier now. I added a custom exporter for nbconvert using entrypoints, that can be called using:nbconvert --to html_embed mynotebook.ipynb