Dompdf: No images in PDF from Wikipedia

Created on 4 Feb 2020 · 4Comments · Source: dompdf/dompdf

Hello guys. Please help :)
When making PDF from wikipedia (ru.wikipedia.org), no images are in PDF.
All described in here's Wiki and found by myself workarounds were of no use.
You can check my "app" code at https://github.com/OlegKorn/PHP-JS-html2pdf

Yesterday I managed to create a PDF WITH IMAGES, but I deleted it! The problem is in or in hrefs

I have apache2, php7.

question

Source

OlegKorn

All 4 comments

maybe the issue is because images on Wikipedia are wrapped inside a tags
this is the structure of an image

<a href="//commons.wikimedia.org/wiki/File:Guido_van_Rossum_OSCON_2006.jpg?uselang=ru" class="image"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/6/66/Guido_van_Rossum_OSCON_2006.jpg/220px-Guido_van_Rossum_OSCON_2006.jpg" decoding="async" width="220" height="330" class="thumbimage" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/6/66/Guido_van_Rossum_OSCON_2006.jpg/330px-Guido_van_Rossum_OSCON_2006.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/6/66/Guido_van_Rossum_OSCON_2006.jpg/440px-Guido_van_Rossum_OSCON_2006.jpg 2x" data-file-width="2336" data-file-height="3504"></a>

I tried to delete from $html these "//" at the beginning of src and other workarounds, but it doesnt work. You see there is a banch of images inside this code so it obscures dompdf

This is what I tried before passing $html to further dompdf controllers

$html = file_get_contents($url);
//preg_match_all('/\/Файл:/.*', $html, $o);
for ($i = 0; $i < 100; $i++) //or for ($i = 0; $i < count($o); $i++)
{
    //$html = str_replace('src="//', 'src="', $html);
    $html = str_replace('/wiki/Файл:', 'commons.wikimedia.org/wiki/Файл:', $html);
    $html = str_replace('/wiki/File:', 'commons.wikimedia.org/wiki/File:', $html);
    //$html = str_replace('"//upload', '"upload', $html);
    //$html = str_replace('//upload', 'upload', $html);
    $html = str_replace('"//commons', '"commons', $html);
    $html = str_replace('//commons', 'commons', $html);
}

OlegKorn on 4 Feb 2020

How are you loading the HTML into Dompdf? It's probably the protocol-relative URL (i.e. the missing "http:" from the front of the image. If you're using $dompdf->load_html() then the default protocol is going to be a file. There are ways around the issue. You could specify a base href in the header. Or you could load the HTML then call `$dompdf->set_protocol('http://')

bsweeney on 5 Feb 2020

👍1

How are you loading the HTML into Dompdf? It's probably the protocol-relative URL (i.e. the missing "http:" from the front of the image. If you're using $dompdf->load_html() then the default protocol is going to be a file. There are ways around the issue. You could specify a base href in the header. Or you could load the HTML then call `$dompdf->set_protocol('http://')

Hello, thanks, adding $dompdf->set_protocol('http://') worked. This is the snippet

$dompdf->loadHtml($html);
$dompdf->set_protocol('http://');
$dompdf->setPaper('A3', 'landscape');
$dompdf->render('test');
$output = $dompdf->output();

file_put_contents('/opt/lampp/htdocs/wikipdf.ru/test.pdf', $output);

I've created a PDF from https://ru.wikipedia.org/wiki/Python. But after so done the Chrome opened a new tab - http://wiki/%D0%A4%D0%B0%D0%B9%D0%BB:Python_3._The_standard_type_hierarchy.png "ERR_NAME_RESOLUTION_FAILED"

Anyway, the issue seems to be solved. Now my PDF has images

OlegKorn on 5 Feb 2020

Not sure what's going on with Chrome. Opened automatically or when you click a link. That looks like the link associated with one of the images. Those are all relative, so for them to work you may want to go ahead and assign the remainder of the URL parameters:

$dompdf->loadHtml($html);
$dompdf->setProtocol('http://');
$dompdf->setBaseHost('ru.wikipedia.org');
$dompdf->setBasePath('/');
$dompdf->setPaper('A3', 'landscape');
$dompdf->render('test');
$output = $dompdf->output();

file_put_contents('/opt/lampp/htdocs/wikipdf.ru/test.pdf', $output);

bsweeney on 5 Feb 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings