Hello guys. Please help :)
When making PDF from wikipedia (ru.wikipedia.org), no images are in PDF.
All described in here's Wiki and found by myself workarounds were of no use.
You can check my "app" code at https://github.com/OlegKorn/PHP-JS-html2pdf
Yesterday I managed to create a PDF WITH IMAGES, but I deleted it! The problem is in or in hrefs
I have apache2, php7.
maybe the issue is because images on Wikipedia are wrapped inside a tags
this is the structure of an image
<a href="//commons.wikimedia.org/wiki/File:Guido_van_Rossum_OSCON_2006.jpg?uselang=ru" class="image"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/6/66/Guido_van_Rossum_OSCON_2006.jpg/220px-Guido_van_Rossum_OSCON_2006.jpg" decoding="async" width="220" height="330" class="thumbimage" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/6/66/Guido_van_Rossum_OSCON_2006.jpg/330px-Guido_van_Rossum_OSCON_2006.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/6/66/Guido_van_Rossum_OSCON_2006.jpg/440px-Guido_van_Rossum_OSCON_2006.jpg 2x" data-file-width="2336" data-file-height="3504"></a>
I tried to delete from $html these "//" at the beginning of src and other workarounds, but it doesnt work. You see there is a banch of images inside this code so it obscures dompdf
This is what I tried before passing $html to further dompdf controllers
$html = file_get_contents($url);
//preg_match_all('/\/肖邪泄谢:/.*', $html, $o);
for ($i = 0; $i < 100; $i++) //or for ($i = 0; $i < count($o); $i++)
{
//$html = str_replace('src="//', 'src="', $html);
$html = str_replace('/wiki/肖邪泄谢:', 'commons.wikimedia.org/wiki/肖邪泄谢:', $html);
$html = str_replace('/wiki/File:', 'commons.wikimedia.org/wiki/File:', $html);
//$html = str_replace('"//upload', '"upload', $html);
//$html = str_replace('//upload', 'upload', $html);
$html = str_replace('"//commons', '"commons', $html);
$html = str_replace('//commons', 'commons', $html);
}
How are you loading the HTML into Dompdf? It's probably the protocol-relative URL (i.e. the missing "http:" from the front of the image. If you're using $dompdf->load_html() then the default protocol is going to be a file. There are ways around the issue. You could specify a base href in the header. Or you could load the HTML then call `$dompdf->set_protocol('http://')
How are you loading the HTML into Dompdf? It's probably the protocol-relative URL (i.e. the missing "http:" from the front of the image. If you're using
$dompdf->load_html()then the default protocol is going to be a file. There are ways around the issue. You could specify a base href in the header. Or you could load the HTML then call `$dompdf->set_protocol('http://')
Hello, thanks, adding $dompdf->set_protocol('http://') worked. This is the snippet
$dompdf->loadHtml($html);
$dompdf->set_protocol('http://');
$dompdf->setPaper('A3', 'landscape');
$dompdf->render('test');
$output = $dompdf->output();
file_put_contents('/opt/lampp/htdocs/wikipdf.ru/test.pdf', $output);
I've created a PDF from https://ru.wikipedia.org/wiki/Python. But after so done the Chrome opened a new tab - http://wiki/%D0%A4%D0%B0%D0%B9%D0%BB:Python_3._The_standard_type_hierarchy.png "ERR_NAME_RESOLUTION_FAILED"
Anyway, the issue seems to be solved. Now my PDF has images
Not sure what's going on with Chrome. Opened automatically or when you click a link. That looks like the link associated with one of the images. Those are all relative, so for them to work you may want to go ahead and assign the remainder of the URL parameters:
$dompdf->loadHtml($html);
$dompdf->setProtocol('http://');
$dompdf->setBaseHost('ru.wikipedia.org');
$dompdf->setBasePath('/');
$dompdf->setPaper('A3', 'landscape');
$dompdf->render('test');
$output = $dompdf->output();
file_put_contents('/opt/lampp/htdocs/wikipdf.ru/test.pdf', $output);