Pdf.js: Some pdf contents are pictures,and header and footer could not be show

Created on 20 Aug 2017  路  14Comments  路  Source: mozilla/pdf.js

Link to PDF file (or attach file here):
the problem pdf is,use pdf tool show is fine

i use the pdf.js example demo code show result like thise

Configuration:

Steps to reproduce the problem:

  1. both open the problem pdf use pdfjs example and other pdf tools,compare the diffrents
  2. under the search锛宨 found that the problem pdf contents are pictures,and thier had been combined
    What is the expected behavior? (add screenshot)
    show the header and footer

What went wrong? (add screenshot)
couldn't show header and footer

4-corrupted-pdf 4-image-conversion

Most helpful comment

I'll try to create one. Hopefully tonight.

All 14 comments

This file contains an inline image. These are meant to be 4KB or less, see page 214 of the PDF 32000 specification. Yours is 74KB. Btw you forgot to compress it. Now inline images have the problem that it is tricky to detect their end. This is "EI". However sometimes big images have "EI" in the image sequence, so more heuristics are needed. In your image, after the "EI" there is a space, then a zero, then a "Q". Why the zero? I suspect that it fools the heuristics.

@timvandermeij thx for you answer.I'm a new guy for this,I try to compress the pdf use https://smallpdf.com ,it鈥檚 worked,but i don't know the compress algorithm which their use.How can i compress it use pdf.js,can you show a exmample or function ? thanks for sincerely.

@hymne your text is unclear. Was the file you attached modified with smallpdf.com? Does the original file display here? Then contact them and point them to page 214 of the PDF 32000 specification. Your file was created last friday and modified yesterday.

@timvandermeij ok,thx your suggest.The problem page is just one page in original pdfs.I splited it use iText.The original file also can't display in this page,but adobe reader display well...

Yes, Adobe does its best to displays bad files. Was the original page created in your company, and what software was used?

Btw I am not "timvandermeij" I'm "THausherr".

@THausherr oh! sorry sorry锛宼he original pdf was not created by my company,we just splited it into pages,i try to contact them what they use.

Thanks... my hope is that they're just starting at this, so that they can fix this weird file by not using inline images this way, and so we don't get such files into the wild.

Btw I don't speak for PDF.js. I'm with a different project (pdfbox) and your file fails there too :-) (Likely, because many developers think similarly)

@Snuffleupagus Thank you for your resolved the isuue,i update your version,it works well.Today is my projoect's deadline,and...umh....I appreciate your help,your are a very very very good person.

sorry for client's required,the pdf Involving confidentiality,i have to remove the pdf link... if somebody want to test,you can contact me by [email protected].

sorry for client's required,the pdf Involving confidentiality,i have to remove the pdf link...

First of all, the fact that you've removed the link does not mean that the file is suddenly unavailable. It's actually still available via GitHub, and the link is now checked in to the repo (see https://github.com/mozilla/pdf.js/pull/8800/files#diff-c8d67ab2dbcb03ca57b7ccd6d5ab7c9f).
Second of all, as part of fixing the display of corrupt PDF files like this one, it's absolutely imperative that we add regression tests since otherwise this could easily break again at any time.

@hymne If you want the file completely removed, then you'll probably need to contact GitHub support to inquire about that. However, I'd ask that you please do not do that unless you're able to produce a replacement file (displaying the same issue) that we can use for testing!

I'll try to create one. Hopefully tonight.

@Snuffleupagus get it.I remove the screenshot and link from the isuue's description.Just let my customers seems to their have bean removed.I'm sorry for trouble u,and hope it will not affectting the test

PDFJS-8798-test1.pdf
PDFJS-8798-test2.pdf
PDFJS-8798-test1.pdf: Diagonal line doesn't appear in old version, appears in new version.

PDFJS-8798-test2.pdf: that is the file I created first and so I wondered why one image was missing. It shows a flaw in the "EI" detecting strategy, which you may or may not want to correct. The difference to the first file is that this one has several inline images.

The inline images are rendered blurry, see #8245.

The files were created by myself for the PDFBox project and modified by myself and are Apache licensed.

@THausherr Thanks a lot for your help with reduced test-cases!
For the remaining issue, can you please file a new issue about that (since we want to limit each one to just one problem)?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

agilgur5 picture agilgur5  路  32Comments

AliND picture AliND  路  29Comments

soa-x picture soa-x  路  174Comments

snorp picture snorp  路  95Comments

Richard-Mlynarik picture Richard-Mlynarik  路  32Comments