Go: x/image/tiff: package doesn't support CCITT Fax 4 Compression

Created on 7 Mar 2017  Â·  19Comments  Â·  Source: golang/go

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go1.8 darwin/amd64

What operating system and processor architecture are you using (go env)?

Darwin/amd64

What did you do?

Take a tiff image with CCITT Fax 4 Compression and attempt to decode it with the golang.org/x/image/tiff package.

What did you expect to see?

I expected a level 4 compressed tiff to decode properly.

What did you see instead?

tiff: unsupported feature: compression value 4

FrozenDueToAge NeedsFix help wanted

Most helpful comment

I can contribute an enhanced version of x/image/tiff that features:

  • CCITT decoding for group3/1d and group4
  • CMYK reading and writing
  • an improved version of compress/lzw (see #25409)

Right now it's all located at pdfcpu.
Please let me know if this is worth a PR.
Thank you!

All 19 comments

@sanbornm would it be possible for you to provide a sample offending image and a working repro program? It makes it easier for folks to investigate.

Thanks.

new.tiff.zip

package main

import (
    "bytes"
    "image"
    "image/png"
    "io/ioutil"
    "log"

    _ "golang.org/x/image/tiff"
)

func main() {
    b, err := ioutil.ReadFile("new.tiff")

    reader := bytes.NewReader(b)
    img, _, err := image.Decode(reader)
    if err != nil {
        log.Fatal(err)
    }

    w := bytes.NewBuffer([]byte{})
    err = png.Encode(w, img)
    if err != nil {
        log.Fatal(err)
    }
}

Output:

2017/03/07 09:03:46 tiff: unsupported feature: compression value 4
exit status 1

I am having similar issue -

[DEBUG]  image.go:278 Error decoding file: tiff: unsupported feature: compression value 4
[ERROR]  image.go:80 Error loading image: tiff: unsupported feature: compression value 4
Error: tiff: unsupported feature: compression value 4

Here is a sample image
(scanned using Kodak i2420 scanner).

I can contribute an enhanced version of x/image/tiff that features:

  • CCITT decoding for group3/1d and group4
  • CMYK reading and writing
  • an improved version of compress/lzw (see #25409)

Right now it's all located at pdfcpu.
Please let me know if this is worth a PR.
Thank you!

CCITT Fax compression seemed unlikely to be useful when this was written.

Sure, please send CLs to me :)

Kind of unreal to get feedback after 6 months.
@bsiegert are you one of the maintainers of this package?

Yes, and I did not see this in time.

Horst Rutter notifications@github.com schrieb am Sa. 13. Apr. 2019 um
20:47:

Kind of unreal to get feedback after 6 months.
@bsiegert https://github.com/bsiegert are you one of the maintainers of
this package?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/19443#issuecomment-482863478, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA04Gr7c7M01TFhLfKgxHT4iFQRHedTQks5vgiZfgaJpZM4MVrHc
.

>

Benny

Excellent!

Please have a look at here and here and let me know how to proceed.

There are dependencies on compress/lzw which I improved and started a golang proposal for and
also we are going to need to find the right place for a ccitt decoder which hopefully soon gets an encoder as well.

Meanwhile I am busy resolving this pdfcpu issue.

Ah.... good to see progress here, I started some work for a t.4 and t.6 decoder (just for fun) as the issue seemed to be well asleep.

@hhrutter Maybe let's start with CCITT Fax G4 compression. You say you have code for that?

Have you ever contributed to Go? If not, see https://golang.org/doc/contribute.html for how to get started. Send a CL for golang/image (golang.org/x/image) and start the commit message with "tiff: ". You can set @nigeltao and me as the reviewers.

Alternatively, you can start a pull request on github, which will create a Gerrit CL. This is a bit more clunky though.

I have a rudimentary decoder for G3 and G4 that I can contribute.
It's basic because it is hard to find samples for all possible variations.
G4 is much more complete than G3 for the same reason.
I picked up all tiff files with embedded CCITT encoded images from the issues in this repo
and use them in testcases to prove correct decoding.

I did not write the encoder yet because these days CCITT G3/G4 encoding does not seem to have that much priority for users like CCITT decoding.

Please let me know if you accept a CCITT reader without the corresponding writer.
The main issue with this may be testing.

I am aware having a Reader/Writer pair enables writing nice tests.
Lacking a ccitt/writer for testcases I resorted to a workaround comparing the G3/G4 compressed binary byte streams as extracted from TIFF with reference PNG files that represent the result of a successful decoding of the very byte streams in the first place.

Another sub issue would be finding the right location for a ccitt package.
There is tiff/lzw but that is just a tmp hack in order to postpone revising compress/lzw if I understood correctly. There were already long discussions about this and I have prepared a solution in pdfcpu but here is not the right place to address this.

There is image/bmp. I believe it make sense to have image/ccitt.
I think CCITT compression is an important enough topic on its own not to hide it inside of image/tiff.

Other than that I think I am setup for contributing. I signed up with Gerrit a year ago although no contribution yet.

Please let me know if you accept a CCITT reader without the corresponding writer.

Yes, that's fine. For example, golang.org/x/image/webp has a reader without a writer, and people still find it useful.

The main issue with this may be testing.

Yeah, that could be tricky. We'd probably want to check in some test files into the repo, but be aware of any copyright concerns with those files. Ideally we'd find some files with e.g. a Creative Commons or equally liberal license, or create your own test files (either manually, or transforming your own images through some other program such as ImageMagick).

I think CCITT compression is an important enough topic on its own not to hide it inside of image/tiff.

I'm not so familiar with CCITT. golang.org/x/image/ccitt might still be the right location, but do people ship 'CCITT files' or 'other files that embed CCITT somehow` that aren't TIFF files?

The main issue with this may be testing.

Yeah, that could be tricky. We'd probably want to check in some test files into the repo, but be aware of any copyright concerns with those files. Ideally we'd find some files with e.g. a Creative Commons or equally liberal license, or create your own test files (either manually, or transforming your own images through some other program such as ImageMagick).

I saw that LOC (Library of Congress) has some interest in TIFF preservation:
https://www.loc.gov/preservation/digital/formats/fdd/fdd000022.shtml

There are images available with different compression embedded:
ftp://ftp.loc.gov/pub/preservation/guidelines/images
but don't know about its copyright.

I tried to find also "reference files" from ITU-T specifically for T.4 and T.6 but didn't find any.

Regarding the ccitt package location.
I would not put it inside of golang.org/x/image/tiff for the reason I already stated and also
it looks like this package needs a lot of work anyways. TIFF is such a flexible format with lots of options some of them still not implemented, better keep the package small as can be for now and leave room to grow and for logical extension. On the other hand I think it's a valid statement to say you are only ever going to need ccitt decoding/encoding within the tiff space but there are always exceptions. So this is my take - but I am not in a position to make the call.

Thanks for the pointer to the library of congress ftp server.
I found plenty of samples to stress test the decoder.
I also have access to a scanner that can produce PDFs with embedded ccitt group4 compressed images but from a testing point of view this would only be just one of many different ccitt encoders out there.

@hhrutter No one said that it should be part of golang.org/x/image/tiff. golang.org/x/image/ccitt would be a good option if there are files in CCITT encoding that are not tiff (which I doubt). Otherwise, it can be golang.org/x/image/tiff/ccitt, for instance.

I don't understand these two bits:

and also it looks like this package needs a lot of work anyways

Is that a reason for not working on it?

better keep the package small as can be for now and leave room to grow and for logical extension

Are you suggesting that the tiff package should not be extended?

Change https://golang.org/cl/174139 mentions this issue: tiff: Add support for CCITT group 3/4 compression

To the contrary.
Of course it definitely should be extended.
There is just tons of work ahead and one needs to understand where I am coming from
and what has been discussed in this proposal.

Let me give you some background:
I created the ccitt decoder out of an urgent need for pdfcpu - a PDF processor.
PDF defines stream filters that allow object streams to be encoded in a certain way and the CCITTFaxDecode filter is one of them. PDF processors like pdfcpu implement these filters as this is an assumption for PDF reader/writers. If you take a look at here then you get a better picture.

The same applies to lzw. There is the PDF LZWDecode filter.
Since neither compress/lzw nor image/tiff/lzw works as laid out in the PDF spec
I created a seperate extension and put it under pdfcpu/lzw analog to pdfcpu/ccitt.
pdfcpu/lzw works for the standard case and both pdf and tiff.
Therefore pdfcpu has been providing an extension of image/tiff under pdfcpu/image/tiff
A version of tiff that does not need to have its own lzw reader and rather uses a central one that's
supposed to be an updated compress/lzw and in addition is also processes cmyk and it writes lzw.

When I say there is a lot of work needed on the tiff package one
example would be to get rid of tiff/lzw and rather have tiff use a consolidated compress/lzw.
The discussion is laid out in this proposal. but because of the Go 1 backwards compatibility promise extending compress/lzw was put on hold.

In order to make progress it really helps to get familiar with the structure laid out in pdfcpu. It goes like:

github.com/hhrutter/pdfcpu
ccitt
lzw
pkg
...
filter
...
tiff

My wishlist is to to get rid of:
github.com/hhrutter/pdfcpu/ccitt and instead import compress/ccitt
github.com/hhrutter/pdfcpu/lzw and instead import compress/lzw
github.com/hhrutter/pdfcpu/tiff and instead import x/image/tiff

also for the sake of a better project package structure.

I am not saying it has to be that way but this is how I envision it.
I just think it is important to understand these issues
and even if this github issue is only about supporting CCITT G3 and G4 compression for tiff images
one has to keep in mind the bigger picture.

For now I can see 2 issues that need to be addressed simultaneously:
1) We need a separate location for ccitt compression (for PDF Readers/Writers)
2) We need to support tiff with ccitt compression

Later we need to
3) Enhance compress/lzw for GIF, PDF and TIFF support.
4) Have image/tiff use this consolidated version of compress/lzw and get rid of image/tiff/lzw
5) Support lzw compression in addition to existing decompression

and on top of that I can contribute
5) CMYK support for image/tiff

And we definitely need to have the CCITT decoding/encoding in a seperate package from tiff.
This is because CCITT compression is a member of the filter set defined in the PDF spec for authors of PDF readers and writers.

I already submitted what I am proposing.

Change https://golang.org/cl/174979 mentions this issue: ccitt: new package for CCITT images

Change https://golang.org/cl/182219 mentions this issue: ccitt: implement NewReader and DecodeIntoGray

Was this page helpful?
0 / 5 - 0 ratings