Pillow: PIL cannot read BigTIFF

Created on 2 Apr 2020  路  9Comments  路  Source: python-pillow/Pillow

What did you do?

I'm trying to open an orthomosaic geotiff into Python to crop into 1000x1000 tiles. I am able to open some tif files in my notebook, but other files return an UnidentifiedImageError. I have set up my notebook using a Dockerfile that creates a conda environment.

What did you expect to happen?

I expected that my code would work the same on all of my tif files - they are all 4-band 8-bit. This code works with a tif file that is 992 MB, but it doesn't work with a file that's 691 MB.

What actually happened?

UnidentifiedImageError                    Traceback (most recent call last)
<ipython-input-85-c7f17ab4563e> in <module>
      6 
      7 # break it up into crops
----> 8 for k, piece in enumerate(crop(infile, tile_height, tile_width, stride, img_dict, prj_name), start_num):
      9     img=Image.new('RGB', (tile_height, tile_width), (255, 255, 255))
     10     print(img.size)

<ipython-input-83-c9b468169840> in crop(infile, tile_height, tile_width, stride, img_dict, prj_name)
      5 
      6 def crop(infile, tile_height, tile_width, stride, img_dict, prj_name):
----> 7     im = Image.open(infile)
      8     img_width, img_height = im.size
      9     print(im.size)

/opt/conda/envs/geo_env/lib/python3.8/site-packages/PIL/Image.py in open(fp, mode)
   2859     for message in accept_warnings:
   2860         warnings.warn(message)
-> 2861     raise UnidentifiedImageError(
   2862         "cannot identify image file %r" % (filename if filename else fp)
   2863     )

UnidentifiedImageError: cannot identify image file '../data/mosaics/GrandJason_SWRightThird_Nov2019_transparent_mosaic_group1.tif'

What are your OS, Python and Pillow versions?

  • OS: Ubuntu 18.04
  • Python: 3.8.2
  • Pillow: 7.0.0
from PIL import Image
import os
import argparse
import numpy as np
import json
import csv
import rasterio
import matplotlib
import folium
from pyproj import Proj, transform


%matplotlib inline


Image.MAX_IMAGE_PIXELS = 100000000000
# ingest the image
infile = "../data/mosaics/GrandJason_SWRightThird_Nov2019_transparent_mosaic_group1.tif"

img_dir = '..' + infile.split(".")[2]
prj_name = img_dir.split("/")[-1]
dataset = rasterio.open(infile)

# what is the name of this image
img_name = dataset.name
print('Image filename: {n}\n'.format(n=img_name))

# How many bands does this image have?
num_bands = dataset.count
print('Number of bands in image: {n}\n'.format(n=num_bands))

# How many rows and columns?
rows, cols = dataset.shape
print('Image size is: {r} rows x {c} columns\n'.format(r=rows, c=cols))

# Does the raster have a description or metadata?
desc = dataset.descriptions
metadata = dataset.meta

print('Raster description: {desc}\n'.format(desc=desc))

# What driver was used to open the raster?
driver = dataset.driver
print('Raster driver: {d}\n'.format(d=driver))

# What is the raster's projection?
proj = dataset.crs
print('Image projection:')
print(proj, '\n')

# What is the raster's "geo-transform"
gt = dataset.transform

print('Image geo-transform:\n{gt}\n'.format(gt=gt))

print('All raster metadata:')
print(metadata)
print('\n')

tile_height = tile_width = 1000
overlap = 80
stride = tile_height - overlap
start_num=0

#crop image into tiles
def crop(infile, tile_height, tile_width, stride, img_dict, prj_name):
    im = Image.open(infile) 
    img_width, img_height = im.size
    print(im.size)
    print(img_width * img_height / (tile_height - stride) / (tile_width - stride))
    count = 0
    for r in range(0, img_height-tile_height+1, stride):
        for c in range(0, img_width-tile_width+1, stride):
            #tile = im[r:r+100, c:c+100]
            box = (c, r, c+tile_width, r+tile_height)
            top_pixel = [c,r]
            img_dict[prj_name + "---" + str(count) + ".png"] = top_pixel
            count += 1
            yield im.crop(box)

#split image into heightxwidth patches
img = Image

img_dict = {}

# create the dir if it doesn't already exist
if not os.path.exists(img_dir):
    os.makedirs(img_dir)

# break it up into crops
for k, piece in enumerate(crop(infile, tile_height, tile_width, stride, img_dict, prj_name), start_num):
    img=Image.new('RGB', (tile_height, tile_width), (255, 255, 255))
    print(img.size)
    print(piece.size)
    img.paste(piece)
    image_name = prj_name + "---%s.png" % k
    path=os.path.join(img_dir, image_name)
    img.save(path)
TIFF

Most helpful comment

OK, shall do when I've had a chance to get at least some of it working!

All 9 comments

Could be a BigTIFF file, which is not supported by Pillow. The first 4 bytes in BigTIFF files are b'II\x2B\x00' or b'MM\x00\x2B'

Could be a BigTIFF file, which is not supported by Pillow. The first 4 bytes in BigTIFF files are b'II\x2B\x00' or b'MM\x00\x2B'

None of my tif files are BigTIFF, and they are all way under 4GB size

Here is the information of my raster:
```Number of bands in image: 4

Image size is: 18679 rows x 31880 columns

Raster description: (None, None, None, None)

Raster driver: GTiff

Image projection:
EPSG:32720

Image geo-transform:
| 0.02, 0.00, 632879.80|
| 0.00,-0.02, 4341041.62|
| 0.00, 0.00, 1.00|

All raster metadata:
{'driver': 'GTiff', 'dtype': 'uint8', 'nodata': None, 'width': 31880, 'height': 18679, 'count': 4, 'crs': CRS.from_epsg(32720), 'transform': Affine(0.01804, 0.0, 632879.79994,
0.0, -0.01804, 4341041.621540001)}

bytearray(b'II+\x00\x08')
```

bytearray(b'II+\x00\x08')

That's BigTIFF.

Are there any plans to support bigtiff? I'd like to add my name to the list of people who would find it very useful.

I've just stumbled upon this issue too. At a first glance, it seems to me that it should be pretty straightforward to modify src/libImaging/TiffDecode.c to handle the BigTIFF format, which is meant to be very closely compatible with the TIFF format.

Ah, after working on this for about a day, and getting most of the way to a working solution, I realised that it is probably futile, as BigTIFF files are to a great extent used for medical images, and they will include multiple images within the file (for example, thumbnails). But PIL can only handle one image in a file, so this won't work very well.

A better solution is probably to identify BigTIFF files (from the b"MM\x00\x2B" or b"II\x2B\x00" header bytes) and then recommend using something like tifffile to handle the image.

If it's of interest, I could upload what I've done so far to github. (It's primarily work on TiffImagePlugin.py and TiffTags.py.)

Best wishes!

But PIL can only handle one image in a file, so this won't work very well.

Pillow can load multiple images. It opens the file at one image, and then uses seek() to navigate to a different image.

If you're willing to share what you've done, and an example image, that would be interesting.

OK, shall do when I've had a chance to get at least some of it working!

Was this page helpful?
0 / 5 - 0 ratings