Fixing “OSError: image file is truncated” in Paperless-ngx while keeping PDF/A

Paperless-ngx installation done by ttecks Helperscripts. Getting every now and then those errors. Changing scanners didn’t help. (Brother xxx -> Futjitsu FI-8150)

one solution mentioned on Github was to set the output format from PDF/A to plain PDF as a workaround. But I decdided to keep PDF/A.

Why PDF/A matters.

PDF/A is an ISO-standardized archival format. It embeds all fonts, color profiles and metadata so the document renders identically decades from now. Switching to plain PDF means losing that long-term archival guarantee — and in Germany/Switzerland, certain official document submissions explicitly require PDF/A.

The truncation error is caused by Pillow/OCRmyPDF choking on slightly malformed image streams inside incoming PDFs. Even official vendor PDFs often have minor non-conformities that trigger it.

Make sure Paperless uses the right Ghostscript binary

On a LXC install, you may have multiple Ghostscript versions. Check which ones exist:

In our case the system had two different binaries:

  • /usr/local/bin/gs → real binary, version 10.06.0 (installed by OCRmyPDF)
  • /usr/bin/ghostscript → symlink pointing to /usr/bin/gs → version 9.55.0 (system package)

Paperless has a config option to specify which binary to use. The default in the config file pointed to the old 9.55.0. Fix it explicitly:

Make sure this line is uncommented in /etc/paperless.conf.

Upgrade Ghostscript to 10.07.0

Ghostscript 10.07.0 contains additional PDF handling improvements. On a bare LXC install (e.g. installed via the tteck Proxmox helper script), compile it from source:

This installs to /usr/local/bin/gs, which is the same path as the previous version — so it overwrites cleanly with no further changes needed.

Restart Paperless

Result

After these steps, the “image file is truncated” errors were eliminated entirely — with PDF/A output still enabled and all documents properly archived in the long-term archival format.

Leave a comment