phppdf Docs

A PHP library for programmatic PDF generation and manipulation

Source ↗

Reading

Content extraction

Text extraction

use PhpPdf\Reader\PdfTextExtractor;

$doc       = PdfDocumentReader::open('/path/to/file.pdf');
$extractor = new PdfTextExtractor($doc);

for ($i = 0; $i < $doc->getPageCount(); $i++) {
    $text = $extractor->getTextForPage($i);   // string
    echo "--- Page " . ($i + 1) . " ---\n";
    echo $text . "\n";
}

Font support

Font typeDecoding method
Type 0 / CIDToUnicode CMap
Type 1, TrueType (WinAnsi)WinAnsiEncoding / Latin-1 fallback

Glyphs with purely custom or glyph-substituted encodings may not extract correctly.

Text operators recognised

Tj, TJ, ', " — show text. Td, TD, Tm, T* — position and line breaks.

Large negative kerning values in TJ arrays (below −200) are treated as word breaks.

Image extraction

use PhpPdf\Reader\PdfImageExtractor;

$extractor = new PdfImageExtractor($doc);

// Images on a specific page (0-based)
$images = $extractor->getImagesForPage(0);

// All unique images across the whole document
$images = $extractor->getAllImages();

foreach ($images as $image) {
    echo $image->name . "\n";          // resource name, e.g. "Im1"
    echo $image->width . "×" . $image->height . "\n";
    echo $image->colorSpace . "\n";    // e.g. "DeviceRGB"
    echo $image->bitsPerComponent . "\n";

    // Raw decoded pixel bytes (or JPEG bytes for DCTDecode images)
    $bytes = $image->data;

    // Write to file
    file_put_contents('/tmp/' . $image->name . '.' . $image->getFileExtension(), $image->toFileBytes());
}

getAllImages() deduplicates shared images — an image referenced from multiple pages appears only once.

PdfExtractedImage methods:

MethodReturns
isJpeg()true if the data is a JPEG byte stream
getFileExtension()'jpg' or 'png'
toFileBytes()Raw JPEG bytes or a valid PNG file
toPng()Always returns a PNG (wraps raw pixels; converts RGBA with SMask)

Annotation extraction

use PhpPdf\Reader\PdfAnnotationExtractor;

$extractor = new PdfAnnotationExtractor($doc);

// Annotations on a single page
$annotations = $extractor->getAnnotationsForPage(0);

// All annotations in the document
$annotations = $extractor->getAllAnnotations();

foreach ($annotations as $ann) {
    echo $ann->type->value;   // e.g. "Link", "Text", "Highlight"
    echo $ann->x . ', ' . $ann->y . '  ' . $ann->width . '×' . $ann->height . "\n";

    if ($ann->isUriLink()) {
        echo 'URL: ' . $ann->uri . "\n";
    }
}

PdfAnnotation properties

PropertyTypeDescription
typePdfAnnotationTypeAnnotation subtype
x, yfloatBottom-left corner
width, heightfloatBounding box size
contents?stringAnnotation text/tooltip
title?stringPopup title
color?array{float,float,float}RGB color
interiorColor?array{float,float,float}Interior fill RGB
borderWidthfloatBorder line width
quadPoints?list<float>Quad points for text markup
uri?stringURI for Link annotations
open?boolWhether the popup is open