April 6, 2026

PDF Merge, Split, Compress & Watermark: API Guide

PDF manipulationPDF mergePDF splitPDF compressionPDF API

Generating a PDF is rarely the end of the story. You often need to combine multiple PDFs into one, extract specific pages, reduce file size for email, or stamp a "DRAFT" watermark — all without manual intervention. This is PDF post-processing, and automating it well separates a basic implementation from a production-grade document workflow.

This guide covers six core PDF manipulation scenarios — merge, split, compress, watermark, page numbers, and headers/footers — with practical code examples and a comparison of library-based vs. API-based approaches.

For the basics of generating PDFs from HTML, see our HTML to PDF Complete Guide. For invoice automation, see Automate Invoice PDFs with API.

PDF Manipulation Overview

PDF post-processing needs fall into a few clear categories:

PDF Post-Processing
├── Structure
│   ├── Merge (combine multiple PDFs into one)
│   └── Split (break one PDF into multiple)
├── Optimization
│   └── Compress (reduce file size)
├── Content Addition
│   ├── Watermarks
│   ├── Page numbers
│   └── Headers / footers
└── Security
    ├── Password protection
    └── Permission control

There are two primary approaches: library-based (PyPDF2, pdf-lib, Ghostscript) and API-based (FUNBREW PDF). Libraries give fine-grained control but require managing dependencies, memory, and language-specific implementations. APIs eliminate infrastructure concerns and work uniformly from any language.

FUNBREW PDF handles both HTML-to-PDF generation and post-processing in one service. Try it instantly in the playground.

1. PDF Merging

Use Cases

Combine invoice + terms of service + payment confirmation into a single PDF
Merge separately generated monthly report chapters for distribution
Bundle application document sets (application form, consent form, instructions)

Design "one PDF from the start" with HTML

Rather than generating separate PDFs and merging them afterward, design your HTML to produce the combined document in a single request using CSS page-break-before.

<!DOCTYPE html>
<html>
<head>
<style>
  .page { page-break-before: always; }
</style>
</head>
<body>
  <!-- Invoice (page 1) -->
  <div class="invoice-section">
    <h1>Invoice</h1>
    <p>Invoice #: INV-2026-001</p>
    <!-- ... -->
  </div>

  <!-- Terms of Service (starts on new page) -->
  <div class="page terms-section">
    <h1>Terms of Service</h1>
    <p>Article 1...</p>
  </div>
</body>
</html>

curl -X POST https://pdf.funbrew.cloud/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "html": "<html>...(HTML above)...</html>",
    "options": { "format": "A4" }
  }' \
  -o combined-document.pdf

Python implementation

import requests

API_KEY = "YOUR_API_KEY"
API_URL = "https://pdf.funbrew.cloud/api/v1/generate"

def combine_sections(sections: list[str]) -> bytes:
    """Combine multiple HTML sections into one PDF."""
    pages = []
    for i, html in enumerate(sections):
        style = 'style="page-break-before: always;"' if i > 0 else ""
        pages.append(f'<div {style}>{html}</div>')

    full_html = f"""<!DOCTYPE html>
<html>
<head>
<style>
  body {{ font-family: sans-serif; margin: 40px; }}
  h1 {{ color: #333; }}
  table {{ width: 100%; border-collapse: collapse; }}
  th, td {{ border: 1px solid #ccc; padding: 8px; }}
</style>
</head>
<body>{''.join(pages)}</body>
</html>"""

    response = requests.post(
        API_URL,
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"html": full_html, "options": {"format": "A4"}},
    )
    response.raise_for_status()
    return response.content

# Usage
invoice_html = "<h1>Invoice</h1><p>Amount: $500</p>"
terms_html = "<h1>Terms of Service</h1><p>Article 1...</p>"
confirmation_html = "<h1>Payment Confirmed</h1><p>Thank you.</p>"

pdf = combine_sections([invoice_html, terms_html, confirmation_html])
with open("invoice-with-terms.pdf", "wb") as f:
    f.write(pdf)

Node.js implementation

const fs = require("fs");

async function combinePDFs(sections) {
  const combined = sections
    .map((html, i) => {
      const style = i > 0 ? 'style="page-break-before: always;"' : "";
      return `<div ${style}>${html}</div>`;
    })
    .join("\n");

  const fullHtml = `<!DOCTYPE html>
<html>
<head><style>body { font-family: sans-serif; margin: 40px; }</style></head>
<body>${combined}</body>
</html>`;

  const response = await fetch("https://pdf.funbrew.cloud/api/v1/generate", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.PDF_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ html: fullHtml, options: { format: "A4" } }),
  });

  return Buffer.from(await response.arrayBuffer());
}

const sections = [
  "<h1>Invoice</h1><p>Amount: $500</p>",
  "<h1>Terms of Service</h1><p>Article 1...</p>",
];

combinePDFs(sections).then((pdf) => {
  fs.writeFileSync("combined.pdf", pdf);
  console.log("PDF created: combined.pdf");
});

2. PDF Splitting

Use Cases

Extract a specific month's pages from a consolidated annual report
Split a bulk-generated application set into individual PDFs for per-user delivery
Deliver large PDFs in page-range segments for download

Generate page-per-document from the start

Instead of post-splitting, generate individual PDFs directly — one per record.

import requests
from concurrent.futures import ThreadPoolExecutor

API_KEY = "YOUR_API_KEY"
API_URL = "https://pdf.funbrew.cloud/api/v1/generate"

def generate_single_pdf(data: dict, template_fn) -> tuple[str, bytes]:
    html = template_fn(data)
    response = requests.post(
        API_URL,
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "html": html,
            "options": {
                "format": "A4",
                "margin": {"top": "20mm", "bottom": "20mm",
                           "left": "15mm", "right": "15mm"},
            },
        },
        timeout=60,
    )
    response.raise_for_status()
    return data["invoice_no"], response.content

def invoice_template(data: dict) -> str:
    return f"""<!DOCTYPE html>
<html>
<body style="font-family: sans-serif; margin: 40px;">
  <h1>Invoice</h1>
  <p>To: {data['customer_name']}</p>
  <p>Invoice #: {data['invoice_no']}</p>
  <p>Amount: ${data['amount']:,.2f}</p>
</body>
</html>"""

customers = [
    {"customer_name": "Acme Corp", "invoice_no": "INV-001", "amount": 5000.00},
    {"customer_name": "Globex Inc", "invoice_no": "INV-002", "amount": 8000.00},
    {"customer_name": "Initech LLC", "invoice_no": "INV-003", "amount": 3000.00},
]

# Parallel generation (see pdf-api-batch-processing for advanced patterns)
with ThreadPoolExecutor(max_workers=5) as executor:
    futures = [
        executor.submit(generate_single_pdf, customer, invoice_template)
        for customer in customers
    ]
    for future in futures:
        invoice_no, pdf = future.result()
        with open(f"invoice_{invoice_no}.pdf", "wb") as f:
            f.write(pdf)
        print(f"Generated: invoice_{invoice_no}.pdf")

For large-scale bulk PDF generation, see the Batch Processing Guide.

3. PDF Compression

Why PDFs get large

Cause	Solution
High-resolution embedded images	Pre-compress and resize images
Full font embedding	Subset fonts (embed only used characters)
Uncompressed content streams	Enable stream compression

HTML-level optimization

Optimizing the HTML you pass to FUNBREW PDF API directly reduces the output PDF size.

<!DOCTYPE html>
<html>
<head>
<style>
  /* Use Google Fonts with display=swap for lightweight loading */
  @import url('https://fonts.googleapis.com/css2?family=Inter:wght@400;700&display=swap');

  body {
    font-family: 'Inter', sans-serif;
    background: white;
    color: black;
  }

  /* Prefer border over box-shadow (box-shadow increases PDF size) */
  .card { border: 1px solid #ccc; }

  /* Avoid gradients and complex backgrounds in print context */
</style>
</head>
<body>
  <!-- Pre-compress images before embedding -->
  <!-- BAD:  <img src="original-4k-photo.jpg"> -->
  <!-- GOOD: <img src="compressed-800px.jpg" width="400"> -->
  <img src="logo-compressed.png" width="200" height="60" alt="Logo">
</body>
</html>

Image pre-compression (Python)

from PIL import Image
import io
import base64

def compress_image_to_base64(
    image_path: str,
    max_width: int = 800,
    quality: int = 85
) -> str:
    """Compress an image and return as a base64 data URI."""
    with Image.open(image_path) as img:
        if img.width > max_width:
            ratio = max_width / img.width
            new_size = (max_width, int(img.height * ratio))
            img = img.resize(new_size, Image.LANCZOS)

        buffer = io.BytesIO()
        img.convert("RGB").save(buffer, format="JPEG", quality=quality, optimize=True)
        buffer.seek(0)
        b64 = base64.b64encode(buffer.read()).decode()
        return f"data:image/jpeg;base64,{b64}"

logo_b64 = compress_image_to_base64("logo.png", max_width=400, quality=90)
html = f"""<html><body>
  <img src="{logo_b64}" width="200" alt="Logo">
  <h1>Report</h1>
</body></html>"""

Node.js image optimization

const sharp = require("sharp");

async function compressImageToBase64(imagePath, maxWidth = 800, quality = 85) {
  const buffer = await sharp(imagePath)
    .resize({ width: maxWidth, withoutEnlargement: true })
    .jpeg({ quality, progressive: true })
    .toBuffer();
  return `data:image/jpeg;base64,${buffer.toString("base64")}`;
}

async function generateCompactPDF(title, content) {
  const logoBase64 = await compressImageToBase64("logo.png", 400, 90);
  const html = `<!DOCTYPE html>
<html>
<body style="font-family: sans-serif; margin: 40px;">
  <img src="${logoBase64}" width="200" alt="Logo">
  <h1>${title}</h1>
  <p>${content}</p>
</body>
</html>`;

  const res = await fetch("https://pdf.funbrew.cloud/api/v1/generate", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.PDF_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ html, options: { format: "A4" } }),
  });
  return Buffer.from(await res.arrayBuffer());
}

4. Watermarks

Watermarks are implemented entirely with CSS. The key is position: fixed combined with a high z-index — this ensures the watermark appears on every page.

Diagonal text watermark

<!DOCTYPE html>
<html>
<head>
<style>
  .watermark {
    position: fixed;
    top: 50%;
    left: 50%;
    transform: translate(-50%, -50%) rotate(-45deg);
    font-size: 80px;
    font-weight: bold;
    color: rgba(200, 200, 200, 0.3);
    z-index: 1000;
    pointer-events: none;
    white-space: nowrap;
    user-select: none;
  }
  .content {
    position: relative;
    z-index: 1;
    margin: 40px;
  }
</style>
</head>
<body>
  <div class="watermark">DRAFT</div>
  <div class="content">
    <h1>Contract</h1>
    <p>This document is a draft. Not for distribution.</p>
  </div>
</body>
</html>

Per-user confidential watermark (Python)

import requests
from datetime import datetime

def generate_confidential_pdf(content_html: str, user_name: str, doc_id: str) -> bytes:
    """Generate a PDF with a repeating user-info watermark for audit trails."""
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M")
    watermark_text = f"{user_name}  •  {timestamp}  •  {doc_id}"

    diagonal_strips = "".join([
        f"""<div style="
          position: absolute;
          top: {i * 180}px;
          left: -100px;
          width: 160%;
          transform: rotate(-25deg);
          font-size: 12px;
          color: rgba(150, 150, 150, 0.18);
          white-space: nowrap;
          letter-spacing: 8px;
        ">{watermark_text}</div>"""
        for i in range(-2, 12)
    ])

    full_html = f"""<!DOCTYPE html>
<html>
<head>
<style>
  body {{ font-family: sans-serif; margin: 40px; }}
</style>
</head>
<body>
  <div style="position: fixed; top: 0; left: 0; width: 100%;
              height: 100%; z-index: 9999; pointer-events: none;
              overflow: hidden;">
    {diagonal_strips}
  </div>
  <div style="position: relative; z-index: 1;">
    {content_html}
  </div>
</body>
</html>"""

    response = requests.post(
        "https://pdf.funbrew.cloud/api/v1/generate",
        headers={"Authorization": f"Bearer YOUR_API_KEY"},
        json={"html": full_html, "options": {"format": "A4"}},
    )
    response.raise_for_status()
    return response.content

pdf = generate_confidential_pdf(
    content_html="<h1>Confidential Report</h1><p>Internal use only.</p>",
    user_name="John Smith",
    doc_id="DOC-2026-001",
)
with open("confidential.pdf", "wb") as f:
    f.write(pdf)

Logo watermark (bottom-right corner)

<style>
  .logo-watermark {
    position: fixed;
    bottom: 30px;
    right: 30px;
    opacity: 0.12;
    z-index: 1000;
  }
</style>
<img class="logo-watermark" src="data:image/png;base64,..." width="100" alt="">

5. Page Numbers & Headers/Footers

CSS `@page` rule

The CSS @page rule with margin boxes (@top-center, @bottom-center, etc.) is the standard way to add consistent headers and footers to every page.

<!DOCTYPE html>
<html>
<head>
<style>
  @page {
    size: A4;
    margin: 25mm 20mm 30mm 20mm;

    /* Header */
    @top-center {
      content: "Monthly Report — April 2026";
      font-size: 10px;
      color: #666;
    }

    /* Footer: page number */
    @bottom-center {
      content: counter(page) " / " counter(pages);
      font-size: 10px;
      color: #666;
    }

    /* Footer left: company name */
    @bottom-left {
      content: "FUNBREW Inc.";
      font-size: 10px;
      color: #999;
    }

    /* Footer right: date */
    @bottom-right {
      content: "2026-04-06";
      font-size: 10px;
      color: #999;
    }
  }

  /* Override for the first page (cover page — no header) */
  @page :first {
    @top-center { content: ""; }
  }

  body {
    font-family: sans-serif;
    font-size: 11pt;
    line-height: 1.6;
  }
</style>
</head>
<body>
  <h1>Monthly Report</h1>
  <p>Executive summary content...</p>

  <div style="page-break-before: always;">
    <h2>Chapter 2: Data Analysis</h2>
    <p>...</p>
  </div>
</body>
</html>

Dynamic header/footer with JavaScript

async function generateReport({ title, author, date, sections }) {
  const sectionsHtml = sections
    .map(
      ({ title: sTitle, content }, i) => `
      <div ${i > 0 ? 'style="page-break-before: always;"' : ""}>
        <h2>${sTitle}</h2>
        ${content}
      </div>`
    )
    .join("");

  const html = `<!DOCTYPE html>
<html>
<head>
<style>
  @page {
    size: A4;
    margin: 25mm 20mm 30mm 20mm;
    @top-left   { content: "${title}"; font-size: 9px; color: #666; }
    @top-right  { content: "${author}"; font-size: 9px; color: #666; }
    @bottom-center {
      content: "— " counter(page) " —";
      font-size: 9px; color: #999;
    }
    @bottom-right { content: "${date}"; font-size: 9px; color: #999; }
  }
  @page :first {
    @top-left  { content: ""; }
    @top-right { content: ""; }
  }
  body { font-family: sans-serif; font-size: 11pt; line-height: 1.7; }
  h1   { text-align: center; margin-bottom: 50px; }
  h2   { color: #333; border-bottom: 2px solid #333; padding-bottom: 5px; }
</style>
</head>
<body>
  <!-- Cover page -->
  <div style="text-align: center; padding-top: 100px;">
    <h1>${title}</h1>
    <p style="color: #666;">${author} | ${date}</p>
  </div>
  ${sectionsHtml}
</body>
</html>`;

  const response = await fetch("https://pdf.funbrew.cloud/api/v1/generate", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.PDF_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ html, options: { format: "A4" } }),
  });

  return Buffer.from(await response.arrayBuffer());
}

const report = await generateReport({
  title: "Q1 2026 Business Report",
  author: "Business Planning Dept.",
  date: "April 6, 2026",
  sections: [
    { title: "Executive Summary", content: "<p>...</p>" },
    { title: "Revenue Performance", content: "<table>...</table>" },
    { title: "Next Quarter Plan", content: "<p>...</p>" },
  ],
});

6. Library Comparison: PyPDF2 vs pdf-lib vs Ghostscript vs API

Feature overview

Tool	Language	Strengths	Limitations
PyPDF2 / pypdf	Python	Merge, split, metadata	No HTML→PDF
pdf-lib	JavaScript	Merge, text/image overlay, forms	Complex font embedding
Ghostscript	CLI	Compression, format conversion	Requires installation, license considerations
pdfkit	Python/Node	HTML→PDF	Requires wkhtmltopdf
FUNBREW PDF API	Any	HTML→PDF + all post-processing	External API dependency

PyPDF2 merge & split

from pypdf import PdfWriter, PdfReader

def merge_pdfs(pdf_paths: list[str], output_path: str) -> None:
    writer = PdfWriter()
    for path in pdf_paths:
        reader = PdfReader(path)
        for page in reader.pages:
            writer.add_page(page)
    with open(output_path, "wb") as f:
        writer.write(f)

def split_pdf(input_path: str, output_dir: str) -> None:
    reader = PdfReader(input_path)
    for i, page in enumerate(reader.pages):
        writer = PdfWriter()
        writer.add_page(page)
        with open(f"{output_dir}/page_{i + 1}.pdf", "wb") as f:
            writer.write(f)

pdf-lib watermark (JavaScript)

import { PDFDocument, rgb, degrees } from "pdf-lib";

async function addWatermark(pdfBytes, text) {
  const doc = await PDFDocument.load(pdfBytes);
  for (const page of doc.getPages()) {
    const { width, height } = page.getSize();
    page.drawText(text, {
      x: width / 2 - 100,
      y: height / 2,
      size: 50,
      color: rgb(0.8, 0.8, 0.8),
      opacity: 0.3,
      rotate: degrees(-45),
    });
  }
  return doc.save();
}

Ghostscript compression (CLI)

# High compression for web sharing
gs -sDEVICE=pdfwrite \
   -dCompatibilityLevel=1.4 \
   -dPDFSETTINGS=/ebook \
   -dNOPAUSE -dQUIET -dBATCH \
   -sOutputFile=compressed.pdf \
   input.pdf

# Quality presets:
# /screen   — smallest (72 dpi,  for screen only)
# /ebook    — balanced (150 dpi, for email)
# /printer  — print quality (300 dpi)
# /prepress — high quality + color profiles (300 dpi)

Choosing your approach

Prefer a library when:

Post-processing existing PDF files (merge, split, encrypt)
Offline or air-gapped environment
High-volume simple operations (e.g., bulk single-page splits)

Prefer an API when:

Generating from HTML and post-processing in one workflow
Minimizing server dependencies (slimmer Docker images)
Calling from multiple languages or microservices with a unified interface
Needing complex font and CSS control

For production error handling patterns, see the API Error Handling Guide. For security best practices, see the API Security Guide.

Summary

Operation	Recommended approach	Key technique
Merge	Design as single HTML from the start	`page-break-before: always`
Split	Generate one file per record	Batch API + parallel processing
Compress	Pre-optimize images	sharp / Pillow before API call
Watermark	CSS `position: fixed`	Semi-transparent + `rotate(-45deg)`
Page numbers	CSS `@page` rule	`counter(page)`
Header/footer	CSS `@page` rule	`@top-center`, `@bottom-center`

The core insight: designing your PDF generation to produce the desired output directly is almost always simpler and more reliable than post-processing. FUNBREW PDF's playground lets you validate CSS effects in your browser before integrating into your codebase.

HTML to PDF Complete Guide — Generation fundamentals and approach comparison
PDF Batch Processing Guide — Efficiently generating large volumes
HTML to PDF CSS Tips — Print CSS techniques for production PDFs
Automate Invoice PDFs — Full invoice automation workflow
PDF Report Generation Guide — Designing and generating report PDFs
Webhook Integration Guide — Async processing and webhooks
API Error Handling Guide — Production-quality error handling
API Security Guide — Secure API key management and access control

Start experimenting with the free plan. Full option reference is in the documentation.

PDF Manipulation Overview

1. PDF Merging

Use Cases

Design "one PDF from the start" with HTML

Python implementation

Node.js implementation

2. PDF Splitting

Use Cases

Generate page-per-document from the start

3. PDF Compression

Why PDFs get large

HTML-level optimization

Image pre-compression (Python)

Node.js image optimization

4. Watermarks

Diagonal text watermark

Per-user confidential watermark (Python)

Logo watermark (bottom-right corner)

5. Page Numbers & Headers/Footers

CSS @page rule

Dynamic header/footer with JavaScript

6. Library Comparison: PyPDF2 vs pdf-lib vs Ghostscript vs API

Feature overview

PyPDF2 merge & split

pdf-lib watermark (JavaScript)

Ghostscript compression (CLI)

Choosing your approach

Summary

Related articles

CSS `@page` rule