Generating a PDF is rarely the end of the story. You often need to combine multiple PDFs into one, extract specific pages, reduce file size for email, or stamp a "DRAFT" watermark — all without manual intervention. This is PDF post-processing, and automating it well separates a basic implementation from a production-grade document workflow.
This guide covers six core PDF manipulation scenarios — merge, split, compress, watermark, page numbers, and headers/footers — with practical code examples and a comparison of library-based vs. API-based approaches.
For the basics of generating PDFs from HTML, see our HTML to PDF Complete Guide. For invoice automation, see Automate Invoice PDFs with API.
PDF Manipulation Overview
PDF post-processing needs fall into a few clear categories:
PDF Post-Processing
├── Structure
│ ├── Merge (combine multiple PDFs into one)
│ └── Split (break one PDF into multiple)
├── Optimization
│ └── Compress (reduce file size)
├── Content Addition
│ ├── Watermarks
│ ├── Page numbers
│ └── Headers / footers
└── Security
├── Password protection
└── Permission control
There are two primary approaches: library-based (PyPDF2, pdf-lib, Ghostscript) and API-based (FUNBREW PDF). Libraries give fine-grained control but require managing dependencies, memory, and language-specific implementations. APIs eliminate infrastructure concerns and work uniformly from any language.
FUNBREW PDF handles both HTML-to-PDF generation and post-processing in one service. Try it instantly in the playground.
1. PDF Merging
Use Cases
- Combine invoice + terms of service + payment confirmation into a single PDF
- Merge separately generated monthly report chapters for distribution
- Bundle application document sets (application form, consent form, instructions)
Design "one PDF from the start" with HTML
Rather than generating separate PDFs and merging them afterward, design your HTML to produce the combined document in a single request using CSS page-break-before.
<!DOCTYPE html>
<html>
<head>
<style>
.page { page-break-before: always; }
</style>
</head>
<body>
<!-- Invoice (page 1) -->
<div class="invoice-section">
<h1>Invoice</h1>
<p>Invoice #: INV-2026-001</p>
<!-- ... -->
</div>
<!-- Terms of Service (starts on new page) -->
<div class="page terms-section">
<h1>Terms of Service</h1>
<p>Article 1...</p>
</div>
</body>
</html>
curl -X POST https://pdf.funbrew.cloud/api/v1/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"html": "<html>...(HTML above)...</html>",
"options": { "format": "A4" }
}' \
-o combined-document.pdf
Python implementation
import requests
API_KEY = "YOUR_API_KEY"
API_URL = "https://pdf.funbrew.cloud/api/v1/generate"
def combine_sections(sections: list[str]) -> bytes:
"""Combine multiple HTML sections into one PDF."""
pages = []
for i, html in enumerate(sections):
style = 'style="page-break-before: always;"' if i > 0 else ""
pages.append(f'<div {style}>{html}</div>')
full_html = f"""<!DOCTYPE html>
<html>
<head>
<style>
body {{ font-family: sans-serif; margin: 40px; }}
h1 {{ color: #333; }}
table {{ width: 100%; border-collapse: collapse; }}
th, td {{ border: 1px solid #ccc; padding: 8px; }}
</style>
</head>
<body>{''.join(pages)}</body>
</html>"""
response = requests.post(
API_URL,
headers={"Authorization": f"Bearer {API_KEY}"},
json={"html": full_html, "options": {"format": "A4"}},
)
response.raise_for_status()
return response.content
# Usage
invoice_html = "<h1>Invoice</h1><p>Amount: $500</p>"
terms_html = "<h1>Terms of Service</h1><p>Article 1...</p>"
confirmation_html = "<h1>Payment Confirmed</h1><p>Thank you.</p>"
pdf = combine_sections([invoice_html, terms_html, confirmation_html])
with open("invoice-with-terms.pdf", "wb") as f:
f.write(pdf)
Node.js implementation
const fs = require("fs");
async function combinePDFs(sections) {
const combined = sections
.map((html, i) => {
const style = i > 0 ? 'style="page-break-before: always;"' : "";
return `<div ${style}>${html}</div>`;
})
.join("\n");
const fullHtml = `<!DOCTYPE html>
<html>
<head><style>body { font-family: sans-serif; margin: 40px; }</style></head>
<body>${combined}</body>
</html>`;
const response = await fetch("https://pdf.funbrew.cloud/api/v1/generate", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.PDF_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ html: fullHtml, options: { format: "A4" } }),
});
return Buffer.from(await response.arrayBuffer());
}
const sections = [
"<h1>Invoice</h1><p>Amount: $500</p>",
"<h1>Terms of Service</h1><p>Article 1...</p>",
];
combinePDFs(sections).then((pdf) => {
fs.writeFileSync("combined.pdf", pdf);
console.log("PDF created: combined.pdf");
});
2. PDF Splitting
Use Cases
- Extract a specific month's pages from a consolidated annual report
- Split a bulk-generated application set into individual PDFs for per-user delivery
- Deliver large PDFs in page-range segments for download
Generate page-per-document from the start
Instead of post-splitting, generate individual PDFs directly — one per record.
import requests
from concurrent.futures import ThreadPoolExecutor
API_KEY = "YOUR_API_KEY"
API_URL = "https://pdf.funbrew.cloud/api/v1/generate"
def generate_single_pdf(data: dict, template_fn) -> tuple[str, bytes]:
html = template_fn(data)
response = requests.post(
API_URL,
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"html": html,
"options": {
"format": "A4",
"margin": {"top": "20mm", "bottom": "20mm",
"left": "15mm", "right": "15mm"},
},
},
timeout=60,
)
response.raise_for_status()
return data["invoice_no"], response.content
def invoice_template(data: dict) -> str:
return f"""<!DOCTYPE html>
<html>
<body style="font-family: sans-serif; margin: 40px;">
<h1>Invoice</h1>
<p>To: {data['customer_name']}</p>
<p>Invoice #: {data['invoice_no']}</p>
<p>Amount: ${data['amount']:,.2f}</p>
</body>
</html>"""
customers = [
{"customer_name": "Acme Corp", "invoice_no": "INV-001", "amount": 5000.00},
{"customer_name": "Globex Inc", "invoice_no": "INV-002", "amount": 8000.00},
{"customer_name": "Initech LLC", "invoice_no": "INV-003", "amount": 3000.00},
]
# Parallel generation (see pdf-api-batch-processing for advanced patterns)
with ThreadPoolExecutor(max_workers=5) as executor:
futures = [
executor.submit(generate_single_pdf, customer, invoice_template)
for customer in customers
]
for future in futures:
invoice_no, pdf = future.result()
with open(f"invoice_{invoice_no}.pdf", "wb") as f:
f.write(pdf)
print(f"Generated: invoice_{invoice_no}.pdf")
For large-scale bulk PDF generation, see the Batch Processing Guide.
3. PDF Compression
Why PDFs get large
| Cause | Solution |
|---|---|
| High-resolution embedded images | Pre-compress and resize images |
| Full font embedding | Subset fonts (embed only used characters) |
| Uncompressed content streams | Enable stream compression |
HTML-level optimization
Optimizing the HTML you pass to FUNBREW PDF API directly reduces the output PDF size.
<!DOCTYPE html>
<html>
<head>
<style>
/* Use Google Fonts with display=swap for lightweight loading */
@import url('https://fonts.googleapis.com/css2?family=Inter:wght@400;700&display=swap');
body {
font-family: 'Inter', sans-serif;
background: white;
color: black;
}
/* Prefer border over box-shadow (box-shadow increases PDF size) */
.card { border: 1px solid #ccc; }
/* Avoid gradients and complex backgrounds in print context */
</style>
</head>
<body>
<!-- Pre-compress images before embedding -->
<!-- BAD: <img src="original-4k-photo.jpg"> -->
<!-- GOOD: <img src="compressed-800px.jpg" width="400"> -->
<img src="logo-compressed.png" width="200" height="60" alt="Logo">
</body>
</html>
Image pre-compression (Python)
from PIL import Image
import io
import base64
def compress_image_to_base64(
image_path: str,
max_width: int = 800,
quality: int = 85
) -> str:
"""Compress an image and return as a base64 data URI."""
with Image.open(image_path) as img:
if img.width > max_width:
ratio = max_width / img.width
new_size = (max_width, int(img.height * ratio))
img = img.resize(new_size, Image.LANCZOS)
buffer = io.BytesIO()
img.convert("RGB").save(buffer, format="JPEG", quality=quality, optimize=True)
buffer.seek(0)
b64 = base64.b64encode(buffer.read()).decode()
return f"data:image/jpeg;base64,{b64}"
logo_b64 = compress_image_to_base64("logo.png", max_width=400, quality=90)
html = f"""<html><body>
<img src="{logo_b64}" width="200" alt="Logo">
<h1>Report</h1>
</body></html>"""
Node.js image optimization
const sharp = require("sharp");
async function compressImageToBase64(imagePath, maxWidth = 800, quality = 85) {
const buffer = await sharp(imagePath)
.resize({ width: maxWidth, withoutEnlargement: true })
.jpeg({ quality, progressive: true })
.toBuffer();
return `data:image/jpeg;base64,${buffer.toString("base64")}`;
}
async function generateCompactPDF(title, content) {
const logoBase64 = await compressImageToBase64("logo.png", 400, 90);
const html = `<!DOCTYPE html>
<html>
<body style="font-family: sans-serif; margin: 40px;">
<img src="${logoBase64}" width="200" alt="Logo">
<h1>${title}</h1>
<p>${content}</p>
</body>
</html>`;
const res = await fetch("https://pdf.funbrew.cloud/api/v1/generate", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.PDF_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ html, options: { format: "A4" } }),
});
return Buffer.from(await res.arrayBuffer());
}
4. Watermarks
Watermarks are implemented entirely with CSS. The key is position: fixed combined with a high z-index — this ensures the watermark appears on every page.
Diagonal text watermark
<!DOCTYPE html>
<html>
<head>
<style>
.watermark {
position: fixed;
top: 50%;
left: 50%;
transform: translate(-50%, -50%) rotate(-45deg);
font-size: 80px;
font-weight: bold;
color: rgba(200, 200, 200, 0.3);
z-index: 1000;
pointer-events: none;
white-space: nowrap;
user-select: none;
}
.content {
position: relative;
z-index: 1;
margin: 40px;
}
</style>
</head>
<body>
<div class="watermark">DRAFT</div>
<div class="content">
<h1>Contract</h1>
<p>This document is a draft. Not for distribution.</p>
</div>
</body>
</html>
Per-user confidential watermark (Python)
import requests
from datetime import datetime
def generate_confidential_pdf(content_html: str, user_name: str, doc_id: str) -> bytes:
"""Generate a PDF with a repeating user-info watermark for audit trails."""
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M")
watermark_text = f"{user_name} • {timestamp} • {doc_id}"
diagonal_strips = "".join([
f"""<div style="
position: absolute;
top: {i * 180}px;
left: -100px;
width: 160%;
transform: rotate(-25deg);
font-size: 12px;
color: rgba(150, 150, 150, 0.18);
white-space: nowrap;
letter-spacing: 8px;
">{watermark_text}</div>"""
for i in range(-2, 12)
])
full_html = f"""<!DOCTYPE html>
<html>
<head>
<style>
body {{ font-family: sans-serif; margin: 40px; }}
</style>
</head>
<body>
<div style="position: fixed; top: 0; left: 0; width: 100%;
height: 100%; z-index: 9999; pointer-events: none;
overflow: hidden;">
{diagonal_strips}
</div>
<div style="position: relative; z-index: 1;">
{content_html}
</div>
</body>
</html>"""
response = requests.post(
"https://pdf.funbrew.cloud/api/v1/generate",
headers={"Authorization": f"Bearer YOUR_API_KEY"},
json={"html": full_html, "options": {"format": "A4"}},
)
response.raise_for_status()
return response.content
pdf = generate_confidential_pdf(
content_html="<h1>Confidential Report</h1><p>Internal use only.</p>",
user_name="John Smith",
doc_id="DOC-2026-001",
)
with open("confidential.pdf", "wb") as f:
f.write(pdf)
Logo watermark (bottom-right corner)
<style>
.logo-watermark {
position: fixed;
bottom: 30px;
right: 30px;
opacity: 0.12;
z-index: 1000;
}
</style>
<img class="logo-watermark" src="data:image/png;base64,..." width="100" alt="">
5. Page Numbers & Headers/Footers
CSS @page rule
The CSS @page rule with margin boxes (@top-center, @bottom-center, etc.) is the standard way to add consistent headers and footers to every page.
<!DOCTYPE html>
<html>
<head>
<style>
@page {
size: A4;
margin: 25mm 20mm 30mm 20mm;
/* Header */
@top-center {
content: "Monthly Report — April 2026";
font-size: 10px;
color: #666;
}
/* Footer: page number */
@bottom-center {
content: counter(page) " / " counter(pages);
font-size: 10px;
color: #666;
}
/* Footer left: company name */
@bottom-left {
content: "FUNBREW Inc.";
font-size: 10px;
color: #999;
}
/* Footer right: date */
@bottom-right {
content: "2026-04-06";
font-size: 10px;
color: #999;
}
}
/* Override for the first page (cover page — no header) */
@page :first {
@top-center { content: ""; }
}
body {
font-family: sans-serif;
font-size: 11pt;
line-height: 1.6;
}
</style>
</head>
<body>
<h1>Monthly Report</h1>
<p>Executive summary content...</p>
<div style="page-break-before: always;">
<h2>Chapter 2: Data Analysis</h2>
<p>...</p>
</div>
</body>
</html>
Dynamic header/footer with JavaScript
async function generateReport({ title, author, date, sections }) {
const sectionsHtml = sections
.map(
({ title: sTitle, content }, i) => `
<div ${i > 0 ? 'style="page-break-before: always;"' : ""}>
<h2>${sTitle}</h2>
${content}
</div>`
)
.join("");
const html = `<!DOCTYPE html>
<html>
<head>
<style>
@page {
size: A4;
margin: 25mm 20mm 30mm 20mm;
@top-left { content: "${title}"; font-size: 9px; color: #666; }
@top-right { content: "${author}"; font-size: 9px; color: #666; }
@bottom-center {
content: "— " counter(page) " —";
font-size: 9px; color: #999;
}
@bottom-right { content: "${date}"; font-size: 9px; color: #999; }
}
@page :first {
@top-left { content: ""; }
@top-right { content: ""; }
}
body { font-family: sans-serif; font-size: 11pt; line-height: 1.7; }
h1 { text-align: center; margin-bottom: 50px; }
h2 { color: #333; border-bottom: 2px solid #333; padding-bottom: 5px; }
</style>
</head>
<body>
<!-- Cover page -->
<div style="text-align: center; padding-top: 100px;">
<h1>${title}</h1>
<p style="color: #666;">${author} | ${date}</p>
</div>
${sectionsHtml}
</body>
</html>`;
const response = await fetch("https://pdf.funbrew.cloud/api/v1/generate", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.PDF_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ html, options: { format: "A4" } }),
});
return Buffer.from(await response.arrayBuffer());
}
const report = await generateReport({
title: "Q1 2026 Business Report",
author: "Business Planning Dept.",
date: "April 6, 2026",
sections: [
{ title: "Executive Summary", content: "<p>...</p>" },
{ title: "Revenue Performance", content: "<table>...</table>" },
{ title: "Next Quarter Plan", content: "<p>...</p>" },
],
});
6. Library Comparison: PyPDF2 vs pdf-lib vs Ghostscript vs API
Feature overview
| Tool | Language | Strengths | Limitations |
|---|---|---|---|
| PyPDF2 / pypdf | Python | Merge, split, metadata | No HTML→PDF |
| pdf-lib | JavaScript | Merge, text/image overlay, forms | Complex font embedding |
| Ghostscript | CLI | Compression, format conversion | Requires installation, license considerations |
| pdfkit | Python/Node | HTML→PDF | Requires wkhtmltopdf |
| FUNBREW PDF API | Any | HTML→PDF + all post-processing | External API dependency |
PyPDF2 merge & split
from pypdf import PdfWriter, PdfReader
def merge_pdfs(pdf_paths: list[str], output_path: str) -> None:
writer = PdfWriter()
for path in pdf_paths:
reader = PdfReader(path)
for page in reader.pages:
writer.add_page(page)
with open(output_path, "wb") as f:
writer.write(f)
def split_pdf(input_path: str, output_dir: str) -> None:
reader = PdfReader(input_path)
for i, page in enumerate(reader.pages):
writer = PdfWriter()
writer.add_page(page)
with open(f"{output_dir}/page_{i + 1}.pdf", "wb") as f:
writer.write(f)
pdf-lib watermark (JavaScript)
import { PDFDocument, rgb, degrees } from "pdf-lib";
async function addWatermark(pdfBytes, text) {
const doc = await PDFDocument.load(pdfBytes);
for (const page of doc.getPages()) {
const { width, height } = page.getSize();
page.drawText(text, {
x: width / 2 - 100,
y: height / 2,
size: 50,
color: rgb(0.8, 0.8, 0.8),
opacity: 0.3,
rotate: degrees(-45),
});
}
return doc.save();
}
Ghostscript compression (CLI)
# High compression for web sharing
gs -sDEVICE=pdfwrite \
-dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/ebook \
-dNOPAUSE -dQUIET -dBATCH \
-sOutputFile=compressed.pdf \
input.pdf
# Quality presets:
# /screen — smallest (72 dpi, for screen only)
# /ebook — balanced (150 dpi, for email)
# /printer — print quality (300 dpi)
# /prepress — high quality + color profiles (300 dpi)
Choosing your approach
Prefer a library when:
- Post-processing existing PDF files (merge, split, encrypt)
- Offline or air-gapped environment
- High-volume simple operations (e.g., bulk single-page splits)
Prefer an API when:
- Generating from HTML and post-processing in one workflow
- Minimizing server dependencies (slimmer Docker images)
- Calling from multiple languages or microservices with a unified interface
- Needing complex font and CSS control
For production error handling patterns, see the API Error Handling Guide. For security best practices, see the API Security Guide.
Summary
| Operation | Recommended approach | Key technique |
|---|---|---|
| Merge | Design as single HTML from the start | page-break-before: always |
| Split | Generate one file per record | Batch API + parallel processing |
| Compress | Pre-optimize images | sharp / Pillow before API call |
| Watermark | CSS position: fixed |
Semi-transparent + rotate(-45deg) |
| Page numbers | CSS @page rule |
counter(page) |
| Header/footer | CSS @page rule |
@top-center, @bottom-center |
The core insight: designing your PDF generation to produce the desired output directly is almost always simpler and more reliable than post-processing. FUNBREW PDF's playground lets you validate CSS effects in your browser before integrating into your codebase.
Related articles
- HTML to PDF Complete Guide — Generation fundamentals and approach comparison
- PDF Batch Processing Guide — Efficiently generating large volumes
- HTML to PDF CSS Tips — Print CSS techniques for production PDFs
- Automate Invoice PDFs — Full invoice automation workflow
- PDF Report Generation Guide — Designing and generating report PDFs
- Webhook Integration Guide — Async processing and webhooks
- API Error Handling Guide — Production-quality error handling
- API Security Guide — Secure API key management and access control
Start experimenting with the free plan. Full option reference is in the documentation.