April 1, 2026

PDF API Production Guide 2026: Rate Limits & Cost Control

productionAPIPDF generationmonitoringbest practices

Getting PDF API integration to work in development is one thing. Keeping it stable in production is another. APIs that behave perfectly in testing can start timing out under traffic spikes, rate limit violations can halt batch jobs mid-run, and unexpected costs can show up at the end of the month.

This article is a 2026 production guide for teams deploying FUNBREW PDF or any PDF generation API to production. It integrates and extends the detailed guidance from three companion articles — error handling, security, and batch processing — into a single pre-launch reference. Updated for 2026 with new sections on observability metrics, cost estimation, and modern monitoring tooling.

If you are new to PDF APIs, start with the HTML to PDF complete guide or the quickstart by language.

1. API Key Management and Rotation

Your API key is the credential to your PDF generation service. A leaked key means unauthorized usage billed to your account and potential exposure of generated documents.

Separate Keys per Environment

# Production
FUNBREW_PDF_API_KEY=sk-prod-xxxxxxxxxxxxxxxxxxxx

# Staging
FUNBREW_PDF_API_KEY=sk-stg-xxxxxxxxxxxxxxxxxxxx

# Local development
FUNBREW_PDF_API_KEY=sk-dev-xxxxxxxxxxxxxxxxxxxx

Sharing one key across environments means developer laptops consume production quota and mistakes in local testing can affect production state.

Secret Management Tools

.env files work fine for small teams. As the team grows, adopt a dedicated secrets manager.

Tool	Best for
AWS Secrets Manager	Applications hosted on AWS
HashiCorp Vault	Multi-cloud or on-premise environments
Doppler	Small-to-medium teams wanting centralized secrets
GitHub Actions Secrets	CI/CD pipelines only

Automating Key Rotation

import boto3
import os

def rotate_pdf_api_key():
    """Rotate the API key using AWS Secrets Manager."""
    client = boto3.client('secretsmanager', region_name='us-east-1')
    
    # Provision a new key from your dashboard API
    new_key = provision_new_api_key()
    
    # Update Secrets Manager
    client.put_secret_value(
        SecretId='funbrew-pdf-api-key-prod',
        SecretString=new_key,
    )
    
    # Invalidate the old key after an overlap window
    print("API key rotation complete")

# Run on a 90-day schedule via CloudWatch Events
rotate_pdf_api_key()

Rotate every 90 days. When rotating, keep the old key valid for a few hours to avoid dropping in-flight requests during deployment.

For the full security guide including IP restrictions and input validation, see PDF API Security Guide.

Checklist

Separate API keys for production, staging, and development
.env excluded from version control via .gitignore
No API keys in frontend JavaScript
90-day rotation schedule established
Key revocation procedure documented

2. Rate Limits and Application-Side Throttling

PDF generation APIs enforce per-minute and per-day request limits depending on your plan. Exceeding these returns 429 Too Many Requests, which halts your processing.

Inspect Your Rate Limit Headers

# Check rate limit status from response headers
curl -s -I -X POST "https://pdf.funbrew.cloud/api/v1/pdf/generate" \
  -H "X-API-Key: $FUNBREW_PDF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"html": "<h1>test</h1>"}' | grep -i "x-rate"

# Example response headers:
# X-RateLimit-Limit: 100
# X-RateLimit-Remaining: 87
# X-RateLimit-Reset: 1743465600

Implement a Token Bucket on Your Side

Rather than hitting the API limit and handling 429 responses reactively, throttle proactively from your application.

// Token bucket rate limiter (Node.js)
class RateLimiter {
  constructor(requestsPerMinute) {
    this.tokens = requestsPerMinute;
    this.maxTokens = requestsPerMinute;
    this.refillRate = requestsPerMinute / 60; // tokens per second
    this.lastRefill = Date.now();
  }

  async acquire() {
    this._refill();

    if (this.tokens < 1) {
      const waitMs = (1 - this.tokens) / this.refillRate * 1000;
      await new Promise(resolve => setTimeout(resolve, waitMs));
      this._refill();
    }

    this.tokens -= 1;
  }

  _refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(
      this.maxTokens,
      this.tokens + elapsed * this.refillRate
    );
    this.lastRefill = now;
  }
}

// Target 80% of the plan limit to preserve headroom for spikes
const limiter = new RateLimiter(80);

async function generatePdf(html) {
  await limiter.acquire();
  // call PDF API
}

Operating at 80% of your plan limit keeps a buffer for traffic spikes without triggering rate limit errors.

For plan-by-plan pricing and rate limit comparisons, see PDF API Pricing Comparison.

Checklist

Rate limits (per-minute and per-day) for your current plan are documented
Application-side throttling is implemented
Peak request volume is within plan limits
X-RateLimit-Remaining header is monitored

3. Error Handling and Retry Design

Production environments have transient failures. Network blips, API maintenance windows, and rendering timeouts happen. Your integration must handle them without losing data or crashing.

Classify Errors Before Retrying

RETRYABLE_STATUS_CODES = {408, 429, 500, 502, 503, 504}
NON_RETRYABLE_STATUS_CODES = {400, 401, 403, 404}

def should_retry(status_code: int) -> bool:
    return status_code in RETRYABLE_STATUS_CODES

Status Code	Meaning	Retry?	Action
400	Bad request	No	Fix HTML or options
401 / 403	Auth error	No	Check / regenerate API key
408	Timeout	Yes	Exponential backoff
429	Rate limited	Yes	Wait for `Retry-After` header value
500 / 502 / 503	Server error	Yes	Exponential backoff

Exponential Backoff with Jitter

interface RetryConfig {
  maxRetries: number;
  initialDelayMs: number;
  maxDelayMs: number;
  backoffMultiplier: number;
}

const DEFAULT_RETRY_CONFIG: RetryConfig = {
  maxRetries: 5,
  initialDelayMs: 1000,
  maxDelayMs: 60000,
  backoffMultiplier: 2,
};

async function generatePdfWithRetry(
  html: string,
  apiKey: string,
  config: RetryConfig = DEFAULT_RETRY_CONFIG
): Promise<Buffer> {
  let delay = config.initialDelayMs;

  for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
    const response = await fetch('https://pdf.funbrew.cloud/api/v1/pdf/generate', {
      method: 'POST',
      headers: {
        'X-API-Key': apiKey,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ html }),
      signal: AbortSignal.timeout(120_000),
    });

    if (response.ok) {
      return Buffer.from(await response.arrayBuffer());
    }

    const isRetryable = [408, 429, 500, 502, 503, 504].includes(response.status);
    if (!isRetryable || attempt === config.maxRetries) {
      throw new Error(`PDF generation failed: HTTP ${response.status}`);
    }

    // Respect Retry-After header for 429
    const retryAfter = response.headers.get('retry-after');
    const waitMs = retryAfter
      ? parseFloat(retryAfter) * 1000
      : Math.min(delay + Math.random() * 1000, config.maxDelayMs);

    console.warn(`Retry ${attempt + 1}/${config.maxRetries}: waiting ${(waitMs / 1000).toFixed(1)}s`);
    await new Promise(resolve => setTimeout(resolve, waitMs));
    delay = Math.min(delay * config.backoffMultiplier, config.maxDelayMs);
  }

  throw new Error('Max retries exceeded');
}

The full retry implementation in curl, Python, Node.js, and PHP is covered in the PDF API Error Handling Guide.

Checklist

Retryable vs. non-retryable errors are classified
Exponential backoff with jitter is implemented
Max retry count and max wait time are capped
Alerts fire when max retries are exhausted
Request IDs are logged for traceability

4. Monitoring and Alerting

The goal is to detect problems before users report them. This requires tracking the right metrics and setting actionable alert thresholds.

Key Metrics to Track

Metric	Warning Threshold	Critical Threshold
PDF generation failure rate	> 1%	> 5%
p50 response time	> 5s	> 15s
p99 response time	> 30s	> 60s
Retry rate (per minute)	> 10%	> 30%
`RateLimit-Remaining`	< 30%	< 10%

Sending Metrics to Datadog

import time
import functools
from datadog import statsd

def track_pdf_generation(func):
    """Decorator that auto-collects PDF generation metrics."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        tags = ['service:pdf-generator', 'env:production']

        try:
            result = func(*args, **kwargs)
            duration_ms = (time.perf_counter() - start) * 1000

            statsd.histogram('pdf.generation.duration_ms', duration_ms, tags=tags)
            statsd.increment('pdf.generation.success', tags=tags)
            return result

        except Exception as e:
            duration_ms = (time.perf_counter() - start) * 1000
            error_tags = tags + [f'error_type:{type(e).__name__}']

            statsd.histogram('pdf.generation.duration_ms', duration_ms, tags=error_tags)
            statsd.increment('pdf.generation.failure', tags=error_tags)
            raise

    return wrapper

@track_pdf_generation
def generate_invoice_pdf(customer_data):
    import requests, os
    response = requests.post(
        'https://pdf.funbrew.cloud/api/v1/pdf/generate',
        headers={'X-API-Key': os.environ['FUNBREW_PDF_API_KEY']},
        json={'html': build_invoice_html(customer_data)},
        timeout=120,
    )
    response.raise_for_status()
    return response.content

Combining Monitoring with Webhooks

Webhook integration lets the API push completion and failure events to your server rather than polling. This simplifies async job tracking.

{
  "event": "pdf.generation.failed",
  "job_id": "job_abc123",
  "timestamp": "2026-04-01T12:00:00Z",
  "error": {
    "code": "RENDER_TIMEOUT",
    "message": "Rendering exceeded 60 seconds",
    "html_size_bytes": 245120
  }
}

Prometheus Alert Rules

groups:
  - name: pdf-api
    rules:
      - alert: PdfGenerationHighFailureRate
        expr: |
          rate(pdf_generation_failure_total[5m]) /
          rate(pdf_generation_total[5m]) > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "PDF generation failure rate exceeds 5%"
          description: "Failure rate over last 5m: {{ $value | humanizePercentage }}"

      - alert: PdfGenerationHighLatency
        expr: histogram_quantile(0.99, pdf_generation_duration_ms_bucket) > 30000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "PDF generation p99 latency exceeds 30 seconds"

Checklist

Generation failure rate tracked in real time
Response time (p50 and p99) tracked
Rate limit headroom alert configured
On-call notification for critical failures
Async job completion detected via webhook or polling

5. Cost Optimization

API costs scale linearly with request count. Eliminating redundant generation and batching requests together can meaningfully reduce spend.

Strategy 1: Cache Identical PDFs

For PDFs generated from static content (terms of service, standard agreements), caching is highly effective.

import hashlib
import redis
import requests
import os

class PdfCache:
    def __init__(self, redis_client, ttl_seconds=86400):
        self.redis = redis_client
        self.ttl = ttl_seconds  # Default: 24 hours

    def get_cache_key(self, html: str, options: dict) -> str:
        """Generate a deterministic cache key from HTML and options."""
        content = f"{html}{str(sorted(options.items()))}"
        return f"pdf_cache:{hashlib.sha256(content.encode()).hexdigest()}"

    def generate_with_cache(self, html: str, options: dict = None) -> bytes:
        options = options or {}
        key = self.get_cache_key(html, options)

        cached = self.redis.get(key)
        if cached:
            return cached

        response = requests.post(
            'https://pdf.funbrew.cloud/api/v1/pdf/generate',
            headers={'X-API-Key': os.environ['FUNBREW_PDF_API_KEY']},
            json={'html': html, 'options': options},
            timeout=120,
        )
        response.raise_for_status()

        pdf_bytes = response.content
        self.redis.setex(key, self.ttl, pdf_bytes)
        return pdf_bytes

cache = PdfCache(redis.Redis(host='localhost', port=6379))
pdf = cache.generate_with_cache(
    html='<h1>Terms of Service</h1><p>...</p>',
    options={'format': 'A4'}
)

Strategy 2: Batch Multiple PDFs per Request

A single batch request generates multiple PDFs at once, reducing API call count. See the PDF Batch Processing Guide for the full implementation.

# One API call generates three PDFs
curl -X POST "https://pdf.funbrew.cloud/api/v1/pdf/generate" \
  -H "X-API-Key: $FUNBREW_PDF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "batch": [
      {
        "html": "<h1>Invoice #001</h1>",
        "filename": "invoice-001.pdf",
        "options": { "format": "A4" }
      },
      {
        "html": "<h1>Invoice #002</h1>",
        "filename": "invoice-002.pdf",
        "options": { "format": "A4" }
      },
      {
        "html": "<h1>Invoice #003</h1>",
        "filename": "invoice-003.pdf",
        "options": { "format": "A4" }
      }
    ]
  }'

Strategy 3: Regenerate Only When Data Changes

from datetime import datetime

class PdfGenerationRecord:
    """Track generation history and skip regeneration when data hasn't changed."""

    def generate_if_outdated(
        self,
        record_id: str,
        html: str,
        data_updated_at: datetime,
    ) -> bytes:
        last_generated = self._get_last_generated(record_id)

        if last_generated and last_generated >= data_updated_at:
            return self._get_cached_pdf(record_id)

        pdf_bytes = self._call_pdf_api(html)
        self._store(record_id, pdf_bytes, generated_at=datetime.utcnow())
        return pdf_bytes

Estimated Savings

Monthly PDF Volume	No Optimization	30% Cache Hit	50% Batch Reduction
10,000	Baseline	−3,000 requests	−5,000 requests
100,000	Baseline	−30,000 requests	−50,000 requests

For plan pricing details, see PDF API Pricing Comparison.

Checklist

Caching implemented for identical or rarely-changing PDFs
Multiple PDFs batched into single requests where possible
Unnecessary re-generation prevented by checking data change timestamps
Monthly request count reviewed to confirm plan is appropriate

6. Scaling with Queues and Async Processing

Synchronous PDF generation (request → wait → response) works for small volumes. Under heavy load or batch jobs, queue-based async processing is more resilient.

When to Use Each Pattern

Scenario	Recommended Pattern	Reason
User clicks "Download"	Synchronous (max 15s)	Immediate feedback required
Monthly invoice batch (1,000+)	Async + queue	Too slow to block a request
Scheduled report generation	Async + scheduler	Runs fully in background
Bulk certificate issuance	Async + batch	Minimizes API call count

Redis Queue Pattern with BullMQ (Node.js)

import { Queue, Worker } from 'bullmq';
import { Redis } from 'ioredis';

const connection = new Redis({ host: 'localhost', port: 6379 });

const pdfQueue = new Queue('pdf-generation', { connection });

// Enqueue a job (called from your API endpoint)
export async function enqueuePdfGeneration(jobData) {
  const job = await pdfQueue.add('generate', jobData, {
    attempts: 5,
    backoff: {
      type: 'exponential',
      delay: 1000,
    },
    removeOnComplete: { count: 1000 },
    removeOnFail: { count: 500 },
  });

  return { jobId: job.id };
}

// Worker (horizontally scalable)
const worker = new Worker(
  'pdf-generation',
  async (job) => {
    const { html, options, webhookUrl } = job.data;

    const response = await fetch('https://pdf.funbrew.cloud/api/v1/pdf/generate', {
      method: 'POST',
      headers: {
        'X-API-Key': process.env.FUNBREW_PDF_API_KEY,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ html, options }),
      signal: AbortSignal.timeout(120_000),
    });

    if (!response.ok) {
      throw new Error(`API error: HTTP ${response.status}`);
    }

    const pdfBuffer = Buffer.from(await response.arrayBuffer());
    const downloadUrl = await uploadToStorage(pdfBuffer);

    if (webhookUrl) {
      await notifyWebhook(webhookUrl, { downloadUrl, jobId: job.id });
    }

    return { downloadUrl };
  },
  {
    connection,
    concurrency: 10, // Tune so total workers × concurrency stays within rate limit
  }
);

worker.on('failed', (job, err) => {
  console.error(`Job ${job?.id} failed:`, err.message);
  // Trigger PagerDuty / Slack alert
});

Kubernetes Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: pdf-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: pdf-worker
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: External
      external:
        metric:
          name: bullmq_queue_size
          selector:
            matchLabels:
              queue: pdf-generation
        target:
          type: AverageValue
          averageValue: "50" # Max 50 queued jobs per worker pod

When scaling out workers, remember the API rate limit stays fixed. Keep workers × concurrency within your plan's per-minute limit.

Checklist

Batch and bulk jobs processed via queue, not synchronous requests
Worker concurrency tuned to stay within API rate limits
Queue depth (backlog size) monitored
Dead Letter Queue configured for failed jobs
Worker auto-scaling (HPA or equivalent) verified

7. Security Checklist

Before going live, verify the following security controls. The full guide with code examples is in PDF API Security Guide.

Input Validation

// Escape all user input before embedding in HTML
function escapeHtml(str) {
  return String(str)
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&#039;');
}

// Enforce an HTML size limit (e.g., 1MB)
const MAX_HTML_SIZE_BYTES = 1_048_576;

function validateHtmlInput(html) {
  if (Buffer.byteLength(html, 'utf8') > MAX_HTML_SIZE_BYTES) {
    throw new Error('HTML is too large. Maximum size is 1MB.');
  }
}

Security Checklist

Category	Check Item	Priority
Auth	API keys in environment variables, not source code	Required
Auth	Separate keys for production, staging, and development	Required
Transport	HTTPS (TLS 1.2+) only	Required
Input	User input escaped before embedding in HTML	Required
Input	HTML size validated before sending to API	Recommended
Access	IP allowlisting to production servers only	Recommended
Access	API never called directly from frontend JavaScript	Required
Data	Auto-deletion policy for generated files confirmed	Recommended
Audit	All API calls logged with timestamps and user IDs	Recommended

8. Pre-Launch Deployment Checklist

Use this as a PR template or release checklist before every production deployment.

Setup

Production API key provisioned from the dashboard
API key stored in secrets manager or environment variable, not in code
.env confirmed absent from Git history
E2E tests passing on staging environment

Error Handling

Request timeout set to 120 seconds or more
Exponential backoff retry logic implemented
Non-retryable errors trigger immediate alerts (no silent failures)
Error logs include request ID, status code, and attempt count

Performance and Scaling

Batch and bulk jobs use queue-based async processing
Application-side throttling implemented
Worker concurrency stays within API rate limits
PDF caching implemented where appropriate

Monitoring and Alerting

Generation failure rate alert configured (threshold: 5%)
Response time alert configured (p99 > 30s)
Rate limit headroom alert configured
Queue depth monitored
Monthly usage visible in dashboard

Security

No API keys in frontend code
User input HTML-escaped before PDF generation
HTTPS (TLS 1.2+) enforced for all API calls
IP allowlisting configured for production servers

Cost Management

Monthly request volume estimated and within plan limits
Caching or change-detection prevents redundant generation
Monthly cost review process established

9. 2026 Observability and Cost Estimation Additions

The following practices reflect 2026 tooling trends and cost management strategies that have become standard for PDF API workloads.

OpenTelemetry traces for PDF generation latency

Modern observability stacks (Datadog, Grafana Cloud, Honeycomb) accept OpenTelemetry (OTel) traces natively. Instrument PDF generation spans so you can identify which HTML templates, page sizes, or concurrency levels contribute most to latency.

from opentelemetry import trace
from opentelemetry.trace import StatusCode
import time

tracer = trace.get_tracer("pdf-service")

def generate_pdf_traced(html: str, template_name: str = "unknown") -> bytes:
    with tracer.start_as_current_span("pdf.generate") as span:
        span.set_attribute("pdf.template",   template_name)
        span.set_attribute("pdf.html_bytes", len(html.encode()))

        start = time.perf_counter()
        try:
            result = call_pdf_api(html)
            span.set_attribute("pdf.duration_ms",  round((time.perf_counter() - start) * 1000))
            span.set_attribute("pdf.output_bytes", len(result))
            span.set_status(StatusCode.OK)
            return result
        except Exception as exc:
            span.set_status(StatusCode.ERROR, str(exc))
            span.record_exception(exc)
            raise

Per-template cost attribution

Track API cost by template to find optimization candidates. If a specific template triggers 40% of your monthly requests, that is the highest-impact caching target.

interface PdfGenerationEvent {
  templateName:  string;
  durationMs:    number;
  outputBytes:   number;
  cached:        boolean;
  timestamp:     Date;
}

class PdfCostTracker {
  private events: PdfGenerationEvent[] = [];

  record(event: PdfGenerationEvent) {
    this.events.push(event);
  }

  /**
   * Estimate monthly API spend per template.
   * Adjust unit_cost_per_request to match your plan pricing.
   */
  monthlyReport(unitCostPerRequest: number = 0.001) {
    const byTemplate = new Map<string, { requests: number; cachedRequests: number }>();

    for (const e of this.events) {
      const existing = byTemplate.get(e.templateName) ?? { requests: 0, cachedRequests: 0 };
      existing.requests++;
      if (e.cached) existing.cachedRequests++;
      byTemplate.set(e.templateName, existing);
    }

    console.table(
      [...byTemplate.entries()].map(([template, stats]) => ({
        template,
        totalRequests:   stats.requests,
        cachedRequests:  stats.cachedRequests,
        apiCalls:        stats.requests - stats.cachedRequests,
        estimatedCost:   `$${((stats.requests - stats.cachedRequests) * unitCostPerRequest).toFixed(2)}`,
      }))
    );
  }
}

Rate limit headroom alert (2026 recommended thresholds)

Based on 2026 production patterns, the following alert thresholds have proven effective across SaaS teams using PDF generation APIs:

Alert	Threshold	Channel	Action
Rate limit headroom < 20%	Sustained 5 min	PagerDuty (P2)	Scale down concurrency or batch size
Generation failure rate > 2%	5-min window	Slack #alerts	Check recent deployments; HTML changes?
p95 latency > 20s	10-min window	Slack #perf	Check template complexity; reduce concurrency
Monthly budget > 80% used	By 20th of month	Slack #finance	Review caching hit rate; enable change-detection
Dead letter queue depth > 10	Instant	PagerDuty (P1)	Possible API outage; verify status page

Cost estimation formula (2026)

Use this formula to estimate monthly API spend before choosing a plan:

Monthly API cost = (daily_pdf_volume × working_days × cache_miss_rate) × unit_cost_per_request

Example:
  500 PDFs/day × 22 working days × 0.40 cache miss rate × $0.001/request
  = 500 × 22 × 0.4 × 0.001
  = $4.40/month at 60% cache hit rate

Sensitivity table:
  Cache hit 0%:  500 × 22 × 1.00 × $0.001 = $11.00/month
  Cache hit 40%: 500 × 22 × 0.60 × $0.001 = $6.60/month
  Cache hit 70%: 500 × 22 × 0.30 × $0.001 = $3.30/month

Build a caching layer for any template that generates the same PDF more than twice in a 24-hour window. At 70% cache hit rate, cost drops by 70%.

Checklist (2026 additions)

OTel spans or equivalent distributed traces configured for PDF generation calls
Per-template request volume and cost tracked
Rate limit headroom alert set at < 20% remaining (not the older 10% threshold)
Monthly cost estimate documented and reviewed against plan pricing
Dead Letter Queue depth alert configured

Conclusion

Moving from "it works" to "it works reliably in production" is the real work in PDF API integration. You do not need to implement everything at once. Start with the essentials — API key management, error handling, and basic monitoring — then layer in throttling, batching, caching, and queue-based scaling as traffic grows.

Each topic in this checklist has a dedicated deep-dive:

Error handling: PDF API Error Handling Guide
Security: PDF API Security Guide
Batch processing: PDF Batch Processing Guide
Webhook integration: PDF API Webhook Integration
Pricing: PDF API Pricing Comparison

Try the API in the Playground, review the full API documentation, and explore real-world implementations in the use cases section.