Oday Bakkour Logo
Back to Knowledge Hub
developmentseo

Crawl Entire Websites with a Single API Call Using Browser Rendering

O
Oday Bakkour
Crawl Entire Websites with a Single API Call Using Browser Rendering

Crawl Entire Websites with a Single API Call Using Browser Rendering

You can now crawl an entire website with a single API call using Browser Rendering's new /crawl endpoint, available in open beta. Submit a starting URL, and pages are automatically discovered, rendered in a headless browser, and returned in multiple formats, including HTML, Markdown, and structured JSON.

Overview

The endpoint is a signed-agent that respects robots.txt and AI Crawl Control by default, making it easy for developers to comply with website rules, and making it less likely for crawlers to ignore web-owner guidance. This is great for training models, building RAG pipelines, and researching or monitoring content across a site.

How It Works

Crawl jobs run asynchronously. You submit a URL, receive a job ID, and check back for results as pages are processed.

Initiating a Crawl

curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \\
-H 'Authorization: Bearer <apiToken>' \\
-H 'Content-Type: application/json' \\
-d '{ "url": "https://blog.cloudflare.com/" }'

Checking Results

curl -X GET 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/{job_id}' \\
-H 'Authorization: Bearer <apiToken>' \\
-H 'Content-Type: application/json'

Key Features

  • Multiple output formats - Return crawled content as HTML, Markdown, and structured JSON (powered by Workers AI)
  • Crawl scope controls - Configure crawl depth, page limits, and wildcard patterns to include or exclude specific URL paths
  • Automatic page discovery - Discovers URLs from sitemaps, page links, or both
  • Incremental crawling - Use modifiedSince and maxAge to skip pages that haven't changed or were recently fetched, saving time and cost on repeated crawls
  • Static mode - Set render: false to fetch static HTML without spinning up a browser, for faster crawling of static sites
  • Well-behaved bot - Honors robots.txt directives, including crawl-delay

Availability

Available on both the Workers Free and Paid plans.

Important Notes

  • The /crawl endpoint cannot bypass Cloudflare bot detection or captchas, and self-identifies as a bot
  • For setting up your own site to be crawled, review the robots.txt and sitemaps best practices

Getting Started

To get started, refer to the crawl endpoint documentation.

Comments

Share your thoughts and join the conversation

Leave a Comment

Loading comments...
Cloudflare's New API for Website Crawling | Oday Bakkour