Installation
npx skills add brightdata/skills --skill scrape 7.3K
Installs
Bright Data — Scrape
Get clean content (markdown, HTML, JSON, screenshot) from one or more URLs via the Bright Data CLI. This skill owns the "fetch raw or lightly-structured content" job. For platform-specific structured data (Amazon, LinkedIn, TikTok, etc.), stop and use data-feeds instead — you'll get clean JSON without selector logic.
Setup gate (run first)
Before any scrape, verify the CLI is installed and authenticated:
if ! command -v bdata >/dev/null 2>&1; then
echo "bdata CLI not installed — see bright-data-best-practices/references/cli-setup.md"
elif ! bdata zones >/dev/null 2>&1; then
echo "bdata not authenticated — run: bdata login (or: bdata login --device for SSH)"
fiIf either check fails, halt and route the user to skills/bright-data-best-practices/references/cli-setup.md. Do not attempt the legacy curl fallback silently — ask the user first.
Pick your path
| Situation | Action |
|---|---|
| Single URL | bdata scrape <url> -f markdown |
| Small list (≤ ~20 URLs) | shell loop, 1 at a time (see references/patterns.md) |
| Larger list (dozens+) | xargs -P 4 with parallelism cap (see references/patterns.md) |
| Paginated listing | scrape page 1 → extract next-page URL → append → repeat (see references/examples.md) |
| JS-heavy / login-gated / interaction-required | escalate to bdata browser (see brightdata-cli skill) |
| Amazon, LinkedIn, TikTok, Instagram, YouTube, Reddit, … | stop — hand off to data-feeds |
| No URL yet, just a topic | hand off to search |
Action
Core commands:
# Clean markdown (default)
bdata scrape "https://example.com/article" -f markdown -o article.md
# Raw HTML (when you need the DOM)
bdata scrape "https://example.com" -f html -o page.html
# Structured JSON (when the Unlocker returns parsed fields)
bdata scrape "https://example.com" -f json --pretty -o page.json
# Visual snapshot (saves PNG)
bdata scrape "https://example.com" -f screenshot -o page.png
# Geo-targeted (override the exit country)
bdata scrape "https://example.com" --country de -f markdown
Full flag reference: references/flags.md.
Verification gate (run before claiming success)
- Non-empty output:
test -s "$out_path"— or, for stdout, at least 200 bytes of content. - Not a block page — grep the output for any of these signatures (case-insensitive):
Access DeniedJust a momentAttention RequiredChecking your browsercaptchacf-browser-verificationcloudflare(with < 2KB total body)
- Expected markers present for the task: e.g., a product page should contain a price pattern (
\$\d); an article should contain at least one<h1>or#heading. - On failure, escalation ladder:
- Retry with a different
--country(e.g.,--country deif the origin site is US) - Escalate to
bdata browserfor full JS rendering (hand off tobrightdata-cliskill)
- Retry with a different
Do not report success until all checks above pass.
Red flags
- Claiming success without inspecting the output.
- Silencing errors with
2>/dev/null— you'll miss auth failures and rate-limit errors. - Running
bdata scrapeon Amazon/LinkedIn/TikTok/Instagram/YouTube/Reddit URLs — these are supported bydata-feedsand return structured data directly. Scraping loses the structure. - Scraping the same URL repeatedly in the same task — cache the first result.
- Looping
bdata scrapesequentially for large lists instead of usingxargs -P 4(or similar) with a parallelism cap. - Using
curlagainstapi.brightdata.comdirectly — legacy path; only when the CLI isn't available.
References
references/flags.md— every flag with when-to-use notes.references/patterns.md— shell-loop batching,xargsparallelism, pagination recipe, retry/backoff, block-page recovery chain, legacycurlfallback.references/examples.md— (1) single page → markdown, (2) batch a list of URLs with parallelism cap, (3) paginated listing, (4) block-page recovery.
Installs
Security Audit
View Source
brightdata/skills
More from this source
Power your AI Agents with
the best open-source models.
Drop-in OpenAI-compatible API. No data leaves Europe.
Explore Inference APIGLM
GLM 5
$1.00 / $3.20
per M tokens
Kimi
Kimi K2.5
$0.60 / $2.80
per M tokens
MiniMax
MiniMax M2.5
$0.30 / $1.20
per M tokens
Qwen
Qwen3.5 122B
$0.40 / $3.00
per M tokens
How to use this skill
Install scrape by running npx skills add brightdata/skills --skill scrape in your project directory. Run the install command above in your project directory. The skill file will be downloaded from GitHub and placed in your project.
No configuration needed. Your AI agent (Claude Code, Cursor, Windsurf, etc.) automatically detects installed skills and uses them as context when generating code.
The skill enhances your agent's understanding of scrape, helping it follow established patterns, avoid common mistakes, and produce production-ready output.
What you get
Skills are plain-text instruction files — not executable code. They encode expert knowledge about frameworks, languages, or tools that your AI agent reads to improve its output. This means zero runtime overhead, no dependency conflicts, and full transparency: you can read and review every instruction before installing.
Compatibility
This skill works with any AI coding agent that supports the skills.sh format, including Claude Code (Anthropic), Cursor, Windsurf, Cline, Aider, and other tools that read project-level context files. Skills are framework-agnostic at the transport level — the content inside determines which language or framework it applies to.
Chat with 100+ AI Models in one App.
Use Claude, ChatGPT, Gemini alongside with EU-Hosted Models like Deepseek, GLM-5, Kimi K2.5 and many more.