#330

Globales Ranking · von 600 Skills

scrape AI Agent Skill

Quellcode ansehen: brightdata/skills

Medium

Installation

npx skills add brightdata/skills --skill scrape

7.3K

Installationen

Bright Data — Scrape

Get clean content (markdown, HTML, JSON, screenshot) from one or more URLs via the Bright Data CLI. This skill owns the "fetch raw or lightly-structured content" job. For platform-specific structured data (Amazon, LinkedIn, TikTok, etc.), stop and use data-feeds instead — you'll get clean JSON without selector logic.

Setup gate (run first)

Before any scrape, verify the CLI is installed and authenticated:

if ! command -v bdata >/dev/null 2>&1; then
    echo "bdata CLI not installed — see bright-data-best-practices/references/cli-setup.md"
elif ! bdata zones >/dev/null 2>&1; then
    echo "bdata not authenticated — run: bdata login  (or: bdata login --device for SSH)"
fi

If either check fails, halt and route the user to skills/bright-data-best-practices/references/cli-setup.md. Do not attempt the legacy curl fallback silently — ask the user first.

Pick your path

Situation	Action
Single URL	`bdata scrape <url> -f markdown`
Small list (≤ ~20 URLs)	shell loop, 1 at a time (see `references/patterns.md`)
Larger list (dozens+)	`xargs -P 4` with parallelism cap (see `references/patterns.md`)
Paginated listing	scrape page 1 → extract next-page URL → append → repeat (see `references/examples.md`)
JS-heavy / login-gated / interaction-required	escalate to `bdata browser` (see `brightdata-cli` skill)
Amazon, LinkedIn, TikTok, Instagram, YouTube, Reddit, …	stop — hand off to `data-feeds`
No URL yet, just a topic	hand off to `search`

Action

Core commands:

# Clean markdown (default)
bdata scrape "https://example.com/article" -f markdown -o article.md

# Raw HTML (when you need the DOM)
bdata scrape "https://example.com" -f html -o page.html

# Structured JSON (when the Unlocker returns parsed fields)
bdata scrape "https://example.com" -f json --pretty -o page.json

# Visual snapshot (saves PNG)
bdata scrape "https://example.com" -f screenshot -o page.png

# Geo-targeted (override the exit country)
bdata scrape "https://example.com" --country de -f markdown

Full flag reference: references/flags.md.

Verification gate (run before claiming success)

Non-empty output: test -s "$out_path" — or, for stdout, at least 200 bytes of content.
Not a block page — grep the output for any of these signatures (case-insensitive):
- Access Denied
- Just a moment
- Attention Required
- Checking your browser
- captcha
- cf-browser-verification
- cloudflare (with < 2KB total body)
Expected markers present for the task: e.g., a product page should contain a price pattern (\$\d); an article should contain at least one <h1> or # heading.
On failure, escalation ladder:
- Retry with a different --country (e.g., --country de if the origin site is US)
- Escalate to bdata browser for full JS rendering (hand off to brightdata-cli skill)

Do not report success until all checks above pass.

Red flags

Claiming success without inspecting the output.
Silencing errors with 2>/dev/null — you'll miss auth failures and rate-limit errors.
Running bdata scrape on Amazon/LinkedIn/TikTok/Instagram/YouTube/Reddit URLs — these are supported by data-feeds and return structured data directly. Scraping loses the structure.
Scraping the same URL repeatedly in the same task — cache the first result.
Looping bdata scrape sequentially for large lists instead of using xargs -P 4 (or similar) with a parallelism cap.
Using curl against api.brightdata.com directly — legacy path; only when the CLI isn't available.

References

references/flags.md — every flag with when-to-use notes.
references/patterns.md — shell-loop batching, xargs parallelism, pagination recipe, retry/backoff, block-page recovery chain, legacy curl fallback.
references/examples.md — (1) single page → markdown, (2) batch a list of URLs with parallelism cap, (3) paginated listing, (4) block-page recovery.

Installationen

Installationen 7.3K

Globales Ranking #330 von 600

Sicherheitsprüfung

ath Safe

socket Safe

Warnungen: 0 Bewertung: 90

snyk Medium

zeroleaks Safe

Bewertung: 93

Quellcode ansehen

brightdata/skills

Mehr aus dieser Quelle

EU-Hosted Inference API

Power your AI Agents with
the best open-source models.

Drop-in OpenAI-compatible API. No data leaves Europe.

Explore Inference API

GLM

GLM 5

$1.00 / $3.20

per M tokens

Kimi

Kimi K2.5

$0.60 / $2.80

per M tokens

MiniMax

MiniMax M2.5

$0.30 / $1.20

per M tokens

Qwen

Qwen3.5 122B

$0.40 / $3.00

per M tokens

So verwenden Sie diesen Skill

Install scrape by running npx skills add brightdata/skills --skill scrape in your project directory. Führen Sie den obigen Installationsbefehl in Ihrem Projektverzeichnis aus. Die Skill-Datei wird von GitHub heruntergeladen und in Ihrem Projekt platziert.

Keine Konfiguration erforderlich. Ihr KI-Agent (Claude Code, Cursor, Windsurf usw.) erkennt installierte Skills automatisch und nutzt sie als Kontext bei der Code-Generierung.

Der Skill verbessert das Verständnis Ihres Agenten für scrape, und hilft ihm, etablierte Muster zu befolgen, häufige Fehler zu vermeiden und produktionsreifen Code zu erzeugen.

Was Sie erhalten

Skills sind Klartext-Anweisungsdateien — kein ausführbarer Code. Sie kodieren Expertenwissen über Frameworks, Sprachen oder Tools, das Ihr KI-Agent liest, um seine Ausgabe zu verbessern. Das bedeutet null Laufzeit-Overhead, keine Abhängigkeitskonflikte und volle Transparenz: Sie können jede Anweisung vor der Installation lesen und prüfen.

Kompatibilität

Dieser Skill funktioniert mit jedem KI-Coding-Agenten, der das skills.sh-Format unterstützt, einschließlich Claude Code (Anthropic), Cursor, Windsurf, Cline, Aider und anderen Tools, die projektbezogene Kontextdateien lesen. Skills sind auf Transportebene framework-agnostisch — der Inhalt bestimmt, für welche Sprache oder welches Framework er gilt.

Data sourced from the skills.sh registry and GitHub. Install counts and security audits are updated regularly.

Made in Europe

Chat with 100+ AI Models in one App.

Use Claude, ChatGPT, Gemini alongside with EU-Hosted Models like Deepseek, GLM-5, Kimi K2.5 and many more.

Start for free View pricing

scrape AI Agent Skill

Bright Data — Scrape

Setup gate (run first)

Pick your path

Action

Verification gate (run before claiming success)

Red flags

References

Installationen

Sicherheitsprüfung

Quellcode ansehen

Power your AI Agents with the best open-source models.

So verwenden Sie diesen Skill

Was Sie erhalten

Kompatibilität

Chat with 100+ AI Models in one App.

Power your AI Agents with
the best open-source models.