#601

Globales Ranking · von 601 Skills

generate image AI Agent Skill

Quellcode ansehen: b-open-io/gemskills

Medium

Installation

npx skills add b-open-io/gemskills --skill generate image

Installationen

Generate Image

Generate images using Nano Banana Pro (gemini-3-pro-image-preview).

When to Use

Use this skill when the user asks to:

Generate an image from a text prompt
Create artwork, illustrations, or graphics
Generate variations of an existing image (img2img)
Create scenes with multiple reference images (character + location + other characters)

Style Selection

If the user hasn't specified a style, present a multi-choice question before proceeding:

How would you like to handle the art style?

Pick a style - Browse 167 styles visually and choose one

Let me choose - I'll suggest a style based on your prompt

What are styles? - Learn how the style system works

No style - Generate without a specific art style

Use the AskUserQuestion tool to present this choice.

If the user picks "Pick a style", launch the style picker (pick mode is the default):

STYLE_JSON=$(bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/browsing-styles/scripts/preview_server.ts)

The picker opens a browser. The user clicks a style, views the detail modal, clicks "Select This Style". STYLE_JSON receives the selection as JSON and the server exits. Parse the id field and pass it via --style <id> to the generate command.

If the user picks "Let me choose", select the most fitting style based on their prompt and explain why. Use the style suggestions in the "Suggest Styles to Users" section below.

If the user picks "What are styles?", explain: there are 167 curated art styles across 10 categories (traditional, digital, illustration, photography, design, retro, technique, decorative, creative, cultural). Each style includes prompt hints and a visual reference tile that gets sent to Gemini for better adherence. Then re-ask the style question.

If the user already specified a style (e.g. "generate in watercolor style"), skip this step entirely and use --style directly.

Gather Design Direction First

Ask clarifying questions before rewriting prompts to understand the user's intent:

Purpose: What is the image for? (banner, logo, social media, product shot, art piece)
Color palette: Any brand colors? Dark/light theme? Specific mood?
Composition constraints: Aspect ratio needs? Text overlay space? Full-bleed vs bordered?
Key elements: What must be included? What should be avoided?

Simple requests like "make a cat image" can proceed with sensible defaults. Complex requests like "create a banner for my app" require clarification to avoid iteration waste.

Prompt Rewriting (Critical)

Before generating any image, always rewrite the user's prompt using the guide in references/prompt-guide.md.

The core principle: "Describe the scene, don't just list keywords."

Rewriting Checklist

Transform simple prompts by adding:

Subject details: Specific appearance, clothing, expression, materials
Environment/Setting: Location, time of day, weather, indoor/outdoor
Lighting: Natural/artificial, direction, quality, color temperature
Composition: Camera angle, distance, framing, focal length
Style/Aesthetic: Photorealistic, illustration style, art movement
Mood/Atmosphere: Emotional tone, color palette mood
Technical specs: Aspect ratio, level of detail, textures

Example Transformation

User says: "banner for my app"
Rewritten prompt: "A modern, full-bleed banner for a technology application. Dark gradient background transitioning from deep navy to black. Abstract geometric network visualization with glowing nodes connected by thin lines. Clean sans-serif typography positioned left of center. Professional, tech-forward aesthetic with no visible borders or edges - the design extends seamlessly to all edges. 16:9 aspect ratio."

User says: "a cat"
Rewritten prompt: "A fluffy orange tabby cat lounging on a sun-drenched windowsill, soft afternoon light creating a warm glow on its fur. The cat is in a relaxed pose with half-closed eyes, conveying contentment. Shot from a low angle with shallow depth of field, the background showing a blurred garden view. Photorealistic, warm color palette."

Recommended Workflow

Draft → Iterate → Final approach saves time and API costs:

Draft Phase (1K): Generate quickly at default resolution to test prompts
Iteration Phase: Refine prompts incrementally, creating new files each attempt
Final Phase (4K): Only produce high-res output after prompt is validated

Do not read the image back - the script outputs only the file path. This allows efficient iteration without Claude loading large image data.

Usage

bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "prompt" [options]

Options

--input <path> - Reference image (can specify multiple times, up to 14 images)
--style <id> - Apply style from the style library (see browsing-styles skill)
--size <1K|2K|4K> - Image size (default: 1K for fast drafts)
--aspect <ratio> - Aspect ratio: 1:1, 16:9, 9:16, 4:3, 3:4
--negative <prompt> - Negative prompt (what to avoid)
--count <n> - Number of images (1-4, default: 1)
--guidance <n> - Guidance scale
--seed <n> - Random seed for reproducibility
--output <path> - Output path
--model <name> - gemini (default) or grok (Grok Imagine Image via Replicate)

Examples

# Simple generation
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "cyberpunk cityscape at night"

# With art style (100+ available, use short names or full IDs)
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "mountain landscape" --style impressionism
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "portrait" --style ukiy
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "city street" --style noir

# High-res with specific aspect ratio
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "mountain landscape" --size 4K --aspect 16:9

# With negative prompt
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "portrait of a cat" --negative "low quality, blurry"

# Combine style with other options
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "cat sleeping" --style wtrc --size 4K --aspect 1:1

# Generate multiple variations
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "abstract art" --count 4

# Single reference image (img2img)
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "make it look like a watercolor painting" --input photo.jpg

# Multiple reference images (character consistency, scene composition)
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "King on throne in dramatic lighting" \
  --input character.png \
  --input throne-room.png \
  --aspect 16:9 --size 2K

# Multiple characters in a scene
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "Two warriors facing each other in combat" \
  --input warrior1.png \
  --input warrior2.png \
  --input battlefield.png \
  --aspect 16:9

Multiple Reference Images

Gemini supports up to 14 reference images per request:

6 objects - Locations, items, environments
5 humans - Character consistency across generations
Use detailed prompts describing how reference images should be combined
Reference images help maintain consistency across scene generations

Common Formats

Format	Aspect Ratio	Recommended Size	Crop Target
OG / Social Share	16:9	2K	1200x630 via sips
Twitter Card	16:9	2K	1200x628 via sips
Hero Banner	16:9	4K	Full width
Profile Picture	1:1	1K	512x512 or 1024x1024
Story / Reel	9:16	2K	1080x1920
Team Lineup	21:9	2K	Full width

Social Share Image Workflow

All 4 steps are required. Do not skip optimization.

Generate at 16:9 with center-weighted prompt at --size 2K (retina-ready):

bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "CENTER all important elements, no content near edges. [your prompt]" --aspect 16:9 --size 2K --output social-share.png

Crop to exact platform dimensions:

sips -z 630 1200 -c 630 1200 social-share.png --out social-share-og.png   # OG
sips -z 628 1200 -c 628 1200 social-share.png --out social-share-tw.png   # Twitter

Optimize - convert to JPEG at 85% quality (social images never need transparency):

sips -s format jpeg -s formatOptions 85 social-share-og.png --out social-share-og.jpg

Verify dimensions and file size:

sips -g pixelWidth -g pixelHeight social-share-og.jpg
ls -la social-share-og.jpg  # Target: <500KB

Why optimize? Gemini outputs PNG (~1-2MB). JPEG at 85% reduces to ~200-400KB with no visible quality loss. Social platforms re-compress uploads, so starting with a well-optimized JPEG prevents double-compression artifacts.

Available Styles

100+ styles across 9 categories: traditional art, digital, illustration, photography, design/UI, retro, techniques, decorative, and creative/material-based.

When a style has a tile reference image (assets/tiles/<id>.png), it is automatically sent as a visual reference to Gemini alongside the prompt hints — this produces much better style adherence.

Suggest Styles to Users

When generating images, proactively suggest relevant styles:

For landscapes: impr, roma, wtrl, cine
For portraits: barq, prer, anim, fnoi
For UI/web: brut, nbrt, glas, flat
For retro/vintage: y2k, frug, psyc, koda
For whimsical/fun: ghbl, kawa, chld, brit
For dramatic: cybr, nnoi, surr, expr
For creative: sand, undr, orig, clay, lego

Browse all: bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/browsing-styles/scripts/list_styles.ts --table
Search: bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/browsing-styles/scripts/list_styles.ts --search "watercolor"

Models

Gemini (default)

Uses gemini-3-pro-image-preview - Nano Banana Pro, Google's professional image generation model with thinking capabilities.

Grok Imagine Image (`--model grok`)

Uses xai/grok-imagine-image via Replicate. Requires REPLICATE_API_TOKEN (or REPLICATE_API_KEY). Text-to-image only (no reference images or style tiles). This is a last-resort fallback — Gemini produces better results including likeness. Only use when:

Content is blocked by Gemini's safety filters
The user specifically requests it

# Grok image generation
bun run --cwd ${CLAUDE_PLUGIN_ROOT} ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.ts "A medieval girl standing on a futuristic hoverpod" --model grok --output hoverpod.jpg

Last verified: March 2026. If a newer generation exists, STOP and suggest a PR to b-open-io/gemskills. See the ask-gemini skill's references/gemini-api.md for current models and Google's official gemini-api-dev skill for the canonical source.

Reference Files

For detailed prompting strategies and techniques:

references/prompt-guide.md - Comprehensive Gemini prompting guide with 7 strategies, example transformations, and best practices from Google's official documentation
ask-gemini skill's references/gemini-api.md - Current Gemini models, SDK info, and dynamic documentation via llms.txt

Installationen

Installationen 26

Globales Ranking #601 von 601

Sicherheitsprüfung

ath Medium

socket Safe

Warnungen: 0 Bewertung: 90

snyk Low

Quellcode ansehen

b-open-io/gemskills

Mehr aus dieser Quelle

EU-Hosted Inference API

Power your AI Agents with
the best open-source models.

Drop-in OpenAI-compatible API. No data leaves Europe.

Explore Inference API

GLM

GLM 5

$1.00 / $3.20

per M tokens

Kimi

Kimi K2.5

$0.60 / $2.80

per M tokens

MiniMax

MiniMax M2.5

$0.30 / $1.20

per M tokens

Qwen

Qwen3.5 122B

$0.40 / $3.00

per M tokens

So verwenden Sie diesen Skill

Install generate image by running npx skills add b-open-io/gemskills --skill generate image in your project directory. Führen Sie den obigen Installationsbefehl in Ihrem Projektverzeichnis aus. Die Skill-Datei wird von GitHub heruntergeladen und in Ihrem Projekt platziert.

Keine Konfiguration erforderlich. Ihr KI-Agent (Claude Code, Cursor, Windsurf usw.) erkennt installierte Skills automatisch und nutzt sie als Kontext bei der Code-Generierung.

Der Skill verbessert das Verständnis Ihres Agenten für generate image, und hilft ihm, etablierte Muster zu befolgen, häufige Fehler zu vermeiden und produktionsreifen Code zu erzeugen.

Was Sie erhalten

Skills sind Klartext-Anweisungsdateien — kein ausführbarer Code. Sie kodieren Expertenwissen über Frameworks, Sprachen oder Tools, das Ihr KI-Agent liest, um seine Ausgabe zu verbessern. Das bedeutet null Laufzeit-Overhead, keine Abhängigkeitskonflikte und volle Transparenz: Sie können jede Anweisung vor der Installation lesen und prüfen.

Kompatibilität

Dieser Skill funktioniert mit jedem KI-Coding-Agenten, der das skills.sh-Format unterstützt, einschließlich Claude Code (Anthropic), Cursor, Windsurf, Cline, Aider und anderen Tools, die projektbezogene Kontextdateien lesen. Skills sind auf Transportebene framework-agnostisch — der Inhalt bestimmt, für welche Sprache oder welches Framework er gilt.

Data sourced from the skills.sh registry and GitHub. Install counts and security audits are updated regularly.

Made in Europe

Chat with 100+ AI Models in one App.

Use Claude, ChatGPT, Gemini alongside with EU-Hosted Models like Deepseek, GLM-5, Kimi K2.5 and many more.

Start for free View pricing

App herunterladen:

generate image AI Agent Skill

Generate Image

When to Use

Style Selection

Gather Design Direction First

Prompt Rewriting (Critical)

Rewriting Checklist

Example Transformation

Recommended Workflow

Usage

Options

Examples

Multiple Reference Images

Common Formats

Social Share Image Workflow

Available Styles

Suggest Styles to Users

Models

Gemini (default)

Grok Imagine Image (--model grok)

Reference Files

Installationen

Sicherheitsprüfung

Quellcode ansehen

Power your AI Agents with the best open-source models.

So verwenden Sie diesen Skill

Was Sie erhalten

Kompatibilität

Chat with 100+ AI Models in one App.

Grok Imagine Image (`--model grok`)

Power your AI Agents with
the best open-source models.