#601

Globales Ranking · von 601 Skills

rag-implementer AI Agent Skill

Quellcode ansehen: oakoss/agent-skills

Medium

Installation

npx skills add oakoss/agent-skills --skill rag-implementer

34

Installationen

RAG Implementer

Build production-ready retrieval-augmented generation systems. RAG = Retrieval + Context Assembly + Generation. Use RAG when LLMs need access to fresh, domain-specific, or proprietary knowledge not in their training data. Do not use RAG when simpler alternatives (FAQ pages, keyword search, semantic search) suffice. For KB architecture selection and governance, use the knowledge-base-manager skill. For knowledge graph implementation, use the knowledge-graph-builder skill.

Overview

Before building RAG, validate the need: try FAQ pages, keyword search, concierge MVP, or simple semantic search first. Only proceed with RAG for 50k+ documents with validated user demand and $200-500/month budget. RAG systems range from Naive (prototype) through Advanced (production) to Modular (enterprise), each tier adding complexity and cost.

The RAG pipeline has three core stages. First, retrieval finds relevant documents using hybrid search (semantic + keyword). Second, context assembly ranks, deduplicates, and compresses retrieved chunks into an optimal prompt. Third, generation produces a grounded response with source attribution. Each stage has distinct failure modes: retrieval can miss relevant documents (low recall), context assembly can overwhelm the model (lost in the middle), and generation can hallucinate despite good context (low faithfulness).

Modern RAG extends beyond basic vector similarity. Hybrid search combining dense embeddings with sparse BM25 is now the baseline. Re-ranking with cross-encoders improves precision after initial retrieval. Contextual chunking and late chunking preserve document-level semantics that fixed-size chunking loses. GraphRAG enables multi-hop reasoning over entity relationships by building knowledge graphs from documents. Proposition chunking breaks documents into atomic facts for precise retrieval of individual claims.

Choose techniques based on your query complexity and document structure. Start with hybrid search and re-ranking as the foundation, then layer contextual chunking, GraphRAG, or query expansion as needed. Measure everything: Precision@K, Recall@K, faithfulness, and end-to-end latency. The difference between a good and bad chunking strategy alone can create a 9% gap in recall performance.

Quick Reference

Phase Goal Key Actions
1. Knowledge Base Design Structured knowledge foundation Map sources, define chunking, add metadata
2. Embedding Strategy Semantic understanding Select model, benchmark on domain data
3. Vector Store Scalable storage Choose DB, configure index, plan scaling
4. Retrieval Pipeline Beyond simple similarity Hybrid retrieval, query enhancement, re-ranking
5. Context Assembly Optimal LLM context Rank, synthesize, compress, mitigate "lost in the middle"
6. Evaluation Measure performance Precision@K, Recall@K, faithfulness, latency
7. Production Deploy Enterprise reliability Containerize, cache, graceful degradation, security
8. Continuous Improvement Ongoing enhancement Auto-updates, fine-tuning, optimization
Decision Options
Vector DB (managed) Pinecone
Vector DB (self-hosted) Weaviate, Qdrant
Vector DB (lightweight) Chroma
Vector DB (existing Postgres) pgvector
Vector DB (billion-scale) Milvus / Zilliz
Embedding (general) text-embedding-3-large (3072 dim)
Embedding (cost-optimized) text-embedding-3-small (1536 dim)
Embedding (code) Voyage Code 3
Embedding (multilingual) multilingual-e5-large, Cohere embed-v4
Chunking (fixed) 500-1000 tokens, 50-100 overlap
Chunking (semantic) Paragraph/section/topic boundaries
Chunking (recursive) Markdown headers, code blocks
Chunking (contextual) LLM-generated summaries prepended to each chunk
Chunking (late) Full-document embedding, then pool by chunk boundaries
Cost Tier Time Monthly Cost Scale
Naive RAG (prototype) 1-2 weeks $50-150 <10k documents
Advanced RAG (production) 3-4 weeks $200-500 10k-1M documents
Modular RAG (enterprise) 6-8 weeks $500-2000+ 1M+ documents
Advanced Technique When to Use
Hybrid search Always -- combine semantic + keyword (BM25) for better recall
Re-ranking When initial retrieval returns noisy results
Contextual retrieval Documents with ambiguous references or pronouns
Late chunking Efficiency-focused pipelines with anaphoric references
GraphRAG Multi-hop reasoning over structured knowledge relationships
Proposition chunking Fact-dense documents requiring atomic retrieval units
Query expansion / HyDE Queries that are short, ambiguous, or under-specified

Common Mistakes

Mistake Correct Pattern
Building RAG before validating user need Try simpler alternatives first (FAQ, keyword search, concierge MVP); only build RAG with validated demand
Using a single retrieval method (semantic only) Implement hybrid retrieval combining semantic search with keyword (BM25) for better recall
Dumping all available data into the knowledge base Curate data sources carefully; filter noise, select authoritative content, and maintain quality
Ignoring the "lost in the middle" problem Place critical information at the start and end of context; compress mid-section
Skipping evaluation metrics before production Establish baselines for Precision@K, Recall@K, faithfulness, and hallucination rate before deploying
Using text-embedding-3-large at full 3072 dimensions without benchmarking Test at reduced dimensions (1024 or 1536) first -- often comparable accuracy at lower cost
Fixed-size chunking for all document types Match chunking strategy to document structure; use semantic or recursive chunking for structured content
Ignoring metadata filtering Attach rich metadata (source, date, category) and filter before or during vector search

Embedding Model Notes

text-embedding-3-large (3072 dimensions) remains OpenAI's most capable embedding model. It supports Matryoshka dimensionality reduction via the dimensions API parameter -- 1024 dimensions often delivers near-full accuracy at one-third storage cost. text-embedding-3-small (1536 dimensions) is a cost-effective alternative at $0.02 per million tokens. For code search, Voyage Code 3 outperforms general-purpose models. For multilingual workloads, consider multilingual-e5-large or Cohere embed-v4. Always benchmark on your domain data; general benchmarks do not predict domain-specific performance.

Vector Store Notes

Pinecone for managed simplicity, Weaviate or Qdrant for self-hosted with hybrid search, Chroma for prototyping, pgvector for teams already on PostgreSQL (practical limit around 10-100M vectors), and Milvus/Zilliz for billion-scale deployments. Choose index type based on tradeoffs: HNSW for speed (higher memory), IVF for scale (requires training), flat for exact search on small datasets only.

Most vector databases now achieve 10-100ms query latency on 1-10M vector datasets. Start with the simplest option that fits your scale requirements and migrate only when you hit concrete performance limits.

Delegation

  • Discover data sources and assess knowledge base quality: Use Explore agent to catalog documents, evaluate data freshness, and identify authoritative content
  • Implement retrieval pipeline with hybrid search and re-ranking: Use Task agent to build embedding, indexing, retrieval, and evaluation components
  • Design RAG architecture and vector store topology: Use Plan agent to select embedding models, vector databases, chunking strategies, and deployment architecture

For KB architecture selection, curation workflows, and governance, use the knowledge-base-manager skill. For knowledge graph implementation (ontology, entity extraction, graph databases), use the knowledge-graph-builder skill.

References

Installationen

Installationen 34
Globales Ranking #601 von 601

Sicherheitsprüfung

ath Safe
socket Safe
Warnungen: 0 Bewertung: 90
snyk Medium
EU EU-Hosted Inference API

Power your AI Agents with the best open-source models.

Drop-in OpenAI-compatible API. No data leaves Europe.

Explore Inference API

GLM

GLM 5

$1.00 / $3.20

per M tokens

Kimi

Kimi K2.5

$0.60 / $2.80

per M tokens

MiniMax

MiniMax M2.5

$0.30 / $1.20

per M tokens

Qwen

Qwen3.5 122B

$0.40 / $3.00

per M tokens

So verwenden Sie diesen Skill

1

Install rag-implementer by running npx skills add oakoss/agent-skills --skill rag-implementer in your project directory. Führen Sie den obigen Installationsbefehl in Ihrem Projektverzeichnis aus. Die Skill-Datei wird von GitHub heruntergeladen und in Ihrem Projekt platziert.

2

Keine Konfiguration erforderlich. Ihr KI-Agent (Claude Code, Cursor, Windsurf usw.) erkennt installierte Skills automatisch und nutzt sie als Kontext bei der Code-Generierung.

3

Der Skill verbessert das Verständnis Ihres Agenten für rag-implementer, und hilft ihm, etablierte Muster zu befolgen, häufige Fehler zu vermeiden und produktionsreifen Code zu erzeugen.

Was Sie erhalten

Skills sind Klartext-Anweisungsdateien — kein ausführbarer Code. Sie kodieren Expertenwissen über Frameworks, Sprachen oder Tools, das Ihr KI-Agent liest, um seine Ausgabe zu verbessern. Das bedeutet null Laufzeit-Overhead, keine Abhängigkeitskonflikte und volle Transparenz: Sie können jede Anweisung vor der Installation lesen und prüfen.

Kompatibilität

Dieser Skill funktioniert mit jedem KI-Coding-Agenten, der das skills.sh-Format unterstützt, einschließlich Claude Code (Anthropic), Cursor, Windsurf, Cline, Aider und anderen Tools, die projektbezogene Kontextdateien lesen. Skills sind auf Transportebene framework-agnostisch — der Inhalt bestimmt, für welche Sprache oder welches Framework er gilt.

Data sourced from the skills.sh registry and GitHub. Install counts and security audits are updated regularly.

EU Made in Europe

Chat with 100+ AI Models in one App.

Use Claude, ChatGPT, Gemini alongside with EU-Hosted Models like Deepseek, GLM-5, Kimi K2.5 and many more.

Kundensupport