universal-pdf-vision-parser OpenClaw Skill

Extract multilingual document content and language learning notes (French, German, Japanese, Spanish, etc.) from PDFs using multimodal vision (Qwen-VL-Max)....

v1.0.0 Updated 1 mo ago

Installation

clawhub install universal-pdf-vision-parse

Requires npm i -g clawhub

View on ClawHub Download .zip

333

Downloads

0

Stars

6

current installs

7 all-time

1

Versions

Universal PDF Vision Parser Skill

Version: 0.1

This skill is a high-end multilingual document digitizer. It uses multimodal vision to 'look' at each PDF page, making it perfect for language learning notes, bilingual documents, and complex layouts that standard OCR fails to capture.

Prerequisites

DashScope API Key: A valid key from Alibaba Cloud Bailian with qwen-vl-max access.
Environment:

pip install pymupdf dashscope

Usage

Basic Command

            python scripts/vision_parse.py --pdf <path_to_pdf> --out <path_to_output.md> --api-key <YOUR_API_KEY> --max-pages 2
          

--max-pages: (Optional) Max pages to process. Defaults to 2. Set to -1 for all pages.

Agentic Workflow

Visual Scanning: Converts PDF pages to 300 DPI PNGs.
Expert Transcription: Qwen-VL-Max identifies the language and transcribes terms, translations, and explanations.
Markdown Structuring: Automatically formats content with bold keywords, italicized meanings, and clean tables.

Examples

User: "Convert this German-Chinese note to markdown: notes.pdf"

Agent Action:

python scripts/vision_parse.py --pdf notes.pdf --out notes.md

Statistics

Downloads 333

Stars 0

Current installs 6

All-time installs 7

Versions 1

Comments 0

Created Mar 3, 2026

Updated Mar 3, 2026

Author

M Z

@mingensiie

Latest Changes

v1.0.0 · Mar 3, 2026

Universal PDF Vision Parser Skill 1.0.0 - Initial release of a high-end, multilingual PDF digitizer for language learning documents. - Uses multimodal vision (Qwen-VL-Max) to extract and structure content from complex layouts into Markdown. - Supports multiple languages including French, German, Japanese, and Spanish. - Converts PDF pages to high-resolution images for accurate text parsing and formatting. - Perfect for extracting language notes, bilingual documents, and hard-to-capture formats.

Quick Install

clawhub install universal-pdf-vision-parse

Related Skills

Other popular skills you might find useful.

Agent Browser

MaTriXy

Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection

69.1k 248 v0.1.0

Browser Automation

peytoncasper

Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications.

31.8k 46 v1.0.1

Code

Iván

Coding workflow with planning, implementation, verification, and testing for clean software development.

18.2k 35 v1.0.4

Agent Browser - Stagehand

peytoncasper

Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications.

6.3k 4 v1.0.0

Browse all skills →

Made in Europe

Chat with 100+ AI Models in one App.

Use Claude, ChatGPT, Gemini alongside with EU-Hosted Models like Deepseek, GLM-5, Kimi K2.5 and many more.

Start for free View pricing