📄 Dolphin — a new multi-model system from ByteDance for recognizing complex documents in images
Dolphin is an advanced model capable of analyzing scanned and photographed documents, recognizing not only text but also tables, formulas, and charts.
It is ideal for automating the reading and structuring of PDF files, scanned reports, and scientific publications.
How it works:
1️⃣ Initial page analysis — the model identifies the layout of elements as if reading them by a person, maintaining the reading order.
2️⃣ Content processing — simultaneously identifying paragraphs, tables, mathematical formulas, and other components using built-in prompts specifically designed for these tasks.
System structure:
• Visual component — Swin Transformer
• Text decoder — MBart
• Management — via prompts
Key features:
• Processing pages individually, allowing handling of large documents
• Accurate parsing of individual elements, such as tables or charts
• High precision and fast data processing
• Open MIT license
To get started:
“`bash
git clone https://github.com/ByteDance/Dolphin.git
cd Dolphin
“`
Available on GitHub, HF, and in demo version.
Using this model will greatly simplify working with various types of documents, making automatic extraction and structuring of information easier and faster.
