Hi-SAM: Marrying Segment Anything Model for Hierarchical Text
Segmentation
Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
Maoyuan Ye Jing Zhang, Senior Member, IEEE Juhua Liu, Member, IEEE Chenyu Liu Baocai Yin Cong Liu Bo Du, Senior Member, IEEE Dacheng Tao, Fellow, IEEE

Abstract
The Segment Anything Model (SAM), a profound vision foundation modelpretrained on a large-scale dataset, breaks the boundaries of generalsegmentation and sparks various downstream applications. This paper introducesHi-SAM, a unified model leveraging SAM for hierarchical text segmentation.Hi-SAM excels in segmentation across four hierarchies, including pixel-leveltext, word, text-line, and paragraph, while realizing layout analysis as well.Specifically, we first turn SAM into a high-quality pixel-level textsegmentation (TS) model through a parameter-efficient fine-tuning approach. Weuse this TS model to iteratively generate the pixel-level text labels in asemi-automatical manner, unifying labels across the four text hierarchies inthe HierText dataset. Subsequently, with these complete labels, we launch theend-to-end trainable Hi-SAM based on the TS architecture with a customizedhierarchical mask decoder. During inference, Hi-SAM offers both automatic maskgeneration (AMG) mode and promptable segmentation (PS) mode. In the AMG mode,Hi-SAM segments pixel-level text foreground masks initially, then samplesforeground points for hierarchical text mask generation and achieves layoutanalysis in passing. As for the PS mode, Hi-SAM provides word, text-line, andparagraph masks with a single point click. Experimental results show thestate-of-the-art performance of our TS model: 84.86% fgIOU on Total-Text and88.96% fgIOU on TextSeg for pixel-level text segmentation. Moreover, comparedto the previous specialist for joint hierarchical detection and layout analysison HierText, Hi-SAM achieves significant improvements: 4.73% PQ and 5.39% F1 onthe text-line level, 5.49% PQ and 7.39% F1 on the paragraph level layoutanalysis, requiring 20× fewer training epochs. The code is available athttps://github.com/ymy-k/Hi-SAM.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| hierarchical-text-segmentation-on-hiertext | Hi-SAM | F-score (average): 81.87 F-score (para., layout): 75.97 F-score (stroke): 83.36 F-score (text-line): 85.30 F-score (word): 82.86 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.