SAM3: Visual Segmentation Model
1. Tutorial Introduction

SAM3 is an advanced computer vision model launched by Meta AI in November 2025. This model can detect, segment, and track objects in images and videos using text, examples, and visual cues. It supports open-vocabulary phrase input, possesses powerful cross-modal interaction capabilities, and can correct segmentation results in real time. SAM3 delivers superior performance in image and video segmentation tasks, outperforming existing systems by twice the speed, and supports zero-shot learning. The model extends to 3D reconstruction, supporting applications in various scenarios such as home previews, creative video editing, and scientific research, providing a powerful impetus for the future development of computer vision. Related research papers are available in the journal "[Insert relevant research paper here]".SAM 3: Segment Anything with Concepts".
This tutorial uses a single RTX 5090 graphics card by default, but can be started with a minimum of a single RTX 4090. Three examples are provided for testing: Image Segmentation, Video Text Prompting, and Video Point/Box Prompting. The model only supports English input.
2. Effect display


3. Operation steps
1. Start the container

2. Usage steps
If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 2-3 minutes and refresh the page.
1. Image Segmentation

Specific parameters:
- Text Prompt: You can enter text here.
- Detection Threshold: The higher the threshold, the fewer targets are detected.
- Mask Threshold: The higher the threshold, the clearer and sharper the generated mask boundaries.
2. Video Text Prompting

Specific parameters:
- Text Prompt(s): You can enter text here.
- Propagate across video: Click this button to perform video tracking of the target.
3. Video Point/Box Prompting

Specific parameters:
- Object ID: The detected target ID.
- Point label:
- positive: When you click on a location on the image, if it is positive, it means that this point belongs to the target object you want to segment, so please include it in the calculation.
- negative: When you click on a location on the image, if it is negative, it means that this point does not belong to the target object (it is the background or something else), please remove it.
- Clear old inputs for this object: Whether to clear previously detected targets.
- Prompt type:
- Points: Click visual cues.
- Boxes: Visual cues for selecting items.

Citation Information
The citation information for this project is as follows:
@misc{carion2025sam3segmentconcepts,
title={SAM 3: Segment Anything with Concepts},
author={Nicolas Carion and Laura Gustafson and Yuan-Ting Hu and Shoubhik Debnath and Ronghang Hu and Didac Suris and Chaitanya Ryali and Kalyan Vasudev Alwala and Haitham Khedr and Andrew Huang and Jie Lei and Tengyu Ma and Baishan Guo and Arpit Kalla and Markus Marks and Joseph Greer and Meng Wang and Peize Sun and Roman Rädle and Triantafyllos Afouras and Effrosyni Mavroudi and Katherine Xu and Tsung-Han Wu and Yu Zhou and Liliane Momeni and Rishi Hazra and Shuangrui Ding and Sagar Vaze and Francois Porcher and Feng Li and Siyuan Li and Aishwarya Kamath and Ho Kei Cheng and Piotr Dollár and Nikhila Ravi and Kate Saenko and Pengchuan Zhang and Christoph Feichtenhofer},
year={2025},
eprint={2511.16719},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.16719},
}Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.