HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI超神经
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
算力平台
首页
SOTA
参照表达分割
Referring Expression Segmentation On A2D
Referring Expression Segmentation On A2D
评估指标
AP
IoU mean
IoU overall
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
AP
IoU mean
IoU overall
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Paper Title
SOC (Video-Swin-B)
0.573
0.725
0.807
0.851
0.827
0.765
0.607
0.252
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
SgMg (Video-Swin-B)
0.585
0.720
0.799
0.843
0.822
0.767
0.617
0.259
Spectrum-guided Multi-granularity Referring Video Object Segmentation
ReferFormer (Video-Swin-B)
0.550
0.703
0.786
0.831
0.804
0.741
0.579
0.212
Language as Queries for Referring Video Object Segmentation
SOC (Video-Swin-T)
0.504
0.669
0.747
0.79
0.756
0.687
0.535
0.195
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
ClawCraneNet
-
0.655
0.644
0.704
0.677
0.617
0.489
0.171
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation
MTTR (w=10)
0.461
0.64
0.72
0.754
0.712
0.638
0.485
0.169
End-to-End Referring Video Object Segmentation with Multimodal Transformers
MANET
0.471
0.632
0.726
0.734
0.682
0.579
0.389
0.132
Multi-Attention Network for Compressed Video Referring Object Segmentation
MTTR (w=8)
0.447
0.618
0.702
0.721
0.684
0.607
0.456
0.164
End-to-End Referring Video Object Segmentation with Multimodal Transformers
RefVOS
-
0.599
0.599
0.495
-
-
-
0.064
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation
VLIDE
0.469
0.598
0.714
0.702
0.663
0.585
0.428
0.151
Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation
Locater
0.465
0.597
0.69
0.709
0.64
0.525
0.351
0.101
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
CMPC-V (I3D)
0.404
0.573
0.653
0.655
0.592
0.506
0.342
0.098
Cross-Modal Progressive Comprehension for Referring Segmentation
Hui et al.
0.399
0.561
0.662
0.654
0.589
0.497
0.333
0.091
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation
mmmmtbvs
0.419
0.558
0.673
0.645
0.597
0.523
0.375
0.13
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
AAMN
0.396
0.552
0.617
0.681
0.629
0.523
0.296
0.029
Actor and Action Modular Network for Text-based Video Segmentation
CMDy
0.333
0.531
0.623
0.607
0.525
0.405
0.235
0.045
Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries
PRPE
0.388
0.529
0.661
0.634
0.579
0.483
0.322
0.083
Polar Relative Positional Encoding for Video-Language Segmentation
HINet
-
0.529
0.679
0.611
0.559
0.486
0.342
0.12
Hierarchical interaction network for video object segmentation from referring expressions
CMPC-V (R2D)
0.351
0.515
0.649
0.590
0.527
0.434
0.284
0.068
Cross-Modal Progressive Comprehension for Referring Segmentation
RefVOS
-
0.497
0.672
0.578
0.534
0.456
0.311
0.093
Hierarchical interaction network for video object segmentation from referring expressions
0 of 27 row(s) selected.
Previous
Next
HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI超神经
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
算力平台
首页
SOTA
参照表达分割
Referring Expression Segmentation On A2D
Referring Expression Segmentation On A2D
评估指标
AP
IoU mean
IoU overall
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
AP
IoU mean
IoU overall
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Paper Title
SOC (Video-Swin-B)
0.573
0.725
0.807
0.851
0.827
0.765
0.607
0.252
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
SgMg (Video-Swin-B)
0.585
0.720
0.799
0.843
0.822
0.767
0.617
0.259
Spectrum-guided Multi-granularity Referring Video Object Segmentation
ReferFormer (Video-Swin-B)
0.550
0.703
0.786
0.831
0.804
0.741
0.579
0.212
Language as Queries for Referring Video Object Segmentation
SOC (Video-Swin-T)
0.504
0.669
0.747
0.79
0.756
0.687
0.535
0.195
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
ClawCraneNet
-
0.655
0.644
0.704
0.677
0.617
0.489
0.171
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation
MTTR (w=10)
0.461
0.64
0.72
0.754
0.712
0.638
0.485
0.169
End-to-End Referring Video Object Segmentation with Multimodal Transformers
MANET
0.471
0.632
0.726
0.734
0.682
0.579
0.389
0.132
Multi-Attention Network for Compressed Video Referring Object Segmentation
MTTR (w=8)
0.447
0.618
0.702
0.721
0.684
0.607
0.456
0.164
End-to-End Referring Video Object Segmentation with Multimodal Transformers
RefVOS
-
0.599
0.599
0.495
-
-
-
0.064
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation
VLIDE
0.469
0.598
0.714
0.702
0.663
0.585
0.428
0.151
Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation
Locater
0.465
0.597
0.69
0.709
0.64
0.525
0.351
0.101
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
CMPC-V (I3D)
0.404
0.573
0.653
0.655
0.592
0.506
0.342
0.098
Cross-Modal Progressive Comprehension for Referring Segmentation
Hui et al.
0.399
0.561
0.662
0.654
0.589
0.497
0.333
0.091
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation
mmmmtbvs
0.419
0.558
0.673
0.645
0.597
0.523
0.375
0.13
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
AAMN
0.396
0.552
0.617
0.681
0.629
0.523
0.296
0.029
Actor and Action Modular Network for Text-based Video Segmentation
CMDy
0.333
0.531
0.623
0.607
0.525
0.405
0.235
0.045
Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries
PRPE
0.388
0.529
0.661
0.634
0.579
0.483
0.322
0.083
Polar Relative Positional Encoding for Video-Language Segmentation
HINet
-
0.529
0.679
0.611
0.559
0.486
0.342
0.12
Hierarchical interaction network for video object segmentation from referring expressions
CMPC-V (R2D)
0.351
0.515
0.649
0.590
0.527
0.434
0.284
0.068
Cross-Modal Progressive Comprehension for Referring Segmentation
RefVOS
-
0.497
0.672
0.578
0.534
0.456
0.311
0.093
Hierarchical interaction network for video object segmentation from referring expressions
0 of 27 row(s) selected.
Previous
Next
Referring Expression Segmentation On A2D | SOTA | HyperAI超神经