HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI超神经
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
算力平台
首页
SOTA
参照表达分割
Referring Expression Segmentation On J Hmdb
Referring Expression Segmentation On J Hmdb
评估指标
AP
IoU mean
IoU overall
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
AP
IoU mean
IoU overall
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Paper Title
SgMg (Video-Swin-B)
0.450
0.725
0.737
0.972
0.917
0.714
0.225
0.003
Spectrum-guided Multi-granularity Referring Video Object Segmentation
SOC (Video-Swin-B)
0.446
0.723
0.736
0.969
0.914
0.711
0.213
0.001
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
SOC (Video-Swin-T)
0.397
0.701
0.707
0.947
0.864
0.627
0.179
0.001
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
MTTR (w=10)
0.392
0.698
0.701
0.939
0.852
0.616
0.166
0.001
End-to-End Referring Video Object Segmentation with Multimodal Transformers
MTTR (w=8)
0.366
0.679
0.674
0.91
0.815
0.57
0.144
0.001
End-to-End Referring Video Object Segmentation with Multimodal Transformers
ClawCraneNet
-
0.655
0.644
0.880
0.796
0.566
0.147
0.002
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation
VLIDE
0.441
0.666
0.68
0.874
0.791
0.586
0.182
0.30
Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation
HINet
-
0.627
0.652
0.819
0.736
0.542
0.168
0.4
Hierarchical interaction network for video object segmentation from referring expressions
CMPC-V
0.342
0.617
0.616
0.813
0.657
0.371
0.07
0.000
Cross-Modal Progressive Comprehension for Referring Segmentation
Hui et al.
0.335
0.604
0.598
0.783
0.639
0.378
0.076
0.000
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation
AAMN
0.321
0.576
0.583
0.773
0.627
0.360
0.044
0.000
Actor and Action Modular Network for Text-based Video Segmentation
CMSA+CFSA
-
0.581
0.628
0.764
0.625
0.389
0.09
0.001
Referring Segmentation in Images and Videos with Cross-Modal Self-Attention Network
ACGA
0.289
0.584
0.576
0.756
0.564
0.287
0.034
0.000
Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query
CMDy
0.301
0.576
0.554
0.742
0.587
0.316
0.047
0.000
Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries
RefVOS
-
0.568
0.606
0.731
0.62
0.392
0.088
0.0
Hierarchical interaction network for video object segmentation from referring expressions
Gavrilyuk et al. (Optical flow)
0.267
0.570
0.555
0.712
0.518
0.264
0.030
0.000
Actor and Action Video Segmentation from a Sentence
Gavrilyuk et al.
0.233
0.542
0.541
0.699
0.460
0.173
0.014
0.000
Actor and Action Video Segmentation from a Sentence
VT-Capsule
0.261
0.550
0.535
0.677
0.513
0.283
0.051
0.000
Visual-Textual Capsule Routing for Text-Based Video Segmentation
Hu et al.
0.178
0.528
0.546
0.633
0.350
0.085
0.002
0.000
Segmentation from Natural Language Expressions
Li et al.
0.173
0.491
0.529
0.578
0.335
0.103
0.060
0.000
Tracking by Natural Language Specification
0 of 21 row(s) selected.
Previous
Next
HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI超神经
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
算力平台
首页
SOTA
参照表达分割
Referring Expression Segmentation On J Hmdb
Referring Expression Segmentation On J Hmdb
评估指标
AP
IoU mean
IoU overall
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
AP
IoU mean
IoU overall
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Paper Title
SgMg (Video-Swin-B)
0.450
0.725
0.737
0.972
0.917
0.714
0.225
0.003
Spectrum-guided Multi-granularity Referring Video Object Segmentation
SOC (Video-Swin-B)
0.446
0.723
0.736
0.969
0.914
0.711
0.213
0.001
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
SOC (Video-Swin-T)
0.397
0.701
0.707
0.947
0.864
0.627
0.179
0.001
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
MTTR (w=10)
0.392
0.698
0.701
0.939
0.852
0.616
0.166
0.001
End-to-End Referring Video Object Segmentation with Multimodal Transformers
MTTR (w=8)
0.366
0.679
0.674
0.91
0.815
0.57
0.144
0.001
End-to-End Referring Video Object Segmentation with Multimodal Transformers
ClawCraneNet
-
0.655
0.644
0.880
0.796
0.566
0.147
0.002
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation
VLIDE
0.441
0.666
0.68
0.874
0.791
0.586
0.182
0.30
Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation
HINet
-
0.627
0.652
0.819
0.736
0.542
0.168
0.4
Hierarchical interaction network for video object segmentation from referring expressions
CMPC-V
0.342
0.617
0.616
0.813
0.657
0.371
0.07
0.000
Cross-Modal Progressive Comprehension for Referring Segmentation
Hui et al.
0.335
0.604
0.598
0.783
0.639
0.378
0.076
0.000
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation
AAMN
0.321
0.576
0.583
0.773
0.627
0.360
0.044
0.000
Actor and Action Modular Network for Text-based Video Segmentation
CMSA+CFSA
-
0.581
0.628
0.764
0.625
0.389
0.09
0.001
Referring Segmentation in Images and Videos with Cross-Modal Self-Attention Network
ACGA
0.289
0.584
0.576
0.756
0.564
0.287
0.034
0.000
Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query
CMDy
0.301
0.576
0.554
0.742
0.587
0.316
0.047
0.000
Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries
RefVOS
-
0.568
0.606
0.731
0.62
0.392
0.088
0.0
Hierarchical interaction network for video object segmentation from referring expressions
Gavrilyuk et al. (Optical flow)
0.267
0.570
0.555
0.712
0.518
0.264
0.030
0.000
Actor and Action Video Segmentation from a Sentence
Gavrilyuk et al.
0.233
0.542
0.541
0.699
0.460
0.173
0.014
0.000
Actor and Action Video Segmentation from a Sentence
VT-Capsule
0.261
0.550
0.535
0.677
0.513
0.283
0.051
0.000
Visual-Textual Capsule Routing for Text-Based Video Segmentation
Hu et al.
0.178
0.528
0.546
0.633
0.350
0.085
0.002
0.000
Segmentation from Natural Language Expressions
Li et al.
0.173
0.491
0.529
0.578
0.335
0.103
0.060
0.000
Tracking by Natural Language Specification
0 of 21 row(s) selected.
Previous
Next
Referring Expression Segmentation On J Hmdb | SOTA | HyperAI超神经