HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Referring Expression Segmentation
Referring Expression Segmentation On A2D
Referring Expression Segmentation On A2D
Metrics
AP
IoU mean
IoU overall
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Results
Performance results of various models on this benchmark
Columns
Model Name
AP
IoU mean
IoU overall
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Paper Title
SOC (Video-Swin-B)
0.573
0.725
0.807
0.851
0.827
0.765
0.607
0.252
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
SgMg (Video-Swin-B)
0.585
0.720
0.799
0.843
0.822
0.767
0.617
0.259
Spectrum-guided Multi-granularity Referring Video Object Segmentation
ReferFormer (Video-Swin-B)
0.550
0.703
0.786
0.831
0.804
0.741
0.579
0.212
Language as Queries for Referring Video Object Segmentation
SOC (Video-Swin-T)
0.504
0.669
0.747
0.79
0.756
0.687
0.535
0.195
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
ClawCraneNet
-
0.655
0.644
0.704
0.677
0.617
0.489
0.171
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation
MTTR (w=10)
0.461
0.64
0.72
0.754
0.712
0.638
0.485
0.169
End-to-End Referring Video Object Segmentation with Multimodal Transformers
MANET
0.471
0.632
0.726
0.734
0.682
0.579
0.389
0.132
Multi-Attention Network for Compressed Video Referring Object Segmentation
MTTR (w=8)
0.447
0.618
0.702
0.721
0.684
0.607
0.456
0.164
End-to-End Referring Video Object Segmentation with Multimodal Transformers
RefVOS
-
0.599
0.599
0.495
-
-
-
0.064
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation
VLIDE
0.469
0.598
0.714
0.702
0.663
0.585
0.428
0.151
Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation
Locater
0.465
0.597
0.69
0.709
0.64
0.525
0.351
0.101
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
CMPC-V (I3D)
0.404
0.573
0.653
0.655
0.592
0.506
0.342
0.098
Cross-Modal Progressive Comprehension for Referring Segmentation
Hui et al.
0.399
0.561
0.662
0.654
0.589
0.497
0.333
0.091
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation
mmmmtbvs
0.419
0.558
0.673
0.645
0.597
0.523
0.375
0.13
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
AAMN
0.396
0.552
0.617
0.681
0.629
0.523
0.296
0.029
Actor and Action Modular Network for Text-based Video Segmentation
CMDy
0.333
0.531
0.623
0.607
0.525
0.405
0.235
0.045
Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries
PRPE
0.388
0.529
0.661
0.634
0.579
0.483
0.322
0.083
Polar Relative Positional Encoding for Video-Language Segmentation
HINet
-
0.529
0.679
0.611
0.559
0.486
0.342
0.12
Hierarchical interaction network for video object segmentation from referring expressions
CMPC-V (R2D)
0.351
0.515
0.649
0.590
0.527
0.434
0.284
0.068
Cross-Modal Progressive Comprehension for Referring Segmentation
RefVOS
-
0.497
0.672
0.578
0.534
0.456
0.311
0.093
Hierarchical interaction network for video object segmentation from referring expressions
0 of 27 row(s) selected.
Previous
Next
HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Referring Expression Segmentation
Referring Expression Segmentation On A2D
Referring Expression Segmentation On A2D
Metrics
AP
IoU mean
IoU overall
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Results
Performance results of various models on this benchmark
Columns
Model Name
AP
IoU mean
IoU overall
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Paper Title
SOC (Video-Swin-B)
0.573
0.725
0.807
0.851
0.827
0.765
0.607
0.252
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
SgMg (Video-Swin-B)
0.585
0.720
0.799
0.843
0.822
0.767
0.617
0.259
Spectrum-guided Multi-granularity Referring Video Object Segmentation
ReferFormer (Video-Swin-B)
0.550
0.703
0.786
0.831
0.804
0.741
0.579
0.212
Language as Queries for Referring Video Object Segmentation
SOC (Video-Swin-T)
0.504
0.669
0.747
0.79
0.756
0.687
0.535
0.195
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
ClawCraneNet
-
0.655
0.644
0.704
0.677
0.617
0.489
0.171
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation
MTTR (w=10)
0.461
0.64
0.72
0.754
0.712
0.638
0.485
0.169
End-to-End Referring Video Object Segmentation with Multimodal Transformers
MANET
0.471
0.632
0.726
0.734
0.682
0.579
0.389
0.132
Multi-Attention Network for Compressed Video Referring Object Segmentation
MTTR (w=8)
0.447
0.618
0.702
0.721
0.684
0.607
0.456
0.164
End-to-End Referring Video Object Segmentation with Multimodal Transformers
RefVOS
-
0.599
0.599
0.495
-
-
-
0.064
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation
VLIDE
0.469
0.598
0.714
0.702
0.663
0.585
0.428
0.151
Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation
Locater
0.465
0.597
0.69
0.709
0.64
0.525
0.351
0.101
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
CMPC-V (I3D)
0.404
0.573
0.653
0.655
0.592
0.506
0.342
0.098
Cross-Modal Progressive Comprehension for Referring Segmentation
Hui et al.
0.399
0.561
0.662
0.654
0.589
0.497
0.333
0.091
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation
mmmmtbvs
0.419
0.558
0.673
0.645
0.597
0.523
0.375
0.13
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
AAMN
0.396
0.552
0.617
0.681
0.629
0.523
0.296
0.029
Actor and Action Modular Network for Text-based Video Segmentation
CMDy
0.333
0.531
0.623
0.607
0.525
0.405
0.235
0.045
Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries
PRPE
0.388
0.529
0.661
0.634
0.579
0.483
0.322
0.083
Polar Relative Positional Encoding for Video-Language Segmentation
HINet
-
0.529
0.679
0.611
0.559
0.486
0.342
0.12
Hierarchical interaction network for video object segmentation from referring expressions
CMPC-V (R2D)
0.351
0.515
0.649
0.590
0.527
0.434
0.284
0.068
Cross-Modal Progressive Comprehension for Referring Segmentation
RefVOS
-
0.497
0.672
0.578
0.534
0.456
0.311
0.093
Hierarchical interaction network for video object segmentation from referring expressions
0 of 27 row(s) selected.
Previous
Next