Junjie Ke Qifei Wang Yilin Wang Peyman Milanfar Feng Yang

Abstract
Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ, SPAQ and KonIQ-10k.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-quality-assessment-on-msu-nr-vqa | MUSIQ | KLCC: 0.7433 PLCC: 0.9068 SRCC: 0.9004 |
| video-quality-assessment-on-msu-sr-qa-dataset | MUSIQ trained on KONIQ | KLCC: 0.51897 PLCC: 0.59151 SROCC: 0.64589 Type: NR |
| video-quality-assessment-on-msu-sr-qa-dataset | MUSIQ trained on AVA | KLCC: 0.44669 PLCC: 0.52404 SROCC: 0.56152 Type: NR |
| video-quality-assessment-on-msu-sr-qa-dataset | MUSIQ trained on SPAQ | KLCC: 0.52673 PLCC: 0.60216 SROCC: 0.64927 Type: NR |
| video-quality-assessment-on-msu-sr-qa-dataset | MUSIQ trained on PaQ-2-PiQ | KLCC: 0.55312 PLCC: 0.66531 SROCC: 0.67746 Type: NR |
| video-quality-assessment-on-msu-video-quality | MUSIQ | KLCC: 0.7433 PLCC: 0.9068 SRCC: 0.9004 Type: NR |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.