HyperAI

Abstract

Recent years have witnessed increasing research attention towards pedestriandetection by taking the advantages of different sensor modalities (e.g. RGB,IR, Depth, LiDAR and Event). However, designing a unified generalist model thatcan effectively process diverse sensor modalities remains a challenge. Thispaper introduces MMPedestron, a novel generalist model for multimodalperception. Unlike previous specialist models that only process one or a pairof specific modality inputs, MMPedestron is able to process multiple modalinputs and their dynamic combinations. The proposed approach comprises aunified encoder for modal representation and fusion and a general head forpedestrian detection. We introduce two extra learnable tokens, i.e. MAA andMAF, for adaptive multi-modal feature fusion. In addition, we construct theMMPD dataset, the first large-scale benchmark for multi-modal pedestriandetection. This benchmark incorporates existing public datasets and a newlycollected dataset called EventPed, covering a wide range of sensor modalitiesincluding RGB, IR, Depth, LiDAR, and Event data. With multi-modal jointtraining, our model achieves state-of-the-art performance on a wide range ofpedestrian detection benchmarks, surpassing leading models tailored forspecific sensor modality. For example, it achieves 71.1 AP on COCO-Persons and72.6 AP on LLVIP. Notably, our model achieves comparable performance to theInternImage-H model on CrowdHuman with 30x smaller parameters. Codes and dataare available at https://github.com/BubblyYi/MMPedestron.

Benchmark	Methodology	Metrics
multispectral-object-detection-on-flir-1	MMPedestron	mAP50: 86.4%
object-detection-on-crowdhuman-full-body	MMPedestron	AP: 97.1 mMR: 30.8
object-detection-on-eventped	MMPedestron	AP: 79.0
object-detection-on-inoutdoor	MMPedestron	AP: 65.7
object-detection-on-stcrowd	MMPedestron	AP: 74.9
pedestrian-detection-on-llvip	MMPedestron	AP: 0.726
pedestrian-detection-on-mmpd-dataset	MMPedestron	box mAP: 79.0

Benchmark

Methodology

Metrics

multispectral-object-detection-on-flir-1

MMPedestron

mAP50: 86.4%

object-detection-on-crowdhuman-full-body

MMPedestron

AP: 97.1

mMR: 30.8

object-detection-on-eventped

MMPedestron

AP: 79.0

object-detection-on-inoutdoor

MMPedestron

AP: 65.7

object-detection-on-stcrowd

MMPedestron

AP: 74.9

pedestrian-detection-on-llvip

MMPedestron

AP: 0.726

pedestrian-detection-on-mmpd-dataset

MMPedestron

box mAP: 79.0

Abstract

Benchmark	Methodology	Metrics
multispectral-object-detection-on-flir-1	MMPedestron	mAP50: 86.4%
object-detection-on-crowdhuman-full-body	MMPedestron	AP: 97.1 mMR: 30.8
object-detection-on-eventped	MMPedestron	AP: 79.0
object-detection-on-inoutdoor	MMPedestron	AP: 65.7
object-detection-on-stcrowd	MMPedestron	AP: 74.9
pedestrian-detection-on-llvip	MMPedestron	AP: 0.726
pedestrian-detection-on-mmpd-dataset	MMPedestron	box mAP: 79.0

Benchmark

Methodology

Metrics

multispectral-object-detection-on-flir-1

MMPedestron

mAP50: 86.4%

object-detection-on-crowdhuman-full-body

MMPedestron

AP: 97.1

mMR: 30.8

object-detection-on-eventped

MMPedestron

AP: 79.0

object-detection-on-inoutdoor

MMPedestron

AP: 65.7

object-detection-on-stcrowd

MMPedestron

AP: 74.9

pedestrian-detection-on-llvip

MMPedestron

AP: 0.726

pedestrian-detection-on-mmpd-dataset

MMPedestron

box mAP: 79.0

When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

Yi Zhang Wang Zeng Sheng Jin Chen Qian Ping Luo Wentao Liu

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters

When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

Yi Zhang Wang Zeng Sheng Jin Chen Qian Ping Luo Wentao Liu

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters

Command Palette

When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

Yi Zhang Wang Zeng Sheng Jin Chen Qian Ping Luo Wentao Liu

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters

Command Palette

When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

Yi Zhang Wang Zeng Sheng Jin Chen Qian Ping Luo Wentao Liu

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters