HyperAIHyperAI

Command Palette

Search for a command to run...

AI Paper Weekly Report | De Novo Protein Design / First open-source Agent Solution / HunyuanOCR / Olmo 3 Language model... One-click Overview

Featured Image

Multimodal large language models (MLLMs) have great potential for achieving human-like interaction, but their development is facing a key challenge: the lack of a fine-grained evaluation framework for human-centered scenarios that can simultaneously measure the model’s ability to understand complex human intentions and provide empathetic, context-aware feedback.

Based on this, a research team from Xi'an Jiaotong University, in collaboration with Ant Group, proposed HumanSense, a comprehensive benchmark designed to evaluate the human-centered perception and interaction capabilities of MLLMs, with a particular focus on deep understanding of extended multimodal contexts and the formulation of reasonable responses. Results show that MLLMs have significant room for improvement in human-centered scenarios, especially in tasks oriented towards high-level interactions. The researchers also designed a multi-stage, modality-progressive reinforcement learning method, forming HumanSense-Omni-Reasoning, which significantly improves performance on high-level understanding and interaction tasks.

Paper link:https://go.hyper.ai/xYM02

Latest AI Papers:https://go.hyper.ai/hzChC

In order to let more users know the latest developments in the field of artificial intelligence in academia, HyperAI's official website (hyper.ai) has now launched a "Latest Papers" section, which updates cutting-edge AI research papers every day.Here are 5 popular AI papers we recommend, let’s take a quick look at this week’s cutting-edge AI achievements⬇️

This week's paper recommendation

1.JAM-2

Title: JAM-2: Fully computational design of drug-like antibodies with high success rates

This paper introduces JAM-2, a universal de novo protein design system that, for the first time, achieves highly efficient design of VHH-Fc antibodies and full-length monoclonal antibodies (mAbs) with drug-like affinity and developability, while achieving double-digit success rates with an unprecedented breadth of targets and epitopes. Among 16 unseen targets, JAM-2 successfully obtained binding molecules for all targets, with average success rates of 39% (VHH-Fc) and 18% (mAb).

Paper link:https://go.hyper.ai/3Mfna

JAM-2 utilizes drug-like affinity to design antibodies against previously unseen targets, exhibiting double-digit binding rates.

2.Olmo 3

This article introduces Olmo 3, an industry-leading family of fully open-source language models with 7B and 32B parameter scales. Olmo 3 models are designed to enable long-context reasoning, function calls, programming, instruction following, general dialogue, and knowledge retrieval. This release includes a complete model flow, covering the entire lifecycle of the model family from build to deployment, encompassing all training phases, checkpoints, datapoints, and dependencies.

Paper link:https://go.hyper.ai/HgvWV

Model Workflow Diagram

3.Lumine

Title: Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

This paper proposes Lumine, the first open-source general-purpose intelligent agent development scheme capable of executing complex tasks for hours in real-time in complex 3D open-world environments. The model employs a human-like interaction paradigm, unifying perception, reasoning, and action in an end-to-end manner through a vision-language model. It processes raw pixel input at a rate of 5 frames per second, generates precise keyboard and mouse actions at 30 frames per second, and dynamically invokes the inference module only when necessary.

Paper link:https://go.hyper.ai/6qg4A

Model Overview

4.HumanSense

Title: HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs

This paper proposes HumanSense, a comprehensive benchmarking framework designed to evaluate the capabilities of MLLMs in human-centric perception and interaction, with a particular focus on deep understanding of long-term multimodal contexts and the generation of reasonable responses. Our evaluation results show that current leading MLLMs still have significant room for improvement in high-level interaction tasks. Furthermore, this paper designs a multi-stage, modality-progressive reinforcement learning approach to construct the HumanSense-Omni-Reasoning model, which significantly improves the model's performance in high-level understanding and interaction tasks.

Paper link:https://go.hyper.ai/xYM02

HumanSense is designed according to a hierarchical structure.

5.HunyuanOCR Technical Report

This paper proposes HunyuanOCR, a commercial-grade, open-source, and lightweight (1 billion parameters) visual-language model (VLM) for OCR tasks. The model architecture consists of a native visual Transformer (ViT) and a lightweight large language model (LLM) connected via an MLP adapter. HunyuanOCR demonstrates superior performance, surpassing existing commercial APIs, traditional processing workflows, and models with larger parameter counts (such as Qwen3-VL-4B).

Paper link:https://go.hyper.ai/KxstF

Model architecture diagram

The above is all the content of this week’s paper recommendation. For more cutting-edge AI research papers, please visit the “Latest Papers” section of hyper.ai’s official website.

We also welcome research teams to submit high-quality results and papers to us. Those interested can add the NeuroStar WeChat (WeChat ID: Hyperai01).

See you next week!