HyperAIHyperAI

Command Palette

Search for a command to run...

MiniCPM5-1B, Trained Using RL+OPD, Achieves state-of-the-art (SOTA) Performance on Multiple Complex Tasks; the CHI-Bench Dataset for Evaluating Medical Agents, Designed for Automation of Complex Healthcare Processes, Has Been released.

Featured Image

MiniCPM5-1B is an open-source language model with 1 billion parameters, designed for edge deployment and resource-constrained scenarios. It is the first model in the MiniCPM5 series. Based on the standard Llama architecture, it introduces features including... A hybrid inference paradigm based on tags. Furthermore, this model leverages advanced RL+OPD training techniques to significantly improve core performance while effectively eliminating output redundancy. It natively supports ultra-long contexts of up to 131K characters.It has achieved a 1B-level open-source state-of-the-art (SOTA) level in complex tasks such as agent invocation and code synthesis.This model effectively avoids the latency and privacy dilemmas of cloud-based inference, providing an ideal solution for building an efficient local AI platform.

The HyperAI website now features "MiniCPM5-1B: A High-Efficiency 1B LLM for Edge Applications." Give it a try!

Online use:https://go.hyper.ai/OBlhv

Welcome to visit our official website for more information:

https://hyper.ai

A quick overview of updates on the hyper.ai website from May 30th to June 5th:

* High-quality public datasets: 6

* A selection of high-quality tutorials: 5

* Community article analysis: 1 article

* Popular encyclopedia entries: 5

Visit the official website:hyper.ai

Selected public datasets

1. chi-bench Medical Intelligent Agent Benchmark Evaluation Dataset

chi-bench is a healthcare agent evaluation dataset released by Actava AI in 2026. This dataset constructs a high-fidelity healthcare business simulation environment, integrating 20 healthcare application systems through the MCP (Model Context Protocol) open interface and providing a knowledge base containing 1,279 healthcare operation documents. The evaluation scenarios cover three major areas in the US healthcare system: pre-authorization management, citation management for health insurance/insurance providers, and population care management.

Online use:https://go.hyper.ai/j8pCr

2. SMOL Multilingual Translation Parallel Dataset

SMOL is a professional translation dataset released by Google in 2025. This dataset includes professionally translated texts in 221 languages, including Amharic, Swahili, and Afar, as well as less commonly annotated languages/regional languages with scarce data. It covers a wide range of language pairs, including both professional translations and texts contributed by volunteers, and adds vertical data and factual annotations related to the medical field for some languages.

Online use:https://go.hyper.ai/84QS4

3. TACK Targeted Chimera Knowledge Base Dataset

TACK is a standardized knowledge base dataset and benchmark set released by AI Laboratory for Molecular Engineering in 2026. It aims to address the problems of data scarcity, lack of rigorous evaluation, and limited coverage in existing PROTAC machine learning benchmarks. It is widely used in fields such as PROTAC degradation activity prediction, targeted protein degradation (TPD) research, AI-assisted drug discovery (AIDD), computer-aided drug design (CADD), virtual drug screening, multi-task learning, molecular property prediction, graph neural network research, and machine learning benchmark testing.

Online use:https://go.hyper.ai/7gDJu

4. EAVSD E-commerce Advertising Video Storyboard Dataset

EAVSD is an e-commerce advertising video storyboard dataset released by a team from Peking University in 2026. It aims to support subject-oriented multi-image generation and narrative planning tasks. This dataset is widely used in subject-oriented multi-image generation and narrative planning tasks, with a core focus on e-commerce advertising video storyboard generation and controllable long-range visual consistency research.

Online use:https://go.hyper.ai/hyzLx

5. DeepCrack Infrastructure Crack Detection Dataset

DeepCrack is a benchmark dataset for infrastructure crack detection provided by the Computer Vision and Remote Sensing Laboratory of Wuhan University. It aims to provide standardized and high-precision supervised learning data support for crack detection algorithm research. It can be directly used for training and evaluation of deep learning models such as U-Net, DeepLab, and SegNet, and is widely used in research directions such as structural health monitoring, road inspection, and building defect identification.

Online use:https://go.hyper.ai/88zlH

Dataset Example

6. World Air Pollution and AQI Dataset

The World Air Pollution and AQI is a global air quality dataset for research and data analysis. This dataset contains monthly city-level observation data from 2014 to 2025, totaling 331,920 records, covering 24 countries across 5 continents, including China, the United States, the United Kingdom, France, Germany, Japan, and South Korea. It includes 24 features, encompassing air pollutant concentrations, air quality index, meteorological variables, and social and environmental indicators.

Online use:https://go.hyper.ai/QL8VK

Selected Public Tutorials

1. MiniCPM5-1B: High-efficiency 1B LLM for edge-side applications

MiniCPM5-1B is the first model in the MiniCPM5 series released by the OpenBMB team. It is designed for edge deployment and resource-constrained scenarios. It adopts a 1B parameter-dense Transformer architecture and achieves state-of-the-art performance among open-source models of the same size. It is particularly good at agentic tool calls, code generation, and challenging inference tasks.

Run online:https://go.hyper.ai/OBlhv

Demo Page

2. HiDream-O1-Image Image Generation System

HiDream-O1-Image is a native unified image generation foundation model, launched by the HiDream.ai team in 2026. The model is built on a pixel-level unified Transformer (UiT) architecture. Unlike traditional models, it does not rely on external VAEs or separate text encoders, but instead natively encodes pixels and text within a single, shared token space.

Run online:https://go.hyper.ai/XkyGK

Demo Page

3. X2SAM: A unified model for arbitrary image and video segmentation

X2SAM, released in April 2026 by Sun Yat-sen University, Pengcheng Laboratory, and Meituan team, is a multimodal large model for unified image and video segmentation scenarios. The core feature of this project is that it unifies text prompts, visual prompts, and image/video segmentation into a single interactive process.

Run online:https://go.hyper.ai/OAndb

Demo Page

4. LocateAnything-3B: A fast, high-quality visual language localization model

Released by NVIDIA in 2026, LocateAnything-3B is a 3B-parameter visual language localization model in the Eagle VLM series, designed for tasks such as open object detection, point expression localization, OCR text localization, GUI element localization, and pointing in images and videos. The core feature of this model is Parallel Box Decoding: it predicts complete bounding box coordinates as structured blocks in parallel, rather than generating coordinates through token-by-token autoregression, thereby improving localization throughput while maintaining geometric consistency.

Run online:https://go.hyper.ai/DxUFC

Demo Page

5. Granite 4.1 8B: Supports dialogue, encoding, RAG, and tool calls.

Granite 4.1 language models are a new generation of open-source foundational models launched by IBM in 2026, encompassing dense decoder architectures at three scales: 3B, 8B, and 30B. Granite 4.1 8B, as the high-performance version in this series, achieves the superior performance required for enterprise applications while maintaining a lightweight parameter scale. This model natively supports multilingual capabilities, a wide range of encoding tasks, Retrieval Enhancement Generation (RAG), tool usage, and structured JSON output, providing robust technical support for real-world applications.

Run online:https://go.hyper.ai/Fpzl7

Demo Page

Community article interpretation

1. The National University of Singapore proposes an AI-computational chemistry collaborative process to accelerate the repositioning of drugs for diabetic wound healing, reducing the R&D cycle by over 701 TP3T!

A research team at the National University of Singapore has proposed a collaborative computational nanomedicine research process that combines artificial intelligence and computational chemistry (AI-CC). This process deeply couples literature mining driven by large language models (qualitative insight) with multi-stage molecular simulation dominated by computational chemistry (quantitative verification), constructing a closed-loop research system for drug-protein nano-interactions and accelerating the repositioning and development of drugs for diabetic wound healing.

View the full report:https://go.hyper.ai/OXs3N

Popular Encyclopedia Articles

1. World Action Model WAM

2. Visual Language Action Model (VLA)

3. Human-in-the-loop

4. Learning While Deploying

5. Reciprocal Rank Fusion

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://go.hyper.ai/wiki

The above is all the content of this week’s editor’s selection. If you have resources that you want to include on the hyper.ai official website, you are also welcome to leave a message or submit an article to tell us!

See you next week!

About HyperAI

HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:

* Provides domestic accelerated download nodes for 2100+ public datasets

* Includes 700+ classic and popular online tutorials

* Analyzing 300+ AI4Science Paper Cases

* Supports searching for 700+ related terms

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai