Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

2 months ago

Existing zero-shot text-to-speech (TTS) models typically only support a limited number of languages, ignoring a large number of low-resource languages. To overcome this limitation,Xiaomi AI Labs' next-generation Kaldi team has launched OmniVoice—a large-scale, multilingual, zero-shot TTS model that supports over 600 languages.It abandons the cumbersome traditional two-stage cascaded architecture and adopts a streamlined single-stage discrete non-autoregressive (NAR) framework to directly map text to acoustic markers. Trained on 581,000 hours of pure open-source data, OmniVoice achieves the broadest language coverage to date.

Currently, the HyperAI website has launched [the relevant section/feature].OmniVoice: Supports high-quality TTS in 600+ languagesCome and try it!

Online use:https://go.hyper.ai/BvKri

Welcome to visit our official website for more information:

https://hyper.ai

A quick overview of hyper.ai's official website updates from April 11th to April 17th:

* High-quality public datasets: 11

* A selection of high-quality tutorials: 6

* Community article analysis: 2 articles

* Popular encyclopedia entries: 5

Top conferences with April deadlines: 2

Visit the official website:hyper.ai

Selected public datasets

1. Stroke Risk Dataset

Stroke Risk is a dataset for stroke risk analysis and prediction in the healthcare context. Built upon common clinical risk factors, this dataset includes demographic information, medical history, lifestyle factors, and key health indicators. It reflects the probability of stroke occurrence under different health and lifestyle conditions, aiming to support machine learning models in predicting and analyzing stroke risk, helping to identify key influencing factors, and thereby improving early screening and prevention capabilities.

Online use:https://go.hyper.ai/6CTH5

2. ToolACE Complex Tool Learning Dialogue Dataset

ToolACE is an automated agent pipeline dataset for tool learning tasks. This dataset contains multi-step conversation examples, calling 26,507 diverse APIs. The samples are generated through multi-agent interactions and undergo a two-layer quality assurance process of rule checking and model validation. Each dialogue represents a multi-step, multi-source information retrieval and analysis task, realistically simulating tool call scenarios and providing high-value training data for LLM (Low-Level Learning).

Online use:https://go.hyper.ai/o3E12

3.CHOCLO Latin American Cultural Benchmark Dataset

The CHOCLO dataset is a benchmark dataset specifically designed to evaluate the knowledge of Latin American culture in language models. It aims to assess the accuracy of language models in representing Latin American culture, and is designed to address real-world issues such as the underestimation, omissions, and biases of Latin American culture in language models.

Online use:https://go.hyper.ai/pjVQi

4. DRACO Cross-Disciplinary In-Depth Research Benchmark Dataset

The DRACO dataset, released by the Perplexity team, is a dataset designed for evaluating complex research tasks and aims to systematically assess the comprehensive capabilities of deep research systems in terms of accuracy, completeness, and objectivity.

Online use:https://go.hyper.ai/hIWgS

5. MDPBench Multilingual Document Parsing Benchmark Dataset

MDPBench is a benchmark dataset for parsing multilingual digital and photographic documents, designed to evaluate and improve models’ ability to parse multilingual documents in real-world, complex scenarios.

Online use:https://go.hyper.ai/1Mc9a

6. World Model Bench dataset

World Model Bench is the world's first benchmark for evaluating the cognitive capabilities of world models and embodied AI systems. It aims to go beyond traditional image and video quality assessments, focusing instead on the cognitive abilities of models. This dataset is built around evaluating world model capabilities, covering three core dimensions: perception, cognition, and embodiment. It is further subdivided into 10 task categories, including environmental understanding, entity recognition and classification, and prediction-based reasoning, and includes 100 diverse scenarios designed to systematically evaluate the cognitive and decision-making abilities of models in complex environments.

Online use:https://go.hyper.ai/hY0aP

7. Credit Card Fraud Detection Dataset

Credit Card Fraud is a dataset for detecting credit card fraud in financial transaction scenarios. It aims to support machine learning models in identifying and modeling abnormal transactions, focusing on solving the problem of extreme class imbalance in financial scenarios, thereby improving the detection capabilities of models in real business environments.

Online use:https://go.hyper.ai/3d8nS

8. Spam Email Detection Dataset

The Spam Email Detection dataset is a labeled email dataset for spam detection tasks. This dataset aims to support research related to classification modeling, natural language processing, and feature engineering, and improve the model's ability to identify spam.

Online use:https://go.hyper.ai/HkpX5

9. Simple Voice Questions dataset

Simple Voice Questions is a short audio dataset released by Google. This multilingual voice dataset contains short audio questions in 17 languages from 26 regions, with a total of approximately 700 speakers. Each speaker provides up to 250 voice samples, covering multiple languages such as Arabic, English, Japanese, Korean, and Hindi, and including diverse recording conditions such as quiet environments, background voices, and traffic noise.

Online use:https://go.hyper.ai/lrKpK

10. COCO-2017-Vietnamese Vietnamese Image Detection Dataset

COCO-2017-Vietnamese is a Vietnamese localization extension dataset built upon the Common Objects in Context 2017 dataset proposed by Microsoft, and compiled and released by the AI Enthusiasm community. This dataset introduces high-quality Vietnamese translations on top of the original English image descriptions, providing a comprehensive benchmark within a bilingual framework, suitable for tasks such as image captioning and multimodal learning.

Online use:https://go.hyper.ai/VM6gY

11. GPT-5.4-step-by-step-reasoning dataset

The GPT-5.4 step-by-step-reasoning dataset is a high-density synthetic reasoning dataset designed for long-chain reasoning (CoT) modeling and complex problem-solving tasks. This dataset contains approximately 1,500 elite-level samples covering high-complexity domains such as mathematics, programming, and medicine, with task difficulty uniformly set at the "Grandmaster" and "Beyond-PhD" levels.

Online use:https://go.hyper.ai/HjJlT

Selected Public Tutorials

1. OmniVoice: Supports high-quality TTS in 600+ languages.

OmniVoice is a multilingual text-to-speech (TTS) model developed by Xiaomi AI Lab's Next-gen Kaldi team, supporting high-quality speech synthesis in over 600 languages. Based on an iterative unmasked decoding architecture, the project implements three core functions: voice cloning, voice design, and automatic voice.

Run online:https://go.hyper.ai/BvKri

2. DeepTutor Personal Learning Assistant

DeepTutor, launched in March 2026 by the Data Intelligence Lab at the University of Hong Kong, is a comprehensive AI-driven teaching system and a personal learning assistant. The project integrates four core functional modules: massive document-based knowledge Q&A, interactive learning visualization, knowledge reinforcement and practice question generation, and in-depth research and creative generation, providing learners with a one-stop intelligent learning experience.

Run online:https://go.hyper.ai/8YnI3

3. VoxCPM2 Voice Reproduction: 30+ Languages, 9 Dialects

VoxCPM2 is a 2B parameter-scale tokenizer-free text-to-speech model released by OpenBMB in April 2026. It supports 30 languages, requires no additional language tags, and covers various use cases, including generating new timbres from scratch, controlled cloning based on reference audio, extreme cloning by combining reference audio with transcribed text, and automatically adjusting tone and expressiveness based on text content. The official specifications also emphasize 48 kHz output, compatibility with 16 kHz reference audio, and context-aware expression.

Run online:https://go.hyper.ai/RLgK9

4. One-click deployment of Nemotron-Cascade-2-30B-A3B

Released by NVIDIA in March 2026, Nemotron-Cascade-2-30B-A3B is an open-source large language model with 30B MoE and approximately 3B activated parameters, trained on Nemotron-3-Nano-30B-A3B-Base. The model's core focus is providing strong inference, dialogue, code-related, and agency capabilities, while simultaneously supporting both thinking mode and instruction mode.

Run online:https://go.hyper.ai/GoEaW

5. Netflix VOID: A revolutionary video object removal technology with physical awareness.

Netflix VOID is a video editing model jointly open-sourced by the Netflix team and Sofia University in April 2026. With 5 billion parameters, the Netflix VOID model is designed to solve the physical consistency problem in film post-production, aiming to overcome the limitations of traditional video completion techniques in handling the causal logic of complex object interactions.

Run online:https://go.hyper.ai/uZoMl

6. Fun-CineForge: A unified model for zero-sample dubbing in diverse film and television scenarios

Fun-CineForge is a zero-shot film dubbing project jointly launched by the Tongyi Labs Speech Team and the University of Science and Technology of China in January 2026. The project includes an end-to-end dataset pipeline for producing large-scale dubbing datasets and a dubbing model based on a Large Multimodal Model (LMM), designed for diverse film scenarios.

Run online:https://go.hyper.ai/DyQKk

Community article interpretation

1. AI-driven de novo design of diverse small-molecule binding proteins: A South Korean team discovered a protein that can selectively recognize stress hormones.

A research team from the Department of Biological Sciences at the Korea Advanced Institute of Science and Technology (KAIST) has used deep learning-driven protein structure generation and sequence design methods to de novo design diverse small-molecule binding proteins using the NTF2-like fold as a core "universal backbone," and further transformed them into sensors similar to chemically induced dimerization (CID). The researchers successfully designed a protein capable of selectively recognizing the stress hormone cortisol and developed an artificial intelligence biosensor based on this.

View the full report:https://go.hyper.ai/FpAXm

2. A French team successfully predicted 2.39 million antiphage proteins and used a deep learning model to map bacterial antiviral immunity.

Researchers at the Pasteur Institute in France have developed and fine-tuned three complementary deep learning models for large-scale prediction of phage resistance. The ALBERT_DF model relies solely on local genomic context for inference; ESM_DF uses a protein language model to parse amino acid sequences; and GeneCLR_DF integrates sequence information with genomic context.

View the full report:https://go.hyper.ai/J5Oz3

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

2 months ago

Information

AI for Science

Artificial Intelligence

Dataset

Machine Learning

Deep Learning

Currently, the HyperAI website has launched [the relevant section/feature].OmniVoice: Supports high-quality TTS in 600+ languagesCome and try it!

Online use:https://go.hyper.ai/BvKri

Welcome to visit our official website for more information:

https://hyper.ai

A quick overview of hyper.ai's official website updates from April 11th to April 17th:

* High-quality public datasets: 11

* A selection of high-quality tutorials: 6

* Community article analysis: 2 articles

* Popular encyclopedia entries: 5

Top conferences with April deadlines: 2

Visit the official website:hyper.ai

Selected public datasets

1. Stroke Risk Dataset

Online use:https://go.hyper.ai/6CTH5

2. ToolACE Complex Tool Learning Dialogue Dataset

Online use:https://go.hyper.ai/o3E12

3.CHOCLO Latin American Cultural Benchmark Dataset

Online use:https://go.hyper.ai/pjVQi

4. DRACO Cross-Disciplinary In-Depth Research Benchmark Dataset

Online use:https://go.hyper.ai/hIWgS

5. MDPBench Multilingual Document Parsing Benchmark Dataset

Online use:https://go.hyper.ai/1Mc9a

6. World Model Bench dataset

Online use:https://go.hyper.ai/hY0aP

7. Credit Card Fraud Detection Dataset

Online use:https://go.hyper.ai/3d8nS

8. Spam Email Detection Dataset

Online use:https://go.hyper.ai/HkpX5

9. Simple Voice Questions dataset

Online use:https://go.hyper.ai/lrKpK

10. COCO-2017-Vietnamese Vietnamese Image Detection Dataset

Online use:https://go.hyper.ai/VM6gY

11. GPT-5.4-step-by-step-reasoning dataset

Online use:https://go.hyper.ai/HjJlT

Selected Public Tutorials

1. OmniVoice: Supports high-quality TTS in 600+ languages.

Run online:https://go.hyper.ai/BvKri

2. DeepTutor Personal Learning Assistant

Run online:https://go.hyper.ai/8YnI3

3. VoxCPM2 Voice Reproduction: 30+ Languages, 9 Dialects

Run online:https://go.hyper.ai/RLgK9

4. One-click deployment of Nemotron-Cascade-2-30B-A3B

Run online:https://go.hyper.ai/GoEaW

5. Netflix VOID: A revolutionary video object removal technology with physical awareness.

Run online:https://go.hyper.ai/uZoMl

6. Fun-CineForge: A unified model for zero-sample dubbing in diverse film and television scenarios

Run online:https://go.hyper.ai/DyQKk

Community article interpretation

1. AI-driven de novo design of diverse small-molecule binding proteins: A South Korean team discovered a protein that can selectively recognize stress hormones.

View the full report:https://go.hyper.ai/FpAXm

2. A French team successfully predicted 2.39 million antiphage proteins and used a deep learning model to map bacterial antiviral immunity.

View the full report:https://go.hyper.ai/J5Oz3

Command Palette

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Selected public datasets

Selected Public Tutorials

Community article interpretation

Popular Encyclopedia Articles

Command Palette

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Selected public datasets

Selected Public Tutorials

Community article interpretation

Popular Encyclopedia Articles

Related News

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Online Tutorial | Supports 600+ Languages, Xiaomi Open Sources OmniVoice: Achieve Voice Cloning With Just 3-10 Seconds of Reference Audio

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Online Tutorial | Deploy OpenClaw Using Free CPU and Easily Integrate With Social Software Such As Lark/Discord

Command Palette

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Selected public datasets

Selected Public Tutorials

Community article interpretation

Popular Encyclopedia Articles

Related News

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Online Tutorial | Supports 600+ Languages, Xiaomi Open Sources OmniVoice: Achieve Voice Cloning With Just 3-10 Seconds of Reference Audio

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Online Tutorial | Deploy OpenClaw Using Free CPU and Easily Integrate With Social Software Such As Lark/Discord

Related News

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Online Tutorial | Supports 600+ Languages, Xiaomi Open Sources OmniVoice: Achieve Voice Cloning With Just 3-10 Seconds of Reference Audio

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Online Tutorial | Deploy OpenClaw Using Free CPU and Easily Integrate With Social Software Such As Lark/Discord

Related News

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Online Tutorial | Supports 600+ Languages, Xiaomi Open Sources OmniVoice: Achieve Voice Cloning With Just 3-10 Seconds of Reference Audio

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Online Tutorial | Deploy OpenClaw Using Free CPU and Easily Integrate With Social Software Such As Lark/Discord