Event Preview | Shanghai Innovation Lab, TileAI, Huawei, and Advanced Compiler Lab Gather in Shanghai; TVM, TileRT, PyPTO, and Triton Showcase Their Unique strengths.

As AI models continue to grow in size, developers and engineering teams are placing increasingly stringent demands on computing performance, resource utilization, and execution efficiency. For this reason, AI compilers are becoming a crucial hub between hardware and applications, providing efficient execution and intelligent computing power scheduling for training and inference.
Driven by this trend, the industry's demand for cutting-edge technology exchange and best practice sharing has also increased. More and more teams hope to explore new methods for computing power optimization, verify implementation paths, and learn from real-world scenarios through in-depth face-to-face discussions.
The Meet AI Compiler technical salon, hosted by HyperAI, has consistently brought together experts, scholars, and frontline engineers from research institutions and enterprises to discuss everything from technological innovation to application challenges, providing a platform for exchange. This July, the 7th Meet AI Compiler technical salon in Beijing successfully concluded, with lively discussions and valuable technical insights sparking continuous interaction!
On December 27th, the 8th Meet AI Compiler was held as scheduled.In this issue, we invited several experts from Shanghai Innovation Academy, TileAI Community, Huawei HiSilicon, Advanced Compiler Lab, and other institutions to share their insights across the entire technology chain, from software stack design and operator development to performance optimization. The content covers cross-ecosystem interoperability of TVM, fusion operator optimization of PyPTO, low-latency systems of TileRT, and multi-architecture acceleration of Triton, presenting a complete technical path from theory to implementation.
Registration is now open, seats are limited! Come join us for valuable insights – we'll be waiting for you in Shanghai! 🫶
Event Details
⏰ Time: December 27th (Saturday) 13:30-17:30
📍 Location: Shanghai Innovation Academy, No. 3, Lane 699, Huafa Road, Xuhui District, Shanghai
👬 Number of participants: 150 (Limited seating available, please register as soon as possible)
🙌🏻 Registration link:https://hdxu.cn/1CupU
Guests and Agenda
Sharing guests
13:40-17:20

Feng Siyuan
Assistant Professor at Shanghai Innovation Academy, Apache TVM PMC
Share topic:TVM FFI: Open ABI and FFI for Machine Learning Systems
Contents:TVM FFI aims to solve the problems of fragmented ecosystems and interoperability in machine learning systems. By defining open ABI and FFI standards, the project utilizes the stable C ABI and DLPack to achieve zero-copy data transfer, bridging the gap between frameworks like PyTorch and the underlying compiler. It supports efficient cross-language calls, significantly reducing the engineering costs of multi-platform adaptation.
Watch this sharing session and you will learn:
1. Learn the TVM-FFI universal standard to significantly reduce the development and maintenance costs of cross-language MLsys.
2. Understand and build a future-compatible modular ML ecosystem

Xue Jilong
Tile-AI Community Founding Member
Share topic:TileRT: A Software and Hardware Exploration for Low-Latency Large Model Inference
Contents:As large models reach trillions of parameters and process sequences exceeding millions of tokens, their capabilities are constantly breaking records. However, the pursuit of ultimate computational speed for models has never ceased. On one hand, many low-latency scenarios require responses within seconds or even milliseconds, such as real-time decision-making and game theory. On the other hand, with the advent of the Agent era in large model training, the rollout time for extremely long sequences has become a major bottleneck.
This report introduces the TileRT project, exploring how to build a software stack for large-scale model computation with extremely low latency, from the perspectives of AI compilers, runtime, and architecture design.
Watch this sharing session and you will learn:
1. Understand the background, importance, and future prospects of low-latency inference scenarios for large models.
2. TileRT's Technical Challenges and Practical Experience Sharing

Wang Chao
Huawei HiSilicon Software Engineer
Share topic:PyPTO: A framework for developing fusion operators based on white-box compilation.
Contents:This presentation focuses on Huawei's newly launched converged operator development framework, PyPTO. Based on the Tensor/Tile programming paradigm, it achieves a balance between high performance and ease of use by focusing on technologies such as in-core SRAM management, cross-platform PTO instruction sets, and MPMD runtime, combined with Human-In-The-Loop tuning and white-box compilation.
Watch this sharing session and you will learn:
1. Master the design philosophy and core architecture of PyPTO, a fusion operator development framework natively designed for SIMD architecture.
2. Master PyPTO's white-box compilation philosophy, which focuses on leveraging users' expert experience, and the essence of Human-In-The-Loop optimization.
3. Master the complete process of quickly developing high-performance fusion operators on the Ascend platform using the visualization tools provided by PyPTO.

Li Jianan
Advanced Compilation Lab Researcher
Share topic:Compilation optimization practices for the Triton compiler
Contents:This presentation focuses on optimization practices for the Triton compiler, systematically introducing Triton's language and compiler structure, ecosystem evolution, and operator library development methods. It also delves into key optimization techniques for multiple architectures, including CPU, NPU, and GPU, demonstrating the complete path to building a high-performance unified operator system.
Watch this sharing session and you will learn:
- Latest developments in the Triton ecosystem
2. Key optimization techniques of the Triton compiler on multiple architectures (CPU/NPU/GPU)

mystery guest Stay tuned
Organizers and partners

HyperAI (hyper.ai) is an internationally leading artificial intelligence and high-performance computing community.It aims to help developers and enthusiasts in the global data science and artificial intelligence industry learn, understand and practice by providing a series of services such as industry information reports, accelerated data set downloads, online tutorial demonstrations, popular model performance evaluations, cutting-edge paper recommendations, high-value results interpretations, and top conference calendar integration, and build the future of artificial intelligence together with the community.
Visit the official website:https://hyper.ai/

OpenBayes Bayesian Computing is a leading high-performance computing service provider in ChinaBy grafting classic software ecosystems and machine learning models onto new-generation heterogeneous chips, it provides industrial enterprises and university scientific research with faster and easier-to-use data science computing products. Its products have been adopted by dozens of large industrial scenarios or leading scientific research institutes.
Visit the official website:https://openbayes.com/

Shanghai Innovation Academy is a new type of talent training institution jointly built by top universities, leading enterprises, and research institutions. Adhering to the training philosophy of "student-centered and cutting-edge research," the academy explores a uniquely Chinese AI leadership talent training program through exceptional faculty, extraordinary training measures, and outstanding support conditions. It is committed to cultivating leading AI talents in China and building a world-class innovation hub for artificial intelligence.

The MLC.AI community was established in June 2022. Chen Tianqi, the main inventor of Apache TVM and a well-known young scholar in the field of machine learning, led the team to launch the MLC online course, which systematically introduced the key elements and core concepts of machine learning compilation.
In November 2022, with the joint efforts of MLC.AI community volunteers, the first complete TVM Chinese documentation was launched and successfully hosted on the HyperAI official website, further providing domestic developers interested in machine learning compilation with the basic settings for accessing and learning a new technology - documentation.
MLC Online Courses:https://mlc.ai/
TVM Chinese Documentation:https://tvm.hyper.ai/
Event Support

Given the limited space at the venue, we have only opened 150 seats available. We recommend that you register as soon as possible to secure your place.
See you there on December 27th from 13:30 to 17:30!