HyperAIHyperAI

Command Palette

Search for a command to run...

QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining

Abstract

Financial markets are noisy and non-stationary, making alpha mining highly sensitive to noise in backtesting results and sudden market regime shifts. While recent agentic frameworks improve alpha mining automation, they often lack controllable multi-round search and reliable reuse of validated experience. To address these challenges, we propose QuantaAlpha, an evolutionary alpha mining framework that treats each end-to-end mining run as a trajectory and improves factors through trajectory-level mutation and crossover operations. QuantaAlpha localizes suboptimal steps in each trajectory for targeted revision and recombines complementary high-reward segments to reuse effective patterns, enabling structured exploration and refinement across mining iterations. During factor generation, QuantaAlpha enforces semantic consistency across the hypothesis, factor expression, and executable code, while constraining the complexity and redundancy of the generated factor to mitigate crowding. Extensive experiments on the China Securities Index 300 (CSI 300) demonstrate consistent gains over strong baseline models and prior agentic systems. When utilizing GPT-5.2, QuantaAlpha achieves an Information Coefficient (IC) of 0.1501, with an Annualized Rate of Return (ARR) of 27.75% and a Maximum Drawdown (MDD) of 7.98%. Moreover, factors mined on CSI 300 transfer effectively to the China Securities Index 500 (CSI 500) and the Standard & Poor's 500 Index (S&P 500), delivering 160% and 137% cumulative excess return over four years, respectively, which indicates strong robustness of QuantaAlpha under market distribution shifts.

One-sentence Summary

Researchers from SUFE, QuantaAlpha, Stanford, PKU, SYSU, and SEU propose QuantaAlpha, an evolutionary framework that refines financial alpha factors via trajectory-level mutation and crossover, ensuring semantic consistency and reducing redundancy, achieving strong out-of-sample performance on CSI 300, CSI 500, and S&P 500.

Key Contributions

  • QuantaAlpha introduces an evolutionary framework for alpha mining that treats each mining run as a trajectory, enabling targeted refinement via mutation and crossover to overcome noise sensitivity and improve controllability in non-stationary markets.
  • The system enforces semantic consistency and complexity constraints during factor generation, while reusing high-reward trajectory segments to mitigate crowding and support reliable, auditable knowledge transfer across iterations.
  • Evaluated on CSI 300, QuantaAlpha achieves an IC of 0.1501 and 27.75% ARR with 7.98% MDD, and demonstrates strong out-of-distribution robustness by delivering 160% and 137% cumulative excess returns on CSI 500 and S&P 500 over four years.

Introduction

The authors leverage large language models to automate alpha factor discovery in financial markets, where noise and non-stationarity make traditional methods brittle and prone to overfitting. Prior agentic frameworks improve automation but suffer from fragile controllability due to noisy feedback, limited reuse of validated insights, and narrow exploration that leads to factor crowding. QuantaAlpha addresses this by treating each mining run as an evolvable trajectory, applying mutation to fix suboptimal steps and crossover to recombine high-performing segments—enabling structured, traceable refinement. It also enforces semantic consistency and complexity constraints during generation to prevent drift and redundancy. Evaluated on CSI 300, it outperforms baselines with strong transferability to CSI 500 and S&P 500, demonstrating robustness under market shifts.

Dataset

  • The authors use the CSI 300 dataset, covering 300 large-cap A-share stocks in China, with a chronological split: training (2016–2020), validation (2021), and testing (2022–2025).
  • Backtesting extends to CSI 500 and S&P 500 indices using the Qlib framework, with data splits detailed in Table 5.
  • Factor construction relies on six basic price and volume features (open, high, low, close, volume, vwap) to predict next-day returns, calculated as y_t = P_{t+2}^close / P_{t+1}^close - 1.
  • Preprocessing includes forward-filling missing values, replacing infinities, dropping samples with missing labels, and applying cross-sectional rank normalization (CSRrankNorm) to features and labels.
  • Model evaluation uses two sets of metrics: factor predictive power (IC, ICIR, Rank IC, Rank ICIR) and strategy performance (ARR, IR, MDD, CR).
  • Baselines include traditional ML, deep learning time-series models, classical factor libraries, and LLM-based agents like RD-Agent and AlphaAgent.

Method

The authors leverage a multi-agent, hypothesis-driven framework called QuantaAlpha to systematically construct and evolve alpha factors for quantitative trading. Rather than treating alpha mining as a static, one-shot model fitting task, they frame it as an iterative, agentic research workflow that generates and refines mining trajectories—ordered sequences of states and actions—from initial context to final evaluated factor. The core architecture is structured around four components: diversified planning initialization, factor realization with constraint gating, self-evolution via mutation and crossover, and a final factor pool that consolidates validated outputs.

Refer to the framework diagram, which contrasts QuantaAlpha with traditional machine learning and agent-based baselines. The system begins with a seed factor pool, from which an initialization agent generates a diversified set of market hypotheses. These hypotheses are then instantiated into executable factors through a symbolic intermediate representation, ensuring semantic fidelity and structural control. Each factor undergoes backtesting and is evaluated for predictive performance and regularization penalties. The resulting trajectories are then subjected to evolutionary operators—mutation and crossover—that iteratively refine the search space by revising suboptimal decisions or recombining high-performing segments from parent trajectories.

The factor realization module is central to maintaining controllability and interpretability. Given a hypothesis hhh, the factor agent maps it to a structured semantic description ddd, which formalizes the intended mechanism using a standardized operator library O\mathcal{O}O. This description is then assembled into a symbolic expression fff, parsed into an Abstract Syntax Tree (AST) T(f)T(f)T(f), and compiled into executable code ccc. Leaf nodes in the AST bind to raw features (e.g., highhighhigh, volumevolumevolume), while internal nodes correspond to operators such as TS_MIN, SMA, or RANK, making the computational graph transparent. To ensure fidelity, an LLM-based verifier checks alignment between the hypothesis, semantic description, and symbolic expression, as well as between the symbolic form and generated code. If inconsistencies are detected, the system regenerates or repairs the offending component.

To promote parsimony and novelty, the authors impose explicit structural constraints. Complexity is quantified as C(f)=α1SL(f)+α2PC(f)+α3log(1+Ff)C(f) = \alpha_1 \cdot SL(f) + \alpha_2 \cdot PC(f) + \alpha_3 \cdot \log(1 + |F_f|)C(f)=α1SL(f)+α2PC(f)+α3log(1+Ff), where SL(f)SL(f)SL(f) is symbolic length, PC(f)PC(f)PC(f) counts free parameters, and FfF_fFf is the set of raw features used. Redundancy is measured via AST isomorphism: for a candidate factor fff and an existing alpha zoo Z\mathcal{Z}Z, the maximum structural similarity is computed as S(f)=maxϕZs(f,ϕ)S(f) = \max_{\phi \in \mathcal{Z}} s(f, \phi)S(f)=maxϕZs(f,ϕ), where s(f,ϕ)s(f, \phi)s(f,ϕ) is the size of the largest common isomorphic subtree. Factors violating complexity or redundancy thresholds are rejected and rewritten.

The self-evolution phase drives iterative improvement. Mutation targets a suboptimal decision node kkk in a trajectory τ\tauτ and rewrites only the localized action aka_kak, preserving the prefix up to sks_ksk and regenerating subsequent steps to maintain coherence. This allows for mechanism-level refinements such as altering time scales or adding regime conditions. Crossover synthesizes a new child trajectory by combining high-performing segments from multiple parent trajectories, explicitly inheriting validated decisions. For example, one parent may contribute a hypothesis template for retail-driven momentum, while another contributes a structural pattern for institutional validation; the crossover operator merges these into a unified, regime-aware dual-source factor.

The evolutionary process is demonstrated in a case study where a factor named Institutional_Momentum_Score_20D emerges from a crossover operation combining insights from two parent trajectories: one focused on fragile retail momentum and the other on sustainable institutional momentum. The synthesized hypothesis introduces dynamic weighting by market volatility, amplifying institutional signals in stable regimes and retail reversal signals in turbulent ones. The resulting factor expression, IMS20D=RANK(ρ20(ΔPP,ΔVV)×(COC)5)\text{IMS}_{20D} = \text{RANK} \left( \rho_{20} \left( \frac{\Delta P}{P}, \frac{\Delta V}{V} \right) \times \overline{ \left( \frac{C - O}{C} \right) }_5 \right)IMS20D=RANK(ρ20(PΔP,VΔV)×(CCO)5), captures institutional-driven momentum through price-volume correlation and intraday return patterns, with cross-sectional ranking ensuring comparability.

The lineage of this factor is traceable: it originates from Parent 1, which identified unsustainable retail momentum, and Parent 2, which validated institutional structural trends. The crossover operation explicitly recombines these validated segments, producing an offspring with improved Rank IC (0.0311) over both parents (0.0216 and 0.0246). This demonstrates how the framework enables not just performance improvement but also conceptual synthesis, preserving the core market hypotheses while enhancing predictive power through structured evolution.

Experiment

  • QuantaAlpha outperforms all baselines in predictive power and strategy performance on CSI 300, demonstrating robustness across market regimes and real-world viability under standard risk controls.
  • Evolutionary components—diversified initialization, mutation, and crossover—collectively enhance exploration, repair, and reuse of high-performing factor trajectories, with mutation being critical for escaping local optima.
  • Semantic consistency, complexity control, and redundancy filtering during factor generation are essential for stable, generalizable factor discovery; removing any degrades performance, especially at the strategy level.
  • QuantaAlpha exhibits strong out-of-distribution generalization, sustaining performance on CSI 500 and S&P 500 without retraining, unlike baselines that fail under market regime shifts.
  • During the 2023 market transition to small-cap and thematic stocks, QuantaAlpha maintains predictive power by discovering structural factors tied to overnight gaps, volatility clustering, and trend quality—aligning with evolving microstructure.
  • Factor diversity through semantic mutation allows QuantaAlpha to adapt to regime changes, avoiding concentration on outdated market hypotheses and mitigating alpha decay.
  • Iterative evolution improves factor quality efficiently, with performance stabilizing around 11–12 iterations; beyond this, diminishing returns and redundancy degrade risk-adjusted performance.
  • Crossover operations enhance predictive accuracy but may increase drawdown, indicating a trade-off that requires regime-adaptive weighting for optimal risk-return balance.

The authors use an ablation study to isolate the contributions of planning, mutation, and crossover in their evolutionary factor mining framework. Results show that removing mutation causes the largest drop in predictive power and strategy returns, while removing planning primarily degrades risk-adjusted performance, and removing crossover leads to moderate but consistent declines. This confirms that all three components are essential, with mutation driving exploration, planning stabilizing search, and crossover enabling efficient reuse of successful patterns.

The authors use a factor evaluation agent to assess predictive power and strategy performance, revealing that QuantaAlpha maintains higher coverage and a greater proportion of factors with positive and statistically meaningful Rank IC compared to AlphaAgent. Results show QuantaAlpha’s factors exhibit stronger overall predictive consistency and a heavier right tail in performance distribution, indicating more robust and diverse signal generation under market shifts. This suggests the system’s evolutionary design and semantic controls help sustain factor quality and generalizability beyond specific market regimes.

The authors use a structured evaluation to compare factor performance across different semantic categories, revealing that QuantaAlpha’s factors excel in capturing overnight market dynamics, trend quality, and liquidity signals, while underperforming factors often rely on rigid or noise-sensitive mechanisms. Results show that strong performers align with persistent microstructure effects like volatility clustering and auction-driven price discovery, whereas weak ones degrade under regime shifts due to overfitting or lack of adaptive conditioning. This pattern confirms that robust factor design requires semantic alignment with market structure and diversity across information channels, not just statistical fit.

The authors use a crossover operation to combine factor trajectories, resulting in an offspring factor that improves predictive power and annualized excess return over the baseline. However, this gain comes with increased maximum drawdown, indicating higher risk exposure during volatile market conditions. The results suggest that while combining signals enhances returns, it requires additional regime-adaptive controls to maintain risk-adjusted performance.

The authors use QuantaAlpha to generate and evolve trading factors through a trajectory-based evolutionary framework, achieving superior predictive power and strategy performance across multiple large language models. Results show that QuantaAlpha consistently outperforms both traditional machine learning models and prior LLM-based agents, particularly in maintaining high returns with controlled drawdowns under real-world trading constraints. The system’s gains stem from structured factor generation, semantic consistency controls, and evolutionary mechanisms that enhance exploration and reuse of successful patterns.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp