Machine Learning Predicts Stock Returns Using ESG Scores and Market Sentiment

Forecasting financial markets has always been a challenge. Prices swing unpredictably, influenced by everything from macroeconomic shifts to investor psychology. Traditional models struggle to keep pace, particularly during periods of stress when volatility spikes and historical patterns break down. But a new hybrid forecasting framework suggests that combining environmental, social, and governance (ESG) data with real-time sentiment analysis can improve predictions and help investors navigate turbulent markets more effectively.

The research introduces a system that blends a deep learning architecture called a Temporal Fusion Transformer with a lightweight machine learning corrector. The model explicitly learns when to prioritize sustainability signals versus mood driven market moves, and it quantifies how these two information streams interact. Tested on US technology stocks, global indices, and cryptocurrencies between 2020 and 2024, the approach delivered stronger predictive accuracy and better risk adjusted returns than competing methods, including large language models designed for finance.

The Problem with Price Alone

Classical finance theory, including the Efficient Market Hypothesis, holds that asset prices reflect all available information. If that were strictly true, past prices would offer little help in predicting future returns. Yet decades of empirical work have shown that markets are not perfectly efficient. Investors overreact to news, herding behavior amplifies volatility, and structural breaks like the COVID-19 crash reveal how quickly regimes can shift.

Modern machine learning and deep learning models can capture nonlinear patterns that traditional econometric approaches miss. Recurrent neural networks, transformers, and ensemble methods have all been applied to financial forecasting with varying degrees of success. But most models treat different data sources as independent inputs, appending ESG scores or sentiment features without modeling how they interact or when each matters most.

Two specific information streams have gained prominence. ESG metrics, which measure a firm's environmental footprint, social practices, and governance quality, are increasingly seen as proxies for resilience and long-term performance. Meanwhile, sentiment extracted from news and social media captures short-term investor psychology and can signal turning points, especially during stress. Meme stock rallies and ESG related repricing episodes illustrate how sustainability narratives and collective sentiment can rapidly influence prices.

The question is whether combining these signals in a principled way, with an explicit mechanism to reweight them as conditions change, can produce forecasts that remain stable across market cycles.

A Gated Fusion of ESG and Sentiment

The new framework centers on a gated late fusion mechanism. ESG features and aspect based sentiment scores are first passed through separate embedding layers, then combined using a learned scalar gate. This gate, a value between zero and one, adjusts how much weight the model places on ESG versus sentiment at each time step. When the gate is high, the system emphasizes sustainability signals. When it drops, sentiment takes precedence.

This design is intentional. During calm markets, ESG scores, which reflect longer-term structural quality, may be more informative. During turbulence, when fear and euphoria dominate, short-term sentiment becomes paramount. By letting the model learn this trade-off from data, the system adapts to regime shifts without requiring manual intervention.

The sentiment component uses FinBERT, a transformer model fine-tuned on financial text, to perform aspect based sentiment analysis. This means the system does not just score an article as positive or negative overall. It extracts sentiment on specific dimensions such as company performance, macroeconomic outlook, market trends, and industry news. This granularity helps the model distinguish between, say, optimism about a firm's earnings versus pessimism about broader economic conditions.

Technical indicators like moving averages, relative strength index, and Bollinger Bands provide the market microstructure layer. Macroeconomic variables, including interest rates and unemployment, add context. All features are carefully aligned to a strict as-of timestamp, ensuring no future information leaks into the training process.

Quantifying Interaction

Beyond predictive accuracy, the study aims to answer a subtler question: do ESG and sentiment actually interact, or do they just contribute independently?

To measure this, the researchers computed SHAP interaction values, a method from explainable AI that quantifies how much the effect of one feature depends on another. They also used Friedman's H statistic, a variance based measure of interaction strength. Across the assets studied, the median ESG–sentiment interaction was statistically significant and regime dependent. The interaction was strongest during high volatility periods, when sentiment's influence surged, and weakened during stable regimes, when ESG's role grew.

Two-dimensional accumulated local effects plots revealed that negative sentiment combined with low ESG scores produced the largest downward price adjustments, while high ESG scores partially offset the drag from bad news. This pattern aligns with the intuition that sustainable, well-governed companies weather negative headlines better than firms with weaker fundamentals.

Testing Across Market Regimes

Forecasting models often look impressive on average but fail spectacularly during crises. To address this, the study evaluated performance not just over the full 2020 to 2024 period, but also within specific stress windows: the COVID-19 crash in early 2020, the 2022 monetary tightening cycle, and the March 2023 banking stress episode. It also split the data into volatility regimes based on realized volatility terciles, creating low, mid, and high volatility buckets.

The hybrid model outperformed a text-only baseline across all regimes. During the COVID crash, when realized volatility spiked, the model's mean absolute error on next-day log returns was 40 percent lower than the naive persistence benchmark and significantly better than the language model baseline. During the 2022 tightening cycle, the advantage persisted, with directional accuracy above 93 percent.

Within volatility regimes, the pattern was clear. In low volatility periods, ESG features gained importance and forecasting errors were smallest. In high volatility periods, sentiment dominated and errors increased for all models, but the hybrid system maintained its relative edge. The learned gate reflected this shift: during stress, the gate value dropped, indicating the model was relying more heavily on sentiment.

Residual Correction Stabilizes Errors

A second innovation is a support vector regression corrector applied to the residuals from the transformer. After the main model produces a forecast, the SVR takes the predicted return, ESG embedding, and sentiment embedding as inputs and adjusts the forecast to correct systematic errors.

This corrector proved most valuable during high volatility regimes. In the COVID window, the SVR reduced residual variance by 40 percent and cut skewness in half, indicating fewer large unexpected errors and more symmetric forecast mistakes. Similar stabilization occurred during the 2022 and 2023 stress periods. In low volatility regimes, the corrector's effect was smaller but still positive.

The implication is that even when the main model captures the broad return dynamics, predictable structure remains in the residuals during regime shifts. The SVR exploits that structure, acting as a statistical safety net when conditions deviate from historical norms.

Economic Value Beyond Point Forecasts

Statistical accuracy is one measure of success. But investors care about risk-adjusted returns, drawdowns, and turnover. To assess economic value, the researchers simulated a simple long-only trading strategy: take a position when the predicted next-day return exceeds the 70th percentile of in-sample forecasts, otherwise hold cash. Positions are rebalanced daily.

Under this rule, the hybrid model delivered higher Sharpe and Sortino ratios than the text-only baseline, with smaller maximum drawdowns. Aggregated across assets, the Sharpe ratio was 1.58 versus 1.42 for the baseline, and the maximum drawdown was 14.3 percent versus 18.7 percent. These improvements held even after accounting for transaction costs of 5 and 10 basis points per trade.

Regime split strategy metrics showed the same pattern. During the COVID crash, the hybrid model's Sharpe ratio was 0.78, compared to 0.48 for the baseline. During low volatility periods, the advantage widened further. Turnover remained comparable, indicating the gains came from better signal quality, not higher trading frequency.

Interpretability and Deployment

Explainability is often sacrificed in the pursuit of accuracy. Here, the design prioritizes transparency. SHAP values reveal which features drive individual predictions. The gate dynamics show when the model shifts between ESG and sentiment. Accumulated local effects plots illustrate nonlinear relationships without assuming a parametric form.

During the COVID crash and the 2022 tightening cycle, temporal SHAP analysis showed sentiment importance spiking as volatility rose. ESG scores gained weight during recovery phases and stable periods. Macroeconomic variables like unemployment and interest rates showed elevated importance around policy shifts. Technical indicators remained consistently informative but secondary to the ESG-sentiment channel.

For practical deployment, inference speed matters. The full architecture, which included auxiliary bidirectional LSTM layers, was too slow for some real-time applications. A streamlined variant removed those layers, keeping only the transformer and SVR. This reduced inference time by about 45 percent while retaining over 90 percent of the accuracy gains. The simplified model is compact enough to run on standard hardware and fast enough to update forecasts intraday.

Limitations and Future Work

The study comes with caveats. First, ESG data is released quarterly and forward filled to daily frequency, which may dilute signals. Future work could replace this with asynchronous fusion techniques that handle mixed-frequency data more naturally.

Second, regime shifts are detected implicitly by the model rather than explicitly flagged. Adding a lightweight regime detector could allow the system to adjust forecast horizons or recalibrate parameters on the fly.

Third, the economic simulations assume uniform transaction costs and no slippage. Real-world execution constraints, market impact, and time-varying liquidity would all affect net returns.

Fourth, the results are predictive, not causal. The ESG-sentiment interaction is statistically robust, but the framework does not claim to identify why ESG moderates sentiment or vice versa. Causal inference would require stronger assumptions and different study designs.

Finally, the evaluation is retrospective. Prospective live testing over multiple future market cycles would provide the strongest evidence of real-world value.

Why This Matters

Financial forecasting is notoriously difficult, and small improvements in accuracy or risk control can translate into large economic gains at scale. This work demonstrates that integrating ESG and sentiment in a principled, interpretable way improves forecasts not just on average, but specifically during the stress periods when accurate predictions matter most.

The findings also have broader implications. They suggest that sustainability metrics and investor psychology are not independent signals but interact in regime-dependent ways. Recognizing and modeling that interaction can help portfolio managers, risk officers, and policymakers better understand how markets respond to shocks.

As alternative data becomes more widely available and machine learning tools more sophisticated, the challenge is no longer just building models that fit historical data well. It is building systems that remain robust across regimes, explain their reasoning, and deploy efficiently in production environments. This study offers a template for how to meet all three goals simultaneously.

Credit & Disclaimer: This article is a popular science summary written to make peer-reviewed research accessible to a broad audience. All scientific facts, findings, and conclusions presented here are drawn directly and accurately from the original research paper. Readers are strongly encouraged to consult the full research article for complete data, methodologies, and scientific detail. The article can be accessed through https://doi.org/10.1038/s41598-026-41985-3

Latest Jobs

Machine Learning Predicts Stock Returns Using ESG Scores and Market Sentiment

Machine Learning Predicts Stock Returns Using ESG Scores and Market Sentiment

Machine Learning Predicts Stock Returns Using ESG Scores and Market Sentiment

The Problem with Price Alone

A Gated Fusion of ESG and Sentiment

Quantifying Interaction

Testing Across Market Regimes

Residual Correction Stabilizes Errors

Economic Value Beyond Point Forecasts

Interpretability and Deployment

Limitations and Future Work

Why This Matters

Get insights bi-weekly

More from Intelligent Systems and Computing Desk

AI Uncovers the Hidden Architecture of Financial Markets

Share this research

About the Author

Intelligent Systems and Computing Desk

Predicting Stock Market Trends Using Deep Learning and Data Analytics

Deep Learning Improves Crude Oil & Gold Rate Forecasting Accuracy

Continue exploring

AI Uncovers the Hidden Architecture of Financial Markets

Predicting Stock Market Trends Using Deep Learning and Data Analytics

Deep Learning Outperforms Traditional Models in Predicting Stock Prices Across Market Conditions

New AI Framework Teaches Robots to Learn Like Humans Over Time