Multimodal LLM Forecasting: Aligning News Semantics with Price Dynamics in Commodity Markets
Keywords:
PSO-SVR hybrid model, Machine learning, Uncertainty sentiment, Empirical asset pricingAbstract
Theobjective of this study is to construct a large-language-model (LLM)-driven, text-enhanced time series forecasting framework in which unstructured news information is transformed into informative exogenous variables for futures price prediction. Unlike conventional pipelines that rely on bag-of-words statistics and shallow topic/sentiment mining, we leverage the contextual semantic understanding and reasoning capabilities of LLMs to extract thematic and sentiment signals from a large corpus of futures-related news headlines. Specifically, each headline is encoded into a high-dimensional semantic embedding by a pretrained LLM, from which fine-grained topic intensity and directional sentiment (bullish / bearish / neutral with strength) are derived and fed into the predictor as exogenous features. This paper addresses two critical design questions: why headlines over full articles, and why futures news over crude-oil-only news. First, news headlines act as highly condensed summaries that encapsulate the most decision-relevant information; for an LLM, headlines further mitigate context-length cost while preserving the core semantic and emotional cues, which is consistent with the headline-based topic-and-sentiment extraction adopted by Li et al. [1–5]. Second, we select the broader futures-news domain rather than crude-oil news alone because crude-oil-specific headlines are scarce, and because gold, natural gas and crude-oil futures exhibit well-established cross-commodity dependencies. To exploit this, we inject these empirical dependencies into the LLM as domain priors via a retrieval-augmented generation (RAG) module: the model dynamically retrieves established findings—e.g., Sujit & Kumar (2011), who show that gold-price fluctuations affect the WTI index and that countries' crude-oil dependence influences exchange rates and thus the purchasing power of gold, and Villar & Joutz (2006), who report that a 20% temporary shock to WTI exerts a 5% contemporaneous impact on natural-gas prices [6–9]—so that the extracted textual features implicitly carry cross-commodity transmission knowledge. This RAG-based prior injection both compensates for the scarcity of single-commodity news and enriches the information density of the resulting exogenous variables.
