Large language models (LLMs) are making strides across various domains, yet their application in financial reasoning remains a challenging frontier. While iterative advancements have propelled LLMs closer to artificial general intelligence (AGI), integrating them into specialized fields like finance poses unique obstacles. The development of Fin-R1, a 7-billion-parameter model tailored for financial reasoning, addresses critical issues such as fragmented data and weak generalization. This innovative model leverages a two-stage training process—supervised fine-tuning (SFT) and reinforcement learning (RL)—to deliver superior performance in financial benchmarks.
Despite significant progress, general-purpose LLMs often fall short in handling the nuanced demands of financial decision-making. Challenges include inconsistent reasoning logic, limited scalability, and regulatory transparency requirements. Fin-R1 excels by overcoming these barriers, achieving state-of-the-art results in tasks like FinQA and ConvFinQA while maintaining interpretability and cost-efficiency. Its success highlights the potential for domain-specific models to transform complex industries.
Fin-R1 stands out as a compact yet powerful solution designed to address the specific needs of financial reasoning. By employing a 7-billion-parameter architecture, it balances computational efficiency with robust performance. This model is trained on Fin-R1-Data, a meticulously curated dataset comprising 60,091 chain-of-thought examples sourced from authoritative financial data. The two-stage training approach—combining SFT and RL—ensures that Fin-R1 not only achieves high accuracy but also maintains interpretability, addressing key concerns in financial applications.
The development of Fin-R1 involved an intricate process of data generation and model refinement. In the first stage, researchers created Fin-R1-Data through a combination of data distillation using DeepSeek-R1 and rigorous filtering via an LLM-as-judge methodology. This ensured the dataset's quality and relevance to real-world financial scenarios. Subsequently, during the model training phase, Fin-R1 was fine-tuned on Qwen2.5-7B-Instruct using SFT followed by GRPO to enhance its reasoning capabilities and output consistency. The integration of structured prompts and reward mechanisms further refined the model’s ability to handle complex financial tasks accurately and consistently.
Fin-R1 has demonstrated exceptional performance compared to other leading models in financial reasoning benchmarks. Despite its relatively small parameter size, it achieved an impressive average score of 75.2, placing second overall. Notably, it surpassed larger models, including DeepSeek-R1-Distill-Llama-70B, by 8.7 points. Its standout achievements in FinQA and ConvFinQA, with scores of 76.0 and 85.0 respectively, underscore its strength in cross-task generalization and financial reasoning precision.
A comprehensive evaluation of Fin-R1 against several state-of-the-art models revealed its superiority in handling diverse financial scenarios. The model excelled in benchmarks such as Ant_Finance, TFNS, and Finance-Instruct-500K, demonstrating its ability to adapt effectively to varying tasks. These results highlight the effectiveness of the two-stage training framework in enhancing both reasoning accuracy and standardization. Moreover, Fin-R1's compact design makes it a cost-efficient choice for practical deployment, opening new possibilities for fintech innovation while ensuring efficient and intelligent financial decision-making.