Illustration of the GSM-Symbolic template creation process. | The Limits of Mathematical Reasoning in Large Language Models

The Limits of Mathematical Reasoning in Large Language Models

Side-by-side code editors display JSON-like syntax with colorful highlighting. A left window labeled "GitHub", right window is "JSON Symbolic Translator". Sample code is visible in both, demonstrating the data translation process.

Illustration of the GSM-Symbolic template creation process. This dataset serves as a tool to investigate the presumed reasoning capabilities of LLMs, enabling the design of controllable mathematical reasoning evaluations with more reliable metrics. Our results reveal that all state-of the-art LLMs exhibit significant performance variations, suggesting the fragility or lack of reasoning.