<title>Articles
Vol. 06 (01), December, 2025, pp. 18-25

Q-learning-Driven Policy Optimization for Grammar Correction utilizing Transformer-Based Language Models

Rima Dutta1, Santu Kundu2, Sarada Mallik3, Abhishek Pal4, Saikat Chatterjee5, Arnob Dutta6

Abstract

This paper proposes a grammatical error correction (GEC) framework that combines effectively the strengths of reinforcement learning with those of transform-based language models. The uniqueness of this research effort specifically resides in using a policy-level Q-learning mechanism to adaptively re-rank potential corrections from a generative model, going beyond traditional approaches that rely solely on T5 for error corrections in text, for example, or even rely on a mere re-rank with a model like GPT2. This proposed scheme uses a fine-tuned version of T5 for error corrections as a generative model that can provide a list of potential corrections, and a separate GPT2 model that will provide an implicit reward as a judge of grammatical fluency. Subsequently, an agent algorithm will help develop an optimal policy that fills up a Q-table that associates error states with superior actions for error corrections. To serve as a preliminary indication of feasibility, this research effort proposes a qualitative analysis with a targeted data set due to inherent limitations in scope.

Keywords

Grammatical Error Correction (GEC), Q-learning, T5 (Text-To-Text Transfer Transformer), Reinforcement Learning (RL), Natural Language Processing (NLP)