Research
Enhancing AI Agents with LADDER and TTRL: A Self-Improving Approach
Zain ul Abideen
April 14, 2025
7 min read
Explore how LADDER and TTRL frameworks can enable AI agents to improve without human supervision through self-directed learning and test-time reinforcement.
As AI agents evolve to become increasingly capable at tool use and autonomous reasoning, the need for methods that allow them to improve without human supervision becomes more critical. In this article, I explore the integration of LADDER (Learning by Auto-Decomposing and Deriving Easier Representations) and TTRL (Test-Time Reinforcement Learning) into a data analysis agent.
This research builds upon the concepts introduced in the Evaluating AI Agents course by DeepLearning.AI.

Overview of LADDER
LADDER is a novel self-supervised learning framework introduced in this paper. It allows large language models to decompose complex problems into simpler variants, solve them, and use the feedback to incrementally improve their performance.
The goal is to simulate a curriculum-like progression, enabling the model to solve increasingly difficult tasks by practicing on easier subproblems it generates itself.
Example Variant Generation
While this implementation serves as a prototype, it illustrates the core mechanism of generating simpler prompts from a complex one.
Test-Time Reinforcement Learning (TTRL)
TTRL complements LADDER by refining the agent at inference time. If an initial result is suboptimal or fails verification, TTRL engages by selecting variants, evaluating their output, and updating the agent accordingly.
Simplified TTRL Workflow
Research Context and Setup
The agent developed for this study was capable of executing three types of tasks:
- Database Lookups: Generating SQL queries from natural language
- Data Analysis: Extracting insights from tabular data
- Visualization: Producing Python code for data visualization
I designed the agent with modular support for tool invocation, verification checks, and feedback-based self-improvement. LADDER and TTRL were incorporated into the pipeline to handle training and refinement respectively. LADDER was used in a proactive capacity, generating task variants to improve model accuracy via self-supervised practice. TTRL, on the other hand, was invoked reactively — when a result failed verification or fell below the confidence threshold.
Each interaction was logged with timestamped accuracy tracking using standard Python logging. Logs captured details such as variant prompt execution, verification results, reward scores, and changes in agent accuracy. This enabled detailed observation of how LADDER and TTRL influenced performance on each task.
Prompt Example
In this example, the agent uses three chained tool invocations: SQL lookup, data analysis, and chart generation. LADDER attempts training on decomposed variants of this multi-step query. If any part fails during execution, TTRL engages to iterate and improve outputs in real-time.
Observations
- LADDER performs best on complex tasks where decomposition into simpler sub-tasks is meaningful (e.g., forecasting, trend analysis).
- Limited effectiveness on simple prompts: When the task is direct (e.g., "Get sales for store 1320 on Nov 1st, 2021"), the generated variants add little instructional value.
- Mock training can dilute feedback: Without executing actual outputs during training, the learning signal may become noisy.
- TTRL often salvages underperforming results by conducting targeted, reward-driven refinement in real-time.
Logging Sample
Conclusion
Integrating LADDER and TTRL into tool-using agents offers a promising approach for enabling self-improvement without external supervision. However, to fully realize their potential, variant generation must be intelligent, recursive, and context-aware.
Future work will focus on:
- Incorporating GPT-based recursive decomposition
- Learning prompt complexity heuristics
- Improving training feedback loops
The exploration into these self-improving strategies reveals an exciting direction for AI agents: learning to learn, autonomously
AILADDERTTRLReinforcement LearningMachine LearningSelf-improvementAI AgentsQueryloop