Latest from Queryloop

Stay updated with our latest research findings, product developments, and insights into AI optimization

All Posts Product Lab R&D

Filter by:

Sort by:

Teaching Small LLMs to Think — Part 2: Reinforcement Learning with OpenPipe ART

Explore how we used Group Relative Policy Optimization (GRPO) via OpenPipe ART framework to achieve faster and more stable reinforcement learning training compared to PPO, reaching over 2x improvement in accuracy in under 40 minutes.

Our research focuses on improving AI agent performance using reinforcement learning (RL). In earlier work, we fine-tuned Qwen2.5–1.5B on HotpotQA using Agent-R1 with Proximal Policy Optimization (PPO). The results were promising: we reached over 80% accuracy through careful hyperparameter tuning.

Machine Learning

Reinforcement Learning

Teaching Small LLMs to Think: Reinforcement Learning with Agent-RI

Learn how we successfully trained Qwen2.5-1.5B, a small LLM, to tackle complex question-answering tasks using reinforcement learning with the Agent-R1 framework, achieving performance comparable to larger models.

The world of Large Language Models (LLMs) is often dominated by talk of ever-increasing model sizes. But what if we could achieve impressive results with their smaller, more nimble counterparts? This post dives into how we successfully trained Qwen2.5–1.5B, a relatively small LLM, to tackle complex question-answering tasks using reinforcement learning (RL) with the Agent-R1 framework, yielding performance comparable to larger models.

Machine Learning

Reinforcement Learning

Enhancing AI Agents with LADDER and TTRL: A Self-Improving Approach

Explore how LADDER and TTRL frameworks can enable AI agents to improve without human supervision through self-directed learning and test-time reinforcement.

LADDER

TTRL

Reinforcement Learning