Queryloop

Latest from Queryloop

Stay updated with our latest research findings, product developments, and insights into AI optimization

Filter by:
Sort by:
Zain ul Abideen
August 5, 2025
6 min read
Research

Teaching Small LLMs to Think — Part 2: Reinforcement Learning with OpenPipe ART

Explore how we used Group Relative Policy Optimization (GRPO) via OpenPipe ART framework to achieve faster and more stable reinforcement learning training compared to PPO, reaching over 2x improvement in accuracy in under 40 minutes.

Our research focuses on improving AI agent performance using reinforcement learning (RL). In earlier work, we fine-tuned Qwen2.5–1.5B on HotpotQA using Agent-R1 with Proximal Policy Optimization (PPO). The results were promising: we reached over 80% accuracy through careful hyperparameter tuning.

AI
Machine Learning
Reinforcement Learning
GRPO
OpenPipe ART
Small LLMs
HotpotQA
Model Training
PPO
Queryloop
Zain ul Abideen
August 4, 2025
7 min read
Research

Teaching Small LLMs to Think: Reinforcement Learning with Agent-RI

Learn how we successfully trained Qwen2.5-1.5B, a small LLM, to tackle complex question-answering tasks using reinforcement learning with the Agent-R1 framework, achieving performance comparable to larger models.

The world of Large Language Models (LLMs) is often dominated by talk of ever-increasing model sizes. But what if we could achieve impressive results with their smaller, more nimble counterparts? This post dives into how we successfully trained Qwen2.5–1.5B, a relatively small LLM, to tackle complex question-answering tasks using reinforcement learning (RL) with the Agent-R1 framework, yielding performance comparable to larger models.

AI
Machine Learning
Reinforcement Learning
Small LLMs
Agent-R1
PPO
HotpotQA
Model Optimization
Queryloop
Zain ul Abideen
April 14, 2025
7 min read
Research

Enhancing AI Agents with LADDER and TTRL: A Self-Improving Approach

Explore how LADDER and TTRL frameworks can enable AI agents to improve without human supervision through self-directed learning and test-time reinforcement.

AI
LADDER
TTRL
Reinforcement Learning
Machine Learning
Self-improvement
AI Agents
Queryloop