Here’s a question nobody’s asking: What if you could run your own deep research AI locally, without paying a dime in API costs?
I’m talking Perplexity-level investigation. ChatGPT Deep Research-style multi-step browsing. But on YOUR data. Behind YOUR firewall. For $0 per query.
Tencent just made this real with SearchAgent-8B.
This isn’t another chatbot wrapped in marketing speak. It’s an open-source, 8-billion parameter model specifically trained to do something most LLMs struggle with: actual research. Not “here’s what I remember from training.” Real investigation. 10 to 15 separate queries. Evidence gathering. Cross-verification. Synthesis.
And they’ve open-sourced everything. The model weights. The training methodology. The data synthesis pipeline. All of it sitting on Hugging Face right now.
What Makes SearchAgent-8B Different

Most LLMs answer questions. SearchAgent-8B investigates them.
That’s not marketing copy. It’s literally how the model works.
Rather than training a model to respond in one shot, Tencent trained this agent to behave like a detective. It breaks down complex questions, executes multiple search queries against a local database, gathers evidence, verifies findings across sources, and then synthesizes everything into a final answer.
The base model is Qwen3-8B – already a strong performer in China’s open-weight AI race – but the magic is in the training. They used GRPO-based reinforcement learning (Group Relative Policy Optimization) combined with something called an “outlier suppression training strategy” that stabilizes the RL process.
The outlier suppression part is legitimately clever. It’s a technique that aggressively terminates abnormal training trajectories – things like the model generating bursts of parallel tool calls, repeating the same search queries, or producing malformed tool syntax. Instead of applying partial penalties or ignoring these errors, the strategy immediately stops generation and assigns a zero reward. This prevents high-variance samples from polluting the gradient updates and significantly reduces training noise. Expect to see this technique show up in more agent training pipelines next year.
Here’s what struck me about the architecture: the training happens entirely on offline corpora. Wikipedia. BrowseComp-Plus. Synthetic multi-turn data from their ASearcher framework. Zero API costs during training. That’s not a typo.
Let me break down the math. The training involved 300 iterations, roughly 1,000 practice runs per iteration, and 10 search calls per run. That’s over 3 million search queries total. If they’d used a paid web search API at around $0.50 per thousand queries, they’d have racked up $1,500+ just in search fees. Instead? Zero. They built a local retrieval server using free offline data.
For context, most commercial deep research tools burn through search API credits like water. Perplexity, ChatGPT’s Deep Research, Gemini’s research mode – they’re all hammering paid search endpoints constantly. Tencent’s approach flips the economics entirely.
Technical Specifications
| Specification | Details |
|---|---|
| Base Model | Qwen3-8B |
| Parameter Count | 8 billion |
| Tensor Type | BF16 |
| Training Framework | rLLM |
| Training Method | GRPO Reinforcement Learning |
| Max Retrieval Turns | 10-15+ per query |
| Training Data | Wikipedia, BrowseComp-Plus, ASearcher synthetic data (~14k filtered samples) |
| Training Hardware | 16 GPUs (train) + 8 GPUs (retrieval) + 8 GPUs (refine) |
| Training Time | ~5 days for 300 iterations |
| API Costs During Training | $0 (would’ve been ~$1,536 with paid APIs) |
The ASearcher Framework Behind It

SearchAgent-8B is actually part of a larger initiative called ASearcher – an open-source framework Tencent built for training search agents at scale.
The goal? What they call “Search Intelligence” – the ability to perform long-horizon search tasks that might require over 100 tool calls during training iterations. That’s ambitious. Most current agent frameworks struggle past 5-10 tool calls before context degradation kicks in.
ASearcher does something clever with data synthesis. It uses a prompt-based LLM agent to autonomously generate high-quality, challenging question-answer pairs. These aren’t simple retrieval questions – they’re specifically designed to require multi-step investigation. The kind of questions where a single search won’t cut it.
The framework also addresses one of the biggest pain points in agent training: cost. By hosting data locally and using synthetic multi-turn data, they’ve essentially eliminated the dependency on expensive live web search APIs. For anyone who’s tried to train agentic systems at scale, that’s a massive deal.
The team behind this – Jiahao Wu, Zhongwen Xu, Qiang Fu, and Wei Yang from Tencent’s TEG AIPD division – also released a 30B variant called SearchAgent-A3B based on Qwen3-30B-A3B (a mixture-of-experts architecture). That one trained for 100 iterations on 32+8+8 GPUs.
How It Compares to Perplexity and ChatGPT Deep Research

Let’s be direct about what SearchAgent-8B is and isn’t.
Perplexity’s Deep Research typically completes tasks in 2-4 minutes, references dozens of live sources, and hit 93.9% accuracy on SimpleQA. ChatGPT’s O3-powered Deep Research takes 5-30 minutes but produces exhaustively detailed, multi-page reports from live web data.
SearchAgent-8B operates differently. It runs entirely locally, queries your own data sources (or whatever you configure), and is constrained to 8 billion parameters. You won’t get GPT-5-level reasoning here.
But here’s the thing: that’s not the point.
| Feature | SearchAgent-8B | Perplexity Deep Research | ChatGPT Deep Research |
|---|---|---|---|
| Infrastructure | Fully local | Cloud-based | Cloud-based |
| Cost per query | $0 (after setup) | Subscription/credits | Subscription tiers |
| Live web access | No (local corpus) | Yes | Yes |
| Model size | 8B parameters | Proprietary Sonar | O3/O4-mini |
| Retrieval turns | 10-15+ | Dynamic | 5-30 min sessions |
| BrowseComp-Plus EM | 30% | N/A | N/A |
| Open source | Yes (Hugging Face) | No | No |
The value proposition is clear: if you need deep research capabilities on private data, behind firewalls, or at scale without per-query costs – SearchAgent-8B is the first serious open-source option.
Official Benchmark Results
The Tencent team published results on multiple QA benchmarks:
| Model | Dataset | Exact Match | F1 Score | Avg Steps/Correct |
|---|---|---|---|---|
| SearchAgent-8B | BrowseComp-Plus | 30% | 36% | 13.1 |
| SearchAgent-A3B (30B) | BrowseComp-Plus | 29% | 35% | 18.9 |
| SearchAgent-A3B (30B) | HotpotQA | 49.5% | 63% | 4.7 |
| SearchAgent-A3B (30B) | Bamboogle | 63.2% | 72.4% | 3.5 |
Those HotpotQA and Bamboogle numbers are competitive for an open-source model running entirely locally.
Real Limitations You Should Know
I’ve been tracking this model since its release, and there are real constraints to acknowledge.
First, 8B parameters is still 8B parameters. The broader pattern we’ve seen with smaller models is that they struggle with complex instruction following and are more prone to hallucination than their larger counterparts. Fine-tuning helps, but it doesn’t transform the underlying capability ceiling.
Second, the “zero API cost” claim only holds if you’re querying local data. Want to run this against live web sources? You’ll need to set up your own retrieval infrastructure and deal with all the associated complexity.
Third, there’s the context and memory management issue. When you’re doing 10+ retrieval turns, you’re accumulating a lot of context. Models can get confused. The outlier suppression training helps, but it doesn’t eliminate the problem entirely.
And finally, this is still early-stage research with limited validation across domains. The benchmarks look solid on QA datasets, but real-world performance on your specific corpus may vary.
What This Means For Developers
If you’re building AI-powered research tools, internal knowledge bases, or any system that needs multi-step information retrieval, pay attention.
The immediate use cases are obvious:
- Enterprise knowledge bases: Run deep research queries against internal documentation without data leaving your infrastructure
- Legal and compliance research: Multi-turn investigation across document repositories
- Academic research assistants: Query local paper collections with follow-up questions
- Customer support intelligence: Deep-dive into support tickets and knowledge bases
The model is available in three variants on Hugging Face:
aidenjhwu/SearchAgent-8B-hq– High-quality version with outlier suppression (recommended for production)aidenjhwu/SearchAgent-8B– Standard version without outlier suppressionaidenjhwu/SearchAgent-A3B– 30B MoE version based on Qwen3-30B-A3B
The HQ variant reduced repeated search queries from ~20% to under 2% in testing – that’s a significant quality improvement.
Here’s where this gets interesting: the agentic AI movement has been largely dominated by proprietary systems. Devin, Manus, Perplexity – closed-source agents with closed-source models. SearchAgent-8B represents a different approach. Open weights, open training methodology, open data synthesis pipeline.
The Bottom Line
Tencent’s SearchAgent-8B isn’t going to replace Perplexity or ChatGPT’s Deep Research for most users. Those tools have live web access, bigger models, and polished UX.
But that’s not the play here.
SearchAgent-8B is the first genuinely open-source deep research agent that can run entirely locally, train without burning through API credits, and perform the kind of multi-turn investigation that makes AI research tools actually useful.
For enterprises with private data? This is huge. For researchers with local corpora? Game-changer. For anyone building custom research tools? Start here.
The model is up on Hugging Face. The training framework is open-source. The data synthesis pipeline is documented.
What you do with it is up to you.
FAQ
What hardware do I need to run SearchAgent-8B locally?
With 8 billion parameters in BF16 format, you’ll need approximately 15-16GB of VRAM for inference. In practice, running through vLLM on an RTX 6000 uses around 15.2GB. A consumer GPU like an RTX 4090 (24GB) works well with headroom to spare. For production deployments, quantization to 4-bit can reduce requirements to around 6GB VRAM with some accuracy tradeoff.
Can SearchAgent-8B access live web data like Perplexity?
Not out of the box. SearchAgent-8B is designed to query local data sources. You can configure it to use web search APIs if you set up the retrieval infrastructure yourself, but that defeats the “zero API cost” advantage.
How does SearchAgent-8B compare to regular RAG systems?
Standard RAG performs a single retrieval step before generation. SearchAgent-8B performs 10-15+ retrieval turns, refining its search based on what it finds. It’s more like a research agent than a simple retrieval-augmented generator.
Has anyone tested this on real documents?
Yes. In practical demos, it successfully extracted specific figures from multi-page PDF reports – finding a “$67 million cybersecurity investment” mention buried in risk management sections. Classic needle-in-haystack retrieval with proper citation.
