Thesis.io Context Engineering: Affordable AI-native for DeFi at Scale
Thesis.io serves thousands of users daily with unlimited web search capabilities and a premium plan at just $20/month that includes 40 deep research sessions. This document explores the technical architecture and optimization techniques that make this affordable scaling possible.Service Model and Pricing Strategy
User Base and Offerings
Thesis.io operates at significant scale while maintaining affordability:- Thousands of daily active users across the platform
- Unlimited web search and follow-ups for all users
- Premium plan: $20/month including:
- 40 deep research sessions
- Unlimited follow-ups in private spaces
- Advanced DeFi analysis capabilities
The Scaling Challenge
The fundamental challenge lies in providing high-quality AI-powered DeFi research while keeping costs manageable. State-of-the-art (SOTA) language models are expensive:| Model | Provider | API Cost (per 1M tokens) | Subscription |
|---|---|---|---|
| Claude 4 Sonnet | Anthropic | 15 (input/output) | Pro: $20/mo |
| OpenAI o3 | OpenAI | 40 | Pro: $200/mo (est.) |
| GPT-5 | OpenAI | 10 | Pro: $200/mo |
| Gemini 2.5 | 10 | Advanced: ~$20/mo |
Technical Architecture: Context Engineering
What is Context Engineering?
Context engineering is the core technique for optimizing Large Language Model (LLM) costs without compromising quality. It involves managing the context window of LLMs to produce desired answers for user requests. Key Principle: Filling the right information in the right format so the LLM can complete the task effectively.Context Window Structure
A typical Thesis.io context window contains structured elements that flow through the system:The context window acts as a structured array of strings that provides the LLM with all necessary information to generate accurate, contextual responses. Each element is categorized and flows through the Thesis LLM processing pipeline.
Three Key Optimization Techniques
1. Prompt KV Caching (5x Cost Reduction)
Prompt KV (Key-Value) Caching is Anthropic’s mechanism that saves compute power by caching prefixed tokens, reducing operational costs by 5x for long-running tasks. Below is an example of Anthropic’s Prompt Caching that we leverage to reduce costs.How Anthropic’s Prompt Caching Works
- Basic Implementation
- Advanced Caching
Visual Cache Performance
| Cache Status | Computation | Cost Impact | Visual Indicator |
|---|---|---|---|
| Cache Hit ✅ | Most tokens reused | 80-90% savings | 🟢🟢🟢🟠 |
| Cache Miss ❌ | All tokens computed | Full cost | 🟠🟠🟠🟠 |
| Partial Hit ⚡ | Mixed reuse | 40-60% savings | 🟢🟢🟠🟠 |
2. Divide-and-Conquer with Sub-Agents
This approach transforms complex DeFi research into specialized tasks, distributing work across dedicated agents that operate in parallel without maintaining expensive long contexts.AI Research Engine Architecture
Benefits
- Reduced Context Length: Sub-agents operate with focused, task-specific contexts
- Parallel Processing: Multiple agents work simultaneously on different aspects
- Specialized Expertise: Each agent optimized for specific DeFi domains
- Cost Efficiency: Main agent avoids including intermediate processing in context
- Scalability: Easy to add new specialized agents without affecting existing ones
3. Self-Hosted Models (2-3x Cheaper and Faster)
Self-hosting smaller and open-source models to handle specialized DeFi tasks dramatically reduces dependency on expensive SOTA LLMs while improving domain-specific performance.Future Outlook for AI in DeFi
Next Generation LLMs
The upcoming generation of language models promises to:- Further reduce costs through improved efficiency architectures
- Enable mass adoption of AI in DeFi applications
- Provide better performance at significantly lower price points
- Democratize access to sophisticated DeFi analysis tools
Ready to build with Thesis.io?
Explore our API Reference to start building cost-effective AI-powered DeFi applications using these proven optimization techniques.