How InstantRecall Works
A deep dive into the architecture, data flow, and setup process
System Architecture
InstantRecall.ai acts as a memory broker between your application and vector storage. We handle the complexity of embeddings, storage, and retrieval so you can focus on building great AI experiences.
๐ Your Data Stays Yours
Vectors are stored in your own Pinecone account. We never see your data.
โก Lightning Fast
Sub-second retrieval with optimized semantic search and caching.
๐ฏ Smart Filtering
Automatic relevance scoring ensures only useful context is returned.
๐ Usage Tracking
Real-time metering and billing based on queries, not storage.
Setup Guide
Follow these steps to integrate InstantRecall into your application. Setup takes less than 5 minutes.
Create Your Account
Sign up for a free InstantRecall account. No credit card required for the free tier.
Free Tier Includes:
- 100 queries per month
- All LLM providers supported
- Unlimited vector storage (in your Pinecone)
- Full API access
Set Up Pinecone
Create a Pinecone account and index if you don't have one already.
Pinecone Configuration:
- Dimension:
1536(for OpenAI ada-002) - Metric:
cosine - Cloud: Any (AWS, GCP, Azure)
Copy your Pinecone API key and index name. You'll add these to your InstantRecall dashboard.
Add API Keys to Dashboard
Navigate to your InstantRecall dashboard and add your keys.
Required:
- Pinecone API Key - For vector storage
Optional (for summarization):
- OpenAI API Key - For GPT summarization
- Anthropic API Key - For Claude summarization
- xAI API Key - For Grok summarization
All keys are encrypted with AES-256-GCM before storage.
Integrate the API
Add a single API call to your chatbot or LLM application.
// Example: Node.js / JavaScript
const response = await fetch('https://instantrecall.ai/api/memory/query', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer YOUR_API_KEY'
},
body: JSON.stringify({
sessionId: 'user-123',
message: 'What did we discuss about the project timeline?',
pineconeKey: process.env.PINECONE_API_KEY,
pineconeIndex: 'my-memory-index',
llmApiKey: process.env.OPENAI_API_KEY // Optional
})
});
const { context, summary } = await response.json();
// Use context in your LLM prompt
const chatResponse = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{ role: 'system', content: context },
{ role: 'user', content: userMessage }
]
});That's it! Your chatbot now has persistent memory across sessions.
Monitor & Scale
Track your usage in real-time and upgrade when you need more queries.
Your dashboard shows:
- Monthly query count
- Remaining quota
- Memory settings and customization
- API key management
Best Practices
๐ฏUse Meaningful Session IDs
Use unique, consistent session IDs (e.g., user IDs, conversation IDs) to properly segment memories across different users or conversations.
๐Tune Retrieval Settings
Adjust "Top K Results" and "Relevance Threshold" in your dashboard to control how much context is retrieved and how strict the relevance filter is.
๐ฐChoose the Right Model
Use cheaper models (GPT-3.5, Haiku) for general summarization. Reserve expensive models (GPT-4, Opus) for complex reasoning tasks.
๐Monitor Your Usage
Keep an eye on your monthly query count to avoid unexpected overages. Upgrade your plan proactively as your usage grows.
๐Rotate Keys Regularly
For security, rotate your API keys periodically. You can update them anytime in the dashboard without code changes.
๐งชTest with Small Batches
Start with a small subset of users or conversations to validate the integration before rolling out to production at scale.