How InstantRecall Works

A deep dive into the architecture, data flow, and setup process

System Architecture

InstantRecall.ai acts as a memory broker between your application and vector storage. We handle the complexity of embeddings, storage, and retrieval so you can focus on building great AI experiences.

๐Ÿ’ป
Your Application
Chatbot, API, Frontend
โ†“
๐Ÿง 
InstantRecall API
Memory Broker
โ†“
๐Ÿ”ข
Embedding Generation
Convert messages to vectors using OpenAI embeddings
๐Ÿ’พ
Vector Storage
Store in your Pinecone index with metadata
๐Ÿ”
Semantic Search
Retrieve most relevant context based on similarity
โ†“
โœจ
Optional Summarization
Use OpenAI, Claude, or Grok to summarize context
โ†“
๐Ÿ“ฆ
Context Response
Formatted context ready for your LLM

๐Ÿ” Your Data Stays Yours

Vectors are stored in your own Pinecone account. We never see your data.

โšก Lightning Fast

Sub-second retrieval with optimized semantic search and caching.

๐ŸŽฏ Smart Filtering

Automatic relevance scoring ensures only useful context is returned.

๐Ÿ“Š Usage Tracking

Real-time metering and billing based on queries, not storage.

Setup Guide

Follow these steps to integrate InstantRecall into your application. Setup takes less than 5 minutes.

1

Create Your Account

Sign up for a free InstantRecall account. No credit card required for the free tier.

Free Tier Includes:

  • 100 queries per month
  • All LLM providers supported
  • Unlimited vector storage (in your Pinecone)
  • Full API access
2

Set Up Pinecone

Create a Pinecone account and index if you don't have one already.

Pinecone Configuration:

  • Dimension: 1536 (for OpenAI ada-002)
  • Metric: cosine
  • Cloud: Any (AWS, GCP, Azure)

Copy your Pinecone API key and index name. You'll add these to your InstantRecall dashboard.

3

Add API Keys to Dashboard

Navigate to your InstantRecall dashboard and add your keys.

Required:

  • Pinecone API Key - For vector storage

Optional (for summarization):

  • OpenAI API Key - For GPT summarization
  • Anthropic API Key - For Claude summarization
  • xAI API Key - For Grok summarization

All keys are encrypted with AES-256-GCM before storage.

4

Integrate the API

Add a single API call to your chatbot or LLM application.

// Example: Node.js / JavaScript
const response = await fetch('https://instantrecall.ai/api/memory/query', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: JSON.stringify({
    sessionId: 'user-123',
    message: 'What did we discuss about the project timeline?',
    pineconeKey: process.env.PINECONE_API_KEY,
    pineconeIndex: 'my-memory-index',
    llmApiKey: process.env.OPENAI_API_KEY // Optional
  })
});

const { context, summary } = await response.json();

// Use context in your LLM prompt
const chatResponse = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [
    { role: 'system', content: context },
    { role: 'user', content: userMessage }
  ]
});

That's it! Your chatbot now has persistent memory across sessions.

5

Monitor & Scale

Track your usage in real-time and upgrade when you need more queries.

Your dashboard shows:

  • Monthly query count
  • Remaining quota
  • Memory settings and customization
  • API key management

Best Practices

๐ŸŽฏUse Meaningful Session IDs

Use unique, consistent session IDs (e.g., user IDs, conversation IDs) to properly segment memories across different users or conversations.

๐Ÿ”Tune Retrieval Settings

Adjust "Top K Results" and "Relevance Threshold" in your dashboard to control how much context is retrieved and how strict the relevance filter is.

๐Ÿ’ฐChoose the Right Model

Use cheaper models (GPT-3.5, Haiku) for general summarization. Reserve expensive models (GPT-4, Opus) for complex reasoning tasks.

๐Ÿ“ŠMonitor Your Usage

Keep an eye on your monthly query count to avoid unexpected overages. Upgrade your plan proactively as your usage grows.

๐Ÿ”Rotate Keys Regularly

For security, rotate your API keys periodically. You can update them anytime in the dashboard without code changes.

๐ŸงชTest with Small Batches

Start with a small subset of users or conversations to validate the integration before rolling out to production at scale.

Ready to Get Started?

Create your account and add memory to your AI application in under 5 minutes