How InstantRecall Works

A deep dive into the architecture, data flow, and setup process

System Architecture

InstantRecall.ai acts as a memory broker between your application and vector storage. We handle the complexity of embeddings, storage, and retrieval so you can focus on building great AI experiences.

💻

Your Application

Chatbot, API, Frontend

↓

🧠

InstantRecall API

Memory Broker

↓

🔢

Embedding Generation

Convert messages to vectors using OpenAI embeddings

💾

Vector Storage

Store in your Pinecone index with metadata

🔍

Semantic Search

Retrieve most relevant context based on similarity

↓

✨

Optional Summarization

Use OpenAI, Claude, or Grok to summarize context

↓

📦

Context Response

Formatted context ready for your LLM

🔐 Your Data Stays Yours

Vectors are stored in your own Pinecone account. We never see your data.

⚡ Lightning Fast

Sub-second retrieval with optimized semantic search and caching.

🎯 Smart Filtering

Automatic relevance scoring ensures only useful context is returned.

📊 Usage Tracking

Real-time metering and billing based on queries, not storage.

Setup Guide

Follow these steps to integrate InstantRecall into your application. Setup takes less than 5 minutes.

Create Your Account

Free Tier Includes:

100 queries per month
All LLM providers supported
Unlimited vector storage (in your Pinecone)
Full API access

Set Up Pinecone

Create a Pinecone account and index if you don't have one already.

Pinecone Configuration:

Dimension: 1536 (for OpenAI ada-002)
Metric: cosine
Cloud: Any (AWS, GCP, Azure)

Copy your Pinecone API key and index name. You'll add these to your InstantRecall dashboard.

Add API Keys to Dashboard

Navigate to your InstantRecall dashboard and add your keys.

Required:

Pinecone API Key - For vector storage

Optional (for summarization):

OpenAI API Key - For GPT summarization
Anthropic API Key - For Claude summarization
xAI API Key - For Grok summarization

All keys are encrypted with AES-256-GCM before storage.

Integrate the API

Add a single API call to your chatbot or LLM application.

// Example: Node.js / JavaScript
const response = await fetch('https://instantrecall.ai/api/memory/query', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: JSON.stringify({
    sessionId: 'user-123',
    message: 'What did we discuss about the project timeline?',
    pineconeKey: process.env.PINECONE_API_KEY,
    pineconeIndex: 'my-memory-index',
    llmApiKey: process.env.OPENAI_API_KEY // Optional
  })
});

const { context, summary } = await response.json();

// Use context in your LLM prompt
const chatResponse = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [
    { role: 'system', content: context },
    { role: 'user', content: userMessage }
  ]
});

That's it! Your chatbot now has persistent memory across sessions.

Monitor & Scale

Track your usage in real-time and upgrade when you need more queries.

Your dashboard shows:

Monthly query count
Remaining quota
Memory settings and customization
API key management

Best Practices

🎯Use Meaningful Session IDs

Use unique, consistent session IDs (e.g., user IDs, conversation IDs) to properly segment memories across different users or conversations.

🔍Tune Retrieval Settings

Adjust "Top K Results" and "Relevance Threshold" in your dashboard to control how much context is retrieved and how strict the relevance filter is.

💰Choose the Right Model

Use cheaper models (GPT-3.5, Haiku) for general summarization. Reserve expensive models (GPT-4, Opus) for complex reasoning tasks.

📊Monitor Your Usage

Keep an eye on your monthly query count to avoid unexpected overages. Upgrade your plan proactively as your usage grows.

🔐Rotate Keys Regularly

For security, rotate your API keys periodically. You can update them anytime in the dashboard without code changes.

🧪Test with Small Batches

Start with a small subset of users or conversations to validate the integration before rolling out to production at scale.

Ready to Get Started?

Create your account and add memory to your AI application in under 5 minutes