OpusVoice AI doesn't just generate generic responses. It answers based on your actual content — product docs, FAQs, policies, whatever you feed it. Here's how the knowledge pipeline works.
Ingestion
You can add content in four ways: paste a URL (we'll crawl and extract text), upload a PDF, upload a DOCX file, or upload plain text. Each source is tracked with its own status: pending, processing, completed, or failed.
Chunking
Raw text is split into overlapping chunks of approximately 1,000 characters with 200-character overlap. The overlap ensures that context isn't lost at chunk boundaries. Each chunk becomes a searchable unit.
Embedding
Every chunk is converted into a 768-dimensional vector using our embedding model. These vectors capture the semantic meaning of the text — not just keywords, but concepts and relationships.
Storage
Vectors are stored in PostgreSQL using the pgvector extension. This gives us the reliability of Postgres with the speed of vector similarity search.
Retrieval
When a visitor asks a question, we embed their query and find the most semantically similar chunks using cosine similarity. The top matches are injected into the AI's context window alongside the conversation history.
The Result
The AI responds with information grounded in your actual content. It cites relevant details, uses your terminology, and stays on-brand. If it doesn't have enough context, it says so rather than hallucinating.