Vector Databases
Vector databases enable semantic search — instead of searching by exact word matching, the system understands text meaning and finds similar content.
How Does It Work?
- Embedding — each text is converted to a numerical vector (1536 dimensions) by an AI model
- Storage — vectors are stored in the database along with the original text and metadata
- Search — the user’s question is converted to a vector and compared with stored vectors (cosine distance)
- Results — the system returns the best matching entries
Use Cases
- RAG (Retrieval-Augmented Generation) — enriching AI responses with context from the knowledge base
- Semantic search — finding similar documents, articles, FAQ
- Chat with knowledge base — ask a question, AI answers based on your documents
Requirements
A vector database requires an AI connector (OpenAI, Gemini, or Claude) with embedding support for generating vectors.
Entries
Each entry in the vector database contains:
- Text content
- Embedding vector
- Metadata (e.g., source URL)
- Source association (type + ID, e.g., Kb::Entry #35)
- Chunk number (when text was split)
Text Chunking
Long texts are automatically split into smaller fragments (chunks) before generating embeddings. Each chunk is a separate entry in the vector database — but all chunks from one source (e.g., a KB entry) are linked together.
“Chunking enabled” Option
In the vector database settings, you can enable the chunking option. This changes behavior:
| Setting | Chunk size | Effect |
|---|---|---|
| Chunking off | model’s max tokens (e.g., 8191 for OpenAI) | Text split only when exceeding model limit. Larger fragments, fewer entries |
| Chunking on | ~500 tokens (~1-2 paragraphs) | Text always split into small fragments. More precise search |
When to Enable Chunking
- Enable when the source has long documents (articles, regulations, documentation) and you need search precision — a small chunk matches a specific question better
- Leave off when entries are short (FAQ, single questions/answers) — splitting short texts doesn’t make sense
How Splitting Works
- The system recognizes text structure — Markdown headings (
## Section), paragraphs, HTML lists - A new section (heading) is a natural chunk boundary
- Each chunk gets a prefix with the section heading it belongs to — so it doesn’t lose context
- If text is HTML — the system converts it to structured text preserving headings and paragraphs
- Tokens counted exactly by tiktoken (OpenAI tokenizer) — not guessing by characters
Per-model Limits
Each embedding model has a different token limit per call. The system automatically retrieves the limit from the connector:
| Model | Max tokens | Effect with chunking OFF | Effect with chunking ON |
|---|---|---|---|
| OpenAI text-embedding-3-small | 8,191 | chunks up to ~7,800 tokens | chunks up to 500 tokens |
| Cohere embed-v4 (Bedrock) | 128,000 | practically no splitting | chunks up to 500 tokens |
| Gemini embedding | 2,048 | chunks up to ~1,900 tokens | chunks up to 500 tokens |
When switching connectors (e.g., from OpenAI to Cohere), limits adjust automatically — no need to change anything in the database settings.