[Intum Help](https://intum.com/help.md) / [Noe](https://intum.com/help/noe.md)

# [Vector Databases](https://intum.com/help/noe/vector-databases.md)

## Vector Databases

Vector databases enable semantic search — instead of searching by exact word matching, the system understands text meaning and finds similar content.

## How Does It Work?

1. **Embedding** — each text is converted to a numerical vector (1536 dimensions) by an AI model
2. **Storage** — vectors are stored in the database along with the original text and metadata
3. **Search** — the user's question is converted to a vector and compared with stored vectors (cosine distance)
4. **Results** — the system returns the best matching entries

## Use Cases

- **RAG (Retrieval-Augmented Generation)** — enriching AI responses with context from the knowledge base
- **Semantic search** — finding similar documents, articles, FAQ
- **Chat with knowledge base** — ask a question, AI answers based on your documents

## Requirements

A vector database requires an AI connector (OpenAI, Gemini, or Claude) with embedding support for generating vectors.

## Entries

Each entry in the vector database contains:
- Text content
- Embedding vector
- Metadata (e.g., source URL)
- Source association (type + ID, e.g., Kb::Entry #35)
- Chunk number (when text was split)

## Text Chunking

Long texts are automatically split into smaller fragments (chunks) before generating embeddings. Each chunk is a separate entry in the vector database — but all chunks from one source (e.g., a KB entry) are linked together.

### "Chunking enabled" Option

In the vector database settings, you can enable the chunking option. This changes behavior:

| Setting | Chunk size | Effect |
|---|---|---|
| **Chunking off** | model's max tokens (e.g., 8191 for OpenAI) | Text split only when exceeding model limit. Larger fragments, fewer entries |
| **Chunking on** | ~500 tokens (~1-2 paragraphs) | Text always split into small fragments. More precise search |

### When to Enable Chunking

- **Enable** when the source has long documents (articles, regulations, documentation) and you need search precision — a small chunk matches a specific question better
- **Leave off** when entries are short (FAQ, single questions/answers) — splitting short texts doesn't make sense

### How Splitting Works

1. The system recognizes text structure — Markdown headings (`## Section`), paragraphs, HTML lists
2. A new section (heading) is a natural chunk boundary
3. Each chunk gets a prefix with the section heading it belongs to — so it doesn't lose context
4. If text is HTML — the system converts it to structured text preserving headings and paragraphs
5. Tokens counted exactly by tiktoken (OpenAI tokenizer) — not guessing by characters

### Per-model Limits

Each embedding model has a different token limit per call. The system automatically retrieves the limit from the connector:

| Model | Max tokens | Effect with chunking OFF | Effect with chunking ON |
|---|---|---|---|
| OpenAI text-embedding-3-small | 8,191 | chunks up to ~7,800 tokens | chunks up to 500 tokens |
| Cohere embed-v4 (Bedrock) | 128,000 | practically no splitting | chunks up to 500 tokens |
| Gemini embedding | 2,048 | chunks up to ~1,900 tokens | chunks up to 500 tokens |

When switching connectors (e.g., from OpenAI to Cohere), limits adjust automatically — no need to change anything in the database settings.