Skip to main content

Overview

This service is optional. document-api and its dependency opensearch are only needed for knowledge base / RAG features. Voice assistants work fully without it. Use make up-all-with-knowledge (Docker) to start with knowledge base support.
The document-api is the knowledge backend for the Rapida platform. It processes documents from PDF, Word, CSV, and other formats into searchable vector embeddings and full-text search indices. At call time, assistant-api queries this service to inject relevant knowledge context into the LLM prompt.

Port

9010 — HTTP (FastAPI / uvicorn)

Language

Python 3.11+ FastAPI + Celery

Storage

PostgreSQL assistant_db Redis (Celery broker) OpenSearch (vectors + text)
Document processing is asynchronous. When a document is uploaded, the API immediately returns a document_id with status: processing. Text extraction, chunking, and embedding generation are handled by Celery workers in the background.

Components

The processing pipeline runs as a Celery task after each upload. The stages are sequential and the document status is updated at each step.
StageLibraryConfigurable
Text extractionformat-specific (see table below)No
Chunkingcustom splitterCHUNK_SIZE, CHUNK_OVERLAP
Embeddingssentence-transformersEMBEDDINGS_MODEL
Full-text indexOpenSearch
FormatLibraryWhat is extracted
PDFPyPDF2, pdfplumberText content + metadata
Word (.docx)python-docxText + paragraph structure
Excel (.xlsx)openpyxl, pandasCell values as text
CSVpandasRow data as text
Markdown (.md)built-inText preserving structure
HTMLBeautifulSoupCleaned text from HTML
Plain text (.txt)built-inDirect read
Imagespytesseract (OCR)OCR-extracted text
Embeddings are generated using sentence-transformers. The model is configurable:
ModelDimensionsSpeedQualityNotes
all-MiniLM-L6-v2384FastGoodDefault, ~80 MB
all-mpnet-base-v2768MediumHighLarger model
all-MiniLM-L12-v2384FasterGoodLighter variant
multilingual-e5-base768MediumGood100+ languages
Set via EMBEDDINGS_MODEL in the config.
The document-api includes RNNoise, a recurrent neural network noise suppressor, for processing audio documents. When enabled, noise reduction is applied before transcription.
SettingVariableValues
Enable/disableRNNOISE_ENABLEDtrue · false
Suppression levelRNNOISE_LEVEL0.0 (off) to 1.0 (maximum)

At call time, assistant-api queries document-api with a text query. The service performs vector similarity search and returns the top-k most relevant chunks. Search request
curl -X POST http://localhost:9010/api/v1/document/search \
  -H "Authorization: Bearer <jwt>" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "customer billing issue",
    "knowledge_base_id": "kb_123",
    "top_k": 5,
    "threshold": 0.5
  }'
Response
{
  "results": [
    {
      "chunk_id": "chunk_123",
      "document_id": "doc_456",
      "content": "Billing errors are handled by submitting a refund request...",
      "similarity_score": 0.87,
      "metadata": {
        "page_no": 5,
        "section": "Billing Policy"
      }
    }
  ]
}

Configuration

The document-api uses a YAML config file at docker/document-api/config.yaml combined with environment variables.

Required settings

VariableRequiredDefaultDescription
postgres.host✅ YeslocalhostPostgreSQL host
postgres.db✅ Yesassistant_dbDatabase name
postgres.auth.user✅ Yesrapida_userDatabase user
postgres.auth.password✅ YesDatabase password
elastic_search.host✅ YeslocalhostOpenSearch host
celery.broker✅ Yesredis://localhost:6379/0Celery broker URL
celery.backend✅ Yesredis://localhost:6379/0Celery result backend URL

Tuning settings

SettingDefaultDescription
CHUNK_SIZE1000Characters per document chunk
CHUNK_OVERLAP100Character overlap between adjacent chunks
MAX_FILE_SIZE52428800Maximum upload size in bytes (50 MB)
EMBEDDINGS_MODELall-MiniLM-L6-v2Sentence-transformers model name
EMBEDDINGS_DIMENSION384Embedding vector dimension
CELERY_WORKERS4Number of Celery worker processes
RNNOISE_ENABLEDtrueEnable audio noise reduction
RNNOISE_LEVEL0.5Noise reduction level (0.0–1.0)

Full config file (docker/document-api/config.yaml)

service_name: "Document API"
host: "0.0.0.0"
port: 9010

authentication_config:
  strict: false
  type: "jwt"
  config:
    secret_key: "rpd_pks"   # Must match SECRET in other services

elastic_search:
  host: "opensearch"        # Use "localhost" for local dev
  port: 9200
  scheme: "http"
  max_connection: 5

postgres:
  host: "postgres"          # Use "localhost" for local dev
  port: 5432
  auth:
    password: "rapida_db_password"
    user: "rapida_user"
  db: "assistant_db"
  max_connection: 10
  ideal_connection: 5

internal_service:
  web_host: "web-api:9001"
  integration_host: "integration-api:9004"
  endpoint_host: "endpoint-api:9005"
  assistant_host: "assistant-api:9007"

storage:
  storage_type: "local"
  storage_path_prefix: /app/rapida-data/assets/workflow

celery:
  broker: "redis://redis:6379/0"
  backend: "redis://redis:6379/0"

knowledge_extractor_config:
  chunking_technique:
    chunker: "app.core.chunkers.statistical_chunker.StatisticalChunker"
    options:
      encoder: "app.core.encoders.openai_encoder.OpenaiEncoder"
      options:
        model_name: "text-embedding-3-large"
        api_key: "your_openai_api_key"

Running

document-api is part of the knowledge Docker Compose profile and is not started by default.
# Start document-api together with all other services and opensearch
make up-all-with-knowledge

# Or start document-api individually (opensearch must already be running)
make up-document

# View logs
make logs-document

# Rebuild
make rebuild-document

Health & Observability

EndpointPurpose
GET /readiness/Reports whether the service is ready
GET /healthz/Liveness probe
curl http://localhost:9010/readiness/

Troubleshooting

The Celery worker is likely not running. Check:
# Docker
make logs-document

# Local — confirm Celery worker is running
PYTHONPATH=api/document-api celery -A app.worker inspect active
Reduce batch size to lower memory pressure, or increase it for throughput on capable hardware:
EMBEDDINGS_BATCH_SIZE=8     # Low memory
EMBEDDINGS_BATCH_SIZE=64    # High throughput (GPU recommended)
# List existing indices
curl http://localhost:9200/_cat/indices

# Delete a stale index and allow re-indexing
curl -X DELETE http://localhost:9200/documents-<index-name>
# Reduce Celery worker concurrency
CELERY_CONCURRENCY=2

# Monitor per-container usage
docker stats document-api

Next Steps

Assistant API

How assistants use knowledge bases during calls.

Architecture

Full system topology and data flow diagrams.

Installation Guide

Deploy the full platform with Docker Compose.

Configuration Reference

Full environment variable reference.