NanoVDR: Visual Document Retrieval Demo

How it works: Type a text query below. A tiny 69M DistilBERT encoder (running on CPU) maps your query into the same embedding space as a 2B VLM teacher (Qwen3-VL-Embedding-2B). The document page embeddings were pre-computed offline by the teacher. Retrieval is a simple dot product — no vision model runs at query time.

Corpus: 1,360 pages from ViDoRe v3 Computer Science (academic papers, slides, diagrams).

Retrieved Document Pages

Example Queries

Examples