Racine.ai

Open VLM Retrieval Leaderboard

This leaderboard presents the performance of various visual embedding models across different business sectors and languages. The evaluation is based on retrieval accuracy for visual search tasks.

Structure

  • Sectors: Each column represents a different business sector (e.g., Energy, Education) with documents in either English (_EN) or French (_FR)
  • Models: Each row shows a different model's performance
  • Scores: Values range from 0 to 1, where higher is better (1.000 being perfect retrieval)
  • Average: Overall mean performance across all sectors for each model
  • Colors: Blue backgrounds indicate EU models, red backgrounds indicate Chinese models

The leaderboard was created in collaboration with the Intelligence Lab of the ECE - Ecole centrale d'électronique.

How to Read the Results

  • Select a language tab to see how models perform with queries in that language
  • All scores are normalized retrieval accuracy metrics
  • Background colors indicate model origins (Blue = EU, Red = Chinese)

Average Performance Across Languages

This table shows the average performance of each model for each sector, averaged across all query languages.

ModelLicenseAverageENERGY_ENENERGY_FR
jinaai/jina-embeddings-v4Qwen Research License (NC)0.9080.9120.904
racineai/QwenAmann-4B-dseApache 2.00.9030.8960.909
llamaindex/vdr-2b-multi-v1 (1536 dim) (960 max pixels)Apache 2.00.8660.8890.843
llamaindex/vdr-2b-multi-v1 (1536 dim) (768 max pixels)Apache 2.00.8630.8850.841
vidore/colqwen2-v1.0Apache 2.00.8600.9020.818
marco/mcdse-2b-v1 (1536 dim) (960 max pixels)Apache 2.00.8450.8650.825
llamaindex/vdr-2b-multi-v1 (768 dim) (960 max pixels)Apache 2.00.8420.8690.815
marco/mcdse-2b-v1 (768 dim) (960 max pixels)Apache 2.00.8350.8570.814
Alibaba-NLP/gme-Qwen2-VL-2B-InstructApache 2.00.8210.8320.809
MrLight/dse-qwen2-2b-mrl-v1 (1024 max pixels)Apache 2.00.7850.8240.746
racineai/Flantier-SmolVLM-2B-dseApache 2.00.7670.7940.740
racineai/Flantier-SmolVLM-500M-dseApache 2.00.5360.6000.473
HuggingFaceTB/SmolVLM-Instruct (base model)Apache 2.00.1930.1950.191
HuggingFaceTB/SmolVLM-500M-Instruct (base model)Apache 2.00.1820.2000.164
Model Origin:
European Union
China
    If you use these benchmarks in your research, please cite:
        @article{visual_embeddings_benchmark_2025,
            title={Cross-lingual Visual Embeddings Benchmark},
            author={racine.ai},
            year={2025}
        }