Open VLM Retrieval Leaderboard

This leaderboard presents the performance of various visual embedding models across different business sectors and languages. The evaluation is based on retrieval accuracy for visual search tasks.

Structure

  • Sectors: Each column represents a different business sector (e.g., Energy, Education) with documents in either English (_EN) or French (_FR)
  • Models: Each row shows a different model's performance
  • Scores: Values range from 0 to 1, where higher is better (1.000 being perfect retrieval)
  • Average: Overall mean performance across all sectors for each model
  • Colors: Blue backgrounds indicate EU models, red backgrounds indicate Chinese models

The leaderboard was created in collaboration with the Intelligence Lab of the ECE - Ecole centrale d'électronique.

How to Read the Results

  • Select a language tab to see how models perform with queries in that language
  • All scores are normalized retrieval accuracy metrics
  • Background colors indicate model origins (Blue = EU, Red = Chinese)

Average Performance Across Languages

This table shows the average performance of each model for each sector, averaged across all query languages.

ModelAverageENERGY_ENENERGY_FR
racineai/AMPERE-1 (1536 dim) (768 max pixels)Coming Soon
llamaindex/vdr-2b-multi-v1 (1536 dim) (960 max pixels)0.8660.8890.843
llamaindex/vdr-2b-multi-v1 (1536 dim) (768 max pixels)0.8630.8850.841
vidore/colqwen2-v1.00.8600.9020.818
marco/mcdse-2b-v1 (1536 dim) (960 max pixels)0.8450.8650.825
llamaindex/vdr-2b-multi-v1 (768 dim) (960 max pixels)0.8420.8690.815
marco/mcdse-2b-v1 (768 dim) (960 max pixels)0.8350.8570.814
Alibaba-NLP/gme-Qwen2-VL-2B-Instruct0.8210.8320.809
MrLight/dse-qwen2-2b-mrl-v1 (1024 max pixels)0.7850.8240.746
racineai/Flantier-SmolVLM-2B-dse0.7670.7940.740
racineai/Flantier-SmolVLM-500M-dse0.5360.6000.473
HuggingFaceTB/SmolVLM-Instruct (base model)0.1930.1950.191
HuggingFaceTB/SmolVLM-500M-Instruct (base model)0.1820.2000.164
Model Origin:
European Union
China

Additional Information

  • Scores are updated regularly as new models are evaluated
  • All evaluations use the same test set for fair comparison
  • Models are evaluated on both English and French datasets to assess cross-lingual capabilities
  • Color coding indicates model origin (Blue = EU, Red = Chinese)

Citation

If you use these benchmarks in your research, please cite:

@article{visual_embeddings_benchmark_2025,
    title={Cross-lingual Visual Embeddings Benchmark},
    author={racine.ai},
    year={2025}
}