Microsoft Open Sources Three Versions of Harrier Text Embedding Models, 27B Version Tops Multilingual MTEB v2

robot
Abstract generation in progress

According to monitoring by 1M AI News, Microsoft has open-sourced the multilingual text embedding model family harrier-oss-v1 on Hugging Face, which includes three versions: 270M, 0.6B, and 27B. The model card indicates that this series employs a decoder-only architecture, last-token pooling, and L2 normalization, supporting a maximum of 32,768 tokens. It can be used for retrieval, clustering, semantic similarity, classification, bilingual mining, and reordering. The Multilingual MTEB v2 is a widely used benchmark for multilingual text embeddings in the industry, primarily testing tasks such as retrieval, classification, clustering, and semantic similarity. According to Microsoft’s model card, the scores for the three versions on this benchmark are 66.5, 69.0, and 74.3, with the 27B version reaching the top spot on the day of its release. The 270M and 0.6B versions also utilize larger embedding models for knowledge distillation, and all three models are released under the MIT license.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin