Skip to content

Feature request: multilingual text embeddings model #945

@ErionTp

Description

@ErionTp

Summary

Currently all shipped text embedding models (ALL_MINILM_L6_V2, ALL_MPNET_BASE_V2, MULTI_QA_MINILM_L6_COS_V1, MULTI_QA_MPNET_BASE_DOT_V1) are English-only.

For apps that support multiple languages, this limits the usefulness of semantic search and similarity features. In our case, we use useTextEmbeddings to match user-typed label names to icons — but it only works well when the user types in English.

Request

Ship a pre-exported multilingual text embeddings model, such as:

This would allow useTextEmbeddings to work with non-English input out of the box, similar to how the English models work today:

import { useTextEmbeddings, MULTILINGUAL_MINILM_L12_V2 } from 'react-native-executorch';

const embeddings = useTextEmbeddings({ model: MULTILINGUAL_MINILM_L12_V2 });

Context

  • The current MULTI_QA_MINILM_L6_COS_V1 works great for English
  • Non-English queries produce poor embeddings since the model was only trained on English data
  • Many React Native apps are multilingual by nature (we support English, Italian, and Albanian)
  • The useTextEmbeddings API wouldn't need to change — just a new model constant

Thanks for the great library!

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestmodelIssues related to exporting, improving, fixing ML models

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions