-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Labels
feature requestmodelIssues related to exporting, improving, fixing ML modelsIssues related to exporting, improving, fixing ML models
Description
Summary
Currently all shipped text embedding models (ALL_MINILM_L6_V2, ALL_MPNET_BASE_V2, MULTI_QA_MINILM_L6_COS_V1, MULTI_QA_MPNET_BASE_DOT_V1) are English-only.
For apps that support multiple languages, this limits the usefulness of semantic search and similarity features. In our case, we use useTextEmbeddings to match user-typed label names to icons — but it only works well when the user types in English.
Request
Ship a pre-exported multilingual text embeddings model, such as:
paraphrase-multilingual-MiniLM-L12-v2— 50+ languages, 384 dimensions, ~470MB (could be quantized)distiluse-base-multilingual-cased-v2— 50+ languages, 512 dimensions
This would allow useTextEmbeddings to work with non-English input out of the box, similar to how the English models work today:
import { useTextEmbeddings, MULTILINGUAL_MINILM_L12_V2 } from 'react-native-executorch';
const embeddings = useTextEmbeddings({ model: MULTILINGUAL_MINILM_L12_V2 });Context
- The current
MULTI_QA_MINILM_L6_COS_V1works great for English - Non-English queries produce poor embeddings since the model was only trained on English data
- Many React Native apps are multilingual by nature (we support English, Italian, and Albanian)
- The
useTextEmbeddingsAPI wouldn't need to change — just a new model constant
Thanks for the great library!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
feature requestmodelIssues related to exporting, improving, fixing ML modelsIssues related to exporting, improving, fixing ML models