Skip to content

Fix/update cropus name embeddings model lang#16

Open
lpi-tn wants to merge 3 commits intomainfrom
Fix/update-cropus-name-embeddings-model-lang
Open

Fix/update cropus name embeddings model lang#16
lpi-tn wants to merge 3 commits intomainfrom
Fix/update-cropus-name-embeddings-model-lang

Conversation

@lpi-tn
Copy link
Copy Markdown
Collaborator

@lpi-tn lpi-tn commented Mar 31, 2026

This pull request updates the materialized view for corpus embedding models to include additional metadata and ensures the corresponding SQLAlchemy model and versioning are in sync. The core change is a migration that drops and recreates the corpus_name_embedding_model_lang materialized view with more fields and improved logic, and updates the data model accordingly.

Database migration and schema changes:

  • Added a new Alembic migration (b049924f7067_modify_corpus_name_embedding_model_lang_) that drops and recreates the corpus_related.corpus_name_embedding_model_lang materialized view. The new view now includes corpus_id, embedding_model_id, used_since, and category_id, and ensures only the latest embedding model per corpus and language is kept using a window function.
  • Updated the CorpusNameEmbeddingModelLang SQLAlchemy model in corpus_related.py to add new fields: corpus_id, embedding_model_id, used_since, and category_id, matching the new view schema.

Versioning:

  • Bumped the package version in pyproject.toml from 1.4.0 to 1.4.2 to reflect the schema and model changes.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the corpus_related.corpus_name_embedding_model_lang materialized view to expose additional metadata (corpus/model IDs, used_since, category_id) and keep only the latest embedding model per (corpus, language), then aligns the SQLAlchemy read-only model and bumps the package version.

Changes:

  • Recreates the corpus_name_embedding_model_lang materialized view with extra columns and a ROW_NUMBER()-based “latest per corpus/lang” selection.
  • Extends the CorpusNameEmbeddingModelLang SQLAlchemy model to match the new view schema.
  • Bumps project version from 1.4.0 to 1.4.2.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
welearn_database/data/models/corpus_related.py Adds new fields to the ORM model representing the updated materialized view.
welearn_database/alembic/versions/b049924f7067_modify_corpus_name_embedding_model_lang_.py Drops/recreates the materialized view with updated projection and “latest per corpus/lang” logic.
pyproject.toml Version bump to reflect the schema/model change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 109 to 113
source_name: Mapped[str] = mapped_column(primary_key=True)
corpus_id: Mapped[UUID]
embedding_model_id: Mapped[UUID]
title: Mapped[str]
lang: Mapped[str]
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source_name is declared as the sole primary key, but the materialized view returns one row per (corpus_id, lang) (latest used_since), which can produce multiple rows for the same source_name across different languages. With source_name alone as the ORM PK, SQLAlchemy’s identity map can collapse/overwrite rows and return incomplete/incorrect results. Consider using a composite primary key that matches the view’s uniqueness (e.g., include lang and/or corpus_id).

Suggested change
source_name: Mapped[str] = mapped_column(primary_key=True)
corpus_id: Mapped[UUID]
embedding_model_id: Mapped[UUID]
title: Mapped[str]
lang: Mapped[str]
source_name: Mapped[str] = mapped_column()
corpus_id: Mapped[UUID] = mapped_column(primary_key=True)
embedding_model_id: Mapped[UUID]
title: Mapped[str]
lang: Mapped[str] = mapped_column(primary_key=True)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants