Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the corpus_related.corpus_name_embedding_model_lang materialized view to expose additional metadata (corpus/model IDs, used_since, category_id) and keep only the latest embedding model per (corpus, language), then aligns the SQLAlchemy read-only model and bumps the package version.
Changes:
- Recreates the
corpus_name_embedding_model_langmaterialized view with extra columns and aROW_NUMBER()-based “latest per corpus/lang” selection. - Extends the
CorpusNameEmbeddingModelLangSQLAlchemy model to match the new view schema. - Bumps project version from
1.4.0to1.4.2.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| welearn_database/data/models/corpus_related.py | Adds new fields to the ORM model representing the updated materialized view. |
| welearn_database/alembic/versions/b049924f7067_modify_corpus_name_embedding_model_lang_.py | Drops/recreates the materialized view with updated projection and “latest per corpus/lang” logic. |
| pyproject.toml | Version bump to reflect the schema/model change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| source_name: Mapped[str] = mapped_column(primary_key=True) | ||
| corpus_id: Mapped[UUID] | ||
| embedding_model_id: Mapped[UUID] | ||
| title: Mapped[str] | ||
| lang: Mapped[str] |
There was a problem hiding this comment.
source_name is declared as the sole primary key, but the materialized view returns one row per (corpus_id, lang) (latest used_since), which can produce multiple rows for the same source_name across different languages. With source_name alone as the ORM PK, SQLAlchemy’s identity map can collapse/overwrite rows and return incomplete/incorrect results. Consider using a composite primary key that matches the view’s uniqueness (e.g., include lang and/or corpus_id).
| source_name: Mapped[str] = mapped_column(primary_key=True) | |
| corpus_id: Mapped[UUID] | |
| embedding_model_id: Mapped[UUID] | |
| title: Mapped[str] | |
| lang: Mapped[str] | |
| source_name: Mapped[str] = mapped_column() | |
| corpus_id: Mapped[UUID] = mapped_column(primary_key=True) | |
| embedding_model_id: Mapped[UUID] | |
| title: Mapped[str] | |
| lang: Mapped[str] = mapped_column(primary_key=True) |
This pull request updates the materialized view for corpus embedding models to include additional metadata and ensures the corresponding SQLAlchemy model and versioning are in sync. The core change is a migration that drops and recreates the
corpus_name_embedding_model_langmaterialized view with more fields and improved logic, and updates the data model accordingly.Database migration and schema changes:
b049924f7067_modify_corpus_name_embedding_model_lang_) that drops and recreates thecorpus_related.corpus_name_embedding_model_langmaterialized view. The new view now includescorpus_id,embedding_model_id,used_since, andcategory_id, and ensures only the latest embedding model per corpus and language is kept using a window function.CorpusNameEmbeddingModelLangSQLAlchemy model incorpus_related.pyto add new fields:corpus_id,embedding_model_id,used_since, andcategory_id, matching the new view schema.Versioning:
pyproject.tomlfrom1.4.0to1.4.2to reflect the schema and model changes.