Fix/update cropus name embeddings model lang by lpi-tn · Pull Request #16 · CyberCRI/welearn-database

lpi-tn · 2026-03-31T14:43:42Z

This pull request updates the materialized view for corpus embedding models to include additional metadata and ensures the corresponding SQLAlchemy model and versioning are in sync. The core change is a migration that drops and recreates the corpus_name_embedding_model_lang materialized view with more fields and improved logic, and updates the data model accordingly.

Database migration and schema changes:

Added a new Alembic migration (b049924f7067_modify_corpus_name_embedding_model_lang_) that drops and recreates the corpus_related.corpus_name_embedding_model_lang materialized view. The new view now includes corpus_id, embedding_model_id, used_since, and category_id, and ensures only the latest embedding model per corpus and language is kept using a window function.
Updated the CorpusNameEmbeddingModelLang SQLAlchemy model in corpus_related.py to add new fields: corpus_id, embedding_model_id, used_since, and category_id, matching the new view schema.

Versioning:

Bumped the package version in pyproject.toml from 1.4.0 to 1.4.2 to reflect the schema and model changes.

…l logic

Copilot

Pull request overview

This PR updates the corpus_related.corpus_name_embedding_model_lang materialized view to expose additional metadata (corpus/model IDs, used_since, category_id) and keep only the latest embedding model per (corpus, language), then aligns the SQLAlchemy read-only model and bumps the package version.

Changes:

Recreates the corpus_name_embedding_model_lang materialized view with extra columns and a ROW_NUMBER()-based “latest per corpus/lang” selection.
Extends the CorpusNameEmbeddingModelLang SQLAlchemy model to match the new view schema.
Bumps project version from 1.4.0 to 1.4.2.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
welearn_database/data/models/corpus_related.py	Adds new fields to the ORM model representing the updated materialized view.
welearn_database/alembic/versions/b049924f7067_modify_corpus_name_embedding_model_lang_.py	Drops/recreates the materialized view with updated projection and “latest per corpus/lang” logic.
pyproject.toml	Version bump to reflect the schema/model change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-31T14:48:46Z

welearn_database/data/models/corpus_related.py

    source_name: Mapped[str] = mapped_column(primary_key=True)
+    corpus_id: Mapped[UUID]
+    embedding_model_id: Mapped[UUID]
    title: Mapped[str]
    lang: Mapped[str]


source_name is declared as the sole primary key, but the materialized view returns one row per (corpus_id, lang) (latest used_since), which can produce multiple rows for the same source_name across different languages. With source_name alone as the ORM PK, SQLAlchemy’s identity map can collapse/overwrite rows and return incomplete/incorrect results. Consider using a composite primary key that matches the view’s uniqueness (e.g., include lang and/or corpus_id).

Suggested change

source_name: Mapped[str] = mapped_column(primary_key=True)

corpus_id: Mapped[UUID]

embedding_model_id: Mapped[UUID]

title: Mapped[str]

lang: Mapped[str]

source_name: Mapped[str] = mapped_column()

corpus_id: Mapped[UUID] = mapped_column(primary_key=True)

embedding_model_id: Mapped[UUID]

title: Mapped[str]

lang: Mapped[str] = mapped_column(primary_key=True)

lpi-tn added 3 commits March 31, 2026 15:32

Bump version to 1.4.2 in pyproject.toml

ad2dd5f

Add new fields to Corpus model for embeddings and categorization

a2ded4b

Modify corpus_name_embedding_model_lang view to enhance data retrieva…

d116bec

…l logic

lpi-tn requested review from Copilot and sandragjacinto March 31, 2026 14:43

Copilot started reviewing on behalf of lpi-tn March 31, 2026 14:44 View session

sandragjacinto approved these changes Mar 31, 2026

View reviewed changes

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/update cropus name embeddings model lang#16

Fix/update cropus name embeddings model lang#16
lpi-tn wants to merge 3 commits intomainfrom
Fix/update-cropus-name-embeddings-model-lang

lpi-tn commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lpi-tn commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants