Skip to content

feat(comments): Add runner for comments migration separately#380

Open
sakshamarora1 wants to merge 12 commits intoCERNDocumentServer:masterfrom
sakshamarora1:feature/comments_migration
Open

feat(comments): Add runner for comments migration separately#380
sakshamarora1 wants to merge 12 commits intoCERNDocumentServer:masterfrom
sakshamarora1:feature/comments_migration

Conversation

@sakshamarora1
Copy link
Copy Markdown
Contributor

@sakshamarora1 sakshamarora1 commented Feb 2, 2026

closes: #286
closes: #381

Steps

  1. Update the collection queries for a collection, retreive all the comments for the records in the records found and create a json metadata file.
ipython ./scripts/dump_comments_to_migrate.py

Output file: comments_metadata.json
The above script also runs this script partly to generate the missing_users.json file

  1. Ensure people.csv, missing_users.json / valid_users.json in /eos/media/cds/cds-rdm/<env>/migration/users/ directory and comments_metadata.json file in /eos/media/cds/cds-rdm/<env>/migration/<collection>/comments/

IMPORTANT: We are not running the ./scripts/copy_comments_attached_files.py yet (or we can, for thesis and IT there are none anyways) [See child issue in the main issue attached]

  1. Create those users (using people.csv containing person_id already placed in the /eos/media/cds/cds-rdm/<env>/migration/users/):
invenio migration comments commenters-run --filepath /eos/media/cds/cds-rdm/<env>/migration/users/missing_users.json --missing-users-dir /eos/media/cds/cds-rdm/<env>/migration/users/ --dry-run

(Remove --dry-run)

After running, make sure that the users are indexed and not just created in the DB.

  1. Finally migrate the comments:
invenio migration comments comments-run --filepath /eos/media/cds/cds-rdm/<env>/migration/<collection>/comments/comments_metadata.json --collection <COLLECTION> --dirpath /eos/media/cds/cds-rdm/<env>/migration/<collection>/comments/ --dry-run

(Remove --dry-run)


The runner can be re-run

  • We store the legacy recid in the number field to keep track of migrated requests
  • When migrating, we check if the lrecid already already exists in the number field, if not then we migrate
  • In case of some comments failing, we rollback the whole request creation (to later fix/update things and re-run)
  • After fixing/updating, we can then just run the migration runner again and the failed ones will get created and the created ones will get skipped

@sakshamarora1 sakshamarora1 marked this pull request as ready for review February 4, 2026 16:33
Copy link
Copy Markdown
Contributor

@kpsherva kpsherva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we store migrated comments ids on the request level to have a retry strategy if a migration run fails? In order to understand which comments to skip on the second/third etc run

@sakshamarora1 sakshamarora1 force-pushed the feature/comments_migration branch from f420de2 to 7f6c6ed Compare February 16, 2026 20:16
Copy link
Copy Markdown
Contributor

@kpsherva kpsherva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT, I had couple of comments left.
Can you add the new commands to the recipe here:
https://gitlab.cern.ch/cds-team/cds-rdm-openshift/-/issues?show=eyJpaWQiOiIxMiIsImZ1bGxfcGF0aCI6ImNkcy10ZWFtL2Nkcy1yZG0tb3BlbnNoaWZ0IiwiaWQiOjMzODkzNH0%3D
I will later convert it to a template

@sakshamarora1 sakshamarora1 force-pushed the feature/comments_migration branch from 7b5bb89 to db05067 Compare March 23, 2026 16:35
@sakshamarora1 sakshamarora1 force-pushed the feature/comments_migration branch from db05067 to d42c37a Compare March 23, 2026 22:16
@sakshamarora1 sakshamarora1 force-pushed the feature/comments_migration branch from 4ed4497 to 610b68d Compare March 24, 2026 09:31
from cds_migrator_kit.users.load import CDSSubmitterLoad


class CDSCommentsLoad(Load):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please ticketize also to add /comment URL redirections

else:
new_comment_deeplink = None

data = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can you show how will the logs look like?

Copy link
Copy Markdown
Contributor Author

@sakshamarora1 sakshamarora1 Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2026-03-26 13:44:29 INFO     Processing legacy comments for recid: 12345
2026-03-26 13:44:29 INFO     Created accepted community submission request<0dc09e5f-d1dd-4c7a-927b-f4bac5111c32> for record<w2d7f-ah177>.
2026-03-26 13:44:29 INFO     Creating event for legacy recid ID<12345> request ID<0dc09e5f-d1dd-4c7a-927b-f4bac5111c32> comment ID<1>
2026-03-26 13:44:29 INFO     Creating event for legacy recid ID<12345> request ID<0dc09e5f-d1dd-4c7a-927b-f4bac5111c32> comment ID<2> parent comment ID<4f02a138-bb6c-42f3-bea5-33c3591d4d50>
2026-03-26 13:44:29 INFO     Found parent event<4f02a138-bb6c-42f3-bea5-33c3591d4d50> for reply event. Setting parent_id.
2026-03-26 13:44:29 INFO     Creating event for legacy recid ID<12345> request ID<0dc09e5f-d1dd-4c7a-927b-f4bac5111c32> comment ID<3> parent comment ID<4f02a138-bb6c-42f3-bea5-33c3591d4d50>
2026-03-26 13:44:29 INFO     Found parent event<4f02a138-bb6c-42f3-bea5-33c3591d4d50> for reply event. Setting parent_id.
2026-03-26 13:44:29 INFO     Successfully migrated 3 comments for request: 0dc09e5f-d1dd-4c7a-927b-f4bac5111c32 from recid: 12345
2026-03-26 13:44:29 INFO     Processing legacy comments for recid: 23456
2026-03-26 13:44:29 INFO     Created accepted community submission request<cb4be0d3-8659-46e3-92e0-78607f6e2cc0> for record<a5dgg-16819>.
2026-03-26 13:44:29 INFO     Creating event for legacy recid ID<23456> request ID<cb4be0d3-8659-46e3-92e0-78607f6e2cc0> comment ID<4>
2026-03-26 13:44:29 INFO     Adding file<4_content.pdf> to the event.
2026-03-26 13:44:29 INFO     Successfully migrated 1 comments for request: cb4be0d3-8659-46e3-92e0-78607f6e2cc0 from recid: 23456
2026-03-26 13:44:29 INFO     Processing legacy comments for recid: 34567
2026-03-26 13:44:30 INFO     Created accepted community submission request<8fa57b89-6e2c-4fd3-a67c-282b3c0c81e5> for record<6hg5n-txn66>.
2026-03-26 13:44:30 INFO     Creating event for legacy recid ID<34567> request ID<8fa57b89-6e2c-4fd3-a67c-282b3c0c81e5> comment ID<5>
2026-03-26 13:44:30 ERROR    Error: User not found. | Field: user_email | Value: unknown@example.com | Recid: 34567 | Comment ID: 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comments Migration: Attach files for comments with files Comments migration

2 participants