feat(comments): Add runner for comments migration separately#380
feat(comments): Add runner for comments migration separately#380sakshamarora1 wants to merge 12 commits intoCERNDocumentServer:masterfrom
Conversation
kpsherva
left a comment
There was a problem hiding this comment.
can we store migrated comments ids on the request level to have a retry strategy if a migration run fails? In order to understand which comments to skip on the second/third etc run
f420de2 to
7f6c6ed
Compare
kpsherva
left a comment
There was a problem hiding this comment.
LGMT, I had couple of comments left.
Can you add the new commands to the recipe here:
https://gitlab.cern.ch/cds-team/cds-rdm-openshift/-/issues?show=eyJpaWQiOiIxMiIsImZ1bGxfcGF0aCI6ImNkcy10ZWFtL2Nkcy1yZG0tb3BlbnNoaWZ0IiwiaWQiOjMzODkzNH0%3D
I will later convert it to a template
7b5bb89 to
db05067
Compare
db05067 to
d42c37a
Compare
4ed4497 to
610b68d
Compare
| from cds_migrator_kit.users.load import CDSSubmitterLoad | ||
|
|
||
|
|
||
| class CDSCommentsLoad(Load): |
There was a problem hiding this comment.
please ticketize also to add /comment URL redirections
| else: | ||
| new_comment_deeplink = None | ||
|
|
||
| data = { |
There was a problem hiding this comment.
LGTM, can you show how will the logs look like?
There was a problem hiding this comment.
2026-03-26 13:44:29 INFO Processing legacy comments for recid: 12345
2026-03-26 13:44:29 INFO Created accepted community submission request<0dc09e5f-d1dd-4c7a-927b-f4bac5111c32> for record<w2d7f-ah177>.
2026-03-26 13:44:29 INFO Creating event for legacy recid ID<12345> request ID<0dc09e5f-d1dd-4c7a-927b-f4bac5111c32> comment ID<1>
2026-03-26 13:44:29 INFO Creating event for legacy recid ID<12345> request ID<0dc09e5f-d1dd-4c7a-927b-f4bac5111c32> comment ID<2> parent comment ID<4f02a138-bb6c-42f3-bea5-33c3591d4d50>
2026-03-26 13:44:29 INFO Found parent event<4f02a138-bb6c-42f3-bea5-33c3591d4d50> for reply event. Setting parent_id.
2026-03-26 13:44:29 INFO Creating event for legacy recid ID<12345> request ID<0dc09e5f-d1dd-4c7a-927b-f4bac5111c32> comment ID<3> parent comment ID<4f02a138-bb6c-42f3-bea5-33c3591d4d50>
2026-03-26 13:44:29 INFO Found parent event<4f02a138-bb6c-42f3-bea5-33c3591d4d50> for reply event. Setting parent_id.
2026-03-26 13:44:29 INFO Successfully migrated 3 comments for request: 0dc09e5f-d1dd-4c7a-927b-f4bac5111c32 from recid: 12345
2026-03-26 13:44:29 INFO Processing legacy comments for recid: 23456
2026-03-26 13:44:29 INFO Created accepted community submission request<cb4be0d3-8659-46e3-92e0-78607f6e2cc0> for record<a5dgg-16819>.
2026-03-26 13:44:29 INFO Creating event for legacy recid ID<23456> request ID<cb4be0d3-8659-46e3-92e0-78607f6e2cc0> comment ID<4>
2026-03-26 13:44:29 INFO Adding file<4_content.pdf> to the event.
2026-03-26 13:44:29 INFO Successfully migrated 1 comments for request: cb4be0d3-8659-46e3-92e0-78607f6e2cc0 from recid: 23456
2026-03-26 13:44:29 INFO Processing legacy comments for recid: 34567
2026-03-26 13:44:30 INFO Created accepted community submission request<8fa57b89-6e2c-4fd3-a67c-282b3c0c81e5> for record<6hg5n-txn66>.
2026-03-26 13:44:30 INFO Creating event for legacy recid ID<34567> request ID<8fa57b89-6e2c-4fd3-a67c-282b3c0c81e5> comment ID<5>
2026-03-26 13:44:30 ERROR Error: User not found. | Field: user_email | Value: unknown@example.com | Recid: 34567 | Comment ID: 5
closes: #286
closes: #381
Steps
Output file:
comments_metadata.jsonThe above script also runs this script partly to generate the
missing_users.jsonfilepeople.csv,missing_users.json/valid_users.jsonin/eos/media/cds/cds-rdm/<env>/migration/users/directory andcomments_metadata.jsonfile in/eos/media/cds/cds-rdm/<env>/migration/<collection>/comments/IMPORTANT: We are not running the
./scripts/copy_comments_attached_files.pyyet (or we can, for thesis and IT there are none anyways) [See child issue in the main issue attached]/eos/media/cds/cds-rdm/<env>/migration/users/):(Remove --dry-run)
After running, make sure that the users are indexed and not just created in the DB.
(Remove --dry-run)
The runner can be re-run