Skip to content

inspire: get exact match for accelerators and experiments#747

Open
jrcastro2 wants to merge 1 commit intoCERNDocumentServer:masterfrom
jrcastro2:add-exact-match-vocab
Open

inspire: get exact match for accelerators and experiments#747
jrcastro2 wants to merge 1 commit intoCERNDocumentServer:masterfrom
jrcastro2:add-exact-match-vocab

Conversation

@jrcastro2
Copy link
Copy Markdown
Contributor

@jrcastro2 jrcastro2 commented Mar 23, 2026

"LHCb": "LHCB",
"AMS": "AMS-RE1",
"NA-62": "NA62",
"NA-062": "NA62",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added a few, this should be expanded as new values are found (unless there is an easy way to get all of them?)

}
"""INSPIRE to CDS accelerator vocabulary mappings."""

CDS_INSPIRE_EXPERIMENT_MAPPINGS = {
Copy link
Copy Markdown
Contributor

@kpsherva kpsherva Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we instead just normalise the term on the fly? what I see here is that we are uppercasing and removing hyphens... I find it troublesome to have to maintain this list when we can normalise the name before performing a search (same comment for the config above)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I understand that we want to do exact matches, and just added the list for the cases where we do not make the exact matches. To remove the mapping I guess we could first search the term as it is, or if it fails we search with the value normalized (so that it covers the more generic values), does this work?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the list, I don't think we can normalize AMS to AMS-RE1 or NA-062 to NA62

I was going to suggest to use the props field but it only supports 1 string value :(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed IRL, agreed to keep it simple, search for the value and if not found normalize it, search again, if not found simply log the error message with the failed value, and remove the mapping. If the value is not found it means is wrong, it should be updated in the source.

Copy link
Copy Markdown
Contributor

@kpsherva kpsherva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have suggested a simplification, please check. In addition, multiple tests are failing

@jrcastro2 jrcastro2 force-pushed the add-exact-match-vocab branch from 794a631 to 5f60844 Compare March 25, 2026 13:33
f"Failed vocabulary search for '{original_term}' in '{vocab_type}'. Error: {e}."
)
return None
except Exception as e:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: do we really need to catch separately? 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[INSPIRE Harvester] Ensure vocabularies exact match

3 participants