Modify title suppression script to take a list of identifiers (PP-3754)#3086
Conversation
ae489ec to
bdfaede
Compare
bdfaede to
33bd62e
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3086 +/- ##
==========================================
+ Coverage 93.20% 93.22% +0.01%
==========================================
Files 491 491
Lines 45259 45336 +77
Branches 6225 6239 +14
==========================================
+ Hits 42184 42264 +80
+ Misses 1987 1985 -2
+ Partials 1088 1087 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
33bd62e to
095e259
Compare
jonathangreen
left a comment
There was a problem hiding this comment.
Looks good, added a couple minor comments for consideration.
| if has_type_column and row["identifier_type"].strip(): | ||
| id_type = row["identifier_type"].strip() |
There was a problem hiding this comment.
Minor: When the CSV includes an identifier_type header but a row omits that column value (e.g. line is 12345 instead of 12345,), csv.DictReader sets row["identifier_type"] to None and calling .strip() raises AttributeError.
| results: dict[tuple[str, str], SuppressResult] = {} | ||
| try: | ||
| for id_type, id_value in pairs: | ||
| try: | ||
| identifier = self.load_identifier(id_type, id_value) | ||
| result = self.suppress_work(library, identifier, dry_run=dry_run) | ||
| except PalaceValueError: | ||
| result = SuppressResult.NOT_FOUND | ||
| results[(id_type, id_value)] = result |
There was a problem hiding this comment.
Minor: It am edge case, but its possible that we will have duplicated identifiers in the CSV file. If that is the case duplicate rows overwrite earlier outcomes. If the same identifier appears multiple times, the summary/detail output undercounts processed rows and can report the wrong final distribution (for example, first row newly suppressed, second already suppressed becomes only one ALREADY SUPPRESSED).
Maybe de-duplicate the list before processing and warn about duplicates?
| prefix = "[DRY RUN] " if dry_run else "" | ||
| suppress_label = "Would suppress" if dry_run else "Newly suppressed" | ||
|
|
||
| col = 20 |
There was a problem hiding this comment.
Minor: col = 20 is a magic number. If any label text changes, this silently mis-aligns. Maybe make a comment calling this out, or dynamically compute this?
…of identifiers (and types). (PP-3754)
…-3754) - Replace legacy _db.query(Library) with _db.scalars(select(Library)) in arg_parser - Use `is None` instead of falsy checks on ORM objects in load_library and load_identifier - Wrap open() in load_identifiers_from_file with FileNotFoundError -> PalaceValueError - Replace row.get("identifier_type", "") with direct row["identifier_type"] access - Remove redundant default=False from --dry-run argument (implied by action="store_true") - Fix summary output alignment with fixed-width column formatting - Replace dead-code status_map.get(result, "UNKNOWN") with status_map[result] - Add test: dry-run mode does not call commit() - Add test: missing CSV file path raises PalaceValueError with clear message - Add test: duplicate CSV rows are passed through as separate pairs by the loader
- Print library name and short name, start timestamp (UTC), and duration at the top of every suppression results summary - Add datetime import; thread started_at and duration_seconds through do_run -> _print_results - Update test_print_results_normal and test_print_results_dry_run to pass the new required parameters and assert on the new header fields
1. AttributeError when identifier_type is None (line 134)
When the CSV has an identifier_type header but a row omits the value (e.g. 12345 instead of 12345,), csv.DictReader sets row["identifier_type"] to None, and calling .strip() raises AttributeError.
Fix: Use row.get("identifier_type") or "" before calling .strip()
2. Duplicate identifiers in CSV (line 178)
Duplicate rows overwrote earlier results in the dict, so the summary could undercount and misreport outcomes.
Fix: Deduplicate (identifier_type, identifier) pairs before processing and log a warning
3. Magic number col = 20 (line 218)
Fix: Compute the column width from the labels
095e259 to
c672c5c
Compare
Description
Extends
SuppressWorkForLibraryScriptto support batch suppression from a CSV file, a dry-run mode, structured console output, and all-or-nothing transactional safety.Changes
suppress.pySuppressResultenum — tracks per-identifier outcome:NEWLY_SUPPRESSED,ALREADY_SUPPRESSED, orNOT_FOUND-f FILE_PATHargument — accepts a CSV file with anidentifiercolumn and an optionalidentifier_typecolumn; rows with a missing/empty type fall back to the--identifier-typeCLI default (ISBN)-iand-fare mutually exclusive — enforced by argparse; exactly one must be provided--dry-runflag — processes and reports the full batch without writing to the databaseload_identifiers_from_file()— parses the CSV, validates the requiredidentifiercolumn, and returns(identifier_type, identifier)pairssuppress_work()— returns aSuppressResult; detects already-suppressed works; no longer callscommit()directlydo_run()— all mutations are flushed in onecommit()at the end; any unexpected exception triggers a fullrollback()and re-raise, guaranteeing all-or-nothing persistence; per-rowPalaceValueError(identifier not found, no associated work) is still handled gracefully asNOT_FOUNDand does not abort the batch_print_results()— prints a summary (counts of newly suppressed / already suppressed / not found) and a per-identifier detail line to stdout; prefixed with[DRY RUN]when applicable%ssubstitution in the--libraryhelp text with an f-stringtest_suppress.py31 tests, all passing. New coverage includes:
-fflag,--dry-runflag, mutual exclusivity of-i/-fidentifiercolumnsuppress_workcommit()do_runintegrationcommit()for entire batchrollback()called on commit failure,rollback()called on unexpected mid-loop errorMotivation and Context
https://ebce-lyrasis.atlassian.net/browse/PP-3754
How Has This Been Tested?
Unit test coverage added. I also did some manual testing.
Checklist