Skip to content

Modify title suppression script to take a list of identifiers (PP-3754)#3086

Merged
dbernstein merged 5 commits intomainfrom
feature/PP-3754-suppress-titles-for-library-by-csv
Mar 4, 2026
Merged

Modify title suppression script to take a list of identifiers (PP-3754)#3086
dbernstein merged 5 commits intomainfrom
feature/PP-3754-suppress-titles-for-library-by-csv

Conversation

@dbernstein
Copy link
Contributor

@dbernstein dbernstein commented Feb 26, 2026

Description

Extends SuppressWorkForLibraryScript to support batch suppression from a CSV file, a dry-run mode, structured console output, and all-or-nothing transactional safety.

Changes

suppress.py

  • New SuppressResult enum — tracks per-identifier outcome: NEWLY_SUPPRESSED, ALREADY_SUPPRESSED, or NOT_FOUND
  • New -f FILE_PATH argument — accepts a CSV file with an identifier column and an optional identifier_type column; rows with a missing/empty type fall back to the --identifier-type CLI default (ISBN)
  • -i and -f are mutually exclusive — enforced by argparse; exactly one must be provided
  • New --dry-run flag — processes and reports the full batch without writing to the database
  • New load_identifiers_from_file() — parses the CSV, validates the required identifier column, and returns (identifier_type, identifier) pairs
  • Updated suppress_work() — returns a SuppressResult; detects already-suppressed works; no longer calls commit() directly
  • Single transactional commit in do_run() — all mutations are flushed in one commit() at the end; any unexpected exception triggers a full rollback() and re-raise, guaranteeing all-or-nothing persistence; per-row PalaceValueError (identifier not found, no associated work) is still handled gracefully as NOT_FOUND and does not abort the batch
  • New _print_results() — prints a summary (counts of newly suppressed / already suppressed / not found) and a per-identifier detail line to stdout; prefixed with [DRY RUN] when applicable
  • f-string cleanup — replaced the last %s substitution in the --library help text with an f-string

test_suppress.py

31 tests, all passing. New coverage includes:

Area Tests added
Arg parsing -f flag, --dry-run flag, mutual exclusivity of -i/-f
CSV loading With type column, without type column, empty type falls back to default, missing identifier column
suppress_work Already suppressed, no work for identifier, dry-run (no mutation), dry-run already suppressed, does not call commit()
do_run integration Dry-run, batch via file, not-found identifier, single commit() for entire batch
Transactional safety rollback() called on commit failure, rollback() called on unexpected mid-loop error
Output formatting Normal mode labels and counts, dry-run mode labels and counts

Motivation and Context

https://ebce-lyrasis.atlassian.net/browse/PP-3754

How Has This Been Tested?

Unit test coverage added. I also did some manual testing.

Checklist

  • I have updated the documentation accordingly.
  • All new and existing tests passed.

@dbernstein dbernstein force-pushed the feature/PP-3754-suppress-titles-for-library-by-csv branch from ae489ec to bdfaede Compare February 26, 2026 19:33
@dbernstein dbernstein marked this pull request as ready for review February 26, 2026 19:33
@dbernstein dbernstein requested a review from a team February 26, 2026 19:34
@dbernstein dbernstein force-pushed the feature/PP-3754-suppress-titles-for-library-by-csv branch from bdfaede to 33bd62e Compare February 26, 2026 21:30
@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.22%. Comparing base (a1e72d1) to head (c672c5c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3086      +/-   ##
==========================================
+ Coverage   93.20%   93.22%   +0.01%     
==========================================
  Files         491      491              
  Lines       45259    45336      +77     
  Branches     6225     6239      +14     
==========================================
+ Hits        42184    42264      +80     
+ Misses       1987     1985       -2     
+ Partials     1088     1087       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dbernstein dbernstein force-pushed the feature/PP-3754-suppress-titles-for-library-by-csv branch from 33bd62e to 095e259 Compare March 2, 2026 16:36
Copy link
Member

@jonathangreen jonathangreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added a couple minor comments for consideration.

Comment on lines +134 to +135
if has_type_column and row["identifier_type"].strip():
id_type = row["identifier_type"].strip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: When the CSV includes an identifier_type header but a row omits that column value (e.g. line is 12345 instead of 12345,), csv.DictReader sets row["identifier_type"] to None and calling .strip() raises AttributeError.

Comment on lines +178 to +186
results: dict[tuple[str, str], SuppressResult] = {}
try:
for id_type, id_value in pairs:
try:
identifier = self.load_identifier(id_type, id_value)
result = self.suppress_work(library, identifier, dry_run=dry_run)
except PalaceValueError:
result = SuppressResult.NOT_FOUND
results[(id_type, id_value)] = result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: It am edge case, but its possible that we will have duplicated identifiers in the CSV file. If that is the case duplicate rows overwrite earlier outcomes. If the same identifier appears multiple times, the summary/detail output undercounts processed rows and can report the wrong final distribution (for example, first row newly suppressed, second already suppressed becomes only one ALREADY SUPPRESSED).

Maybe de-duplicate the list before processing and warn about duplicates?

prefix = "[DRY RUN] " if dry_run else ""
suppress_label = "Would suppress" if dry_run else "Newly suppressed"

col = 20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: col = 20 is a magic number. If any label text changes, this silently mis-aligns. Maybe make a comment calling this out, or dynamically compute this?

…-3754)

- Replace legacy _db.query(Library) with _db.scalars(select(Library)) in arg_parser
- Use `is None` instead of falsy checks on ORM objects in load_library and load_identifier
- Wrap open() in load_identifiers_from_file with FileNotFoundError -> PalaceValueError
- Replace row.get("identifier_type", "") with direct row["identifier_type"] access
- Remove redundant default=False from --dry-run argument (implied by action="store_true")
- Fix summary output alignment with fixed-width column formatting
- Replace dead-code status_map.get(result, "UNKNOWN") with status_map[result]
- Add test: dry-run mode does not call commit()
- Add test: missing CSV file path raises PalaceValueError with clear message
- Add test: duplicate CSV rows are passed through as separate pairs by the loader
- Print library name and short name, start timestamp (UTC), and duration
  at the top of every suppression results summary
- Add datetime import; thread started_at and duration_seconds through
  do_run -> _print_results
- Update test_print_results_normal and test_print_results_dry_run to
  pass the new required parameters and assert on the new header fields
1. AttributeError when identifier_type is None (line 134)
When the CSV has an identifier_type header but a row omits the value (e.g. 12345 instead of 12345,), csv.DictReader sets row["identifier_type"] to None, and calling .strip() raises AttributeError.
Fix: Use row.get("identifier_type") or "" before calling .strip()

2. Duplicate identifiers in CSV (line 178)
Duplicate rows overwrote earlier results in the dict, so the summary could undercount and misreport outcomes.
Fix: Deduplicate (identifier_type, identifier) pairs before processing and log a warning

3. Magic number col = 20 (line 218)
Fix: Compute the column width from the labels
@dbernstein dbernstein force-pushed the feature/PP-3754-suppress-titles-for-library-by-csv branch from 095e259 to c672c5c Compare March 3, 2026 22:14
@dbernstein dbernstein merged commit 09088a3 into main Mar 4, 2026
19 checks passed
@dbernstein dbernstein deleted the feature/PP-3754-suppress-titles-for-library-by-csv branch March 4, 2026 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants