Skip to content

feat: add humanize library to commit0 benchmark#115

Open
07Kaustubh wants to merge 1 commit intocommit-0:mainfrom
07Kaustubh:add-humanize
Open

feat: add humanize library to commit0 benchmark#115
07Kaustubh wants to merge 1 commit intocommit-0:mainfrom
07Kaustubh:add-humanize

Conversation

@07Kaustubh
Copy link
Copy Markdown

Summary

Add python-humanize/humanize (v4.15.0) as a new benchmark entry for the commit0 dataset.

Library Details

Field Value
Original repo python-humanize/humanize
Purpose Human-friendly formatting of numbers, dates, file sizes, and time deltas
Python version 3.12
Tests 737 collected, 737 pass at reference commit
Functions stubbed 34 across 5 source modules
Source modules filesize.py, i18n.py, lists.py, number.py, time.py

Commits

Commit SHA Description
reference_commit 2ddb5903cdc1c7e6eb6b083f4f99f73db50aecd9 Tag 4.15.0 — full working implementation
base_commit 7a405e3f49b907db112ff08d9096720a37f60447 Stubs — 34 functions replaced with pass

Dataset Entry

{
  "instance_id": "commit-0/humanize",
  "repo": "commit-0/humanize",
  "original_repo": "python-humanize/humanize",
  "base_commit": "7a405e3f49b907db112ff08d9096720a37f60447",
  "reference_commit": "2ddb5903cdc1c7e6eb6b083f4f99f73db50aecd9",
  "setup": {
    "install": "pip install -e .",
    "packages": null,
    "pip_packages": ["freezegun", "pytest", "pytest-cov"],
    "pre_install": [
      "export SETUPTOOLS_SCM_PRETEND_VERSION=4.15.0",
      "apt-get update && apt-get install -y gettext",
      "bash scripts/generate-translation-binaries.sh"
    ],
    "python": "3.12",
    "specification": "https://humanize.readthedocs.io/"
  },
  "test": {
    "test_cmd": "pytest",
    "test_dir": "tests/"
  },
  "src_dir": "src/humanize/"
}

Verification

  • import humanize succeeds at base_commit (stubs are syntactically valid)
  • ✅ 737 tests collected at base_commit
  • ✅ Tests fail (not error) at base_commit — stubs return None, producing assertion failures
  • 737/737 tests pass at reference_commit (including all i18n tests with compiled .mo files)
  • ✅ Only 5 files in src/humanize/ modified between reference → base; zero test file changes

Changes in This PR

  1. commit0/harness/constants.py — Add SPLIT_HUMANIZE, append "humanize" to SPLIT_ALL, add entry to SPLIT dict
  2. commit0/data/test_ids/humanize.bz2 — 737 pytest node IDs (bz2-compressed)

Additional Integration Required

For full integration, maintainers will also need to:

  1. Fork python-humanize/humanize to the commit-0 GitHub org (stubs branch available at 07Kaustubh/humanize@commit0-stubs)
  2. Add dataset row to HuggingFace datasets (commit0/commit0 and wentingzhao/commit0_combined)
  3. Build & push Docker image as wentingzhao/humanize:v0

Design Notes

  • i18n dependency chain: All i18n utility functions are stubbed (consistent with babel, jinja, marshmallow precedent). Agents must implement i18n.py before number.py/time.py/filesize.py will function.
  • _ngettext_noop preserved: Called at module level in number.py to build human_powers tuple — stubbing it would cause ImportError.
  • pre_install rationale: gettext + generate-translation-binaries.sh compiles .po.mo files needed by 59 i18n tests. SETUPTOOLS_SCM_PRETEND_VERSION is needed because the shallow Docker clone lacks git tags.

Add python-humanize/humanize (v4.15.0) as a new benchmark entry:
- 34 functions stubbed across 5 modules (filesize, i18n, lists, number, time)
- 737 tests, all passing at reference commit
- Python 3.12, requires gettext for i18n translation compilation

Adds SPLIT_HUMANIZE constant, updates SPLIT_ALL and SPLIT dict,
and includes humanize.bz2 test IDs file (737 pytest node IDs).
Copilot AI review requested due to automatic review settings March 30, 2026 12:01
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the humanize repository as a new selectable Commit0 benchmark target by registering it in the harness repo splits and providing its precomputed pytest node IDs.

Changes:

  • Added humanize to SPLIT_ALL and introduced SPLIT_HUMANIZE, wiring it into the SPLIT mapping.
  • Added commit0/data/test_ids/humanize.bz2 containing the (bz2-compressed) pytest node IDs for the benchmark.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated no comments.

File Description
commit0/harness/constants.py Registers the new humanize repo split and includes it in the “all” set.
commit0/data/test_ids/humanize.bz2 Adds compressed pytest node IDs used by the harness to run the benchmark’s tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants