Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 4, 2026

⚡️ This pull request contains optimizations for PR #1199

If you approve this dependent PR, these changes will be merged into the original PR branch omni-java.

This PR will be automatically closed if the original PR is merged.


📄 52% (0.52x) speedup for FunctionOptimizer._get_java_sources_root in codeflash/optimization/function_optimizer.py

⏱️ Runtime : 202 microseconds 133 microseconds (best of 25 runs)

📝 Explanation and details

The optimized code achieves a 51% speedup (202μs → 133μs) by reducing redundant work in the _get_java_sources_root method through three key improvements:

1. Single-pass loop structure
The original code used two separate loops over the same parts tuple. The optimized version merges these into a single loop, eliminating redundant iteration. Since parts can have hundreds of components (as shown in the large path tests), this avoids processing up to 1000+ elements twice.

2. Deferred Path construction
The optimized code only creates the Path(*parts[:i]) object when actually needed (when a standard prefix is found at i > 0). Line profiler shows this Path construction takes 40% of total time in the optimized version, so avoiding unnecessary constructions in fallback paths saves significant cycles.

3. Conditional debug logging
By wrapping debug statements with if logger.isEnabledFor(logging.DEBUG):, the code avoids constructing expensive f-strings when debug logging is disabled (the common case in production). The original code always built these strings even when they wouldn't be logged.

Test Results Analysis:

  • Small paths (4-7 components): 70-153% faster — benefits most from eliminating the second loop
  • Medium paths (~7 components with multiple prefixes): 85% faster — shows effectiveness of early-return optimization
  • Large paths (200-500 components): 16-31% faster — still gains from single-pass despite more iterations

The optimization maintains exact functional behavior: it returns the same path boundaries for standard package prefixes (prioritizing them over 'java' directories) and falls back to tests_root identically to the original. This is particularly important since the function helps locate Java source roots in Maven/Gradle project structures, where correctness is critical for test file resolution.

Impact on workloads: Since this method is called during function optimizer initialization for Java projects, the speedup reduces setup overhead proportionally to path complexity, with larger/deeper project structures seeing the most benefit.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 9 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 81.2%
🌀 Click to see Generated Regression Tests
from pathlib import Path

# imports
import pytest  # used for our unit tests
from codeflash.optimization.function_optimizer import FunctionOptimizer
from codeflash.verification.verification_utils import TestConfig

# Helper to create a FunctionOptimizer instance without running its __init__
# We only need an object of the real FunctionOptimizer class whose .test_cfg attribute
# is set to a TestConfig with the tests_root we want to exercise. Using object.__new__
# ensures we create a real instance of the class (no stubs) but avoid running heavy __init__.
def _make_optimizer_with_tests_root(tests_root: Path) -> FunctionOptimizer:
    # Create a TestConfig using the real constructor (3 required positional args).
    # project_root_path and tests_project_rootdir are not used by _get_java_sources_root,
    # but are required to construct TestConfig properly.
    test_cfg = TestConfig(tests_root, Path("project_root"), Path("tests_project_rootdir"))
    # Create a bare instance of FunctionOptimizer without invoking __init__
    optimizer = object.__new__(FunctionOptimizer)
    # Attach only the attribute needed by _get_java_sources_root
    optimizer.test_cfg = test_cfg
    return optimizer

def test_detects_java_sources_root_before_standard_package_prefix():
    # Setup: tests_root contains a standard Java package prefix ("com")
    tests_root = Path("project") / "test" / "src" / "com" / "aerospike" / "tests"
    optimizer = _make_optimizer_with_tests_root(tests_root)

    # Call the method under test
    codeflash_output = optimizer._get_java_sources_root(); result = codeflash_output # 21.7μs -> 9.99μs (118% faster)

def test_returns_tests_root_when_prefix_is_at_root():
    # Setup: first path component is a standard prefix -> should NOT strip (i == 0)
    tests_root = Path("com") / "example" / "tests"
    optimizer = _make_optimizer_with_tests_root(tests_root)

    # Call the method under test
    codeflash_output = optimizer._get_java_sources_root(); result = codeflash_output # 10.5μs -> 4.15μs (153% faster)

def test_detects_maven_style_java_directory_and_includes_java():
    # Setup: Maven-style path with a 'java' directory earlier in the path
    tests_root = Path("project") / "src" / "test" / "java" / "com" / "example"
    optimizer = _make_optimizer_with_tests_root(tests_root)

    # Call the method under test
    codeflash_output = optimizer._get_java_sources_root(); result = codeflash_output # 15.3μs -> 8.99μs (70.6% faster)

def test_returns_tests_root_if_no_standard_prefix_and_no_java_dir():
    # Setup: a path that doesn't include standard package prefixes nor 'java'
    tests_root = Path("some") / "random" / "tests" / "unit"
    optimizer = _make_optimizer_with_tests_root(tests_root)

    # Call the method under test
    codeflash_output = optimizer._get_java_sources_root(); result = codeflash_output # 9.84μs -> 4.13μs (138% faster)

def test_company_and_comfy_do_not_trigger_prefix_detection():
    # Setup: components include words that contain 'com' as substring but are not equal -> should not match
    tests_root = Path("project") / "company" / "comfy" / "module"
    optimizer = _make_optimizer_with_tests_root(tests_root)

    # Call the method under test
    codeflash_output = optimizer._get_java_sources_root(); result = codeflash_output # 9.59μs -> 3.93μs (144% faster)

def test_multiple_standard_prefixes_chooses_first_occurrence():
    # Setup: include two different standard package prefixes; the function must pick the first one encountered
    parts = ["root", "level1", "org", "subpackage", "com", "example", "tests"]
    tests_root = Path(*parts)
    optimizer = _make_optimizer_with_tests_root(tests_root)

    # Call the method under test
    codeflash_output = optimizer._get_java_sources_root(); result = codeflash_output # 13.7μs -> 7.37μs (85.6% faster)

def test_root_path_returns_same_without_error():
    # Setup: absolute root path (platform-dependent, Path('/') on Unix-like), no prefixes
    tests_root = Path("/")  # safe cross-platform representation of root for Unix-style systems
    optimizer = _make_optimizer_with_tests_root(tests_root)

    # Call the method under test - should not raise and should return the same Path
    codeflash_output = optimizer._get_java_sources_root(); result = codeflash_output # 8.98μs -> 3.68μs (144% faster)

def test_large_path_with_prefix_in_middle_is_handled_correctly():
    # Create a long path (well under 1000 components) and place a standard prefix in the middle.
    # This assesses that the method efficiently scans parts and returns the prefix boundary properly.
    n_components = 200  # well below the 1000 element guideline
    components = [f"dir{i}" for i in range(n_components)]
    # Place 'com' at index 100
    prefix_index = 100
    components[prefix_index] = "com"
    # Add a few more package-like components after 'com'
    components.extend(["example", "subpackage", "tests"])
    tests_root = Path(*components)
    optimizer = _make_optimizer_with_tests_root(tests_root)

    # Call the method under test
    codeflash_output = optimizer._get_java_sources_root(); result = codeflash_output # 54.8μs -> 47.0μs (16.6% faster)

    # Expectation: return Path built from components before index 100
    expected = Path(*components[:prefix_index])

def test_large_path_without_any_prefix_returns_same_quickly():
    # Create a large path with many components but no standard prefixes nor 'java'
    n_components = 500  # still under 1000
    components = [f"node{i}" for i in range(n_components)]
    tests_root = Path(*components)
    optimizer = _make_optimizer_with_tests_root(tests_root)

    # Call the method under test - should complete deterministically and return original
    codeflash_output = optimizer._get_java_sources_root(); result = codeflash_output # 57.8μs -> 44.0μs (31.5% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1199-2026-02-04T08.13.40 and push.

Codeflash Static Badge

The optimized code achieves a **51% speedup** (202μs → 133μs) by reducing redundant work in the `_get_java_sources_root` method through three key improvements:

**1. Single-pass loop structure**
The original code used two separate loops over the same `parts` tuple. The optimized version merges these into a single loop, eliminating redundant iteration. Since `parts` can have hundreds of components (as shown in the large path tests), this avoids processing up to 1000+ elements twice.

**2. Deferred Path construction**
The optimized code only creates the `Path(*parts[:i])` object when actually needed (when a standard prefix is found at `i > 0`). Line profiler shows this Path construction takes 40% of total time in the optimized version, so avoiding unnecessary constructions in fallback paths saves significant cycles.

**3. Conditional debug logging**
By wrapping debug statements with `if logger.isEnabledFor(logging.DEBUG):`, the code avoids constructing expensive f-strings when debug logging is disabled (the common case in production). The original code always built these strings even when they wouldn't be logged.

**Test Results Analysis:**
- Small paths (4-7 components): **70-153% faster** — benefits most from eliminating the second loop
- Medium paths (~7 components with multiple prefixes): **85% faster** — shows effectiveness of early-return optimization
- Large paths (200-500 components): **16-31% faster** — still gains from single-pass despite more iterations

The optimization maintains exact functional behavior: it returns the same path boundaries for standard package prefixes (prioritizing them over 'java' directories) and falls back to `tests_root` identically to the original. This is particularly important since the function helps locate Java source roots in Maven/Gradle project structures, where correctness is critical for test file resolution.

**Impact on workloads:** Since this method is called during function optimizer initialization for Java projects, the speedup reduces setup overhead proportionally to path complexity, with larger/deeper project structures seeing the most benefit.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 4, 2026
@codeflash-ai codeflash-ai bot mentioned this pull request Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants