Skip to content

Add a kw-based function to create a builder#315

Merged
javiermtorres merged 5 commits intomainfrom
314-python-config-explicit
Mar 11, 2026
Merged

Add a kw-based function to create a builder#315
javiermtorres merged 5 commits intomainfrom
314-python-config-explicit

Conversation

@javiermtorres
Copy link
Contributor

A new function using kwargs has been added, called from_config. The old function has been renamed from_configpath instead. This approach has been favored over creating yet another data structure in python. The validation stays in Rust, as checked in the pytests.

Closes #314

@javiermtorres
Copy link
Contributor Author

Tokenizer information is still missing. The current approach will be to reuse serde information via https://docs.rs/serde_plain/latest/serde_plain/index.html maybe. Generating the python type stub information at compile time would be really helpful.

Copy link
Member

@besaleli besaleli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass

# not yet supported
# tokenizer: Optional[str],
validate_transform: bool = True,
target: Optional[str] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should support a StrEnum for this, and then have the signature be Optional[Union[EncoderfileTarget, str]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, sorry, I forgot that one 🙈 I'd just use the strenum in the type annotation, why give the choice 😄

Copy link
Contributor Author

@javiermtorres javiermtorres Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, now I see the issue. I'll accept the target as a string, but possibly implement getters in a targetspec class that doesn't have a new method (unlike other members there). Open to discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to have a format attr like in OpenAPI, frankly. For the moment I'm accepting any string and validating in the rust side (see the test_encoderfilebuilder_wrong_arch test). Anything more detailed feels overkill, but some detailed types for the target components would be welcome, true.

if s.eq_ignore_ascii_case("batch_longest") {
Ok(TokenizerPadStrategy::BatchLongest)
} else {
s.parse::<usize>()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@besaleli This would be the other, more manual approach.

@codecov-commenter
Copy link

codecov-commenter commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 1.75439% with 168 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
encoderfile-py/src/builder.rs 0.00% 123 Missing ⚠️
encoderfile/src/builder/config.rs 9.67% 28 Missing ⚠️
encoderfile/src/common/model_type.rs 0.00% 11 Missing ⚠️
encoderfile/src/builder/base_binary/target_spec.rs 0.00% 6 Missing ⚠️
Files with missing lines Coverage Δ
encoderfile/src/builder/builder.rs 89.18% <ø> (ø)
encoderfile/src/builder/base_binary/target_spec.rs 39.77% <0.00%> (-2.92%) ⬇️
encoderfile/src/common/model_type.rs 21.42% <0.00%> (-78.58%) ⬇️
encoderfile/src/builder/config.rs 77.13% <9.67%> (-9.39%) ⬇️
encoderfile-py/src/builder.rs 0.00% <0.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@besaleli besaleli self-requested a review March 10, 2026 02:30
@besaleli besaleli self-requested a review March 10, 2026 02:31
Copy link
Member

@besaleli besaleli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good so far!!

@javiermtorres
Copy link
Contributor Author

The encoderfile-py crate is not test-covered because we have the tests in python anyway... Would you like to some more coverage maybe in the de/ser parts for the new data types???? Again I think this will come up in the pytests, but if you insist I can add some checks to parse things back and forth.

I'll also add py visibility for the target spec.

@javiermtorres javiermtorres merged commit 8c432de into main Mar 11, 2026
3 checks passed
@javiermtorres javiermtorres deleted the 314-python-config-explicit branch March 11, 2026 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parse EF builder config from Python data structures

3 participants