Skip to content

Improve protein parsing#73

Merged
swooster merged 5 commits intomainfrom
swooster/improve-protein-parsing
Mar 10, 2026
Merged

Improve protein parsing#73
swooster merged 5 commits intomainfrom
swooster/improve-protein-parsing

Conversation

@swooster
Copy link
Contributor

@swooster swooster commented Mar 9, 2026

@Ravna ran into an issue where invalid data wasn't caught early, resulting in much head-scratching: Two fastas had been concatenated without a newline between them, but FastaParser::<ProteinSequence> didn't catch that. This fixes that, as well as the fact that proteins don't strip out spaces, even though the tests imply that they should.

@swooster swooster requested a review from a team March 9, 2026 21:26
@swooster
Copy link
Contributor Author

swooster commented Mar 9, 2026

Important note: I wasn't sure whether to accept - as a valid protein character (wikipedia says that's commonly used to mean a gap of indeterminate length). I've opted to omit it for now.

@hwchen
Copy link
Contributor

hwchen commented Mar 10, 2026

Since we're examining AA parsing, should we consider using an include-list for valid amino acids, instead of just checking ascii alpha?

I feel like there was a comment somewhere about just using u8 for now instead of having a proper AA type, but I don't think that needs to preclude us from parsing more strictly.

@swooster
Copy link
Contributor Author

The reason I didn't include an AA list is because every single letter is some kind of AA code. Some are ambiguous and some are uncommon, but all letters are potentially valid.

(also refactor two dereferences into one reference destructuring)
@swooster swooster merged commit ad2129e into main Mar 10, 2026
8 checks passed
@swooster swooster deleted the swooster/improve-protein-parsing branch March 10, 2026 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants