Improve protein parsing by swooster · Pull Request #73 · SecureDNA/quickdna

swooster · 2026-03-09T21:26:56Z

@Ravna ran into an issue where invalid data wasn't caught early, resulting in much head-scratching: Two fastas had been concatenated without a newline between them, but FastaParser::<ProteinSequence> didn't catch that. This fixes that, as well as the fact that proteins don't strip out spaces, even though the tests imply that they should.

swooster · 2026-03-09T21:29:13Z

Important note: I wasn't sure whether to accept - as a valid protein character (wikipedia says that's commonly used to mean a gap of indeterminate length). I've opted to omit it for now.

hwchen · 2026-03-10T01:22:50Z

Since we're examining AA parsing, should we consider using an include-list for valid amino acids, instead of just checking ascii alpha?

I feel like there was a comment somewhere about just using u8 for now instead of having a proper AA type, but I don't think that needs to preclude us from parsing more strictly.

src/rust_api.rs

swooster · 2026-03-10T02:25:17Z

The reason I didn't include an AA list is because every single letter is some kind of AA code. Some are ambiguous and some are uncommon, but all letters are potentially valid.

(also refactor two dereferences into one reference destructuring)

swooster added 4 commits March 9, 2026 14:13

Test that sequences actually strip spaces, not allow them through.

df0aaf3

Fix proteins not stripping spaces.

429b88e

Add test for parsing invalid proteins.

39edec5

Fix proteins accepting invalid amino acids.

3cb5306

swooster requested a review from a team March 9, 2026 21:26

hwchen reviewed Mar 10, 2026

View reviewed changes

src/rust_api.rs Outdated Show resolved Hide resolved

hwchen approved these changes Mar 10, 2026

View reviewed changes

Refactor is_bad_aa to is_seq_char.

bafd80f

(also refactor two dereferences into one reference destructuring)

swooster merged commit ad2129e into main Mar 10, 2026
8 checks passed

swooster deleted the swooster/improve-protein-parsing branch March 10, 2026 03:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve protein parsing#73

Improve protein parsing#73
swooster merged 5 commits intomainfrom
swooster/improve-protein-parsing

swooster commented Mar 9, 2026

Uh oh!

swooster commented Mar 9, 2026 •

edited

Loading

Uh oh!

hwchen commented Mar 10, 2026

Uh oh!

Uh oh!

swooster commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

swooster commented Mar 9, 2026

Uh oh!

swooster commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hwchen commented Mar 10, 2026

Uh oh!

Uh oh!

swooster commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

swooster commented Mar 9, 2026 •

edited

Loading