Skip to content

Refactor deploy_component to streaming payload using mutlipart request#197

Open
cb1kenobi wants to merge 3 commits intomainfrom
deploy-multipart
Open

Refactor deploy_component to streaming payload using mutlipart request#197
cb1kenobi wants to merge 3 commits intomainfrom
deploy-multipart

Conversation

@cb1kenobi
Copy link
Contributor

@cb1kenobi cb1kenobi commented Mar 5, 2026

Update the deploy_component operation to support multipart requests enabling streaming the payload.

On the client side, the request is comprised of two parts: the cbor-encoded JSON data minus the payload followed by the stream the tarball bytes.

On the server side, the request is parsed and the first part is decoded into JSON containing everything except the payload, then the second part is the tarball bytes. The payload stream is set as the payload as a Readable. Luckily, a Readable is exactly what extractApplication() wants and we're good to go!

The server-side changes are backwards compatible with payload being encoded in the body.

I did some before and after tests to see the memory difference. I tested on clean install (e.g. nuked the ~/harper directory). Upon startup, Harper sits at a cool 300MB. Before, the client-side would consume around 2GB of memory packaging a 875MB website, then the server-side would jump to around 3GB of memory. After these changes, the client-side consumes around 100MB and the server-side consumes 1.2GB of memory. After garbage collection kicks in, the memory drops back down to 300MB.

I encourage the reviewers to scrutinize every line and pull the branch locally to test.

Fixes #187.

@cb1kenobi cb1kenobi requested a review from a team as a code owner March 5, 2026 00:14
@socket-security
Copy link

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Added@​fastify/​busboy@​3.2.010010010087100

View full report

Copy link
Member

@kriszyp kriszyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really cool, nice work. However, the major issue here is that I don't think this will replicate. When replicateOperation serializes this request, it will serialize Readable as an empty object (there isn't really a valid synchronous serialization of a Readable, and even if there was, we would be back to putting the whole file in a big buffer).

And replication is actually the bigger issue with large component deploys right now. I believe our current CBOR encoding supports up to 2GB payloads (I don't know of anyone trying to go bigger than that), but replication is currently set a max payload 100MB. I was just working on a PR to address this: https://github.com/HarperFast/harperdb/pull/3079/changes

Will this work though? Are there timeouts that we might encounter if we try to send a 300MB message over WebSockets? I am not sure. And I do think that ultimately a streaming mechanism (as you have started) is the right approach. But we will probably need to lean on the mechanism used for streaming blobs through replication, or something like that, to stream component tarfiles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large deployments exceed V8's string length

2 participants