Refactor deploy_component to streaming payload using mutlipart request#197
Refactor deploy_component to streaming payload using mutlipart request#197
deploy_component to streaming payload using mutlipart request#197Conversation
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
There was a problem hiding this comment.
This looks really cool, nice work. However, the major issue here is that I don't think this will replicate. When replicateOperation serializes this request, it will serialize Readable as an empty object (there isn't really a valid synchronous serialization of a Readable, and even if there was, we would be back to putting the whole file in a big buffer).
And replication is actually the bigger issue with large component deploys right now. I believe our current CBOR encoding supports up to 2GB payloads (I don't know of anyone trying to go bigger than that), but replication is currently set a max payload 100MB. I was just working on a PR to address this: https://github.com/HarperFast/harperdb/pull/3079/changes
Will this work though? Are there timeouts that we might encounter if we try to send a 300MB message over WebSockets? I am not sure. And I do think that ultimately a streaming mechanism (as you have started) is the right approach. But we will probably need to lean on the mechanism used for streaming blobs through replication, or something like that, to stream component tarfiles.
Update the
deploy_componentoperation to support multipart requests enabling streaming the payload.On the client side, the request is comprised of two parts: the cbor-encoded JSON data minus the
payloadfollowed by the stream the tarball bytes.On the server side, the request is parsed and the first part is decoded into JSON containing everything except the
payload, then the second part is the tarball bytes. The payload stream is set as thepayloadas aReadable. Luckily, aReadableis exactly whatextractApplication()wants and we're good to go!The server-side changes are backwards compatible with
payloadbeing encoded in the body.I did some before and after tests to see the memory difference. I tested on clean install (e.g. nuked the ~/harper directory). Upon startup, Harper sits at a cool 300MB. Before, the client-side would consume around 2GB of memory packaging a 875MB website, then the server-side would jump to around 3GB of memory. After these changes, the client-side consumes around 100MB and the server-side consumes 1.2GB of memory. After garbage collection kicks in, the memory drops back down to 300MB.
I encourage the reviewers to scrutinize every line and pull the branch locally to test.
Fixes #187.