Skip to content

Spawn Wisely#210

Merged
kriszyp merged 9 commits intomainfrom
spawn-wisely
Mar 13, 2026
Merged

Spawn Wisely#210
kriszyp merged 9 commits intomainfrom
spawn-wisely

Conversation

@kriszyp
Copy link
Member

@kriszyp kriszyp commented Mar 11, 2026

Override child_process module to ensure we only spawn trusted processes and that we only spawn one and only one process for a given name, across threads

kriszyp added 2 commits March 11, 2026 08:12
…es and that we only spawn one and only one process for a given name, across threads
@kriszyp kriszyp marked this pull request as ready for review March 11, 2026 15:46
@kriszyp kriszyp requested a review from a team as a code owner March 11, 2026 15:46
Copy link
Member

@Ethan-Arrowood Ethan-Arrowood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wont explicitly approve or request changes in case you want to complete my recommendations in follow up work. I'm fine with these changes landing as is and we can iterate. I feel most strongly about making things configurable and being open by default. harper-pro / fabric can lock things down by default, but the harper core should remain as open as possible.

Comment on lines +481 to +517
class ExistingProcessWrapper extends EventEmitter {
pid: number;
private checkInterval: NodeJS.Timeout;

constructor(pid: number) {
super();
this.pid = pid;

// Monitor process and emit exit event when it terminates
this.checkInterval = setInterval(() => {
try {
// Signal 0 checks if process exists without actually killing it
process.kill(pid, 0);
} catch {
// Process no longer exists
clearInterval(this.checkInterval);
this.emit('exit', null, null);
}
}, 1000);
}

// Kill the process
kill(signal?: NodeJS.Signals | number) {
try {
process.kill(this.pid, signal);
return true;
} catch {
return false;
}
}

// Clean up interval when wrapper is no longer needed
unref() {
clearInterval(this.checkInterval);
return this;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class ExistingProcessWrapper extends EventEmitter {
pid: number;
private checkInterval: NodeJS.Timeout;
constructor(pid: number) {
super();
this.pid = pid;
// Monitor process and emit exit event when it terminates
this.checkInterval = setInterval(() => {
try {
// Signal 0 checks if process exists without actually killing it
process.kill(pid, 0);
} catch {
// Process no longer exists
clearInterval(this.checkInterval);
this.emit('exit', null, null);
}
}, 1000);
}
// Kill the process
kill(signal?: NodeJS.Signals | number) {
try {
process.kill(this.pid, signal);
return true;
} catch {
return false;
}
}
// Clean up interval when wrapper is no longer needed
unref() {
clearInterval(this.checkInterval);
return this;
}
}
class ExistingProcessWrapper extends EventEmitter {
#pid: number;
#checkInterval: NodeJS.Timeout;
constructor(pid: number) {
super();
this.#pid = pid;
// Monitor process and emit exit event when it terminates
this.#checkInterval = setInterval(() => {
try {
// Signal 0 checks if process exists without actually killing it
process.kill(pid, 0);
} catch {
// Process no longer exists
clearInterval(this.#checkInterval);
this.emit('exit', null, null);
}
}, 1000);
}
get pid() {
return this.#pid;
}
// Kill the process
kill(signal?: NodeJS.Signals | number) {
try {
process.kill(this.#pid, signal);
return true;
} catch {
return false;
}
}
// Clean up interval when wrapper is no longer needed
unref() {
clearInterval(this.#checkInterval);
return this;
}
}

Replace private with # so it is actually private. The private keyword is compile time only and has no runtime enforcement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sure, but pid wasn't really intended to be private. In trying match node, it is just a plain ol' public property. But yeah, #checkInterval sounds good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we wouldn't want the user to be able to set the pid though? Even accidentally? Thats what I guarding for by switching to a getter pattern

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds nice. I think you meant to file a bug with node.js, not comment on a PR ;) (I am just following what node has defined).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha yes, I assumed Node.js had this property locked down so users cannot set it. But maybe that is an invalid assumption.

'fs',
]);
function checkAllowedModulePath(moduleUrl: string, containingFolder: string): boolean {
const ALLOWED_COMMANDS = new Set(['next', 'npm', 'node']);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be configurable and expandable somehow. Maybe plugins should have a way to register commands? And the security contract is that if a user trusts a plugin they trust whatever commands that plugin may register as well?

From the perspective of an open core, we need to ensure maximum configurability so that we don't unnecessarily restrict open usage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I could make this configurable 👍
FWIW, the plugins were running outside the VM, but was changing that with this PR #209
Primarily because it is somewhat confusing that we can provide a consistent behavior of import from 'harper' or 'harperdb' for application (via the VM), but can't do so with the plugin. But yeah, we could also give plugins a way to register commands.


function createSpawn(spawnFunction: (...args: any) => child_process.ChildProcess, alwaysAllow?: boolean) {
const basePath = env.getHdbBasePath();
return function (command: string, args: any, options: any) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to change the function signature of exec() or execFile().

Ideally, the REPLACED_BUILTIN_MODULES would have each function's signature in tact.

I have an idea... bear with me...

// pull in some child_process types:
import * as child_process, { type ChildProcess, type ExecException, type ExecOptions } from 'node:child_process';
import { mkdirSync, writeFileSync, rmSync } from 'node:fs';
import { join } from 'node:path';

// cache this in a global since it doesn't change
const basePath = env.getHdbBasePath();

function checkCommand(fnName: string, command: string, name?: string, alwaysAllow: boolean = false): { existingPid: number | null, pidFilePath: string } {
	if (!ALLOWED_COMMANDS.has(command) && !alwaysAllow) {
		throw new Error(`Command ${command} is not allowed`);
	}
	const processName = name;
		if (!processName)
			throw new Error(
				`Calling ${fnName} in Harper must have a process "name" in the options to ensure that a single process is started and reused`
			);

	// Ensure PID directory exists
	const pidDir = join(basePath, 'pids');
	mkdirSync(pidDir, { recursive: true });

	const pidFilePath = join(pidDir, `${processName}.pid`);

	// Try to acquire lock - returns null if we got the lock, or PID if process exists
	const existingPid = acquirePidFileLock(pidFilePath);
	return { existingPid, pidFilePath };
}

function createPidFile(child: ChildProcess, pidFilePath: string): ChildProcess {
	// Write PID to the file we just created
	const { pid } = child; // TIL: `pid` can be undefined!
	if (pid === undefined) {
		// child failed, let the caller handle `.on('error')`
		return child;
	}
	try {
		writeFileSync(pidFilePath, pid.toString(), 'utf-8');
	} catch (err) {
		// Failed to write PID, clean up
		child.kill();
		rmSync(pidFilePath, { force: true });
		throw err;
	}

	// Clean up PID file when process exits
	child.on('exit', () => rmSync(pidFilePath, { force: true }));
	return child;
}

// you could shove this in a types file
type ChildProcessWrapper = {
	exec(command: string): ChildProcess;
	exec(
		command: string,
		options: ({ encoding: 'buffer' | null } & ExecOptions) | null
	): ChildProcess;
	exec(
		command: string,
		callback: (error: ExecException | null, stdout: string, stderr: string) => void
	): ChildProcess;
	exec(
		command: string,
		options: ({ encoding: 'buffer' | null } & ExecOptions) | ExecOptions | null,
		callback: (error: ExecException | null, stdout: string | Buffer, stderr: string | Buffer) => void
	): ChildProcess;
	/* execFile, fork, spawn */
};

const REPLACED_BUILTIN_MODULES = {
	child_process: {
		exec(command: string, options: ExecOptions & { name?: string }, callback?: (error: ExecException | null, stdout: string, stderr: string) => void): ChildProcess {
			const { existingPid, pidFilePath } = checkCommand('exec', command.split(' ')[0], options?.name);
			if (typeof existingPid === 'number') {
				return new ExistingProcessWrapper(existingPid);
			}
			return createPidFile(child_process.exec(command, options, callback), pidFilePath);
		},
		/* execFile, fork, spawn */
	} as ChildProcessWrapper
};

It's a bit more verbose, but it will preserve the function signature types and it will help eliminate a few any types. Whatcha think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I am following. We are not directly exposing these functions to users, so I am not sure what it means to "preserve the function signature types". Users will be invoking these functions with import { spawn } from 'node:fs'. And AFAIK, the type system will always resolve those types to node's provided types, nothing in this code effects that at all does it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, TypeScript should pick up the types from @types/node. Ignore 99% my comment. The function signature for execFile is missing the 4th parameter: child_process.execFile(file[, args][, options][, callback]).

Copy link
Member

@Ethan-Arrowood Ethan-Arrowood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking great!

lockdown: freeze
containment: vm
dependencyContainment: false
allowedShellCommands:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allowedShellCommands:
allowedSpawnCommands:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, just fixed this, this would have saved me a few minutes if I had looked at the PR first :).

get() {}, // make it look like a resource
testFork() {
// Fork should work (allowed command)
const child = fork('next', ['--version'], { name: 'test-next-process' });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to fail now, no?

return true;
},
};
const ALLOWED_COMMANDS = new Set(env.get(CONFIG_PARAMS.APPLICATIONS_ALLOWEDSPAWNCOMMANDS) ?? []);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice solution here. So by default, all built-in modules are allowed, but no commands. Should we allow all commands by default too? Or is that too much of a security risk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, allowing all things except a block list isn't a very good security posture. Part of my intent here is to try lock down things more with this major version upgrade. And then we can easily expand and allow more after v5.

Copy link
Contributor

@cb1kenobi cb1kenobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it!

@kriszyp kriszyp merged commit a19101e into main Mar 13, 2026
22 checks passed
@kriszyp kriszyp deleted the spawn-wisely branch March 13, 2026 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants