Skip to content

Comments

fix: fix crash due data race on globalMarkdownWorkletRuntime#752

Open
linhvovan29546 wants to merge 3 commits intoExpensify:mainfrom
linhvovan29546:fix/crash-due-data-race
Open

fix: fix crash due data race on globalMarkdownWorkletRuntime#752
linhvovan29546 wants to merge 3 commits intoExpensify:mainfrom
linhvovan29546:fix/crash-due-data-race

Conversation

@linhvovan29546
Copy link

@linhvovan29546 linhvovan29546 commented Feb 15, 2026

Details

This PR fixes the Sentry crash (std::__ndk1::default_delete<T>::operator()[abi:ne180000]) reported here. The globalMarkdownWorkletRuntime could be accessed from both the JS and UI threads, causing a data race, this PR adds a mutex to protect this shared_ptr.

Related Issues

Expensify/App#82146

Manual Tests

This issue is hard to reproduce, so please we need to verify that the input still works as before like typing, copy/pasting, and other basic interactions

Linked PRs

@github-actions
Copy link

github-actions bot commented Feb 15, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@linhvovan29546
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

tomekzaw
tomekzaw previously approved these changes Feb 15, 2026
@war-in
Copy link
Collaborator

war-in commented Feb 19, 2026

Hi! Thank you for your contribution! Do we have some E/App PR to test the change in app? Let's make sure it doesn't break anything there

@linhvovan29546
Copy link
Author

Do we have some E/App PR to test the change in app?

No, I only applied a manual patch to E/App to test it locally.

@war-in
Copy link
Collaborator

war-in commented Feb 20, 2026

@linhvovan29546 so how about preparing an E/App PR with

"@expensify/react-native-live-markdown": "git+https://github.com/Expensify/react-native-live-markdown.git#19ad58c01e0e22b36c4d889a44a47fc4431303c8",

in package.json, and generating adhoc builds to make sure we don't break anything?

@linhvovan29546
Copy link
Author

I can still reproduce this on the adhoc build. See the Sentry log: Sentry issue. So this PR does not resolve the problem.

@linhvovan29546
Copy link
Author

linhvovan29546 commented Feb 24, 2026

Hi @war-in @tomekzaw
After investigating further with Sentry symbolication + logcat tracing on a hybrid release build, I found that the actual crash is different from a data race on globalMarkdownWorkletRuntime.

The crash is a SIGSEGV (signal 11, SEGV_ACCERR) deep in the Hermes GC:

SerializableRemoteFunction::~SerializableRemoteFunction()
  → cleanupIfRuntimeExists(runtime_, function_)
    → ~jsi::Value()
      → Hermes write barrier → accesses freed heap memory → SIGSEGV

RCA:

When the LiveMarkdown worklet runtime initializes, WorkletRuntime::init() calls memoryManager_->loadAllCustomSerializables(). This loads Reanimated's registered custom serializables into the LiveMarkdown runtime.
Because those worklets originated in Reanimated's runtime, SerializableRemoteFunction::toJSValue(liveMarkdownRt) detects a cross-runtime call (&rt != runtime_) and creates a SerializableJSRef NativeState object inside LiveMarkdown's Hermes heap — holding a SerializableRemoteFunction whose runtime_ pointer points to Reanimated's LockableRuntime

Solution:

In react-native-worklets/Common/cpp/worklets/SharedItems/Serializable.h, cleanupIfRuntimeExists should always call value.release() instead of only when the runtime is dead:

// Before
inline void cleanupIfRuntimeExists(jsi::Runtime *rt, std::unique_ptr<jsi::Value> &value) {
  if (rt != nullptr && !WorkletRuntimeRegistry::isRuntimeAlive(rt)) {
    value.release();
  }
  // else: ~jsi::Value() fires → crash when rt is a *different* alive runtime
}

// After
inline void cleanupIfRuntimeExists(jsi::Runtime * /*rt*/, std::unique_ptr<jsi::Value> &value) {
  // Always release — never call ~jsi::Value() during teardown.
  // The underlying HermesValue cell is managed by the VM and collected
  // by the owning runtime's GC. Same tradeoff as the original dead-runtime case.
  value.release(); // NOLINT
}

@linhvovan29546
Copy link
Author

I'm not very familiar with worklets. The RCA and proposed solution were provided by Claude. I validated the fix and it resolves the issue. While it may not be the ideal fix, it's a good starting point we can iterate on.

@tjzel
Copy link

tjzel commented Feb 24, 2026

Hey, Worklets maintainer here, do you have a good reproducer for that?

I think the solution Claude proposed could result in massive memory leaks. Also registerCustomSerializable has really nothing to do with SerializableRemoteFunction.

It could be that the runtime died just after it was checked that it's alive but before the destructor was called. I'll try to make a patch for it so we could verify that.

@linhvovan29546
Copy link
Author

@tjzel Thanks, I don’t have a reliable reproducer. I reproduced it in the Expensify app by opening the app, navigating to the search page with the markdown input, and the app crashed. If it doesn’t crash on the first attempt, I repeat the steps until it does.

@linhvovan29546
Copy link
Author

Please let me know if I can help, as I’m able to reproduce the issue.

@tjzel
Copy link

tjzel commented Feb 24, 2026

After further investigation it seems that the root cause it SerializableRemoteFunction outliving the RN Runtime. Unfortunately the RN Runtime is not a Worklet Runtime and we don't have an easy way to check if it's still alive.

Could you tell me what exact version of Worklets do you need the patch for?

@linhvovan29546
Copy link
Author

Could you tell me what exact version of Worklets do you need the patch for?

Thank you! I need it for version 0.7.2

@tjzel
Copy link

tjzel commented Feb 24, 2026

Here's a bandaid patch, you'll need to recompile the app as it changes the native code. Let me know if it works, then I'll know for sure what is the culprit here.

diff.patch

@linhvovan29546
Copy link
Author

Let me know if it works, then I'll know for sure what is the culprit here.

The patch worked for me.

@tjzel
Copy link

tjzel commented Feb 24, 2026

Good. It might slightly increase the memory imprint as the callbacks from SerializableRemoteFunction are kept in a global map, but that should anyway be way better than a crash.

I'll have to think about an actual good solution to this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants