Skip to content

Conversation

@LucaMarconato
Copy link
Member

@LucaMarconato LucaMarconato commented Feb 11, 2026

After doing some profiling, I identified 2 bottlenecks:

  1. the zipstore was opened multiple time and not reused.
  2. the cell_id_str_from_prefix_suffix_uint32() had some long for loops that could be removed.

This PR:

  1. caches the opening of the cells zip store
  2. vectorizes the function cell_id_str_from_prefix_suffix_uint32()

The second part is very interesting: Claude Opus 4.6 suggested a clever bit-shift trick which halves execution time (illustrated below, again with the help of Claude).

Further optimization could be explored for the string concatenation, which is now driving the runtime of the function (10% of total, see flame graph), and is among the top 3 drivers of runtime.

Flame graph:
image

Explanation of the bit shift trick:
image

@codecov-commenter
Copy link

codecov-commenter commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 96.87500% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 62.78%. Comparing base (7eb2e61) to head (722fa33).

Files with missing lines Patch % Lines
src/spatialdata_io/readers/xenium.py 96.87% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #370      +/-   ##
==========================================
+ Coverage   62.70%   62.78%   +0.08%     
==========================================
  Files          27       27              
  Lines        3140     3147       +7     
==========================================
+ Hits         1969     1976       +7     
  Misses       1171     1171              
Files with missing lines Coverage Δ
src/spatialdata_io/readers/xenium.py 72.06% <96.87%> (+0.55%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LucaMarconato
Copy link
Member Author

LucaMarconato commented Feb 11, 2026

  • The latest commit removes some unnecessary conversion to string for some dataframe indices.

Flamegraph below shows the current status. The string operations in cell_id_str_from_prefix_suffix_uint32() definitely consume quite some time.

image

@LucaMarconato
Copy link
Member Author

LucaMarconato commented Feb 11, 2026

  • cell_id_str_from_prefix_suffix_uint32() was being called twice, now this is avoided. <--- EDIT: I need to check this better, it seems that we need both calls, at least in some Xenium versions.
  • removed the anndata's ImplicitModificationWarning which was calling the slow _astype_nansafe() when casting the integer indices to string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants