Skip to content

Comments

perf: improve OdfTableRow cell count performance#436

Open
driesva wants to merge 1 commit intotdf:masterfrom
driesva:improve-table-row-cell-count-performance
Open

perf: improve OdfTableRow cell count performance#436
driesva wants to merge 1 commit intotdf:masterfrom
driesva:improve-table-row-cell-count-performance

Conversation

@driesva
Copy link

@driesva driesva commented Feb 17, 2026

  • Issue was mentioned in Parsing ODF files generated by Excel #25
  • Do not iterate over all cells which is slow for Excel generated spreadsheets which have 16384 columns by default.
  • Rely on number-columns-repeated attribute to count the real cells.
  • ATTN: this gives a different result for merged cells over multiple rows compared to the previous implementation. However, this count seems more in line with the actual content XML.
    My test spreadsheet in LibreOffice (and Excel) looks like this:
image

The XML for the last 2 rows:

       <table:table-row table:style-name="ro1">
          <table:table-cell office:value-type="float" office:value="1" calcext:value-type="float" table:number-columns-spanned="2" table:number-rows-spanned="2">
            <text:p>1</text:p>
          </table:table-cell>
          <table:covered-table-cell/>
          <table:table-cell office:value-type="float" office:value="2" calcext:value-type="float">
            <text:p>2</text:p>
          </table:table-cell>
        </table:table-row>
        <table:table-row table:style-name="ro1">
          <table:covered-table-cell table:number-columns-repeated="2"/>
          <table:table-cell office:value-type="float" office:value="2" calcext:value-type="float">
            <text:p>2</text:p>
          </table:table-cell>
        </table:table-row>

Based on that I would say 2 real cells and 1 real cell for the last line.
Whereas the original implementation resulted in 3 real cells for both lines ❓
👉 if the count in this PR is wrong, please provide more info!

It also doesn't fix the problem of #25 because given Excel adds rows / cells like this:

        <table:table-row table:number-rows-repeated="1048571" table:style-name="ro1">
          <table:table-cell table:number-columns-repeated="16384"/>
        </table:table-row>

there's not really a way to know the exact number of used cells. It probably requires application logic (e.g. stop parsing at one or more empty cells / rows)

* Issue was mentioned in tdf#25
* Do not iterate over all cells which is slow for Excel generated spreadsheets which have 16384 columns by default.
* Rely on `number-columns-repeated` attribute to count the _real_ cells.
* ATTN: this gives a different result for merged cells over multiple rows compared to the previous implementation. However, this count seems more in line with the actual content XML.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant