Remove nondeterminism in container generation#3213
Remove nondeterminism in container generation#3213weswigham wants to merge 1 commit intomicrosoft:mainfrom
Conversation
Could we just cache it and access on a method (sync.Once esque), so we only do it once anyway? But, this code already iterates over many tables, so knowing we're going over a specific table is sort of a pain anyway, so maybe this PR is the best we can do without coming up with a better solution in general. |
There was a problem hiding this comment.
Pull request overview
Removes nondeterminism in symbol container generation within the checker by ensuring scope symbol-table iteration results are ordered deterministically. This supports the TypeScript-go compiler’s need for stable behavior despite Go map iteration randomness, especially when container selection impacts symbol accessibility and downstream printing/emit.
Changes:
- Collect all matching in-scope “variable namespace” candidates instead of taking the first match from a randomly-iterated symbol table.
- Sort the collected candidate symbols to make container selection stable across runs.
| var variableMatches []*ast.Symbol | ||
| if (meaning == ast.SymbolFlagsValue && | ||
| container.Flags&leftMeaning == 0) && | ||
| container.Flags&ast.SymbolFlagsType != 0 && | ||
| c.getDeclaredTypeOfSymbol(container).flags&TypeFlagsObject != 0 { | ||
| c.someSymbolTableInScope(enclosingDeclaration, func(t ast.SymbolTable, _ symbolTableID, _ bool, _ bool, _ *ast.Node) bool { | ||
| found := false | ||
| for _, s := range t { | ||
| if s.Flags&leftMeaning != 0 && c.getTypeOfSymbol(s) == c.getDeclaredTypeOfSymbol(container) { | ||
| firstVariableMatch = s | ||
| return true | ||
| variableMatches = append(variableMatches, s) | ||
| found = true | ||
| } | ||
| } | ||
| return false | ||
| return found | ||
| }) | ||
| c.sortSymbols(variableMatches) | ||
| } |
There was a problem hiding this comment.
This change intentionally alters container selection to be deterministic (collect + sort matches), but this PR doesn't add a targeted regression test for it. Please add a minimal compiler test that creates multiple in-scope variables matching the same container type and asserts a stable/expected printback or symbol chain selection, to guard against future nondeterminism regressions.
| if (meaning == ast.SymbolFlagsValue && | ||
| container.Flags&leftMeaning == 0) && | ||
| container.Flags&ast.SymbolFlagsType != 0 && | ||
| c.getDeclaredTypeOfSymbol(container).flags&TypeFlagsObject != 0 { | ||
| c.someSymbolTableInScope(enclosingDeclaration, func(t ast.SymbolTable, _ symbolTableID, _ bool, _ bool, _ *ast.Node) bool { | ||
| found := false | ||
| for _, s := range t { | ||
| if s.Flags&leftMeaning != 0 && c.getTypeOfSymbol(s) == c.getDeclaredTypeOfSymbol(container) { |
There was a problem hiding this comment.
Inside the symbol-table scan, c.getDeclaredTypeOfSymbol(container) is invoked repeatedly (both in the outer condition and again for every entry in the loop). Since this callback may end up traversing large tables (including globals), please hoist the container's declared type into a local variable once and reuse it within the closure to avoid unnecessary work.
| if (meaning == ast.SymbolFlagsValue && | |
| container.Flags&leftMeaning == 0) && | |
| container.Flags&ast.SymbolFlagsType != 0 && | |
| c.getDeclaredTypeOfSymbol(container).flags&TypeFlagsObject != 0 { | |
| c.someSymbolTableInScope(enclosingDeclaration, func(t ast.SymbolTable, _ symbolTableID, _ bool, _ bool, _ *ast.Node) bool { | |
| found := false | |
| for _, s := range t { | |
| if s.Flags&leftMeaning != 0 && c.getTypeOfSymbol(s) == c.getDeclaredTypeOfSymbol(container) { | |
| containerDeclaredType := c.getDeclaredTypeOfSymbol(container) | |
| if (meaning == ast.SymbolFlagsValue && | |
| container.Flags&leftMeaning == 0) && | |
| container.Flags&ast.SymbolFlagsType != 0 && | |
| containerDeclaredType.flags&TypeFlagsObject != 0 { | |
| c.someSymbolTableInScope(enclosingDeclaration, func(t ast.SymbolTable, _ symbolTableID, _ bool, _ bool, _ *ast.Node) bool { | |
| found := false | |
| for _, s := range t { | |
| if s.Flags&leftMeaning != 0 && c.getTypeOfSymbol(s) == containerDeclaredType { |
|
Yeah, we'd need to do some weird special-casing for the globals table if we wanted that-one-only to have an on-demand sorted table and special sorted-table-assuming algorithms to handle. Personally, I think if we wanted to be true to strada, we'd just always use insertion-order sorted symbol tables from the binder forward and wouldn't need these conceptually costly departures from strada (we're parsing and binding code serially - insertion order is reliable and stable); there's at least 3 other places in the node builder/symbol accessibility that have comments saying they lost their early bails in the port (just look for usages of Separately, this case specifically, ideally we'd obsolete all this logic and retain symbol reverse mappings in the symbols themselves (or a side table), but doing that requires setting up a lot of hooks throughout the checker (and binder) anywhere we build symbol tables, and, having tried that in the past, it's pretty error-prone to try and set up, since "symbol containing this symbol" has like 3 different layers of nuance depending on the context (direct export? via alias? instance member?) that all require difference processing to track (which is why the existing |
We've discussed before and concluded that it is too costly. We have a lot of symbol tables, most of them quite small. Adding the overhead of a slice and a dynamically allocated array would negatively impact performance and memory consumption noticeably. |
|
How often do we end up in that bottom part of |
|
Aye, it's definitely from a time where that was "free" because declaration emit always happened after a full typecheck, so a Anyways, there's... 61 test baselines affected if you outright remove this style container lookup. All baselines going from something like |
Per this comment
Unfortunately, it makes the worst case, exhaustively traversing the global table, the guaranteed case (if we end up looking at the globals - we won't if there's a more local match, and local symbol tables are generally small, so exhaustive isn't bad), since we need to find all possible references to sort them.
We also mentioned having a deferred-sorted globals list to potentially look at to keep the early-bail behavior in a stable way, but that itself is gonna be costly to make, and, because it's unclear how many symbols fall back to globals here, I'm not even sure that's worth it. Certainly, there's nothing in our test suite this affects the printback of. This is the kinda thing I'd prefer only to start tweaking in a big way with perf numbers to back it up, so I know I'm not introducing complexity for negative gain.
I double checked the other
someSymbolTableInScopecalls - they all either only do[]accesses, or collect all possible refs and sort them (seetrySymbolTable), so this one seems like the odd duck out right now in terms of handling unsorted maps well.