Skip to content

(cleanup) remove Python 2 remaining items#727

Draft
mykaul wants to merge 18 commits intoscylladb:masterfrom
mykaul:python_2_no_more
Draft

(cleanup) remove Python 2 remaining items#727
mykaul wants to merge 18 commits intoscylladb:masterfrom
mykaul:python_2_no_more

Conversation

@mykaul
Copy link

@mykaul mykaul commented Mar 4, 2026

Pre-review checklist

This is 100% OpenCode's work. So take it with a grain of salt, and I need to go over it. I can also cherry-pick each one separately. I've asked it to separate as much as possible to independent items.

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

mykaul added 10 commits March 4, 2026 18:58
In Python 3, iterator objects do not have a .next() method; the
built-in next() function must be used instead. The call at line 355
of test_class_construction.py used the Python 2 pattern
iter(...).next(), which would raise AttributeError if ever reached
at runtime.

Currently the test passes only because CQLEngineException is raised
before .next() is called, but this is fragile: if the exception
timing changes, the test would fail with AttributeError instead of
the expected CQLEngineException.

Replace with next(iter(...)) for correct Python 3 usage.
In Python 3, calling str.encode() can only raise UnicodeEncodeError,
never UnicodeDecodeError. The except UnicodeDecodeError branches in
AsciiType.serialize and UTF8Type.serialize were leftover from Python 2,
where str.encode() could trigger an implicit decode of a byte string.

These dead except branches silently masked the intended behavior. In
Python 3, if the input is already bytes there is no .encode() to call,
so the original code would raise AttributeError rather than returning
the value as-is.

Replace the try/except pattern with explicit isinstance(var, bytes)
checks, which correctly handles both str and bytes inputs on Python 3.
In Python 2, filter() returned a list. In Python 3, it returns a lazy
iterator that can only be consumed once. The column_aliases variable
assigned from filter() at metadata.py:2273 may be iterated multiple
times downstream (e.g., for length checks and enumeration), which would
silently produce empty results on the second pass.

Wrap the filter() call in list() to ensure the result is a concrete
list that supports repeated iteration, indexing, and len().
The module-level "".encode("utf8") call was a workaround for CPython
bug #10923, where importing the utf8 codec for the first time in a
background thread could cause a deadlock due to the import lock.

This bug was fixed in CPython 3.3 (2012), and the driver now requires
Python 3.9+. The workaround is dead code that serves no purpose and
confuses readers.
Python 3 uses __str__ for string representation; __unicode__ was the
Python 2 equivalent for unicode strings. The UnicodeMixin base class
currently bridges the two by wiring __str__ to call __unicode__(), but
this indirection is unnecessary on Python 3.

Rename all 18 __unicode__ method definitions in statements.py directly
to __str__, and update the one direct __unicode__() call in __repr__
to __str__(). The __str__ definitions on the subclasses now take
precedence over the inherited UnicodeMixin.__str__ lambda, so behavior
is unchanged.
Rename the 2 __unicode__ methods in AbstractQueryableColumn and
ModelQuerySet to __str__. Remove the redundant __str__ wrapper in
ModelQuerySet that existed solely to bridge __str__ -> __unicode__
for Python 2 compatibility.

In Python 3, __str__ is the canonical string representation method;
__unicode__ served that role in Python 2. The indirection through
UnicodeMixin is no longer needed for these classes.
…rs, named

Rename the remaining 5 __unicode__ method definitions across
cqlengine/models.py (ColumnQueryEvaluator), cqlengine/functions.py
(QueryValue, Token), cqlengine/operators.py (BaseQueryOperator), and
cqlengine/named.py (NamedColumn) to __str__.

This is part of the systematic removal of the Python 2 UnicodeMixin
pattern. The __str__ definitions on each class now take precedence
over the inherited UnicodeMixin.__str__ lambda, so behavior is
unchanged.
UnicodeMixin was a Python 2/3 compatibility shim that wired __str__
to call __unicode__(). Now that all __unicode__ methods have been
renamed to __str__ in prior commits, UnicodeMixin serves no purpose.

- Delete the UnicodeMixin class from cassandra/cqlengine/__init__.py
- Remove UnicodeMixin from the inheritance list of 6 classes:
  ValueQuoter, BaseClause, BaseCQLStatement, AbstractQueryableColumn,
  QueryValue, BaseQueryOperator
- Remove all 'from cassandra.cqlengine import UnicodeMixin' imports

The classes now define __str__ directly, which is the standard Python 3
approach for string representation.
absolute_import became the default behavior in Python 3.0. These
imports were needed in Python 2 to prevent relative imports from
shadowing stdlib modules (e.g., 'import io' resolving to a local
module instead of the stdlib). Since the driver requires Python 3.9+,
these are dead code.

Removed from: cassandra/protocol.py, cassandra/cqltypes.py,
cassandra/connection.py, cassandra/cluster.py, and
tests/integration/cqlengine/query/test_queryset.py.
WeakSet has been available in the weakref module since Python 2.7+
and all Python 3 versions. The try/except ImportError fallback to
cassandra.util.WeakSet was unreachable dead code on Python 3.

- Replace try/except with direct 'from weakref import WeakSet' in
  cluster.py, pool.py, and io/asyncorereactor.py
- Delete the ~210-line custom WeakSet class and its _IterationGuard
  helper from cassandra/util.py
- Remove the now-unused 'from _weakref import ref' import
@mykaul mykaul marked this pull request as draft March 4, 2026 19:57
@mykaul mykaul requested a review from Copilot March 4, 2026 19:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the Python 2 cleanup by removing remaining unicode/UnicodeMixin compatibility shims and updating tests/docs/code paths to assume Python 3-only semantics (the project now requires Python >=3.9).

Changes:

  • Removes Python 2-era unicode patterns (u'', __unicode__, UnicodeMixin) and normalizes string handling across driver and cqlengine.
  • Simplifies Python-version conditionals/fallback imports (e.g., WeakSet imports) and applies formatting-only refactors in several modules/tests.
  • Updates unit/integration tests and Sphinx config to reflect Python 3-only behavior and representations.

Reviewed changes

Copilot reviewed 34 out of 37 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/unit/test_types.py Replaces u'' literals with plain str in type read/write tests.
tests/unit/test_row_factories.py Removes Python 3.0–3.6 conditional expectations for namedtuple creation.
tests/unit/test_orderedmap.py Updates unicode-key tests to Python 3 str keys.
tests/unit/test_metadata.py Replaces u'' literals with str in metadata CQL export tests.
tests/unit/test_marshalling.py Updates UTF8/unicode expectations to Python 3 str and cleans up ordered map inserts.
tests/unit/advanced/test_insights.py Removes Python-version-specific namespace logic and reformats expected dicts.
tests/integration/standard/test_query.py Updates unicode query strings/column names to Python 3 str.
tests/integration/standard/test_cluster.py Updates expected row tuples to Python 3 str.
tests/integration/cqlengine/model/test_udts.py Updates unicode literals to Python 3 str.
tests/integration/cqlengine/model/test_model_io.py Updates unicode literals to Python 3 str in model IO assertions.
tests/integration/cqlengine/model/test_class_construction.py Mostly formatting + Python 3 iterator usage (next(iter(...))) and string literal normalization.
tests/integration/cqlengine/columns/test_validation.py Removes old Python-version branches and normalizes string usage/formatting in validation tests.
setup.py Removes Python 2-era subprocess gating and refactors extension/doc build setup logic.
docs/conf.py Normalizes string literals and formatting in Sphinx configuration.
cassandra/query.py Python 3 string/formatting cleanup; keeps namedtuple fallback paths but modernizes literals/layout.
cassandra/pool.py Removes legacy WeakSet fallback and reformats/shard-aware related code blocks.
cassandra/io/asyncorereactor.py Removes legacy WeakSet fallback and modernizes literals/formatting.
cassandra/encoder.py Deprecates Python 2 “unicode” semantics and standardizes encoding/quoting behavior for Python 3.
cassandra/datastax/graph/query.py Python 3 string/formatting cleanup and minor readability refactors.
cassandra/datastax/graph/graphson.py Python 3 cleanup + formatting; updates docs/comments describing supported Python types.
cassandra/datastax/graph/fluent/_query.py Python 3 string/formatting cleanup and improves readability of traversal query generation.
cassandra/cqlengine/statements.py Removes UnicodeMixin usage and converts __unicode__ implementations to __str__.
cassandra/cqlengine/operators.py Removes UnicodeMixin usage and converts operator stringification to __str__.
cassandra/cqlengine/named.py Converts __unicode__ to __str__ and normalizes string literals.
cassandra/cqlengine/models.py Removes UnicodeMixin usage and normalizes string literals/formatting across model machinery.
cassandra/cqlengine/functions.py Removes UnicodeMixin usage and converts __unicode__ to __str__.
cassandra/cqlengine/init.py Removes UnicodeMixin definition entirely.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mykaul added 8 commits March 5, 2026 18:34
The subprocess module has been part of Python's standard library since
Python 2.4. The try/except ImportError guard and has_subprocess flag
were unreachable dead code that added unnecessary indentation and
complexity to the doc-building logic.

Replace with a direct 'import subprocess' and remove the conditional
guard around the documentation build steps.
Python 3 uses __bool__ for truth-value testing; __nonzero__ was the
Python 2 equivalent. The code previously defined __nonzero__ and
aliased __bool__ = __nonzero__ for cross-compatibility.

Since Python 3 never calls __nonzero__, rename the method directly
to __bool__ and remove the alias.
This integration test was permanently dead code: it contained a guard
'if sys.version_info[0:2] != (2, 7): raise SkipTest(...)' which
means it was always skipped on Python 3. The skip reason stated that
the test compares static strings from dict items whose ordering is
not deterministic on Python 3.

Since the driver no longer supports Python 2, and fixing the test to
use order-independent comparison is a separate concern, remove the
permanently-skipped test entirely.
Two code blocks in test_validation.py were guarded by
'if sys.version_info < (3, 1):', making them unreachable on Python 3.
The blocks used unichr() (a Python 2 builtin that does not exist in
Python 3) and u'' string prefixes for unicode validation tests.

In Python 3, chr() already returns a unicode character and all strings
are unicode, so the adjacent chr(233) tests already cover the same
functionality. Remove the dead blocks entirely.
…3.9+

Three version guards were left over from Python 2/3.x compatibility:

1. cluster.py: 'if sys.version_info[0] >= 3 and sys.version_info[1] >= 7'
   guarded the Eventlet/futurist ThreadPoolExecutor workaround. Since the
   driver requires 3.9+, this is always True. Removed the guard, dedented
   the body, and updated the docstring and error message to drop the
   'Python 3.7+' qualifier (the issue is inherent to Eventlet, not a
   version-specific regression).

2. test_row_factories.py: NAMEDTUPLE_CREATION_BUG was defined as
   'sys.version_info >= (3,) and sys.version_info < (3, 7)', which is
   always False on 3.9+. The test's dead branch tested a warning path
   that can never trigger. Removed the constant, the dead branch, the
   unused 'sys' import, and simplified the test to just verify long
   column lists work.

3. test_insights.py: 'if sys.version_info > (3,)' guarded a namespace
   suffix that is always needed on Python 3. Removed the guard and the
   now-unused 'sys' import.

All 608 unit tests pass.
On Python 3, the u'' prefix is a no-op since all strings are already
unicode. These prefixes were left over from Python 2 compatibility and
add visual noise without any semantic effect.

Removed u'' prefixes from:
- cassandra/query.py: __str__ methods for SimpleStatement,
  PreparedStatement, BoundStatement, BatchStatement, and a docstring
  example showing OrderedMapSerializedKey output
- cassandra/datastax/graph/query.py: GraphStatement.__str__
- cassandra/datastax/graph/fluent/_query.py: TraversalBatch.__str__
  and as_graph_statement query construction
- docs/conf.py: project name and copyright strings

All 608 unit tests pass.
On Python 3, the u'' prefix is a no-op since all strings are already
unicode. These prefixes were left over from Python 2 compatibility.

Removed 67 u-prefix occurrences across 9 test files:
- tests/unit/test_types.py (3)
- tests/unit/test_orderedmap.py (3)
- tests/unit/test_marshalling.py (6)
- tests/unit/test_metadata.py (27)
- tests/integration/standard/test_types.py (4)
- tests/integration/standard/test_query.py (13)
- tests/integration/standard/test_cluster.py (8)
- tests/integration/cqlengine/model/test_udts.py (1)
- tests/integration/cqlengine/model/test_model_io.py (2)

All 608 unit tests pass.
Several comments and docstrings still referenced Python 2 concepts that
no longer apply now that the driver requires Python 3.9+:

- encoder.py: Updated cql_encode_unicode() docstring to note it is
  unused since Python 2 removal (str is always unicode on Python 3).
  Also fixed the method body: it was calling val.encode('utf-8') which
  on Python 3 converts str to bytes, producing wrong output. Now it
  passes val directly to cql_quote.

- metadata.py: Changed 'will always be a unicode' to 'will always be
  a str' (line 2155). Updated unhexlify comment to say 'str input'
  instead of 'unicode input' and fixed typo 'everythin' (line 2350).

- graphson.py: Removed '(PY2)'/'(PY3)' qualifiers from the type
  mapping table. Updated 'long' to 'int' for varint, 'str (unicode)'
  to 'str' for inet, removed 'buffer (PY2)' from blob entries.

- util.py: Updated comment on _positional_rename_invalid_identifiers
  to remove stale 'Python 2.6' reference.

- asyncorereactor.py: Removed stale 'TODO: Remove when Python 2
  support is removed' since Python 2 support has been removed. The
  guard itself is still needed for interpreter shutdown scenarios.

All 608 unit tests pass.
@mykaul mykaul force-pushed the python_2_no_more branch from 172606e to 59b6813 Compare March 5, 2026 16:43
@mykaul
Copy link
Author

mykaul commented Mar 5, 2026

Fixed all comments.

@mykaul
Copy link
Author

mykaul commented Mar 5, 2026

@copilot code review[agent] - please re-review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants