Skip to content

Add a stable consent hash#1137

Open
Badlapje wants to merge 15 commits intomainfrom
feature/stabilize-consent-hash
Open

Add a stable consent hash#1137
Badlapje wants to merge 15 commits intomainfrom
feature/stabilize-consent-hash

Conversation

@Badlapje
Copy link

@Badlapje Badlapje commented May 19, 2021

Prior to this commit, there was no stable hashing algorithm used for consent attributes. Casing, order of attributes, ... could all change the hash used to store consent attributes. As such, it couldn't be relied upon.

This commit adds a stable hashing algorithm. It stores all new consent attributes with the new hash, but when retrieving does so checking first the old hash and then the new one.

Edit 4-5-2022 MKodde
After @Badlapje s work was finished on the project, I inherited his work and have finished the work remaining on this ticket.

  1. The consent stabilization feature was implemented in the consent workflows of EB
  2. A database migration was added to allow storing both the unstable and the stable hash.
  3. Tests where added and updated

The QA tests fail on a security check on the JS dependencies. This will be fixed on another PR.

Pivotal ticket: https://www.pivotaltracker.com/story/show/176513931

@Badlapje Badlapje requested a review from MKodde May 19, 2021 15:51
@Badlapje Badlapje changed the title Add a consentHashService Add a stable consent hash May 19, 2021
Copy link
Member

@MKodde MKodde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first review looks very promising! See some improvement suggestions below.

@Badlapje Badlapje force-pushed the feature/stabilize-consent-hash branch 6 times, most recently from 4e3a85e to dff4b53 Compare May 20, 2021 16:46
@Badlapje Badlapje requested a review from MKodde June 1, 2021 17:07
@Badlapje Badlapje force-pushed the feature/stabilize-consent-hash branch from dff4b53 to 595c767 Compare June 2, 2021 12:38
@MKodde MKodde force-pushed the feature/stabilize-consent-hash branch 8 times, most recently from c716bf8 to a5dd163 Compare May 4, 2022 10:09
@MKodde MKodde requested review from VadimSchmitz and thijskh May 4, 2022 10:28
return !$this->_consentEnabled ||
$this->_hasStoredConsent($serviceProvider, ConsentType::TYPE_EXPLICIT);
$consent = $this->_hasStoredConsent($serviceProvider, ConsentType::TYPE_EXPLICIT);
return !$this->_consentEnabled || $consent->given();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means now queries (and hash calculations) will be done even if the consent feature toggle is off. This also goes for the other public methods that use this construct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very good point! Will change this to only search for previous consent when consent is enabled.

public function storeConsentHash(array $parameters): bool
{
$query = "INSERT INTO consent (hashed_user_id, service_id, attribute_stable, consent_type, consent_date, deleted_at)
VALUES (?, ?, ?, ?, NOW(), '0000-00-00 00:00:00')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's stored as being deleted? Because a non-null value is being set for deleted_at?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That has to do with the following.

The deleted_at column became part of the primary key, because we can now have multiple rows for a given user for a given service. Where one of those rows is an active consent row, and the rest are soft deleted versions of previously given consent.

Having a dattime as part of a key requires it to be not nullable. But we do not want the column to be empty. Thats why a null-datetime is set for those values. DBAL will translate this to 'null'. But for the DBMS is satisfied. At least that's my understanding.

$consentType,
];

$hasUnstableConsentHash = $this->_hashService->retrieveConsentHash($parameters);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering about the approach. This currently queries first for the new hash. If not found then for the old hash. And if also not found will present consent screen.

Why is this done over the following approach

  • oldHash = _getAttributesHash()
  • newHash = _getStableAttributesHash()
  • SELECT * FROM consent where attributes_stable = :newHash OR attributes = :oldHash
    so you always know in one query whether consent was given?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a good point! Having this in one query is much better. Will update this.

// this up() migration is auto-generated, please modify it to your needs
$this->abortIf($this->connection->getDatabasePlatform()->getName() !== 'mysql', 'Migration can only be executed safely on \'mysql\'.');

$this->addSql('ALTER TABLE consent ADD attribute_stable VARCHAR(80) NOT NULL, CHANGE attribute attribute VARCHAR(80) DEFAULT NULL');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not yet decided on the pros and cons of adding a new hash field next to the old field.

I believe the problem is solvable with just one database field. Not sure yet if that is preferable though.

Copy link
Member

@MKodde MKodde May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open for other suggestions! Having it in one single column will work too in my opinion. The 'OR' based query solution would work just as well on a single column. But would make it harder to decide when we can stop supporting the old style attribute hash method.

@MKodde MKodde force-pushed the feature/stabilize-consent-hash branch from 26e8ac1 to 145ebd9 Compare May 5, 2022 11:24
@MKodde MKodde added 6.8 and removed 6.7 labels May 5, 2022
@thijskh thijskh removed the 6.8 label Oct 25, 2023
@MKodde MKodde removed the request for review from VadimSchmitz December 30, 2025 09:11
@baszoetekouw
Copy link
Member

@MKodde @johanib I seems this issue (unstable consent hashes, causing too many consent screen because the orider of the attributes values received form the IdP changes between authentications) might become relevant again.

What would be the best approach to revive this? Try to bring this PR up to date, or start fresh implementing this using the ideas from this PR?

@baszoetekouw baszoetekouw linked an issue Jan 30, 2026 that may be closed by this pull request
AND
service_id = ?
AND
(attribute = ? OR attribute_stable = ?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OR affects index use?

Koen Cornelis and others added 12 commits March 6, 2026 11:35
Prior to this commit, there was no service specifically for hashing
attribute arrays for consent.

This commit adds such a service with both the old hashing algorithm and
a stable hashing algorithm.

Pivotal ticket: https://www.pivotaltracker.com/story/show/176513931
The consent queries added to the ConsentRepository.php file (deleted)
should have been added to the existing DbalConsentRepository. While at
it, naming conventions have been applied to the query names. And the
service config was updated.
Extracting this interface from the existing service is allowing us to
more easily test the service. As mocking a final class is not possible.
This might prove usefull down the road
This represents the consent type at hand (implicit or explicit consent).
A third option is that no consent has been given.
The requirements are stated in the story:

https://www.pivotaltracker.com/story/show/176513931 and specifically
these points:

A challenge is that we do not want to invalidate all given consents with the
current algorithm. So we implement it as follows:

*  We do not touch the existing consent hashing method at all
*  We create a new hashing method that is more stable.
*  We cover this new method with an abundance of unit tests to verify the
   stability given all sorts of inputs.
*  We change the consent query from (pseudocode):
     SELECT *
     FROM consent
     WHERE user = me
     AND consenthash = hashfromoldmethod
     OR consenthash = hashfromnewmethod
*  Newly given consent will be stored with the new hash.
*  When old consent matched, still generate new consent hash (without showing
   consent screen
The NameID objects where causing issues when creating a stabilized
consent hash. The objects can not be serialized/unserialized. By
normalizing them before starting to perform other normalization tasks on
the attribute array, this issue can be prevented.
Only get the stored consent when consent is enabled. The other methods
need no adjusting as the short-circuiting by the first condition
prevents those expressions from being executed.

Consider:

`(true || $this->expensiveMethodCall())`

The expensiveMethodCall is never called as the first condition already
decided the rest of the condition.

https://www.php.net/manual/en/language.operators.logical.php
Previously when testing if previous consent was given was based on
testing for old-style attribute hash calculation. And if that one did
not exist, the new (stable) attribute hash was tested.

Having that in two queries made little sense as it can easily be
performed in one action.

The interface of the repo and service changed slightly, as now a value
object is returned instead of a boolean value. The value object reflects
the version (type of attribute hash) of consent that was given. That can
either be: unstable, stable or not given at all.
@kayjoosten kayjoosten force-pushed the feature/stabilize-consent-hash branch from 145ebd9 to d05aad4 Compare March 6, 2026 10:39
@kayjoosten kayjoosten self-assigned this Mar 6, 2026
@kayjoosten kayjoosten force-pushed the feature/stabilize-consent-hash branch 5 times, most recently from d79f3ba to d30ca90 Compare March 6, 2026 17:30
kayjoosten added a commit that referenced this pull request Mar 9, 2026
Based on requirements from issue #1721, PR #1137 review threads, and
newly added code paths:

ConsentIntegrationTest:
- test_upgrade_not_applied_when_no_consent_given: upgradeAttributeHashFor
  does nothing when hasConsentHash returns notGiven()
- test_upgrade_continues_gracefully_when_attributes_changed: when
  updateConsentHash returns false (0 rows, attributes changed since
  consent was given), upgradeAttributeHashFor must not throw

ConsentTest (eb4):
- testNullNameIdReturnsNoConsentWithoutCallingRepository: when
  getNameIdValue() returns null, all three private guards (_storeConsent,
  _hasStoredConsent, _updateConsent) must return early without touching
  the repository

ConsentVersionTest (new):
- isStable/isUnstable/given behaviour for all three states
- invalid value throws InvalidArgumentException

ConsentHashServiceTest:
- test_stable_attribute_hash_attribute_name_casing_normalized: issue #1721
  explicitly requires 'Case normalize all attribute names'; verifies keys
  like 'displayName' and 'DISPLAYNAME' produce the same hash
- test_stable_attribute_hash_attribute_name_ordering_normalized: issue
  #1721 requires 'Sort all attribute names'; verifies key order is stable
@kayjoosten kayjoosten force-pushed the feature/stabilize-consent-hash branch from d30ca90 to b183015 Compare March 9, 2026 09:55
- Guard upgradeAttributeHashFor against consent-disabled state: returns
  early without hitting the DB or recalculating hashes when consent is off
- Fix DIP violation in Factory.php: type-hint ConsentHashServiceInterface
  instead of concrete ConsentHashService
- Fix countAllFor memory issue: delegate to COUNT(*) query instead of
  loading all consent rows into memory
- Fix null-guards in _hasStoredConsent() and _updateConsent(): return
  early when getConsentUid() returns null, consistent with _storeConsent()
- Fix risky PHPUnit tests in ConsentTest: add assertions, use shouldNotReceive()
- findAllFor: select attribute_stable alongside attribute, prefer it for
  new-format rows (fixes null attribute_hash in /api/consent response)
- updateConsentHash: check rowCount(), log warning and return false on 0 rows
- hasConsentHash: SELECT attribute_stable instead of SELECT *
- ProcessConsent/ProvideConsent: skip upgradeAttributeHashFor on fresh consent
- ConsentVersion: add isStable() counterpart to isUnstable()
- Restore display_errors=0 in phpunit.xml to prevent PHP deprecation noise
- Move Version20220503092351 to migrations/ so Doctrine picks it up;
  add preUp skip guard to avoid duplicate column on existing installs;
  remove deprecated abortIf() calls
- Make 'attribute' column nullable in baseline migration and Doctrine
  entity (storeConsentHash sets attribute_stable only, not attribute)
- Add 'attribute_stable' column to baseline CREATE TABLE statement
- Fix integration test class name to match filename (PHPUnit warning)
- Make consentTypeProvider() static (PHPUnit deprecation)
Based on requirements from issue #1721, PR #1137 review threads, and
newly added code paths:

ConsentIntegrationTest:
- test_upgrade_not_applied_when_no_consent_given: upgradeAttributeHashFor
  does nothing when hasConsentHash returns notGiven()
- test_upgrade_continues_gracefully_when_attributes_changed: when
  updateConsentHash returns false (0 rows, attributes changed since
  consent was given), upgradeAttributeHashFor must not throw

ConsentTest (eb4):
- testNullNameIdReturnsNoConsentWithoutCallingRepository: when
  getNameIdValue() returns null, all three private guards (_storeConsent,
  _hasStoredConsent, _updateConsent) must return early without touching
  the repository

ConsentVersionTest (new):
- isStable/isUnstable/given behaviour for all three states
- invalid value throws InvalidArgumentException

ConsentHashServiceTest:
- test_stable_attribute_hash_attribute_name_casing_normalized: issue #1721
  explicitly requires 'Case normalize all attribute names'; verifies keys
  like 'displayName' and 'DISPLAYNAME' produce the same hash
- test_stable_attribute_hash_attribute_name_ordering_normalized: issue
  #1721 requires 'Sort all attribute names'; verifies key order is stable
@kayjoosten kayjoosten force-pushed the feature/stabilize-consent-hash branch from b183015 to c3faaec Compare March 9, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consent hash must be stable

7 participants