Replicated softmax by Fede-Rausa · Pull Request #137 · MIND-Lab/OCTIS

Fede-Rausa · 2025-11-21T11:04:29Z

Adding Replicated Softmax (a restricted boltzmann machine for topic modeling) and Over Replicated Softmax (a deep version of the RSM) models to the octis topic models list.
Both can be currently evaluated only on classification, diversity and coherence metrics, but support locally a fast computation for the perplexity upper bound (that is supported also for the LDA in the gensim implementation, used by octis).

silviatti · 2025-11-24T00:20:45Z

Hi Federico, regarding the Python 3.8 error, it seems that some libraries have dropped support for python 3.8. I think it's time to deprecate it. Could you please remove Python 3.8 from python-publish.yml and any other related references?

Also, please add tests to ensure everything works as expected. I don't currently have time to verify the implementation manually, but adding tests will give us more confidence and serve as guardrails. You can follow the same patterns as in: https://github.com/MIND-Lab/OCTIS/blob/master/tests/test_octis.py and https://github.com/MIND-Lab/OCTIS/blob/master/tests/test_optimization.py.

Lastly, could you run a formatter (e.g., ruff) on the files you added?

Thanks,

Silvia

Fede-Rausa

Following Your advice, the old tests are passed.
Now the build process works for 3.9 and 3.10. I will supply soon the test functions for the RSM and oRSM, plus a better formatting. Thank You for the support.

Best regards,

Federico

silviatti

Please review the comments in the code. I recommend running a linter to check for unused functions, duplicated code blocks, and other potential issues. I also asked ChatGPT for a quick review and it identified several points worth examining.

I am not an expert in RSM, so I can only validate those findings to a certain extent. I suggest doing a similar review yourself to further assess and verify potential issues in the implementation.

silviatti · 2026-02-14T16:19:28Z

octis/models/oRSM.py

+        def _get_topic_word_matrix(self):
+            """
+            Return the topic representation of the words
+            """
+            w_vh, w_v, w_h = self.W
+            topic_word_matrix = w_vh.T
+            normalized = []
+            for words_w in topic_word_matrix:
+                minimum = min(words_w)
+                words = words_w - minimum
+                normalized.append([float(i) / sum(words) for i in words])
+            topic_word_matrix = np.array(normalized)
+            return topic_word_matrix
+
+        def _get_topic_word_matrix0(self):
+            """
+            Return the topic representation of the words


these functions are duplicated

silviatti · 2026-02-14T16:20:33Z

octis/models/oRSM.py

+            visible_sample = self.sample_softmax(visible_probs, D)
+            return visible_sample
+
+        def sample_h2(self, h1):


is this function used? if not, remove

silviatti · 2026-02-14T16:22:17Z

octis/models/oRSM.py

+                mu1 = self.v_and_h2_to_h1(v, h2)
+                mu2 = self.h1_to_softmax(mu1)
+
+                if (old_mu2 - mu2).sum() < self.epsilon:


are you sure about this? I asked Chatgpt and they say it's incorrect because:

Differences may cancel out (positive/negative)

It does not use absolute values
It should be:

if np.linalg.norm(old_mu2 - mu2) < self.epsilon:

or

if np.sum(np.abs(old_mu2 - mu2)) < self.epsilon:

I agree, that's a big point. That error is also present in the original version proposed by dongwookim. Maybe I will have to tune the default value of epsilon to make it larger and let a faster convergence.

I have done the tests and the convergence is reached with a reasonably small epsilon, so I have applied the changes.

silviatti · 2026-02-14T16:23:21Z

octis/models/oRSM.py

+            else:
+                self.dtm = dtm
+                if doval:
+                    self.val_dtm = np.log(1 + val_dtm)


is it correct that you log transform the validation?

No, it is really wrong, I will correct it. The log transform should be applied only when the user requires it, for both training and validation (it should be considered a preprocessing step).

silviatti · 2026-02-14T16:24:04Z

octis/models/oRSM.py

+            w_vh, w_v, w_h = self.W
+            D = v.sum(axis=1)
+            energy = np.outer((D + self.M), w_h) + (v @ w_vh) * np.reshape(
+                (1 + self.M / D), (-1, 1)


can D=0 lead to a division by zero error?

Yes absolutely. But it isn't supposed to happen, since any document observed in the dataset should have at least one word, and if that word or all the words in the document aren't included in the vocabulary it would get an error. However, this cannot happen in both the training and validation sets, since the softmax layer is initialized from the vocabulary and the vocabulary is built from the entire corpus (validation included). So this is a problem only trying to use the model for documents with no words or with a preprocessing different from the one of OCTIS.
To ensure that each element of D should be positive, there are two ways:

way 1: throw an error like this:

assert np.any(D==0), 'D vector contains 0. All the documents should have at least one word in the vocabulary'
This can slow the training process, since for each batch iteration requires to compute the conditions D==0.
Maybe can make more sense to add it at the start of training, but the problem would persist for new documents in validation or out of corpus.

way 2:
replace (1 + self.M / D) with (1 + self.M / (D+1))
This solves easily the problem but it produces a small bias in the strength the prior should have (M is the fixed length that the prior version of the document has. When D>M the observed document is stronger than the prior version of it, and impacts more on the topics estimates. Viceversa when M>D).

I have added:

D = self.dtm.sum(axis=1)
assert np.any(D==0), 'all the documents should have positive length'

inside set_structure_from_dtm to prevent this from happening (v is a batch of the dtm)

silviatti · 2026-02-14T16:24:55Z

octis/models/RSM.py

+            else:
+                self.dtm = dtm
+                if doval:
+                    self.val_dtm = np.log(1 + val_dtm)


see similar comment in oRSM

silviatti · 2026-02-14T16:26:00Z

octis/models/RSM.py

+            """
+            mfh = self.visible2hidden(dtm)
+            vprob = self.hidden2visible(mfh)
+            lpub = np.exp(-np.nansum(np.log(vprob) * dtm) / np.sum(dtm))


is this log perplexity or simply perplexity?

Simply perplexity, I have to correct it or remove the np.exp

silviatti · 2026-02-14T16:27:29Z

octis/models/RSM.py

+            given a document term matrix
+            """
+            mfh = self.visible2hidden(dtm)
+            vprob = self.hidden2visible(mfh)


possibily clip vprob (e.g. vprob = np.clip(vprob, 1e-12, None)) to avoid vprob=0 and Nan values later?

also, can np.sum(dtm) = 0?

For the first problem, your clip solution is more elegant and efficient than my sad np.nansum operation, so I will use it.
For the second, I give an answer similar to the D=0 problem above.
It shouldn't be possible that dtm is a matrix of zeros. The dtm columns are as many as the words in the vocabulary initialized by the preprocessed corpus. To handle a case were the vocabulary is given and the corpus contains empty documents, an assert check can be added to solve it.

Maybe an error can be thrown in this way?

assert np.sum(dtm)==0, 'All the documents in the corpus seems to be empty for the given vocabulary'

silviatti · 2026-02-14T16:33:55Z

octis/models/oRSM.py

+                DTM[i, id] = count
+        return DTM
+
+    class oRSM_model(object):


I would recommend implementing a shared superclass which implements the shared functions, while you override the model-specific functions. This prevents you from having a lot of duplicated code e.g. _get_topic_word_matrix, get_topics, softmax, multinomial_sample, set_train_hyper, etc.

Ok, I will add a third file, RS_utils.py, together with RSM.py and oRSM.py in the models folder.

I have called the file RS_class.py, where the shared superclass is implemented. I hope I've done it in the way You mean. Many functions (like set_train_hyper) are quite different between the two classes, despite the common names they have.

silviatti · 2026-02-14T16:34:53Z

setup.py

        'Programming Language :: Python :: 3',
-        'Programming Language :: Python :: 3.7',
-        'Programming Language :: Python :: 3.8',
+        #'Programming Language :: Python :: 3.7',


just remove

Fede-Rausa · 2026-02-14T21:36:34Z

Hi Silvia, thanks for the review, it surfaces important problems.
I will try to solve everything and answer each question as soon as possible.

…ect RS and oRS to handle errors of empty documents.

Fede-Rausa · 2026-02-20T13:54:48Z

Hi Silvia, I have inserted the duplicated functions in RS_class.py , and removed them from the other scripts. Let me know If I have done everything in the good way, or if I have to correct something else. Thank You for your time.

Federico

Fede-Rausa added 2 commits November 17, 2025 16:29

add Replicated Softmax Model to octis

5bf3987

try to fix package build

1092537

silviatti self-requested a review November 24, 2025 00:20

remove 3.8 references

07f988c

Fede-Rausa commented Nov 24, 2025

View reviewed changes

Fede-Rausa added 8 commits November 24, 2025 15:58

adding tests for RSM and oRSM

442cd33

formatting oRSM.py and RSM.py

38c7fc5

adding training flexibility to rsm

bae76ec

making RSM code more modular for fine-tuning

c2a2dd7

Adding wrapper functions and error handling for vanishing gradient

5955867

update README and formatting

b6f0da3

update Readme

1a9bf42

updt README

bbd95a5

silviatti requested changes Feb 14, 2026

View reviewed changes

Fede-Rausa added 2 commits February 19, 2026 17:02

Shared superclass for RS and oRS to remove duplicated functions. Corr…

ebb379f

…ect RS and oRS to handle errors of empty documents.

formatting last changes with ruff

608463d

Conversation

Fede-Rausa commented Nov 21, 2025

Uh oh!

silviatti commented Nov 24, 2025

Uh oh!

Fede-Rausa left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

silviatti left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fede-Rausa commented Feb 14, 2026

Uh oh!

Fede-Rausa commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fede-Rausa left a comment •

edited

Loading