'nan' output for CoherenceModel when calculating 'c_v'

#### Problem description

I'm using LDA Multicore from gensim 3.8.3. I'm training on my train corpus and I'm able to evaluate the train corpus using the CoherenceModel within Gensim, to calculate the 'c_v' value. However, when I'm trying to calculate the 'c_v' over my test set, it throws the following warning: 

/Users/xxx/env/lib/python3.7/site-packages/gensim/topic_coherence/direct_confirmation_measure.py:204: RuntimeWarning: divide by zero encountered in double_scalars
  m_lr_i = np.log(numerator / denominator)
/Users/xxx/lib/python3.7/site-packages/gensim/topic_coherence/indirect_confirmation_measure.py:323: RuntimeWarning: invalid value encountered in double_scalars
  return cv1.T.dot(cv2)[0, 0] / (_magnitude(cv1) * _magnitude(cv2))

Furthermore, the output value of the CoherenceModel is 'nan' for some of the topics and therefore I'm not able to evaluate my model on a heldout test set. 

#### Steps/code/corpus to reproduce

I run the following code:

```python
coherence_model_lda = models.CoherenceModel(model=lda_model,
                                            topics=topic_list,
                                            corpus=corpus,
                                            texts=texts,
                                            dictionary=train_dictionary,
                                            coherence=c_v,
                                            topn=20
                                            )

coherence_model_lda.get_coherence() = nan  # output of aggregated cv value

coherence_model_lda.get_coherence_per_topic() = [0.4855137269180713, 0.3718866594914528, nan, nan, nan, 0.6782845928414825, 0.21638660621444444, 0.22337594485796397, 0.5975773184175942, 0.721341268732559, 0.5299883104816663, 0.5057903454344682, 0.5818051100304473, nan, nan, 0.30613393712342557, nan, 0.4104488627000527, nan, nan, 0.46028708148750963, nan, 0.394606654755219, 0.520685457293826, 0.5918440959767729, nan, nan, 0.4842068862650447, 0.9350644411891258, nan, nan, 0.7471151926054456, nan, nan, 0.5084926961568169, nan, nan, 0.4322957454944861, nan, nan, nan, 0.6460815758337844, 0.5810936860540964, 0.6636319471764807, nan, 0.6129884526648472, 0.48915614063099017, 0.4746167359622748, nan, 0.6826979166639224] # output of coherence value per topic 
```

I've tried to increase the EPSILON value within:
gensim.topic_coherence.direct_confirmation_measure, however, this doesn't have any effect. 

Furthermore, I've tried to change the input arguments (e.g. exclude the dictionary argument) but this also doesn't have any effect. I think the error has to do something with the fact that quite a large portion of the words within the test set is not available in the train set, however, the EPSILON value should be able to handle this. 

#### Versions

```
python
Python 3.7.2 (default, Dec  2 2020, 09:47:26) 
[Clang 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform; print(platform.platform())
Darwin-18.7.0-x86_64-i386-64bit
>>> import sys; print("Python", sys.version)
Python 3.7.2 (default, Dec  2 2020, 09:47:26) 
[Clang 9.0.0 (clang-900.0.39.2)]
>>> import struct; print("Bits", 8 * struct.calcsize("P"))
Bits 64
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.18.5
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.5.2
>>> import gensim; print("gensim", gensim.__version__)
gensim 3.8.3
>>> from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

'nan' output for CoherenceModel when calculating 'c_v' #3040

Problem description

Steps/code/corpus to reproduce

Versions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

'nan' output for CoherenceModel when calculating 'c_v' #3040

Description

Problem description

Steps/code/corpus to reproduce

Versions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions