-
Notifications
You must be signed in to change notification settings - Fork 28
Closed
Labels
BugSomething isn't workingSomething isn't workingHelp wantedExtra attention is neededExtra attention is needed
Description
#[test]
fn test_zero_observations() {
let a = Array2::<f32>::zeros((2, 0));
let pearson = a.pearson_correlation();
assert_eq!(pearson.shape(), &[2, 2]);
let all_nan_flag = pearson.iter().map(|x| x.is_nan()).fold(true, |acc, flag| acc & flag);
assert_eq!(all_nan_flag, true);
}
This test fails on Travis, for #5, while it succeeds on my local machine.
It's weird - any idea? @jturner314
I am trying to reproduce the error, but I am failing.
Metadata
Metadata
Assignees
Labels
BugSomething isn't workingSomething isn't workingHelp wantedExtra attention is neededExtra attention is needed
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
LukeMathWalker commentedon Sep 29, 2018
On Travis CI the
pearson
matrix is:If I look into the covariance matrix obtained using the
cov
method I get:This is due to the
dot
method: with zero observationsdenoised
is an array of shape[2, 0]
, and multiplying it with its own transpose gives us a matrix filled with zeroes. This seems to be coherent withdot
implementation whenblas
is not active:Applying the same line of reasoning,
std_matrix
inpearson_correlation
is a matrix filled with zeroes. What should I expect from the component-wise division of two zero-filled matrices? NaN? Random infs?LukeMathWalker commentedon Sep 29, 2018
I have added a test to check that the covariance matrix with zero observations is indeed a matrix filled with zeroes and once again I get different results on Travis CI vs local machine:
jturner314 commentedon Sep 29, 2018
You've discovered a couple of bugs:
It appears that
.dot()
of a 2 × 0 matrix and a 0 × 2 matrix returns a 2 × 2 matrix of uninitialized memory, when it should return a 2 × 2 matrix of zeros.The
test_covariance_zero_observations
test I added in Document and test .cov() for empty arrays #4 is broken in a couple of ways:cov.mapv(|x| x.is_nan())
doesn't do anything; it should have beencov.mapv(|x| assert!(x.is_nan()))
. But, NaN values aren't correct in this case (compare to NumPy, which gives NaNs forddof = 0
and zeros forddof = -1
); it really should have beencov.mapv(|x| assert_eq!(x, 0.))
, as you discovered.I don't have time right now to fix these things, but I'll take a look later.
jturner314 commentedon Sep 30, 2018
I think #7 should fix the nondeterministic behavior you are observing.
You can see the behavior of floating point division by zero with this program:
The result is:
In other words, zero divided by zero is NaN, while non-zero divided by zero is ±inf (sign determined by both operands).
For the
shape = (2, 0)
case, the correlation coefficient should be a 2×2 array of NaNs. You can confirm this with NumPy:LukeMathWalker commentedon Oct 1, 2018
Great - thanks for the quick fix!
It turns out I was looking at the wrong dot function 😛
I'll merge master back into my branch to finalize the PR then!