Spacy: Decaying() Returns Unexpected Values

Created on 20 Mar 2019  Â·  5Comments  Â·  Source: explosion/spaCy

How to reproduce the behavior

Add the following code to any project which includes spaCy

from spacy import util

sizes = util.decaying(1., 10., 0.001)

size = next(sizes)
print (size)
assert size == 1.
size = next(sizes)
print (size)
assert size == 1. - 0.001
size = next(sizes)
print (size)
assert size == 1. - 0.001 - 0.001

This is a direct test of the example provided in the spaCy docs for util.decaying. It will fail on the first assertion.

Additionally the example shows an impossible sequence as this is a decaying series and 1 > 10. If you invert the start and end values you do get a sequence which never decays below the end.

Looking at the actual series you can see that it does not decay at a rate of 0.001 but some approximately close number lost to floating point math which eventually results in the ability to get nearly duplicate values in the series:

these values are next to each other when viewing the series defined by
decaying(1., 10., 0.001)

0.8257638315441783
0.8250825082508251

There is also a problem with how the decay factor is considered. If you use a larger factor the results are completely non-sensical:

dropout = decaying(10., 1., 0.45)
6.8965517241379315
5.2631578947368425
4.25531914893617
3.5714285714285716
3.076923076923077
2.7027027027027026
2.4096385542168672
2.173913043478261
1.9801980198019802
1.8181818181818181
1.680672268907563
1.5625
1.4598540145985401
1.36986301369863
1.2903225806451613
1.2195121951219514
1.1560693641618496
1.098901098901099
1.0471204188481675
1.0

I expressed this issue on twitter in this thread. This issue is mainly being opened so I can make the PR per the contribution guidelines.

Your Environment

  • spaCy version: 2.0.18
  • Platform: Darwin-18.2.0-x86_64-i386-64bit (macOS Mojave 10.14.3)
  • Python version: 3.7.1
  • Models: en
bug

All 5 comments

Thanks for this!

I was sure I replied to that comment on Twitter, but I don't see it there. I guess I must not have. Maybe I lost connection after typing the tweet. Sorry!

Anyway, I would've encouraged you to open an issue, and noted that this type of thing is one of my weaker points.

@honnibal Here it is btw – Twitter just makes it difficult to find nested replies: https://twitter.com/honnibal/status/1100848503759216640

I have pushed up a branch and made a PR - slk/issue#3447- you can see the change I've made to util.py. The change I've made brings the model for calculating the linear series more in line with how other methods calculate/return a series (i.e. compounding, stepping) as well as simplifies the logic so that it is a truly linear series.

I think this does address the core problem with how the method was originally defined but doesn't address the error I mentioned above about floating point inaccuracy. Here are the outputs for my version of decaying (new) and the current version of decaying (old).

decaying( 10., 1., .001)

| 'old' | 'new' |
|---|---|
| 9.990009990009991 | 10.0 |
| 9.980039920159681 | 9.999 |
| 9.970089730807578 | 9.998000000000001 |
| 9.9601593625498 | 9.997000000000002 |
| 9.950248756218906 | 9.996000000000002 |
| 9.940357852882704 | 9.995000000000003 |
| 9.9304865938431 | 9.994000000000003 |
| 9.920634920634921 | 9.993000000000004 |
| 9.910802775024777 | 9.992000000000004 |
| 9.900990099009901 | 9.991000000000005 |
| 9.891196834817015 | 9.990000000000006 |
| 9.881422924901186 | 9.989000000000006 |
| 9.87166831194472 | 9.988000000000007 |
| 9.861932938856016 | 9.987000000000007 |
| 9.852216748768473 | 9.986000000000008 |
| 9.84251968503937 | 9.985000000000008 |
| 9.832841691248772 | 9.984000000000009 |
| 9.823182711198427 | 9.98300000000001 |
| 9.813542688910697 | 9.98200000000001 |
| 9.803921568627452 | 9.98100000000001 |
| 9.794319294809013 | 9.980000000000011 |

Merged, thanks!

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings