Scikit-learn: Fix documentation of default values in all classes

Created on 2 Dec 2019 · 118Comments · Source: scikit-learn/scikit-learn

Description

The documentation of default values in many classes is either not included, inconsistent in how it is written, or out-of-date. I would like to gather a few people to work on the default value documentation for every class as there are a ton of classes where these issues exist. I have been told that the default values should be documented as "default=<'value'>" and so I am creating this issue under that assumption.

Solution

Here are a few things that I have seen for the parameters which should be changed:

no mention of whether there is a default should be checked against code as a few parameters are missing this entirely
"optional" should be changed to "default=<'value'>"
make sure how the default values are documented is consistent within the class, i.e. change everything to the format "default=<'value'>"
Modify a single file per PR

If a few people work on a few classes each, then this should be done in no time! These should all be fairly simple fixes.

#### Examples
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html
The link above is an example where default values are not indicated but the parameters say "optional", and where those with default values indicated are all inconsistently documented.

Sprint good first issue

Source

cgsavard

👍5

Most helpful comment

Logically speaking, if a param is optional, shouldn't the default be None always? Having a parameter with a default value other than None suggests it should be required.

If there is a default, this usually means that the literature has found this to be a sensible default value which also suggests that this parameter has an impact on performance and thus it shouldn't be optional, but should just mention what the default is. Those seem closer to required parameters by definition, we just happened to make a sensible choice for the user so they can change it or not.

Or more practically speaking, are there currently any optional parameters that we've found which have numeric default values, but for which specifying None will raise an exception? That would also suggest that the parameter is actually required, but that a sensible default has been chosen based on literature/research.

Or maybe I've been confusing the meaning of required and optional all these years? Lol. Would definitely love to help on this either way!

jmwoloso on 6 Dec 2019

👍4

All 118 comments

Hello @cgsavard, I would like to work on this. Can I start looking at the AgglomerativeClustering class?

vachanda on 3 Dec 2019

@vachanda Go for it! We can continue posting here which ones we work on so that others know.

cgsavard on 3 Dec 2019

Thanks for coordinating this @cgsavard

Note to contributors: please follow the guidelines under: https://scikit-learn.org/stable/developers/contributing.html#guidelines-for-writing-documentation

adrinjalali on 3 Dec 2019

@cgsavard, Is there a list of classes that have discrepancies or do we have to go through each of them and update them?

vachanda on 4 Dec 2019

@vachanda I do not have a list, unfortunately. I have just been going through the files and seeing what needs to be updated.

cgsavard on 4 Dec 2019

I am working on AffinityPropagation, SpectralCoclustering, SpectralBiclustering, and Birch.

cgsavard on 4 Dec 2019

I am working on FeatureAgglomeration, KMeans, and MiniBatchKMeans.

vachanda on 6 Dec 2019

Logically speaking, if a param is optional, shouldn't the default be None always? Having a parameter with a default value other than None suggests it should be required.

Or maybe I've been confusing the meaning of required and optional all these years? Lol. Would definitely love to help on this either way!

jmwoloso on 6 Dec 2019

👍4

@jmwoloso We were really inconsistent regarding the usage of optional and therefore we recently decided to remove it.

glemaitre on 6 Dec 2019

🚀1

i want to contribute as well. can i go ahead with this

cyrus303 on 6 Dec 2019

@glemaitre ok, that definitely makes sense. so then we're removing the optional verbage all together, right, while also noting default values in the doc strings?

should each of these that we find be opened as an issue separately or how are we staging all of this work that we're doing since multiple people are working on multiple things relating to this single issue?

jmwoloso on 6 Dec 2019

@cyrus303 @jmwoloso You can get a class (a module maximum) and correct it. The idea is to remove the optional and add a default when there is one (there is usually one). Since we are touching the documentation, we should make sure that the style on the line follow our new style guide: https://scikit-learn.org/dev/developers/contributing.html#guidelines-for-writing-documentation

You can mention which of the class/module you are working, open a link a PR to avoid duplicate effort :). Looking forward to reviewing it.

glemaitre on 6 Dec 2019

👍1

Hey! I will work on tree classes (tree.DecisionTreeClassifier, tree.DecisionTreeRegressor, tree.ExtraTreeClassifier and tree.ExtraTreeRegressor).

alfaro96 on 11 Dec 2019

I will also fix this issue for the neighbors module.

alfaro96 on 17 Dec 2019

I'll take the ensemble module.

jmwoloso on 21 Dec 2019

@glemaitre any preference on bool vs. boolean? seeing a mix of both in ensemble, even in the same class. might as well get those in shape while i'm doing defaults.

EDIT:

ditto for int vs integer. I'm assuming int on that one, but wanted to confirm.

EDIT (again):

also seeing docstrings with inconsistent values relative to the __init__ signature for that class, e.g.:

min_impurity_split for RandomForestClassifier

the __init__ signature has min_impurity_split=None while the docstrings for it say min_impurity_split : float, (default=0). I would assume update the docstrings to match the signature since we'd want to keep the behavior of the class consistent (i.e. we want the same defaults passed in upon instantiation)?

jmwoloso on 21 Dec 2019

@jmwoloso Could you refer to https://scikit-learn.org/stable/developers/contributing.html#guidelines-for-writing-documentation. Basically you should default to the python type name (bool, str, int, float)

the __init__ signature has min_impurity_split=None while the docstrings for it say min_impurity_split : float, (default=0). I would assume update the docstrings to match the signature since we'd want to keep the behavior of the class consistent (i.e. we want the same defaults passed in upon instantiation)?

We should match the parameter in the function signature. This value default parameter has changed and the docstring was not updated.

glemaitre on 21 Dec 2019

👍2

Hi @cgsavard , I'd like to contribute but this is going to be my first time so need some hand holding. I'm quite familiar with python, somewhat handy with text editors and recently went through the fork -> clone -> edit -> PR workflow tutorial here. Please advise next step... Thank you!

mghah on 22 Dec 2019

Hi @cgsavard ,
Can I please work on Imputer ?

pulkitmehtawork on 22 Dec 2019

Hi @cgsavard , I want to work on linear_model class.

ankishb on 22 Dec 2019

I am also working on Neural Network, Decomposition, Feature Extraction, Metrics and Preprocess classes.

ankishb on 22 Dec 2019

can some one please check my pr #15964 and see why code cov is failing . This is my first time for contribution . Please guide.

pulkitmehtawork on 24 Dec 2019

Ignore codecov. This is a false positive since we don't touch code. I will review soon the PR

Sent from my phone - sorry to be brief and potential misspell.

glemaitre on 24 Dec 2019

👍2

I just made my first contribution #15988

mghah on 30 Dec 2019

I will take the naive_bayes module.

tamirlan1 on 3 Jan 2020

I just made my first contribution #16019

tamirlan1 on 3 Jan 2020

Hi All, working on sklearn/neighbors, thank you.

mghah on 4 Jan 2020

Contributed to sklearn/semi_supervised.Thanks

shubchat on 7 Jan 2020

Hi @cgsavard , I'd like to contribute too, i will take sklearn/svm module. Thanks

tituschristian on 8 Jan 2020

Contributed to sklearn/semi_supervised.Thanks
Are there some further edits needed on the PR #16042

shubchat on 9 Jan 2020

@glemaitre in #16105, I had to dig a little deep into constructs to fetch default values, docstrings seemed inaccurate and outdated at times.

Also I tried to use less ambiguous, concise and mathematically rigorous way of defining ranges of parameters. for example, I changed positive float to float in (0, inf] or 0<= shrinkage <=1 to float in (0, 1). Long story short, I did the best i could to be concise and accurate but please pay 5% more attention reviewing this PR. Thanks.

mghah on 11 Jan 2020

@cgsavard , this is a very nice issue for a sprint! If you are ok with that I'm planning to add it to our Sprint list. I have summarized the classes that have been addressed by a PR already, and their correspondent PR here.
Do you mind to link the gist in the issue description? This will make the information available from the beginning. May I also ask you to clarify in the description that each PR should address one file (maximum one module) at a time as in explained here? This will really help contributors and reviewers! Thanks a lot!

cmarmo on 16 Jan 2020

For those interested in this issue, the command

git grep "optional.*default"

will output the files still affected by this problem (thanks @ogrisel! :) ).

cmarmo on 16 Jan 2020

🎉1

@cgsavard Hello, I would like to work on model_selection @WiMLDS

marielledado on 25 Jan 2020

@lopusz and I want to work on random_projection.py

Have fun to everyone!

@adrinjalali @noatamir @WiMLDS

magda-zielinska on 25 Jan 2020

@ETay203 and I would like to work on mean_shift @WiMLDS_Berlin sprint.

mjmolina on 25 Jan 2020

@magda-zielinska and I want to work on pipeline.py

@adrinjalali @noatamir @WiMLDS

lopusz on 25 Jan 2020

@lopusz and @magda-zielinska and I want to work on kernel_approximation.py

fraboeni on 25 Jan 2020

I'm going to tackle the _optics.py now

ETay203 on 25 Jan 2020

Reopening: was closed by "Fixes" keyword in #16216.

cmarmo on 26 Jan 2020

Reopening: was closed by "Fixes" keyword in #16207

cmarmo on 27 Jan 2020

I'm going to tackle the sklearn/linear_model/_coordinate_descent.py now

hs-nazuna on 29 Jan 2020

I cleaned base.py and submitted PR

lopusz on 29 Jan 2020

I cleaned discriminant_analysis.py and submitted a PR

lopusz on 30 Jan 2020

I will look now at sklearn/gaussian_process/*.py

lopusz on 2 Feb 2020

There is already a long pr for the GPs @lopusz :)

adrinjalali on 2 Feb 2020

@lopusz my apologies, that PR was touching other issues of the GP module, you can go ahead and work on it if you don't mind :)

adrinjalali on 3 Feb 2020

@adrinjalali Thank you for keeping an eye on it!

Indeed, I have not scanned the open PRs well enough, so the fact that GPs are not taken is more of an accident ;)

I will make sure to keep track of what is PRed.

And yes PR for GPs is comming ;)

lopusz on 5 Feb 2020

Is there anything else to be done here?

andrewasche on 19 Apr 2020

I am working on sklearn/decomposition/_dict_learning.py

reshamas on 28 May 2020

what is left to do? I'm open to help. . .

andrewasche on 31 May 2020

Figuring out what's left is probably a good place to start helping :)

adrinjalali on 5 Jun 2020

Hi, I've been looking through to see what's left, I think there are still some updates to make in some of the modules looked at previously.
I was going to work through these, starting with the cluster module and could raise a PR for each module as I go along?
This is my first contribution so please let me know if I'm not following the process correctly etc.
Thanks!

pgithubs on 5 Jun 2020

This is the list of functions, classes and modules left to fix:

[x] sklearn.feature_selection.SelectorMixin
[x] sklearn.config_context
[x] sklearn.set_config
[x] sklearn.calibration.CalibratedClassifierCV
[x] sklearn.cluster.OPTICS
[x] sklearn.cluster.SpectralClustering
[x] sklearn.cluster.affinity_propagation
[x] sklearn.cluster.cluster_optics_dbscan
[x] sklearn.cluster.cluster_optics_xi
[x] sklearn.cluster.compute_optics_graph
[x] sklearn.cluster.mean_shift
[x] sklearn.cluster.spectral_clustering
[x] sklearn.cluster.ward_tree
[x] sklearn.cross_decomposition.CCA
[x] sklearn.cross_decomposition.PLSCanonical
[x] sklearn.cross_decomposition.PLSRegression
[x] sklearn.cross_decomposition.PLSSVD
[x] sklearn.datasets
[x] sklearn.decomposition
[x] sklearn.dummy
[x] sklearn.ensemble.HistGradientBoostingRegressor (experimental)
[x] sklearn.ensemble.HistGradientBoostingRegressor (experimental)
[x] sklearn.feature_extraction.image.grid_to_graph
[x] sklearn.feature_extraction.image.img_to_graph
[x] sklearn.feature_extraction.text.CountVectorizer
[x] sklearn.feature_extraction.text.HashVectorizer
[x] sklearn.feature_selection
[x] sklearn.impute
[x] sklearn.inspection.partial_dependence
[x] sklearn.inspection.permutation_importance
[x] sklearn.inspection.permutation_importance
[x] sklearn.inspection.PartialDependenceDisplay
[x] sklearn.inspection.plot_partial_dependence
[x] sklearn.isotonic.IsotonicRegression
[x] sklearn.isotonic.check_increasing
[x] sklearn.isotonic.isotonic_regression
[x] sklearn.kernel_approximation
[x] sklearn.kernel_ridge
[x] sklearn.linear_model.PassiveAggressiveClassifier
[x] sklearn.linear_model.LassoLars
[x] sklearn.linear_model.OrthogonalMatchingPursuit
[x] sklearn.linear_model.HuberRegressor
[x] sklearn.linear_model.RANSACRegressor
[x] sklearn.linear_model.TheilSenRegressor
[x] sklearn.linear_model.PassiveAggressiveRegressor
[x] sklearn.linear_model.orthogonal_mp
[x] sklearn.linear_model.orthogonal_mp_gram
[x] sklearn.manifold
[x] sklearn.metrics (except sklearn.metrics.confusion_matrix, sklearn.metrics.roc_auc_score, sklearn.metrics.max_error sklearn.metrics.mean_poisson_deviance, sklearn.metrics.mean_gamma_deviance, sklearn.metrics.mean_tweedie_deviance, sklearn.metrics.plot_confusion_matrix, sklearn.metrics.plot_precision_recall_curve)
[x] sklearn.mixture
[x] sklearn.model_selection.GridSearchCV
[x] sklearn.model_selection.ParameterGrid
[x] sklearn.model_selection.ParameterSampler
[x] sklearn.model_selection.RandomizedSearchCV
[x] sklearn.model_selection.fit_grid_point
[x] sklearn.multiclass
[x] sklearn.multioutput
[x] sklearn.neural_network
[x] sklearn.preprocessing
[x] sklearn.random_projection
[x] sklearn.tree.export_graphviz
[x] sklearn.tree.export_text
[x] sklearn.tree.plot_tree
[x] sklearn.utils

Hope I am not missing anything.

alfaro96 on 5 Jun 2020

Hi. I'll go try make a pass in the feature_selection documentation

kohakukun on 6 Jun 2020

We take the sklearn.mixture part

violetr on 6 Jun 2020

Taking cross_decomposition part

kohakukun on 6 Jun 2020

For the 2020 Scikit-Learn Sprint, @icoder18 and I are taking the sklearn.random_projection part

mobigelow on 6 Jun 2020

@adrinjalali we completed sklearn/mixture

violetr on 6 Jun 2020

Working on the sklearn.linear_model for the sprint with @genvalen

parthsuresh on 6 Jun 2020

Take sklearn.calibration.CalibratedClassifierCV

asubramaniyan on 6 Jun 2020

Working on this for sklearn.utils.validation

neinkeinkaffee on 6 Jun 2020

Next we'll be tackling sklearn.utils.random

neinkeinkaffee on 6 Jun 2020

working on sklearn.impute

mobigelow on 6 Jun 2020

Working on sklearn.tree.plot_tree

madelgi on 6 Jun 2020

Table 14 will take sklearn.neural_network

amy12xx on 6 Jun 2020

Take sklearn.kernel_approximation

asubramaniyan on 6 Jun 2020

Taking sklearn.inspection

icoder18 on 6 Jun 2020

Table 14 will take sklearn.preprocessing

amy12xx on 6 Jun 2020

Taking datasets

mobigelow on 6 Jun 2020

Taking sklearn.mixture #17509

amy12xx on 6 Jun 2020

List updated.

Thank you all!

alfaro96 on 6 Jun 2020

Taking sklearn.metrics for sprint

genvalen on 7 Jun 2020

Taking model_selection module

kohakukun on 7 Jun 2020

@glemaitre Can we update the description of this to include it would be best to submit one file at a time?

reshamas on 9 Jun 2020

Hello I would like to contribute. It is my first time though ... And it is not clear for me how I can know on which module there is still work to be done ? Thanks !

clmbst on 19 Jun 2020

https://github.com/scikit-learn/scikit-learn/issues/15761#issuecomment-639461778 contains the list of modules left to fix.

alfaro96 on 19 Jun 2020

Thanks. Take sklearn.decomposition then.

clmbst on 19 Jun 2020

I am working on 'sklearn.isotonic.isotonic_regression'

Cristinamulas on 20 Jun 2020

I am working on 'sklearn.multiclass.py'

Cristinamulas on 20 Jun 2020

Hi, may I try to take the remaining on sklearn.tree? This would be my first time contributing as well.

m-vd on 4 Jul 2020

Thanks for checking in, great to have your help! Please proceed; I think all of our sprint updates have been closed out.

On Jul 4, 2020, at 10:45, Ivan Wiryadi notifications@github.com wrote:

Hi, may I try to take the remaining on sklearn.tree? This would be my first time contributing as well.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.

mobigelow on 4 Jul 2020

Hi, I would like to make my first contribution. Can I take sklearn.multioutput?

franslarsson on 5 Jul 2020

I'll continue with sklearn.utils, starting with _encode.py

franslarsson on 8 Jul 2020

I am working on sklearn/decomposition/_dict_learning.py

TahiriNadia on 11 Jul 2020

I'm working on sklearn.kernel_ridge in the sprint

mikeaalv on 11 Jul 2020

Hi, I will start working on sklearn.feature_extraction.image.img_to_graph

tijanajovanovic on 11 Jul 2020

I am working on sklearn.feature_extraction.text.CountVectorizer

Probinette4 on 11 Jul 2020

I am working on sklearn.sklearn.kernel_ridge

TahiriNadia on 11 Jul 2020

I am working on sklearn.ensemble.HistGradientBoostingRegressor

Hoda1394 on 11 Jul 2020

"I am working on this"

on this? @Hoda1394

TahiriNadia on 11 Jul 2020

"I am working on this"

on this? @Hoda1394

@TahiriNadia corrected.

Hoda1394 on 11 Jul 2020

👍1

@cgsavard Hey, Can I work on this? I'm a first-timer

Praveenk8051 on 24 Jul 2020

I'll work on the files in sklearn.datasets.

JinLi711 on 25 Jul 2020

Can i work on sklearn.linear_model._least_angle.py

sadakmed on 2 Aug 2020

@glemaitre I'm working on sklearn.linear_model._least_angle.py and i found an inconsistency of the use of method ='lar' sometimes it indicates lars sometimes lar, this inconsistency is also in code (not only in documentation), I can see that lars is the right one, could you confirm it, and i will make a PR.

sadakmed on 2 Aug 2020

working on 'sklearn/ensemble/_hist_gradient_boosting/binning.py'

sadakmed on 2 Aug 2020

files need change:

sklearn/_config.py
sklearn/dummy.py
sklearn/multioutput.py
sklearn/linear_model/_huber.py
sklearn/linear_model/_theil_sen.py
sklearn/linear_model/_ridge.py
sklearn/linear_model/_omp.py
sklearn/linear_model/_sag.py
sklearn/externals/_lobpcg.py
sklearn/externals/_lobpcg.py
sklearn/utils/extmath.py
sklearn/utils/__init__.py
sklearn/utils/graph.py
sklearn/utils/_mocking.py
sklearn/utils/sparsefuncs.py
sklearn/neighbors/_base.py
sklearn/gaussian_process/_gpc.py
sklearn/gaussian_process/kernels.py
sklearn/model_selection/_validation.py
~sklearn/decomposition/_fastica.py~
~sklearn/decomposition/_dict_learning.py~
~sklearn/decomposition/_factor_analysis.py~
~sklearn/decomposition/_incremental_pca.py~
~sklearn/decomposition/_lda.py~
~sklearn/decomposition/_pca.py~
~sklearn/decomposition/_truncated_svd.py~
~sklearn/decomposition/_sparse_pca.py~
~sklearn/decomposition/_nmf.py~
sklearn/manifold/_mds.py
sklearn/manifold/_spectral_embedding.py
sklearn/manifold/_t_sne.py
sklearn/ensemble/_hist_gradient_boosting/grower.py
sklearn/ensemble/_hist_gradient_boosting/binning.py
sklearn/metrics/_ranking.py
sklearn/tree/_classes.py
sklearn/preprocessing/_discretization.py
sklearn/preprocessing/_encoders.py line 620
sklearn/neural_network/_multilayer_perceptron.py line 1054
sklearn/covariance/_robust_covariance.py

Please do check if someone is already working/worked on the file you choosed

sadakmed on 2 Aug 2020

@sadakmed, for all the "decomposition files", there is an ongoing pull request #17739.

clmbst on 3 Aug 2020

👍1

working on "gaussian_process.GaussianProcessRegressor" and "neighbors._base.py"

sadakmed on 3 Aug 2020

Hi, I am new, and I would like to start contributing. Do you still need some help on this issue? is there any file you still need help with?

boricles on 31 Aug 2020

Hey @boricles!

Have a look to https://github.com/scikit-learn/scikit-learn/issues/15761#issuecomment-639461778 for a list with the modules still to be fixed.

alfaro96 on 31 Aug 2020

@alfaro96 thanks. I did a quick look just now. I will select a module tonight, and work on it.

boricles on 31 Aug 2020

I am working on sklearn/config_context

boricles on 1 Sep 2020

Hey, thought I'd see if I could help with the docs.

@alfaro96 I'd like to work on sklearn.feature_extraction.text.CountVectorizer, if it hasn't already been taken, especially because I've personally encountered some pitfalls when working with Vectorizers in the past.

Also, I noticed that although sklearn.model_selection.learning_curve was updated, there's an out-of-date tutorial using the old documentation, should I leave it be? Or is it worth updating?

madprogramer on 5 Sep 2020

Hi @alfaro96,

after edits:
I see sklearn.config_context and sklearn.set_config from sklearn.config_config.py were fixed so it can be checked out from the task list.

I would like to work on sklearn.utils. I saw only once instance of parameter documentation where 'optional' is used in. That means that I need to fix only that instance, correct? It is in sklearn.utils._mocking.py

haiatn on 5 Sep 2020

Hey, thought I'd see if I could help with the docs.

Hey @madprogramer,

@alfaro96 I'd like to work on sklearn.feature_extraction.text.CountVectorizer, if it hasn't already been taken, especially because I've personally encountered some pitfalls when working with Vectorizers in the past.

~I have had a look to the checklist and sklearn.feature_extraction.text.CountVectorizer reference and it does not seem to be fixed. A PR would be welcome.~

Edit: The sklearn.feature_extraction.text.CountVectorizer is already fixed.

Also, I noticed that although sklearn.model_selection.learning_curve was updated, there's an out-of-date tutorial using the old documentation, should I leave it be? Or is it worth updating?

It is worth it updating, although this should be done in a separate PR.

Thank you!

alfaro96 on 6 Sep 2020

👍1

Hi @alfaro96,

Hey @haiatn,

after edits:
I see sklearn.config_context and sklearn.set_config from sklearn.config_config.py were fixed so it can be checked out from the task list.

I have updated the checklist.

I would like to work on sklearn.utils. I saw only once instance of parameter documentation where 'optional' is used in. That means that I need to fix only that instance, correct? It is in sklearn.utils._mocking.py

That is the idea, although the classes in the sklearn.utils._mocking.py file are not part of the public API, so I do not think that it is worthy to update them.

Nevertheless, it would be nice if you could work in any of the other functions, classes and modules that are pending to be fixed.

Thank you!

alfaro96 on 6 Sep 2020

👍1

I looked at the checklist. From what I saw the following can be checked from the checklist:

sklearn.feature_extraction.image.img_to_graph
sklearn.isotonic.IsotonicRegression
sklearn.isotonic.check_increasing
I did not find the file sklearn.ensemble.HistGradientBoostingRegressor but all of sklearn.ensemble is OK

Can I work on sklearn.manifold._spectral_embedding and sklearn.feature_extraction.text.HashVectorizer? I will do it in seperate PR. I think they are the only files left that need fixing (assuming sklearn.feature_extraction.text.CountVectorizer is taken).

haiatn on 6 Sep 2020

I looked at the checklist. From what I saw the following can be checked from the checklist:

sklearn.feature_extraction.image.img_to_graph

sklearn.isotonic.IsotonicRegression

sklearn.isotonic.check_increasing

Thank you @haiatn, I have updated the checklist.

I did not find the file sklearn.ensemble.HistGradientBoostingRegressor but all of sklearn.ensemble is OK

The sklearn.ensemble.HistGradientBoostingClassifier and sklearn.ensemble.HistGradientBoostingRegressor are in this file: scikit-learn/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py. However, they have been already fixed.

Can I work on sklearn.manifold._spectral_embedding and sklearn.feature_extraction.text.HashVectorizer? I will do it in seperate PR. I think they are the only files left that need fixing (assuming sklearn.feature_extraction.text.CountVectorizer is taken).

I have had a look to the sklearn.manifold module and sklearn.feature_extraction.text.HashingVectorizer and they have been already fixed (I have updated the checklist accordingly).

Nevertheless, there are several functions in the sklearn.utils module that should be still fixed.

Thank you @haiatn, we really appreciate your help!

alfaro96 on 7 Sep 2020

I will now work on sklearn.utils._estimator_html_repr, sklearn.utils.deprecation and sklearn.utils._testing

haiatn on 11 Sep 2020

I will finish sklearn.utils. There are only 3 files I found that need fixing.

haiatn on 15 Sep 2020

hey @alfaro96 ,
could you review my open pull requests? I think they are the last ones.

18360 #18385 #18386

haiatn on 18 Sep 2020

Hey @haiatn!

I have already have a look to your open PRs.

Thank you!

alfaro96 on 20 Sep 2020

👍1

Now that we merged what's left off sklearn.utils and it was the last on the checklist, did we finish?

haiatn on 24 Sep 2020

There is one last open pull request #18025, then this issue could be eventually closed.

cmarmo on 24 Sep 2020

👍1

Hello,
I want to start contributing. Is there any class pending for fixing doc of default values? If any then I can take it up.

mynkdsi1011 on 24 Sep 2020

Hey new to the open source I am looking forward to fixing doc by any chance something is left which needs to be fixed

k-yash on 4 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

GridSearchCV.fit(...,n_job=-1) might contain bug in parallelism

tluocs · 3Comments

Improve class design for AgglomerativeClustering and FeatureAgglomeration (was pooling_func in AgglomerativeClustering doesn't work)

yinruiqing · 3Comments

EM algorithm in GMM fails for one-dimensional datasets using 0.16.1 (but fine with 0.15.2)

rebeccaroisin · 4Comments

sklearn.cross_validation LabelKFold gives warnings and erros

StevenLOL · 3Comments

Import error when loading a pickled model pulled from Pipeline

bmulas1535 · 3Comments