Skip to content

Commit 36322ce

Browse files
authored
[dask][docs] initial setup for Dask docs (#3822)
* initial Dask docs * fix MRO * address review comments
1 parent 98a85a8 commit 36322ce

File tree

12 files changed

+50
-16
lines changed

12 files changed

+50
-16
lines changed

README.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -87,8 +87,6 @@ ML.NET (.NET/C#-package): https://github.com/dotnet/machinelearning
8787

8888
LightGBM.NET (.NET/C#-package): https://github.com/rca22/LightGBM.Net
8989

90-
Dask-LightGBM (distributed and parallel Python-package): https://github.com/dask/dask-lightgbm
91-
9290
Ruby gem: https://github.com/ankane/lightgbm
9391

9492
LightGBM4j (Java high-level binding): https://github.com/metarank/lightgbm4j

docs/FAQ.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ You may also ping a member of the core team according to the relevant area of ex
2424
- `@chivee <https://github.com/chivee>`__ **Qiwei Ye** (C++ code / Python-package)
2525
- `@btrotta <https://github.com/btrotta>`__ **Belinda Trotta** (C++ code)
2626
- `@Laurae2 <https://github.com/Laurae2>`__ **Damien Soukhavong** (R-package)
27-
- `@jameslamb <https://github.com/jameslamb>`__ **James Lamb** (R-package)
27+
- `@jameslamb <https://github.com/jameslamb>`__ **James Lamb** (R-package / Dask-package)
2828
- `@wxchan <https://github.com/wxchan>`__ **Wenxuan Chen** (Python-package)
2929
- `@henry0312 <https://github.com/henry0312>`__ **Tsukasa Omoto** (Python-package)
3030
- `@StrikerRUS <https://github.com/StrikerRUS>`__ **Nikita Titov** (Python-package)

docs/Parallel-Learning-Guide.rst

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Follow the `Quick Start <./Quick-Start.rst>`__ to know how to use LightGBM first
77

88
**List of external libraries in which LightGBM can be used in a distributed fashion**
99

10-
- `Dask-LightGBM`_ allows to create ML workflow on Dask distributed data structures.
10+
- `Dask API of LightGBM <./Python-API.rst#dask-api>`__ (formerly it was a separate package) allows to create ML workflow on Dask distributed data structures.
1111

1212
- `MMLSpark`_ integrates LightGBM into Apache Spark ecosystem.
1313
`The following example`_ demonstrates how easy it's possible to utilize the great power of Spark.
@@ -134,8 +134,6 @@ Example
134134

135135
- `A simple parallel example`_
136136

137-
.. _Dask-LightGBM: https://github.com/dask/dask-lightgbm
138-
139137
.. _MMLSpark: https://aka.ms/spark
140138

141139
.. _The following example: https://github.com/Azure/mmlspark/blob/master/notebooks/samples/LightGBM%20-%20Quantile%20Regression%20for%20Drug%20Discovery.ipynb

docs/Python-API.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,16 @@ Scikit-learn API
3333
LGBMRegressor
3434
LGBMRanker
3535

36+
Dask API
37+
--------
38+
39+
.. autosummary::
40+
:toctree: pythonapi/
41+
42+
DaskLGBMClassifier
43+
DaskLGBMRegressor
44+
DaskLGBMRanker
45+
3646
Callbacks
3747
---------
3848

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939

4040
# -- mock out modules
4141
MOCK_MODULES = ['numpy', 'scipy', 'scipy.sparse',
42-
'sklearn', 'matplotlib', 'pandas', 'graphviz']
42+
'sklearn', 'matplotlib', 'pandas', 'graphviz', 'dask', 'dask.distributed']
4343
for mod_name in MOCK_MODULES:
4444
sys.modules[mod_name] = Mock()
4545

python-package/README.rst

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,12 +183,22 @@ Run ``python setup.py install --bit32``, if you want to use 32-bit version. All
183183

184184
If you get any errors during installation or due to any other reasons, you may want to build dynamic library from sources by any method you prefer (see `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst>`__) and then just run ``python setup.py install --precompile``.
185185

186-
187186
Build Wheel File
188187
****************
189188

190189
You can use ``python setup.py bdist_wheel`` instead of ``python setup.py install`` to build wheel file and use it for installation later. This might be useful for systems with restricted or completely without network access.
191190

191+
Install Dask-package
192+
''''''''''''''''''''
193+
194+
To install all additional dependencies required for Dask-package, you can append ``[dask]`` to LightGBM package name:
195+
196+
.. code:: sh
197+
198+
pip install lightgbm[dask]
199+
200+
Or replace ``python setup.py install`` with ``pip install -e .[dask]`` if you are installing the package from source files.
201+
192202
Troubleshooting
193203
---------------
194204

python-package/lightgbm/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,10 @@
1919
plot_tree, create_tree_digraph)
2020
except ImportError:
2121
pass
22+
try:
23+
from .dask import DaskLGBMRegressor, DaskLGBMClassifier, DaskLGBMRanker
24+
except ImportError:
25+
pass
2226

2327

2428
dir_path = os.path.dirname(os.path.realpath(__file__))
@@ -31,5 +35,6 @@
3135
'register_logger',
3236
'train', 'cv',
3337
'LGBMModel', 'LGBMRegressor', 'LGBMClassifier', 'LGBMRanker',
38+
'DaskLGBMRegressor', 'DaskLGBMClassifier', 'DaskLGBMRanker',
3439
'print_evaluation', 'record_evaluation', 'reset_parameter', 'early_stopping',
3540
'plot_importance', 'plot_split_value_histogram', 'plot_metric', 'plot_tree', 'create_tree_digraph']

python-package/lightgbm/compat.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,3 +105,12 @@ def _check_sample_weight(sample_weight, X, dtype=None):
105105
_LGBMAssertAllFinite = None
106106
_LGBMCheckClassificationTargets = None
107107
_LGBMComputeSampleWeight = None
108+
109+
"""dask"""
110+
try:
111+
from dask import array
112+
from dask import dataframe
113+
from dask.distributed import Client
114+
DASK_INSTALLED = True
115+
except ImportError:
116+
DASK_INSTALLED = False

python-package/lightgbm/dask.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@
2121
from dask import delayed
2222
from dask.distributed import Client, default_client, get_worker, wait
2323

24-
from .basic import _ConfigAliases, _LIB, _log_warning, _safe_call
24+
from .basic import _ConfigAliases, _LIB, _log_warning, _safe_call, LightGBMError
25+
from .compat import DASK_INSTALLED, PANDAS_INSTALLED, SKLEARN_INSTALLED
2526
from .sklearn import LGBMClassifier, LGBMRegressor, LGBMRanker
2627

2728

@@ -393,6 +394,9 @@ def _predict(model, data, raw_score=False, pred_proba=False, pred_leaf=False, pr
393394

394395

395396
class _LGBMModel:
397+
def __init__(self):
398+
if not all((DASK_INSTALLED, PANDAS_INSTALLED, SKLEARN_INSTALLED)):
399+
raise LightGBMError('dask, pandas and scikit-learn are required for lightgbm.dask')
396400

397401
def _fit(self, model_factory, X, y=None, sample_weight=None, group=None, client=None, **kwargs):
398402
"""Docstring is inherited from the LGBMModel."""
@@ -431,7 +435,7 @@ def _copy_extra_params(source, dest):
431435
setattr(dest, name, attributes[name])
432436

433437

434-
class DaskLGBMClassifier(_LGBMModel, LGBMClassifier):
438+
class DaskLGBMClassifier(LGBMClassifier, _LGBMModel):
435439
"""Distributed version of lightgbm.LGBMClassifier."""
436440

437441
def fit(self, X, y=None, sample_weight=None, client=None, **kwargs):
@@ -479,7 +483,7 @@ def to_local(self):
479483
return self._to_local(LGBMClassifier)
480484

481485

482-
class DaskLGBMRegressor(_LGBMModel, LGBMRegressor):
486+
class DaskLGBMRegressor(LGBMRegressor, _LGBMModel):
483487
"""Docstring is inherited from the lightgbm.LGBMRegressor."""
484488

485489
def fit(self, X, y=None, sample_weight=None, client=None, **kwargs):
@@ -515,7 +519,7 @@ def to_local(self):
515519
return self._to_local(LGBMRegressor)
516520

517521

518-
class DaskLGBMRanker(_LGBMModel, LGBMRanker):
522+
class DaskLGBMRanker(LGBMRanker, _LGBMModel):
519523
"""Docstring is inherited from the lightgbm.LGBMRanker."""
520524

521525
def fit(self, X, y=None, sample_weight=None, init_score=None, group=None, client=None, **kwargs):

python-package/lightgbm/engine.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -334,15 +334,15 @@ def _make_n_folds(full_data, folds, nfold, params, seed, fpreproc=None, stratifi
334334
"xe_ndcg", "xe_ndcg_mart", "xendcg_mart"}
335335
for obj_alias in _ConfigAliases.get("objective")):
336336
if not SKLEARN_INSTALLED:
337-
raise LightGBMError('Scikit-learn is required for ranking cv.')
337+
raise LightGBMError('scikit-learn is required for ranking cv')
338338
# ranking task, split according to groups
339339
group_info = np.array(full_data.get_group(), dtype=np.int32, copy=False)
340340
flatted_group = np.repeat(range(len(group_info)), repeats=group_info)
341341
group_kfold = _LGBMGroupKFold(n_splits=nfold)
342342
folds = group_kfold.split(X=np.zeros(num_data), groups=flatted_group)
343343
elif stratified:
344344
if not SKLEARN_INSTALLED:
345-
raise LightGBMError('Scikit-learn is required for stratified cv.')
345+
raise LightGBMError('scikit-learn is required for stratified cv')
346346
skf = _LGBMStratifiedKFold(n_splits=nfold, shuffle=shuffle, random_state=seed)
347347
folds = skf.split(X=np.zeros(num_data), y=full_data.get_label())
348348
else:

python-package/lightgbm/sklearn.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -289,7 +289,7 @@ def __init__(self, boosting_type='gbdt', num_leaves=31, max_depth=-1,
289289
and you should group grad and hess in this way as well.
290290
"""
291291
if not SKLEARN_INSTALLED:
292-
raise LightGBMError('Scikit-learn is required for this module')
292+
raise LightGBMError('scikit-learn is required for lightgbm.sklearn')
293293

294294
self.boosting_type = boosting_type
295295
self.objective = objective

python-package/setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -344,7 +344,7 @@ def run(self):
344344
extras_require={
345345
'dask': [
346346
'dask[array]>=2.0.0',
347-
'dask[dataframe]>=2.0.0'
347+
'dask[dataframe]>=2.0.0',
348348
'dask[distributed]>=2.0.0',
349349
'pandas',
350350
],

0 commit comments

Comments
 (0)