Decision tree vs. Naive Bayes classifier [closed]

Decision Trees are very flexible, easy to understand, and easy to debug. They will work with classification problems and regression problems. So if you are trying to predict a categorical value like (red, green, up, down) or if you are trying to predict a continuous value like 2.9, 3.4 etc Decision Trees will handle both … Read more

Plot Interactive Decision Tree in Jupyter Notebook

Updated Answer with collapsible graph using d3js in Jupyter Notebook Start of 1st cell in notebook %%html <div id=”d3-example”></div> <style> .node circle { cursor: pointer; stroke: #3182bd; stroke-width: 1.5px; } .node text { font: 10px sans-serif; pointer-events: none; text-anchor: middle; } line.link { fill: none; stroke: #9ecae1; stroke-width: 1.5px; } </style> End of 1st cell … Read more

Different decision tree algorithms with comparison of complexity or performance

Decision Tree implementations differ primarily along these axes: the splitting criterion (i.e., how “variance” is calculated) whether it builds models for regression (continuous variables, e.g., a score) as well as classification (discrete variables, e.g., a class label) technique to eliminate/reduce over-fitting whether it can handle incomplete data The major Decision Tree implementations are: ID3, or … Read more

How do I find which attributes my tree splits on, when using scikit-learn?

Directly from the documentation ( http://scikit-learn.org/0.12/modules/tree.html ): from io import StringIO out = StringIO() out = tree.export_graphviz(clf, out_file=out) StringIO module is no longer supported in Python3, instead import io module. There is also the tree_ attribute in your decision tree object, which allows the direct access to the whole structure. And you can simply read … Read more

Visualizing decision tree in scikit-learn

Here is one liner for those who are using jupyter and sklearn(18.2+) You don’t even need matplotlib for that. Only requirement is graphviz pip install graphviz than run (according to code in question X is a pandas DataFrame) from graphviz import Source from sklearn import tree Source( tree.export_graphviz(dtreg, out_file=None, feature_names=X.columns)) This will display it in … Read more

How do I solve overfitting in random forest of Python sklearn?

I would agree with @Falcon w.r.t. the dataset size. It’s likely that the main problem is the small size of the dataset. If possible, the best thing you can do is get more data, the more data (generally) the less likely it is to overfit, as random patterns that appear predictive start to get drowned … Read more

What does `sample_weight` do to the way a `DecisionTreeClassifier` works in sklearn?

Some quick preliminaries: Let’s say we have a classification problem with K classes. In a region of feature space represented by the node of a decision tree, recall that the “impurity” of the region is measured by quantifying the inhomogeneity, using the probability of the class in that region. Normally, we estimate: Pr(Class=k) = #(examples … Read more

Passing categorical data to Sklearn Decision Tree

(This is just a reformat of my comment above from 2016…it still holds true.) The accepted answer for this question is misleading. As it stands, sklearn decision trees do not handle categorical data – see issue #5442. The recommended approach of using Label Encoding converts to integers which the DecisionTreeClassifier() will treat as numeric. If … Read more

How to extract the decision rules from scikit-learn decision-tree?

I believe that this answer is more correct than the other answers here: from sklearn.tree import _tree def tree_to_code(tree, feature_names): tree_ = tree.tree_ feature_name = [ feature_names[i] if i != _tree.TREE_UNDEFINED else “undefined!” for i in tree_.feature ] print “def tree({}):”.format(“, “.join(feature_names)) def recurse(node, depth): indent = ” ” * depth if tree_.feature[node] != _tree.TREE_UNDEFINED: … Read more