You should only work on homework by yourself. This assignment is neither a paired-up lab nor a group project.
Please download and unpack this tar file:
There are three files in this tar file:
Please run the algorithm of decision tree we discussed in class to build a tree to classify blue and red points in two pictures respectively below:
Blue points are labeled as C1
in csv file and -1
in the code. Red points are labeled as C2
in csv file and 1
in the code.
You should complete the step 1 of assignment 7 at first and copy get_lines() and calculate_line() to assignment 8 with little modification. For example, in calculate_line():
(1) using X and Y instead of self.X and self.Y
(2) remove self.rule and only use Gini index to calculate impurity
In the train() method, the code call self.split() to recursively build the tree from root. Your main job in this assignment is to complete self.split() only.
Here is the pseudocode for split(node):
Find the best line for this node. # You should copy this part of code from Bestline.py
Update the split_point of current node as the best line
Create a mask to split the data of the node.
Use mask and ~mask to initialize two Nodes using X, Y, and depth+1 as left child and right child
Call self.split() on left or right child if:
(1) get_number_of_points() of this child is larger than self.min_samples_split
(2) depth of this child is less than self.max_depth
(3) get_impurity() of this child is not 0 (not all points in this child belong to same color)
There are three parameters passed in the command to control your code:
data
sample
depth
You are expected to generate pictures and texts using corresponding commands as following:
python DecisionTree.py -data 1
x > 0.240
| x > 0.374
| | x > 0.521
| | | y > 0.534 -> Class: red
| | | y < 0.534 -> Class: red
| | x < 0.521 -> Class: blue
| x < 0.374 -> Class: red
x < 0.240 -> Class: blue
python DecisionTree.py -data 1 -sample 3 -depth 2
x > 0.240
| x > 0.374 -> Class: red
| x < 0.374 -> Class: red
x < 0.240 -> Class: blue
python DecisionTree.py -data 2
y > 0.605 -> Class: blue
y < 0.605
| y > 0.249
| | x > 0.689 -> Class: red
| | x < 0.689
| | | x > 0.221
| | | | x > 0.558
| | | | | y > 0.477 -> Class: red
| | | | | y < 0.477 -> Class: red
| | | | x < 0.558
| | | | | x > 0.264 -> Class: blue
| | | | | x < 0.264
| | | | | | x > 0.243 -> Class: blue
| | | | | | x < 0.243 -> Class: blue
| | | x < 0.221 -> Class: blue
| y < 0.249
| | y > 0.210
| | | y > 0.215 -> Class: red
| | | y < 0.215 -> Class: red
| | y < 0.210 -> Class: red
python DecisionTree.py -data 2 -sample 20 -depth 10
y > 0.605 -> Class: blue
y < 0.605
| y > 0.249
| | x > 0.689 -> Class: red
| | x < 0.689
| | | x > 0.221 -> Class: blue
| | | x < 0.221 -> Class: blue
| y < 0.249 -> Class: red
You need to submit these files to the dropbox on Canvas:
Please DO NOT change the names of files you downloaded.