CMPT 311 Machine Learning

Spring 2022


Assignment 8 - Decision Tree

100 points

Due Apr 11th, 2022, 10 AM

How Homework are Managed and Graded

You should only work on homework by yourself. This assignment is neither a paired-up lab nor a group project.

Code

Please download and unpack this tar file:

hw8.tar

There are three files in this tar file:

Instructions

Please run the algorithm of decision tree we discussed in class to build a tree to classify blue and red points in two pictures respectively below:

Blue points are labeled as C1 in csv file and -1 in the code. Red points are labeled as C2 in csv file and 1 in the code.

You should complete the step 1 of assignment 7 at first and copy get_lines() and calculate_line() to assignment 8 with little modification. For example, in calculate_line():

(1) using X and Y instead of self.X and self.Y
(2) remove self.rule and only use Gini index to calculate impurity 

In the train() method, the code call self.split() to recursively build the tree from root. Your main job in this assignment is to complete self.split() only.

Here is the pseudocode for split(node):

Find the best line for this node. # You should copy this part of code from Bestline.py
Update the split_point of current node as the best line

Create a mask to split the data of the node. 
Use mask and ~mask to initialize two Nodes using X, Y, and depth+1 as left child and right child 
Call self.split() on left or right child if:
  (1) get_number_of_points() of this child is larger than self.min_samples_split
  (2) depth of this child is less than self.max_depth
  (3) get_impurity() of this child is not 0 (not all points in this child belong to same color)

There are three parameters passed in the command to control your code:

  1. data

    • required.
    • self.index in the code
    • the index of the dataset
  2. sample

    • optional with default value as 1
    • self.min_samples_split in the code
    • minimal number of samples in a node that allows for further splitting
  3. depth

    • optional with default value as np.inf
    • self.max_depth in the code
    • maximal depth of a node that allows for further splitting

Tasks

You are expected to generate pictures and texts using corresponding commands as following:

python DecisionTree.py -data 1

x > 0.240
| x > 0.374
| | x > 0.521
| | | y > 0.534 -> Class: red
| | | y < 0.534 -> Class: red
| | x < 0.521 -> Class: blue
| x < 0.374 -> Class: red
x < 0.240 -> Class: blue

python DecisionTree.py -data 1 -sample 3 -depth 2

x > 0.240
| x > 0.374 -> Class: red
| x < 0.374 -> Class: red
x < 0.240 -> Class: blue

python DecisionTree.py -data 2

y > 0.605 -> Class: blue
y < 0.605
| y > 0.249
| | x > 0.689 -> Class: red
| | x < 0.689
| | | x > 0.221
| | | | x > 0.558
| | | | | y > 0.477 -> Class: red
| | | | | y < 0.477 -> Class: red
| | | | x < 0.558
| | | | | x > 0.264 -> Class: blue
| | | | | x < 0.264
| | | | | | x > 0.243 -> Class: blue
| | | | | | x < 0.243 -> Class: blue
| | | x < 0.221 -> Class: blue
| y < 0.249
| | y > 0.210
| | | y > 0.215 -> Class: red
| | | y < 0.215 -> Class: red
| | y < 0.210 -> Class: red

python DecisionTree.py -data 2 -sample 20 -depth 10

y > 0.605 -> Class: blue
y < 0.605
| y > 0.249
| | x > 0.689 -> Class: red
| | x < 0.689
| | | x > 0.221 -> Class: blue
| | | x < 0.221 -> Class: blue
| y < 0.249 -> Class: red

What to Submit

You need to submit these files to the dropbox on Canvas:

Please DO NOT change the names of files you downloaded.