Written Assignment 3(Total Points:40)
Requirements:
l. The written assignment will be graded based on correctness , accuracy, and clarity.
- Please prepare your answers using a word document and submit the final assignment in a pdf file through assignment link.
3. Please explicitly indicate the ids of the questions you are answering for ease of grading.
- Even if your answer is not correct , you may still get certain partial marks based on your calculation/analysis process . Please present the necessary calculation process if any.
5. Any submission of required assignments past the date they are due are subject to a grade reduction. ln particular late homework/assignments will be penalized by 10% (from the maximum points) each day and will not be accepted after 48 hours. For example , if the submission is 2 days late, 80 points (out of I00) will be the maximum score to obtain.
Problem 1. ( 20 points)
Consider following set of points. Please use SVM to train a classifier and then classify these data points.
Training data : (Points with ai=1 means this point is support vector.)
ID | ai | X1 | X2 | y |
1 | 1 | 1 | 2 | 1 |
2 | 1 | 2 | 1 | -1 |
3 | 1 | 0 | 1 | 1 |
4 | 0 | 1 | -2 | -1 |
5 | 0 | 5 | 9 | 1 |
6 | 0 | 6 | 2 | -1 |
7 | 0 | 3 | 9 | 1 |
Testing data:
ID | X1 | X2 | y |
8 | 5 | 9 | |
9 | 1 | -2 |
- (10 points)Find the decision boundary, show detail calculation process.
- (10 points)Use the decision boundary you found to classify the testing data. Show detail calculation process, including the intermediate result and the formula you used.
Problem 2. ( 20 points)
Consider the following data set for a binary class problem.
A | B | Class label |
T | F | + |
T | T | + |
T | T | + |
T | F | – |
T | T | + |
F | F | – |
F | F | – |
F | F | – |
T | T | – |
T | F | – |
- (10 points)Calculate the information gain when splitting on A and B. Which attribute would the decision tree induction algorithm choose?
- (10 points)Calculate the gain in the Gini index when splitting on A and B. Which attribute would the decision tree induction algorithm choose?