Hw#2: Decision Trees

  1. On the iris dataset (explanation, pictures), you are asked to do the following:

    • write a MATLAB program which does the following for each attribute using the training data
      • calculate entropy measure for all possible split points between attribute's minimum and maximum values
      • plot of entropy versus split values
    • decide which attribute should be selected at which split value for the root node
  2. Also download WEKA machine learning software and install. Using the training data and test data formatted for WEKA, train a C4.5 decision tree (which is named J48 under WEKA) on training set.

    • compare the split calculated by WEKA at the root node with the split you found in the first part
    • report the constructed decision tree
    • report the classification result on test set

You should submit:

  • all the the program code that does the calculation and plotting
  • the graphs
  • a report that gives your results

Submit hard copies (to homework box in ETA 11 PCLAB); do not submit any part of your homework by email/floppy/CD. This homework is due Friday, March 16, 14:00.

General policy: If you return your homework by next Monday, you get 50% of your grade. Any later submission will not receive any grade but ALL homeworks should be completed to pass the course.