7. Decision Trees and Decision Rules

Intelligent Database Systems Lab

國立雲林科技大學

National Yunlin University of Science and Technology

GMDH-based feature ranking andselection for improvedclassification of medical data

Advisor : Dr. Hsu

Presenter : Yu-San Hsieh

Author : R.E. Abdel-Aal

2005. BI.456-468

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Motivation

Objective

Method

Material

Results

Conclusions

Outline

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Motivation

Accuracy is very important in classifiers usedfor medical application.

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Objective

Improved classification performance ofmedical data.

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Method

First stage – ranked feature

─GMDH algorithm

Zm(m-1)/2

1. representation

2.Selection and stopping

An increasing rmin：model becoming complex,

1.Overfitting the estimation data

2.Performing poorly on the new selection data.

Iteration

Square error

r12

rm(m-1)2

rmin

r22

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Method

First stage – ranked feature

─AIM abductive network

2.Selection and stopping

1.repesentation

First stage – ranked feature

─AIM abductive network

2.Selection and stopping

Avoid overfitting

Using CPM control

1.CPM>1,simpler model that are less accurate but generalize.

2.CPM<1,complex model, overfit training data and decrease actual prediction performance.

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Method

Second stage – selected feature

─Selected k, performance on an evaluation dataset wouldfirst improve and starts to deteriorate due to the modeloverfitting the training data.

─A compact m-feature subset can be obtained by takingthe first m features starting from top of the ranking list.Ex: ranking list{2,6,7,8,1,5,3,4,9}, selected 6-features is{2,6,7,8,1,5}.

─The optimum subset of features is determined byrepeatedly forming subset of k features, starting from thetop of the ranking list.Ex: ranking list{2,6,7,8,1,5,3,4,9},{2,6,7,8,1,5},{6,7,8,1,5,3}…中選出最佳的subset