SVM

Support Vector Machines inMarketing

Georgi Nalbantov

MICC, Maastricht University

2/20

Contents

Purpose

Linear Support Vector Machines

Nonlinear Support Vector Machines

(Theoretical justifications of SVM)

Marketing Examples

Conclusion and Q & A

(some extensions)

3/20

Purpose

Task to be solved (The Classification Task):Classify cases (customers) into “type 1” or “type 2” on the basis of

some known attributes (characteristics)

Chosen tool to solve this task:Support Vector Machines

4/20

The Classification Task

Given data on explanatory and explained variables, where the explained variablecan take two values {  1 }, find a function that gives the “best” separation betweenthe “-1” cases and the “+1” cases:

Given: ( x1, y1 ), … , ( xm , ym )  n  {  1 }

Find:  : n  {  1 }

“best function” = the expected error on unseen data ( xm+1, ym+1 ), … , ( xm+k , ym+k ) is minimal

Existing techniques to solve the classification task:

Linear and Quadratic Discriminant Analysis

Logit choice models (Logistic Regression)

Decision trees, Neural Networks, Least Squares SVM

5/20

Support Vector Machines: Definition

Support Vector Machines are a non-parametric tool for classification/regression

Support Vector Machines are used for prediction rather than description purposes

Support Vector Machines have been developed by Vapnik and co-workers

6/20

Number of art books purchased

∆ buyers

● non-buyers

Months since last purchase

Linear Support Vector Machines

A direct marketing company wants to sell anew book:

“The Art History of Florence”

Nissan Levin and Jacob Zahavi in Lattin,Carroll and Green (2003).

Problem: How to identify buyers and non-buyers using the two variables:

Months since last purchase

Number of art books purchased

∆

●

∆

●

∆

●

∆

7/20

∆ buyers

● non-buyers

Number of art books purchased

Months since last purchase

Main idea of SVM:separate groups by a line.

However: There are infinitely many linesthat have zero training error…

… which line shall we choose?

Linear SVM: Separable Case

∆

●

∆

●

∆

●

8/20

SVM use the idea of a margin around theseparating line.

The thinner the margin,

the more complex the model,

The best line is the one with thelargest margin.

∆ buyers

● non-buyers

Number of art books purchased

margin

Months since last purchase

Linear SVM: Separable Case

∆

●

∆

●

∆

●

9/20

The line having the largest margin is:w1x1 + w2x2 + b = 0

Where

x1 = months since last purchase

x2 = number of art books purchased

Note:

w1xi 1 + w2xi 2 + b  +1 for i  ∆

w1xj 1 + w2xj 2 + b  –1 for j  ●

Months since last purchase

Number of art books purchased

margin

Linear SVM: Separable Case

w1x1 + w2x2 + b = 1

w1x1 + w2x2 + b = 0

w1x1 + w2x2 + b = -1

∆

●

∆

●

∆

●

10/20

The width of the margin is given by:

Note:

maximize

the margin

minimize

Linear SVM: Separable Case

Months since last purchase

Number of art books purchased

w1x1 + w2x2 + b = 1

w1x1 + w2x2 + b = 0

w1x1 + w2x2 + b = -1

margin

∆

●

∆

●

∆

●

11/20

The optimization problem for SVM is:

subject to:

w1xi 1 + w2xi 2 + b  +1 for i  ∆

w1xj 1 + w2xj 2 + b  –1 for j  ●

maximize

the margin

minimize

Linear SVM: Separable Case

margin

∆

●

∆

●

∆

●

12/20

“Support vectors” are those points that lieon the boundaries of the margin

The decision surface (line) is determinedonly by the support vectors. All otherpoints are irrelevant

“Support vectors”

Linear SVM: Separable Case

∆

●

∆

●

∆

●

13/20

Non-separable case: there is no lineseparating errorlessly the two groups

Here, SVM minimize L(w,C) :

subject to:

w1xi 1 + w2xi 2 + b  +1 – i for i  ∆

w1xj 1 + w2xj 2 + b  –1 + i for j  ●

I,j  0

∆ buyers

● non-buyers

Training set: 1000 targeted customers

maximize

the margin

minimize thetraining errors

L(w,C) = Complexity + Errors

Linear SVM: Nonseparable Case

w1x1 + w2x2 + b = 1

∆

●

∆

●

∆

●

∆

14/20

C = 5

Bigger C

( thinner margin )

smaller number errors

( better fit on the data )

increased complexity

Smaller C

( wider margin )

bigger number errors

( worse fit on the data )

decreased complexity

Linear SVM: The Role of C

∆

●

∆

●

C = 1

∆

●

∆

●

∆

Vary both complexity and empirical error via C … by affecting the optimal w and optimalnumber of training errors

15/20

Mapping into a higher-dimensional space

Optimization task: minimize L(w,C)

subject to:

 ∆

 ●

Nonlinear SVM: Nonseparable Case

∆

●

∆

●

∆

●

∆

16/20

Nonlinear SVM: Nonseparable Case

 Map the data into higher-dimensional space: 2 3

(1,-1)

(1,1)

(-1,1)

(-1,-1)

∆

●

∆

●

●

∆

17/20

Nonlinear SVM: Nonseparable Case

 Find the optimal hyperplane in the transformed space

(1,-1)

(1,1)

(-1,1)

(-1,-1)

∆

●

∆

●

∆

●

18/20

Nonlinear SVM: Nonseparable Case

 Observe the decision surface in the original space (optional)

∆

●

∆

●

∆

●

19/20

Nonlinear SVM: Nonseparable Case

 Dual formulation of the (primal) SVM minimization problem

Primal

Dual

Subject to

20/20

Nonlinear SVM: Nonseparable Case

 Dual formulation of the (primal) SVM minimization problem

Dual

(kernel function)

Subject to

21/20

Nonlinear SVM: Nonseparable Case

 Dual formulation of the (primal) SVM minimization problem

Dual

Subject to

(kernel function)

22/20

Strengths of SVM:

Training is relatively easy

No local minima

It scales relatively well to high dimensional data

Trade-off between classifier complexity and error can be controlledexplicitly via C

Robustness of the results

The “curse of dimensionality” is avoided

Weaknesses of SVM:

What is the best trade-off parameter C ?

Need a good transformation of the original space

Strengths and Weaknesses of SVM

23/20

The Ketchup Marketing Problem

Two types of ketchup: Heinz and Hunts

Seven Attributes

Feature Heinz

Feature Hunts

Display Heinz

Display Hunts

Feature&Display Heinz

Feature&Display Hunts

Log price difference between Heinz and Hunts

Training Data: 2498 cases (89.11% Heinz is chosen)

Test Data: 300 cases (88.33% Heinz is chosen)

24/20

Cross-validation mean squarederrors, SVM with RBF kernel

min

max

Do (5-fold ) cross-validation procedure tofind the best combination of the manuallyadjustable parameters (here: C and σ)

The Ketchup Marketing Problem

Choose a kernel mapping:

Linear kernel

Polynomial kernel

RBF kernel

25/20

Model

Linear Discriminant Analysis

The Ketchup Marketing Problem – Training Set

Heinz

Predicted GroupMembership

Total

Hunts

Heinz

Hit Rate

Original

Count

Hunts

204

272

89.51%

Heinz

2168

2226

Hunts

25.00%

75.00%

100.00%

Heinz

2.61%

97.39%

100.00%

26/20

Model

Logit Choice Model

The Ketchup Marketing Problem – Training Set

Heinz

Predicted GroupMembership

Total

Hunts

Heinz

Hit Rate

Original

Count

Hunts

214

272

77.79%

Heinz

497

1729

2226

Hunts

78.68%

21.32%

100.00%

Heinz

22.33%

77.67%

100.00%

27/20

Model

Support Vector Machines

The Ketchup Marketing Problem – Training Set

Heinz

Predicted GroupMembership

Total

Hunts

Heinz

Hit Rate

Original

Count

Hunts

255

272

99.08%

Heinz

2220

2226

Hunts

93.75%

6.25%

100.00%

Heinz

0.27%

99.73%

100.00%

28/20

Model

Majority Voting

The Ketchup Marketing Problem – Training Set

Heinz

Predicted GroupMembership

Total

Hunts

Heinz

Hit Rate

Original

Count

Hunts

272

89.11%

Heinz

2226

Hunts

100%

100.00%

Heinz

100%

100.00%

29/20

Model

Linear Discriminant Analysis

The Ketchup Marketing Problem – Test Set

Heinz

Predicted GroupMembership

Total

Hunts

Heinz

Hit Rate

Original

Count

Hunts

88.33%

Heinz

262

265

Hunts

8.57%

91.43%

100.00%

Heinz

1.13%

98.87%

100.00%

30/20

Model

Logit Choice Model

The Ketchup Marketing Problem – Test Set

Heinz

Predicted GroupMembership

Total

Hunts

Heinz

Hit Rate

Original

Count

Hunts

77%

Heinz

202

265

Hunts

82.86%

17.14%

100.00%

Heinz

23.77%

76.23%

100.00%

31/20

Model

Support Vector Machines

The Ketchup Marketing Problem – Test Set

Heinz

Predicted GroupMembership

Total

Hunts

Heinz

Hit Rate

Original

Count

Hunts

95.67%

Heinz

262

265

Hunts

71.43%

28.57%

100.00%

Heinz

1.13%

98.87%

100.00%

32/20

Conclusion

Support Vector Machines (SVM) can be applied in the binaryand multi-class classification problems

SVM behave robustly in multivariate problems

Further research in various Marketing areas is needed to justifyor refute the applicability of SVM

Support Vector Regressions (SVR) can also be applied

http://www.kernel-machines.org

Email: nalbantov@few.eur.nl