Slide 1

Semantic HistoryEmbedding in OnlineGenerative Topic Models

Pu Wang (presenter)

Authors:

Loulwah AlSumait (lalsumai@gmu.edu)

Daniel Barbará (dbarbara@gmu.edu)

Carlotta Domeniconi (carlotta@cs.gmu.edu)

Department of Computer Science

George Mason University

SDM 2009

Outline

Introduction and related work

Online LDA (OLDA)

Parameter Generation

Sliding history window

Contribution weights

Experiments

Conclusion and future work

Introduction

When a topic is observed at a certain time, it ismore likely to appear in the future

previously discovered topics hold importantinformation about the underlying structure ofdata

Incorporating such information in futureknowledge discovery can enhance the inferredtopics

Related Work

Q. Sun, R. Li et al. ACL 2008.

LDA-based Fisher kernel to measure the text semanticsimilarity between blocks of LDA documents

X. Wang et al. ICDM 2007

Topical N-Gram model that automatically identifiedfeasible N-grams based on the context that surround it

X. Phan et al. IW3C2 2008.

a classifier on both a small set of labeled documents inaddition to an LDA topic model estimated fromWikipedia.

TrackingTopics

M t

zti

wti

 t

 t

 t

 t

Time

(time between t & t+1 = ε)

Topic Evolution Tracking

PriorsConstruction

EmergingTopicDetection

t

t

t+1

Mt+1

zit+1

wit+1

t+1

t+1

 t+1

t+1

S t+ 1

EmergingTopic List

t+1

t+1

t+1

t+1

Online LDA (OLDA)

Inference Process

Current

stream

Historic

observations

 Parameter Generation

 Simple inference problem

 Gibbs Sampling

Current

stream

Historic

observations

Topic Evolution Tracking

Topic alignment over time

Handles changes in lexicon, topic drift

Topic 1 (0.65)

Bank (0.44),money (0.35),loan (0.21)

Topic 2 (0.35)

Factory (0.53),production (0.34),labor (0.13)

Topic 1 (0.43)

Bank (0.5),credit (0.32),money (0.18)

Topic 2 (0.57)

Factory (0.48),cost (0.32),manufacturing (0.2)

Time

t+1

P(topic)

P(word|topic)

Aligned topicsover time

Sliding History Window

Consider all topic-word distributions within a“sliding history window” (δ)

Alternatives for keeping track of history at time t

full memory, δ= t

short memory, δ=1

Intermediate memory, δ= c

Matrix Evolution Matrix

Dictionary

Topic distribution over time

Contribution Control

Evolution Tuning Parameters ω

Individual weights of models

Decaying history: ω1 < ω2<…< ωδ

Equal contributions: ω1 = ω2=…= ωδ

Total weight of history (vs. weight of new observations)

Balanced weights (sum=1)

 Biased toward the past (sum>1)

Biased toward the future (sum<1)

Parameter Generation

Priors of Topic distribution over words at time t+1

Generate topic distribution

Experimental Design

“Matlab Topic Modeling Toolbox”, by Mark Steyvers andTom Griffiths

Datasets:

NIPS

Proceedings from 1988-2000

1,740 papers, 13,649 unique words, 2,301,375 word tokens

13 streams, size from 90 to 250 doc’s per stream

Reuters-21578

News from 26-FEB-1987 to 19-OCT-1987

10,337 documents; 12,112 unique words; 793,936 word tokens

30 streams (29/340 doc’s, 1/517 doc’s)

Baselines:

OLDAfixed: no memory

OLDA (ω(1) ): short memory

Performance Evaluation

measure: Perplexity

Test set: documents of next year or stream

ReutersOLDA with fixed β vs. OLDA with semantic β

Nomemory

ReutersOLDA with different window size and weights

• Increasingwindow sizeenhancedprediction

• Incrementalhistoryinformation(δ>1,sum>1)did notimprove topicestimation atall

Increase window size

shortmemory

Equalcontribution

Incremental HistoryInformation

NIPSOLDA with Different Window

Nomemory

Shortmemory

• Increasingwindow sizeenhancedprediction w.r.t.short memory

• Window sizegreater than 3enhancedprediction

• Effect oftotal weight

•

NIPSOLDA with Different Total Weight

Nomemory

Sum ofweight = 1

Decreasesum ofweights

Modelswith lowertotal weightresulted inbetterprediction

NIPS & ReutersOLDA with Different Total Weight

• Variablesum(ω)

• δ = 2

Decrease totalsum of weights

Increase totalsum of weights

NIPSOLDA with Equal vs Decaying History Contribution

Conclusions

the effect of embedding semantic information in LDAtopic modeling of text streams

Parameter generation based on topical structuresinferred in the past

Semantic embedding enhances OLDA prediction

Effect of

Total influence of history,

History window size, and

Equal or decaying contributions

Future work

use of prior-knowledge

effect of embedded historic semantics on detecting emergingand/or periodic topics