lec9mocap

CSC2515 Lecture 10 Part 2 Making time-series modelswith RBM’s

Time series models

•Inference is difficult in directed models of timeseries if we use non-linear distributedrepresentations in the hidden units.

–It is hard to fit Dynamic Bayes Nets to high-dimensional sequences (e.g motion capturedata).

•So people tend to avoid distributedrepresentations and use much weaker methods(e.g. HMM’s).

Time series models

•If we really need distributed representations (which wenearly always do), we can make inference much simplerby using three tricks:

–Use an RBM for the interactions between hidden andvisible variables. This ensures that the main source ofinformation wants the posterior to be factorial.

–Model short-range temporal information by allowingseveral previous frames to provide input to the hiddenunits and to the visible units.

•This leads to a temporal module that can be stacked

–So we can use greedy learning to learn deep modelsof temporal structure.

The conditional RBM model(Sutskever & Hinton 2007)

•Given the data and the previous hiddenstate, the hidden units at time t areconditionally independent.

–So online inference is very easy

•Learning can be done by usingcontrastive divergence.

–Reconstruct the data at time t fromthe inferred states of the hidden unitsand the earlier states of the visibles.

–The temporal connections can belearned as if they were additionalbiases

t-2 t-1 t

Why the autoregressive connections do not causeproblems

•The autoregressive connections do not mess upcontrastive divergence learning because:

–We know the initial state of the visible units, so weknow the initial effect of the autoregressiveconnections.

–It is not necessary for the reconstructions to be atequilibrium with the hidden units.

–The important thing for contrastive divergence is toensure the hiddens are in equilibrium with the visibleswhenever statistics are measured.

Generating from a learned model

•The inputs from the earlier statesof the visible units createdynamic biases for the hiddenand current visible units.

•Perform alternating Gibbssampling for a few iterationsbetween the hidden units and thecurrent visible units.

–This picks new hidden andvisible states that arecompatible with each otherand with the recent history.

t-2 t-1 t

Stacking temporal RBM’s

•Treat the hidden activities of the first levelTRBM as the data for the second-levelTRBM.

–So when we learn the second level, weget connections across time in the firsthidden layer.

•After greedy learning, we can generate fromthe composite model

–First, generate from the top-level modelby using alternating Gibbs samplingbetween the current hiddens andvisibles of the top-level model, using thedynamic biases created by the previoustop-level visibles.

–Then do a single top-down pass throughthe lower layers, but using theautoregressive inputs coming fromearlier states of each layer.

An application to modelingmotion capture data(Taylor, Roweis & Hinton, 2007)

•Human motion can be captured by placingreflective markers on the joints and then usinglots of infrared cameras to track the 3-Dpositions of the markers.

•Given a skeletal model, the 3-D positions of themarkers can be converted into the joint anglesplus 6 parameters that describe the 3-D positionand the roll, pitch and yaw of the pelvis.

–We only represent changes in yaw because physicsdoesn’t care about its value and we want to avoidcircular variables.

Modeling multiple types of motion

•We can easily learn tomodel many styles ofwalking in a single model.

–This means we canshare a lot ofknowledge.

–It should also make itmuch easier to learnnice transitions betweenstyles.

•Because we can do onlineinference (slightlyincorrectly), we can fill inmissing markers in realtime.

t-2 t-1 t

stylelabel

Show Graham Taylor’s moviesavailable at www.cs.toronto/~gwtaylor