Multimedia Interfaces

•What is a multimedia interface

–Most anything where users do not just interactwith text

–E.g., audio, speech, images, faces, video, sensordata, …

Working with Multimedia

•Symbolic vs. non-symbolic content

–How can users search and browse for the content theyneed?

–What is represented and what is not?

–Important that interface design be appropriate to theparticular content processing techniques

•Static vs. dynamic content

–How can users locate particular states within a pieceof content?

–Need visualizations that enable state/segment-basedindexing and visualization

General Audio

•Mapping audio cues to events

–Recognizing sounds related to particular events(e.g. gunshot, falling, scream)

•Mapping events to audio cues

–Audio debugger to speed up stepping throughcode

•Spatialized audio

–Provides additional geographic/navigationalchannel

–Example: Michael Joyce’s Interactive Central Park

Spatialized Audio

•Spatialized audio is easier when assumingheadphones because of control

•Head-related transfer function (HRTF)

–Difference in timing and signal strength determinehow we identify position of sound

•Beamforming

–Timing for constructive interference to create strongersignal at desired location

•Crosstalk Cancellation

–Destructive interference to remove parts of signal atdesired location

Audio Signal Analysis

•Fast Fourier Transform (FFT) and Discrete WaveletTransform (DWT)

–Transforms commonly used on audio signals

–Allow for analysis of frequency features across time(e.g. power contained in a frequency interval)

–FFTs have equal sized windows where wavelets canvary based on frequency

•Mel-frequency cepstral coeffients (MFCC)

–Based on FFTs

–Maps results into bands approximating humanauditory system

Speech

•Speaker segmentation

–Identify when a change in speaker occurs

–Useful for basic indexing or summarization ofspeech content

•Speaker identification

–Identify who is speaking during a segment

–Enables search (and other features) based onspeaker

•Speech recognition

–Identify the content of speech

Speech Recognition

•Start by segmenting utterances andcharacterizing phonemes

–Use gaps to segment

–Group segments into words

•Limited vocabulary of commands

–Classifiers for limited vocabulary (HMMs)

•Continuous speech

–Language models for disambiguation

–Speaker dependent or not

Music

•Music processing can support a variety of activities

•Composition

–From traditional to interactive

•Selection

–Example: iTunes, Pandora,

–Use for shared spaces

•Playback

–Interactive playback, social playback

•Management & Summarization

–Example: MusicWiz

•Games

–Guitar Hero, Rockband, etc.

MobiLenin

•Enable interaction withmusic in a public space

–Not karaoke

•Voting like in manypub/bar games

•Audience can affectwhich version of musicand video is shown

MusicWiz

MetadataModuleMetadataModule

AudioSignalModuleAudioSignalModule

LyricsModuleLyricsModule

Worksp.Express.ModuleWorksp.Express.Module

ArtistModuleArtistModule

RelatednessTableRelatednessTable

Inference EngineInference Engine

WorkspaceStatus

RelatedSong Titles

MusicCollectionMusicCollection

Songs &Metadata

Songs

MusicWiz InterfaceMusicWiz Interface

Lyrics

Statistics of ArtistSimilarity

InternetInternet

RelatednessAssessmentRelatednessAssessment

Sim. Values

•Music managementenvironment that combines:

–explicit information

–implicit information

–non-verbal expression ofpersonal interpretation

•Two basic components:

–interface for interacting withthe music collection

–inference engine for assessingmusic relatedness

Image Processing: Color

•Color histograms – howmuch of each color is inimage

–Probability of a pixel in theimage being a particularcolor

•Color correlograms – howclose colors are to eachother in image

–Probability of finding a pixelof a particular color at aspecific distance from a pixelof a known color

http://www.mathworks.com/matlabcentral/fileexchange/screenshots/4423/original.jpg

https://encrypted-tbn0.google.com/images?q=tbn:ANd9GcRzyLunbpFik-hZwz8vryQZTZ0GEAhlaB_mbTifqAPbd7NGhxSz

Image/Video Processing: Subdividing

•Region subdivision

–Sometimes we subdivideimages into regions

–Spread observed featuresat edges for morecontinuous model

•Temporal subdivision

–Video is subdivided intosegments

–Spread features intoneighboring segments

https://www.cs.auckland.ac.nz/courses/compsci708s1c/lectures/Glect-html/t731-corridor-16reg.JPG

Image Processing: ForegroundBackground Separation

•Background Modeling

–Convert to greyscale

–Dynamic model (to cope with changes in signerbody position and lighting)

•BPt = .96 * BP(t-1) + .04 P

•Foreground object detection

–Pixels different from background model by morethan a threshold are foreground pixels

–Spatial filter removes regions of foreground pixelssmaller than a minimum threshold

•Face location to determine position offoreground relative to the face

•Videos without a single main face are notconsidered as potential SL videos

Image showing different stages of video processing. The first shows the frame from the video with a box indicating the results of face detection. The second shows the background model generated for the video at this point. The third shows the intermediate foreground, which is computed as the difference between the current frame and the background model. The fourth frame shows the final foreground which has removed portions of the foreground too small to be hands and/or arms.

Image Processing: Other Features

•Edge detection

–Sobel filter

•Object and Face detection

–Skintone models

•Face recognition

•Open Source Computer Vision(OpenCV)

http://upload.wikimedia.org/wikipedia/en/thumb/8/8e/EdgeDetectionMathematica.png/500px-EdgeDetectionMathematica.png

http://www.fujifilmusa.com/products/digital_cameras/f/finepix_f100fd/features/img/page_03/pic_05.jpg

https://encrypted-tbn3.google.com/images?q=tbn:ANd9GcQNizycmmXAhSo8py7CPuL_mVYSxtdYq78X-AHIYu8M8q02ZSqS

MediaGLOW: Interpreting User Action

•Evolving Notion of Similarity viaUser Expression

–Photos presented in a graph-basedworkspace with “springs” betweeneach pair of photos.

–Lengths of springs is initially based on adefault distance metric based on theirtime, location, tags, or visual features.

–Users can pin photos in place andcreate piles of photos.

–Distance metric to piles change as newmembers are added, resulting in thedynamic layout of unpinned photos inthe workspace.

$C:\Users\shipman\Desktop\IUI MediaGLOW\tooltip.tif$

$C:\Users\shipman\Desktop\IUI MediaGLOW\photo5.tif$

$C:\Users\shipman\Desktop\IUI MediaGLOW\get-similar.tif$

DOTS: Supporting Use of Surveillance Video

•The problem

–Number and size ofsurveillance systemsare increasing buthuman attention islimiting factor

•Approach

–Provide summariesof action

–Build interfacesknowing limits ofautomation

DOTS: The Main Interface

•Components

–Rotating camerabank withactivity graphs

–Mixed-initiativemain viewer

–Map withtracking data

–Timeline withautomaticevents

DOTS: Tracking Layout

•Difficulty in tracking isthat camera views areoften similar

•Tracking layout placescameras around the mainviewer to aid tracking

•Study showed significantimprovement in trackingsuccess over traditionalviewer

•In either layout, map canbe used to find activitynear a location and time.

HyperHitchcock: Interactive Video

•Issue

–Vision: Seamlessly interactwith charactersin the show

–Reality: Difficult to authoreven simpleinteractive videos

•Today, video is includedwithin pages of contentbut links between playingvideos are not common.

Support for Hypervideo Authoring

•Links in video can lead to other video segments

–Short main video with branches providing additional detail

–Hyperlinks to branches just like in Web pages

–Making of a scene in a movie, biography of an actor,different camera angle

•General hypervideo difficult to author

–Simple hypervideo format with only a single active link

•Novel approach: use automatic video analysis, createan easy-to-use interface, and support simplehypervideo format

Hierarchical Video with Links

•Video sequences are represented as a containment hierarchyof video elements

–Elements are video clips or composites grouping other video elements

–Elements are played in sequence

•Each element can be link anchor or link destination

•Anchor for innermost element is available while element isplaying

•After link destinationvideo is played,play-back continuesat the link anchor

Detail-on-demand Links

•Any video clip or composite can be link anchor or link destination

•Optional link offsets into destination

•Links have labels

•Link return behaviors control the purpose of the link

–Play from where the viewer left the video

–Play from the end of the source anchor sequence

–Play from beginning of thesource anchor sequence

–Stop playback

•Different behaviors fordestination completion oraborted playback

Hyper-Hitchcock Editor

•Hyper-Hitchcock evolved from Hitchcock video editor

•Video clips grouped in piles by similarity (e.g., recording time)

•Workspace to arrange clips

–Resize keyframes to trim clips

–Clips ordered as horizontalor vertical lists

–Place links between clips

–Group clips into composites

•Tree view to visualizecontainment hierarchy ofcomposites

Trimming Clips in the Workspace

•Best five seconds of clip selected by default

•Resizing keyframe changes length of clip

–Picks the best portion around initial five-second portion

–Start and end can jump to sentence boundary silence

•Clip start and/or end can be locked in timeline

•Locked ends can be dragged

•Audio energyvisualized in timelineto spot words andsentences

Attaching Links to Clips and Composites

•Link anchors and destinationscan be clips, composites, orelements inside composites

•Color-coding and positionindicates link attachment inworkspace

Links in and out of composite

Blue: attached to compositeRed: attached to elementDashed: between compositeand element

Hypervideo Player

•Video player with controls for following and returningfrom links

•Several improvements based on user feedback

–First version indicated linksin timeline and showed thelabel for the active link

–Next version showedlabels in timeline

–Current version includeskeyframes for active linkand for link history

•User study suggestsfurther improvements

Today’s Topics

•General Audio

–Audio cues, spatialized audio

•Speech

–Segmentation, speaker id, recognition

•Music

–Interactive music, summarization, organization

•Image and video processing

–Color-oriented representations

–Region and temporal segmentation

–Foreground-background separation

–Edge and face detection

•Image and video applications

–MediaGlow – image selection

–DOTS – surveillance

–HyperHitchcock – interactive video