Multimedia Interfaces
What is a multimedia interface
Most anything where users do not just interactwith text
E.g., audio, speech, images, faces, video, sensordata, …
Working with Multimedia
Symbolic vs. non-symbolic content
How can users search and browse for the content theyneed?
What is represented and what is not?
Important that interface design be appropriate to theparticular content processing techniques
Static vs. dynamic content
How can users locate particular states within a pieceof content?
Need visualizations that enable state/segment-basedindexing and visualization
General Audio
Mapping audio cues to events
Recognizing sounds related to particular events(e.g. gunshot, falling, scream)
Mapping events to audio cues
Audio debugger to speed up stepping throughcode
Spatialized audio
Provides additional geographic/navigationalchannel
Example: Michael Joyce’s Interactive Central Park
Spatialized Audio
Spatialized audio is easier when assumingheadphones because of control
Head-related transfer function (HRTF)
Difference in timing and signal strength determinehow we identify position of sound
Beamforming
Timing for constructive interference to create strongersignal at desired location
Crosstalk Cancellation
Destructive interference to remove parts of signal atdesired location
Audio Signal Analysis
Fast Fourier Transform (FFT) and Discrete WaveletTransform (DWT)
Transforms commonly used on audio signals
Allow for analysis of frequency features across time(e.g. power contained in a frequency interval)
FFTs have equal sized windows where wavelets canvary based on frequency
Mel-frequency cepstral coeffients (MFCC)
Based on FFTs
Maps results into bands approximating humanauditory system
Speech
Speaker segmentation
Identify when a change in speaker occurs
Useful for basic indexing or summarization ofspeech content
Speaker identification
Identify who is speaking during a segment
Enables search (and other features) based onspeaker
Speech recognition
Identify the content of speech
Speech Recognition
Start by segmenting utterances andcharacterizing phonemes
Use gaps to segment
Group segments into words
Limited vocabulary of commands
Classifiers for limited vocabulary (HMMs)
Continuous speech
Language models for disambiguation
Speaker dependent or not
Music
Music processing can support a variety of activities
Composition
From traditional to interactive
Selection
Example: iTunes, Pandora,
Use for shared spaces
Playback
Interactive playback, social playback
Management & Summarization
Example: MusicWiz
Games
Guitar Hero, Rockband, etc.
MobiLenin
Enable interaction withmusic in a public space
Not karaoke
Voting like in manypub/bar games
Audience can affectwhich version of musicand video is shown
MusicWiz
MetadataModuleMetadataModule
AudioSignalModuleAudioSignalModule
LyricsModuleLyricsModule
Worksp.Express.ModuleWorksp.Express.Module
ArtistModuleArtistModule
RelatednessTableRelatednessTable
Inference EngineInference Engine
WorkspaceStatus
RelatedSong Titles
MusicCollectionMusicCollection
Songs &Metadata
Songs
MusicWiz InterfaceMusicWiz Interface
Lyrics
Statistics of ArtistSimilarity
InternetInternet
RelatednessAssessmentRelatednessAssessment
Sim. Values
Music managementenvironment that combines:
explicit information
implicit information
non-verbal expression ofpersonal interpretation
Two basic components:
interface for interacting withthe music collection
inference engine for assessingmusic relatedness
Image Processing: Color
Color histograms – howmuch of each color is inimage
Probability of a pixel in theimage being a particularcolor
Color correlograms – howclose colors are to eachother in image
Probability of finding a pixelof a particular color at aspecific distance from a pixelof a known color
http://www.mathworks.com/matlabcentral/fileexchange/screenshots/4423/original.jpg
https://encrypted-tbn0.google.com/images?q=tbn:ANd9GcRzyLunbpFik-hZwz8vryQZTZ0GEAhlaB_mbTifqAPbd7NGhxSz
Image/Video Processing: Subdividing
Region subdivision
Sometimes we subdivideimages into regions
Spread observed featuresat edges for morecontinuous model
Temporal subdivision
Video is subdivided intosegments
Spread features intoneighboring segments
https://www.cs.auckland.ac.nz/courses/compsci708s1c/lectures/Glect-html/t731-corridor-16reg.JPG
Image Processing: ForegroundBackground Separation
Background Modeling
Convert to greyscale
Dynamic model (to cope with changes in signerbody position and lighting)
BPt = .96 * BP(t-1) + .04 P
Foreground object detection
Pixels different from background model by morethan a threshold are foreground pixels
Spatial filter removes regions of foreground pixelssmaller than a minimum threshold
Face location to determine position offoreground relative to the face
Videos without a single main face are notconsidered as potential SL videos
13
Image showing different stages of video processing. The first shows the frame from the video with a box indicating the results of face detection. The second shows the background model generated for the video at this point. The third shows the intermediate foreground, which is computed as the difference between the current frame and the background model. The fourth frame shows the final foreground which has removed portions of the foreground too small to be hands and/or arms.
Image showing different stages of video processing. The first shows the frame from the video with a box indicating the results of face detection. The second shows the background model generated for the video at this point. The third shows the intermediate foreground, which is computed as the difference between the current frame and the background model. The fourth frame shows the final foreground which has removed portions of the foreground too small to be hands and/or arms.
Image showing different stages of video processing. The first shows the frame from the video with a box indicating the results of face detection. The second shows the background model generated for the video at this point. The third shows the intermediate foreground, which is computed as the difference between the current frame and the background model. The fourth frame shows the final foreground which has removed portions of the foreground too small to be hands and/or arms.
Image Processing: Other Features
Edge detection
Sobel filter
Object and Face detection
Skintone models
Face recognition
Open Source Computer Vision(OpenCV)
http://upload.wikimedia.org/wikipedia/en/thumb/8/8e/EdgeDetectionMathematica.png/500px-EdgeDetectionMathematica.png
http://www.fujifilmusa.com/products/digital_cameras/f/finepix_f100fd/features/img/page_03/pic_05.jpg
https://encrypted-tbn3.google.com/images?q=tbn:ANd9GcQNizycmmXAhSo8py7CPuL_mVYSxtdYq78X-AHIYu8M8q02ZSqS
MediaGLOW: Interpreting User Action
Evolving Notion of Similarity viaUser Expression
Photos presented in a graph-basedworkspace with “springs” betweeneach pair of photos.
Lengths of springs is initially based on adefault distance metric based on theirtime, location, tags, or visual features.
Users can pin photos in place andcreate piles of photos.
Distance metric to piles change as newmembers are added, resulting in thedynamic layout of unpinned photos inthe workspace.
C:\Users\shipman\Desktop\IUI MediaGLOW\tooltip.tif
C:\Users\shipman\Desktop\IUI MediaGLOW\photo5.tif
C:\Users\shipman\Desktop\IUI MediaGLOW\get-similar.tif
DOTS: Supporting Use of Surveillance Video
The problem
Number and size ofsurveillance systemsare increasing buthuman attention islimiting factor
Approach
Provide summariesof action
Build interfacesknowing limits ofautomation
trails-combined
activity-tab-4
DOTS: The Main Interface
Components
Rotating camerabank withactivity graphs
Mixed-initiativemain viewer
Map withtracking data
Timeline withautomaticevents
regular-player-tab
DOTS: Tracking Layout
Difficulty in tracking isthat camera views areoften similar
Tracking layout placescameras around the mainviewer to aid tracking
Study showed significantimprovement in trackingsuccess over traditionalviewer
In either layout, map canbe used to find activitynear a location and time.
smv-player
pie-menu-icme
HyperHitchcock: Interactive Video
Issue
Vision: Seamlessly interactwith charactersin the show
Reality: Difficult to authoreven simpleinteractive videos
Today, video is includedwithin pages of contentbut links between playingvideos are not common.
hitchcock
Support for Hypervideo Authoring
Links in video can lead to other video segments
Short main video with branches providing additional detail
Hyperlinks to branches just like in Web pages
Making of a scene in a movie, biography of an actor,different camera angle
General hypervideo difficult to author
Simple hypervideo format with only a single active link
Novel approach: use automatic video analysis, createan easy-to-use interface, and support simplehypervideo format
Hierarchical Video with Links
Video sequences are represented as a containment hierarchyof video elements
Elements are video clips or composites grouping other video elements
Elements are played in sequence
Each element can be link anchor or link destination
Anchor for innermost element is available while element isplaying
After link destinationvideo is played,play-back continuesat the link anchor
figures3
Detail-on-demand Links
Any video clip or composite can be link anchor or link destination
Optional link offsets into destination
Links have labels
Link return behaviors control the purpose of the link
Play from where the viewer left the video
Play from the end of the source anchor sequence
Play from beginning of thesource anchor sequence
Stop playback
Different behaviors fordestination completion oraborted playback
figures4
Hyper-Hitchcock Editor
Hyper-Hitchcock evolved from Hitchcock video editor
Video clips grouped in piles by similarity (e.g., recording time)
Workspace to arrange clips
Resize keyframes to trim clips
Clips ordered as horizontalor vertical lists
Place links between clips
Group clips into composites
Tree view to visualizecontainment hierarchy ofcomposites
hyper-hitchcock
Trimming Clips in the Workspace
Best five seconds of clip selected by default
Resizing keyframe changes length of clip
Picks the best portion around initial five-second portion
Start and end can jump to sentence boundary silence
Clip start and/or end can be locked in timeline
Locked ends can be dragged
Audio energyvisualized in timelineto spot words andsentences
resize-clip
Attaching Links to Clips and Composites
Link anchors and destinationscan be clips, composites, orelements inside composites
Color-coding and positionindicates link attachment inworkspace
new-composite
Links in and out of composite
Blue: attached to compositeRed: attached to elementDashed: between compositeand element
link-property2
Hypervideo Player
Video player with controls for following and returningfrom links
Several improvements based on user feedback
First version indicated linksin timeline and showed thelabel for the active link
Next version showedlabels in timeline
Current version includeskeyframes for active linkand for link history
User study suggestsfurther improvements
home-video-player
Today’s Topics
General Audio
Audio cues, spatialized audio
Speech
Segmentation, speaker id, recognition
Music
Interactive music, summarization, organization
Image and video processing
Color-oriented representations
Region and temporal segmentation
Foreground-background separation
Edge and face detection
Image and video applications
MediaGlow – image selection
DOTS – surveillance
HyperHitchcock – interactive video