PowerPoint Presentation

Real-time and RetrospectiveAnalysis of Video Streams andStill Image Collections usingMPEG-7

Ganesh Gopalan,

College of Oceanic and AtmosphericSciences, Oregon State University

Introduction

•HD video streams have potential toimprove understanding of deep sea eco-systems

•However, volume and complexityassociated with the HD streams andformats can be overwhelming

•Our approach: Use industry standards totransform video into a data type vs.treating it as viewing material

MPEG-7 Overview

•Multimedia content description interface

•Consists of low-level descriptors and high-level description schemes

•Low-level descriptors provide statisticalinformation about the pixel values incontent

•Description Schemes are used torepresent semantic information

Low Level Descriptors

•Structures that describe content in termsof the distribution of edges, colors,textures, shapes and motion

•Descriptors extracted using MPEG-7Experimental Model (XM) software

•The input is a still image or a frame fromvideo

•The output is an XML description of thestatistical information

Examples of Low Level Descriptors

•Edge Histogram

•Homogeneous Texture

•Color Layout

•Color Structure

•Motion Activity

•Descriptors are rotation and scalinginvariant

Descriptor Extraction and Search

•Phase 1: descriptor XML for collection offrames/still images is generated andcached

•Phase 2: difference between query imagedescriptor from those values cached inphase one is computed

•The cache can be augmented with thedescriptors from a new video or still imagecollection

Description Schemes

•Description Schemes attempt to model thereality behind the content

•Low level descriptors can be used to tagobjects of interest; the tags are then usedto construct a high level description

•A search can then be performed againstthe higher level description schemes

High Definition Video SearchEngine

•Applied MPEG-7 to the development of an HDsearch engine

•Extracted descriptors for approximately 10,000frames from 2.5 hours of high definition content

•Content provided by the University ofWashington from “Visions 05 Cruise”

•Also applied to search for eddies in satelliteimage collections; super-cells in radar images

Application Architecture

•.NET Windows Forms front end with anembedded Windows Media Player

•SQL Server back-end

•Common Language Run-time Integrationfor development of stored procedures tomanage MPEG-7 XML

•Procedures can be written in .NETlanguages rather than SQL

Creating a CLR Stored Procedure

CREATE FUNCTION FindUsingVisualDescriptor

(

@uid int,

@token uniqueidentifier,

@queryImage varbinary(MAX),

@descriptorName nvarchar(256)

)

RETURNS nvarchar(MAX)

AS EXTERNAL NAMEMPEG7Document.StoredProcedures.FindUsingVisualDescriptor;

Creating an HTTP Endpoint

CREATE ENDPOINT MPEG7

STATE = Started

AS HTTP

(

SITE = ‘XXX.XXX.XX.XXX',

PATH = '/MPEG7Endpoint',

AUTHENTICATION = (BASIC),

PORTS = (SSL),

SSL_PORT = 444

)

FOR SOAP

(WEBMETHOD 'FindUsingVisualDescriptor'

(NAME = 'looking.dbo.FindUsingVisualDescriptor',

FORMAT = ALL_RESULTS), …)

User Interface

•UI allows conversion of video into framesusing ffmpeg

•Descriptors of choice are then generatedfor all frames

•Descriptors are persisted to the server

Retrospective Search

•A query image initiates the search

•The descriptor value for the given image iscompared with those cached from thevideo frames or still images

•The top 100 frames that are closest to thequery image are returned

Retrospective Search Example

Real-time Event Detection

•In this case, we have a set of knownimages that have objects of interest

•Descriptors of frames from a real-timestream are compared on a continuousbasis with those in the “event library”

•When the difference in descriptor values isbelow a threshold, an event has beendetected

Example of an Event

Reference Event

Use of Multi-Core Systems

•The descriptor extraction process can be madefaster by taking advantage of multipleprocessors or cores

•The total number of frames can be divided upamongst the available processors

•Threads extract the descriptors concurrently togenerate chunks of XML

•The threads then signal each other to combinethe chunks into a single file with the descriptorXML

Challenges

•Shadows and other lighting issues cancreate false positives

•May be necessary to use multipledescriptors for classification

•Processing high definition video at 30fps iscomputationally intensive

•Scaling to a large number of images suchas on the web presents a challenge

Conclusion

•MPEG-7 supports a rich framework forcontent-based searches through its lowlevel descriptors

•Detected content can be tagged effectivelyusing the high level description schemesthat can be used to locate, search throughand distribute content

Future Directions

•Need to explore ways to speed updescriptor extraction using GPUs or hybridGPGPUs.

•Explore Cloud Services to implementvideo services – transcoding video on thefly for different devices, descriptorextraction using HPC clusters, streamingservices

•Explore the Surface Computer as a UI

Acknowledgements

•We are thankful to Professor JohnDelaney from the University of Washingtonfor providing the HD footage

•We are also thankful to the NSF fundedLOOKING team for supporting this effort