PLENARY LECTURES

 Machine Learning for Problem Solving
 Facial Behaviour Understanding
 Bayesian SSVEP/NIRS-based Brain Signal Decoding with Monte Carlo Implementations
 Research for wonderful future embedded applications
 Large-scale Audio and Video Content Analysis and Identification


Machine Learning for Problem Solving

Juang Prof. Biing-Hwang (Fred) Juang
Georgia Institute of Technology, USA
Personal homepage


Professor Juang received his Ph.D. from University of California, Santa Barbara in 1981. He had worked at Speech Communications Research Laboratory (SCRL) and Signal Technology, Inc. (STI) on a number of Government-sponsored research projects. Notable accomplishments during the period include development of vector quantization for voice applications, voice coders at extremely low bit rates, 800 bps and around 300 bps, and robust vocoders for use in satellite communications. He subsequently joined the Acoustics Research Department of Bell Laboratories, working in the area of speech enhancement, coding and recognition. Prof. Juang became Director of Acoustics and Speech Research at Bell Labs in 1996, and Director of Multimedia Technologies Research at Avaya Labs (a spin-off of Bell Labs) in 2001. His group continued the long heritage of Bell Labs in speech communication research, including, most notably, the invention of electret microphone, network echo canceller, a series of speech CODECs, and key algorithms for signal modeling and automatic speech recognition. In the past few years, he and his group developed a speech server for applications such as AT&T's advanced 800 calls and the Moviefone, the Perceptual Audio Coder (PAC) for digital audio broadcasting in North America (in both terrestrial and satellite systems), and a world-first real-time full-duplex hands-free stereo teleconferencing system. Prof. Juang has published extensively, including the book “Fundamentals of Speech Recognition”, co-authored with L.R. Rabiner, and holds about twenty patents. He joins Georgia Tech in 2002.

Abstract

Machine learning is a diffuse subject that carries many names. Some consider it to be part of artificial intelligence, while many equate it with modeling of statistical distribution from data or simply curve fitting. In this talk, we take the perspective of designing machines and algorithms to help us solve problems. Problem solving, in parallel with problem finding, is an important intellectual process. We ask if and how machine learning should be formulated to solve problems rather than perform statistical modeling. We use two age-old problems, namely, optimal quantization of data and pattern recognition, to illustrate many key aspects in this formulation. We further introduce new results in performance-based pattern recognition that has benefited from the problem solving perspective.


 Download lecture handout


Facial Behaviour Understanding

pantic Prof. Maja Pantic
Imperial College London, UK
Personal homepage


Prof. Maja Pantic received MSc and PhD degrees in computer science from Delft University of Technology, the Netherlands, in 1997 and 2001. From 2001 to 2005, she was an Assistant and then an Associate professor at the Electrical Engineering, Mathematics and Computer Science (EEMCS) at Delft University of Technology. In 2006, she joined the Imperial College London, Department of Computing, UK, where she is full-time Professor of Affective & Behavioural Computing and head of the Intelligent Behaviour Understanding Group (iBUG), working on machine analysis of human non-verbal behaviour and its applications to HCI. From November 2006, she also holds an appointment as a part-time Professor of Affective & Behavioural Computing at EEMCS of the University of Twente, the Netherlands. She was a Visiting Professor at the Robotics Institute, Carnegie Mellon University, in 2005.

In 2002, for her research on Facial Information for Advanced Interface (FIFAI), Prof. Pantic received Dutch Research Council Junior Fellowship (NWO Veni), awarded annually to 7 best young scientists in exact sciences in the Netherlands. In 2008, for her research on Machine Analysis of Human Naturalistic Behavior (MAHNOB), she received European Research Council Starting Grant, awarded annually to 2% best young scientists in any research field in Europe. In 2011, Prof. Pantic received BCS Roger Needham Award, awarded annually to a UK based researcher for a distinguished research contribution in computer science within ten years of their PhD.

Prof. Pantic currently serves as the Editor in Chief of Image and Vision Computing Journal (IVCJ) and an Associate Editor for both the IEEE Transactions on Systems, Man, and Cybernetics Part B (TSMC-B) and for the IEEE Transactions on Multimedia (TMM). She was the General Chair for the IEEE Int’l Conf. on Automatic Face and Gesture Recognition 2008, Belgium-Netherlands Conf. on Artificial Intelligence 2008, and IEEE Int’l Conf. on Affective Computing and Intelligent Interaction 2009. She currently serves as the Program Chair for the IEEE Int’l Conf. on Social Computing 2011. Prof. Pantic was also the initiator and co-organiser of both CVPR for Human Communicative Behaviour Analysis (CVPR4HB 2008-2011) and Social Signal Processing Workshop (SSPW 2009-2010).

Prof. Pantic is one of the world’s leading experts in the research on machine understanding of human behavior including vision-based detection, tracking, and analysis of human behavioral cues like facial expressions and body gestures, and multimodal analysis of human behaviors like laughter, social signals, and affective states. She is also one of the pioneers in design and development of fully automatic, affect-sensitive human-centered anticipatory interfaces, built for humans based on human models. She has published more than 100 technical papers in the areas of machine analysis of facial expressions and emotions, machine analysis of human body gestures, and human-computer interaction. She is a Senior member of the IEEE, and has served as the Key Note Speaker and an organization/ program committee member at numerous conferences.

Abstract

A widely accepted prediction is that computing will move to the background, weaving itself into the fabric of our everyday living spaces and projecting the human user into the foreground. To realize this prediction, next-generation computing should develop anticipatory user interfaces that are human-centred, built for humans, and based on naturally occurring multimodal human behaviour such as affective and social signaling.
The facial behaviour is our preeminent means to communicating affective and social signals. This talk discusses a number of components of human facial behavior, how they can be automatically sensed and analysed by computer, what is the past research in the field conducted by the iBUG group at Imperial College London, and how far we are from enabling computers to understand human facial behavior.


Bayesian SSVEP/NIRS-based Brain Signal Decoding with Monte Carlo Implementations

matsumoto Prof. Takashi Matsumoto
Waseda University, Japan
Personal homepage


Takashi Matsumoto received his BS in Electrical Engineering from Waseda University, Tokyo, Japan, MS in Applied Mathematics from Harvard University, Cambridge, Massachusetts, and Ph.D in Electrical Engineering from Waseda University, Tokyo, Japan. Currently, he is a professor at the Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan.

His interests are in Bayesian learning algorithms for real-world problems with Monte Carlo implementations. The target applications include time-series predictions, EEG and NIRS-based brain signal decoding, hyperspectral imaging-based individual authentication and generic object recognition, among others. He is also attempting to construct Generalized EM algorithms with nonparametric prior distributions for Bayesian learning and applications.

Dr. Matsumoto was a recipient of the 1994 Best Paper Award of the Japanese Neural Network Society. He is a Fellow of the IEEE. He has held visiting positions at Cambridge University, U. K., and U. C. Berkeley.

Abstract

Bayesian learning algorithms with Monte Carlo implementations are proposed and tested for brain signal decoding from EEG and NIRS data. The topics include:

Each time trial data arrives, the algorithm attempts to compute the predictive distribution of the pattern classes from which one computes the Sequential Error Rate of Steady State Visually Evoked Potentials (SSVEP) classification problems. Sequential Error Rate refers to the average classification error rate windowed over a short trial period. The algorithm is tested against two-class and four-class sequential classification problems.

In the Covert SSVEP Selective Attention problem, a subject is asked to pay attention to a target covertly, which makes the learning problem even more difficult than the overt attention problems for the subject and hence for the machine. In order to help a subject, the machine provides an online visual feedback signal showing the degree of achievement.

An attempt is made to predict an anxiety index (STAI index) from NIRS data consisting of oxyhemoglobin and deoxyhemoglobin data in prefrontal cortex area, when a subject is at rest.


 Download lecture handout


Research for wonderful future embedded applications

zhang Dr. Yimin Zhang
Intel Labs China, China


Yimin Zhang is currently the director of the Embedded application lab, Intel Labs China. He joined Intel in 2000, and has held various positions there as a research scientist, research manager of a team developing emerging statistical computing workloads. His research interest is on computer vision/image search, visual computing, speech/language processing, and workload scalability analysis for low power high performance architecture etc. The research work he led on innovative applications and benchmarking has contributed to various R&D efforts at Intel, e.g. multi-core, tera-scale computing. He has published more than 60 papers on international conferences and journals. He is a senior member of China Computer Federation, a member of IEEE, also served as  Academic working committee member of China Computer Federation, and session/workshop chairs  in PCM, ISCA etc. He received his B.A. degree from Fudan University in 1993, his M.S. degree from Shanghai Maritime University in 1996, and his Ph.D. degree from Shanghai Jiao Tong University in 1999, all in Computer Science.

Abstract

Recently Smartphone, Tablet, Smart TV etc. become very popular in the market, the applications on these embedded devices are the key to attract users and enrich people’s life.

In this talk, the current status of embedded application will be briefly reviewed, then we will introduce our thoughts on future application trend and research methodology. Based on these, we will talk about some of our recent work on embedded applications and especially in visual analytics/computer vision applications and algorithms, also some interesting future research directions we see as important, especially we will illustrate the possible implications of these research directions on machine learning, signal processing and related fields.


 Download lecture handout


Large-scale Audio and Video Content Analysis and Identification

kashino Dr. Kunio Kashino
NTT Communication Science Laboratories, Japan
Personal homepage


Kunio Kashino holds the title of Distinguished Technical Member, and is Leader of Media Recognition Research Group at NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation. He is also a visiting professor at National Institute of Informatics, Japan. His team has been working on audio and video analysis, search, retrieval, and recognition algorithms and their implementation. He received his PhD from the University of Tokyo for his pioneering work on music scene analysis in 1995. He is a Senior Member of the IEEE.

Abstract

Huge volumes of media content, such as movies and music, are being produced, stored and distributed over the broadband network or the multimedia clouds, and obviously, the search function has been a very important key to accessing, managing or utilizing such content. In recent years, while manually added metadata or collaborative information has been used to this end, there has been an ever-increasing demand to analyze, search and identify the media content itself. Since one of the most fundamental functions for such media analysis is to distinguish the "same" content from others, this talk will first review the same-content-detection techniques.  As an example, the RMS (Robust Media Search) project at NTT Communication Science Laboratories will be particularly introduced.  The talk will then focus on another important function called “media scene learning." It involves extracting pieces of information from unlabelled audio and video signals, without depending on knowledge specific to individual types of objects or events. Finally, the impact of "scale" on future technical challenges will also be discussed.