Saturday, October 12, 2013

Somebody is watching where you look at...

Yet another technical but interesting talk at ghc13 today was about integrating eye direction to augmented reality applications. The presenter was Ann McNamara, from Texas A&M University, and she was well-communicating with the full-room of her audience.

Thinking of the existence of the term "Augmented Reality", it was told to be first used by engineers at Boeing in1990s, for the projections of other things on real world objects. But it's not until mid-90s that AR apps started to appear on smart phones and become popular.

As the popularity of smart phones and hand-held devices started to increase, it seems clever to exploit those in AR applications. Such as cameras and screens on smart phones that combine real world and virtual world by overlaying virtual elements. Or what's better, using the GPS info and a bit of image recognition, the real-world is better integrated with the virtual one.. (or vice-versa? :) )

Ok now the real task: you want to overlay elements that are related to the real world, but they have to be placed in the vaccinity of the real work objects, so the placement really matters in terms of reality of the application, which is quite challenging.

As an example, one of the first AR applications is Volkswagen XL1 and Meraio, which uses a phone or a tablet. The user looks through the device to the car, and sees the additional information on how to provide maintenance on the car, or how to place some parts, or the age of a part, etc. Which is quite useful for maintenance and engineering. As another example, panorama in an unfamiliar environment with all the virtual info that is overlaid on the view, i.e. you see the comments about the restaurant on your right, or the timetable for the buses on the nearest bus stop, or the weather about this new neighborhood you just moved in, etc.

But the problem is, how can you see around, important items, and all of their properties, and ratings and comments and... wow, that is too much for the size of your mobile devices, right? If they put all of them at once in that little screen, we would have highly overpopulated screens! And what about when we place the device closer to the object, it should get bigger, but sorry, the amount of visualizations are not getting smaller... Also, in a real scene, it is always the case that many objects occlude each other.. How to overcome all?

By view management!

Which ensures that there will be no overlap, no obscuring each other and will aid scene navigation. So how to manage a view, one of the most cleverest responses: the view is where we look! Nowhere else, so where are we looking? Or do we see where we look, is the better question.


The whole clever purpose of this project is to direct and/or benefit visual attention in mobile devices. Incorporating eye tracking into AR techniques and present labels or visual information where the visual attention is detected, where the user is looking is the cool part that would change the approach HCI techniques to not just focus on gestures, or hands, or body; but also eyes are of equal or more importance. Anyone remembers the movie called Minority Report? I know, most of the technological enhancements use the same old cliche example, but wouldn't it be cool to have the actually-not-so-smart-phone to interpret you better by just looking into your eyes? Not on a mobile device, but I have tried a couple of eye-tracking application in previous Siggraph's, and I can see that it is a promising technology if it is well-exploited.

So how they exploited that approach is using visual clutters based on optical flow or pixel flow and
looking at eye tracking to find where to place visual information. Speaking of which, eye tracking works by using infra red lights, and uses image processing for detecting the texture pupil for calculating the gaze and eye position. Then a heat map for gazes is extracted: imagine we are all Cyclops of X-men, but without glasses! The heat map shows our accumulated "burns" on an image:) So when you are placing important information, or labels, or something-to-be-seen, you wouldn't want to put them on cold places.

There also some other questions that are researched, as how to associate locations with the places that are more examined, or how to change labels without making the transition apparent, or how to put all that technology into mobile devices. One answer is the structure sensor: a device you attach to the mobile device and you can scan around! And another one is directing attention. Instead of detecting attention and going to those important places, you direct the attention to the places that want to be looked.

The researcher choose a very-well suited and also enjoyable input for evaluating attention direction: episodic paintings! Yes, I am not an art major either, and this talk was my first time to hear about such paintings too. Episodic paintings are actually some old videos, in one frame :) Ancient artists didn't know how to record, so they merged it with art and explained a timeline in only one painting. The frames are not divided, and they painted all the episodes on one panel, so you should observe the painting and read its episodes from just one frame! Of course, for the historical art people, it is easy to read (or watch?) these paintings and see the whole timeline. However for me, and for most of you, it is just one painting with some irrelevant repetitions :)

The reason why we went that deep into episodic paintings is, the user study conducted in this research help out-of-domain people to read those paintings, by subtle gaze direction. Remember directing attention? Subtle gaze direction is unconscious gaze redirection, in other words, making people look to some specific point without actually making them aware of where they are looking! Creepy but... cool. So obviously, putting a red arrow for some point in the picture is not what they are trying to accomplish:)

The human visual system (HVS) is a wonderland, hopefully I can blog about all those working mechanisms about mach-band effect, illusions, differencing logic, cons and rod, etc... Ann also benefits from this system, and uses the motion to make somebody look somewhere. Because the peripheral vision of HVS is more responsive to the motion. They put a little bit motion effect to some part of the painting, and wow, all the heat map changes, that spot is on fire :) They applied gaze direction to episodic paintings in a conference in 2012, and the user group with gaze direction was far more successful in reading the paintings in a correct order! There are also other techniques to direct gazes, like dodging a spot, blurring or changing contrast. But they are mainly used in videos, not static images, which is logical because the video already has motion, but the image does not.

She also showed some preliminary results on heat maps, before and after the attention is directed, and the results show that we see before we look but we also look where we see :) Which is as complicated as expected from our complex nerves, right?

I hope we will see the real world applications out of this starting project and our mobile devices will be looking at where we are looking at, to ease our communication with the digital world.

No comments:

Post a Comment