Scene-Independent Group Profiling in Crowd
By Jing Shao
Groups are the primary entities that make up a crowd. Understanding group-level dynamics and properties is thus scientifically important and practically useful in a wide range of applications, especially for crowd understanding.
Socio-psychologists and biologists have extensively studied group dynamics as the primary processes that influence crowd behaviors. Group dynamics contain both intra- and inter- aspect: e.g. bacterial colonies were found to exhibit collective behavior to achieve a common goal, i.e. spreading of diseases; Conflict often occurs during competition of resources or goal incompatibility, either in fish schools or ant swarm.
In recent work of Shao et al (CVPR2014), a universal and fundamental set of group properties and corresponding scene-independent visual descriptors are proposed. This is made possible through learning a novel Collective Transition prior, which leads to a robust approach for group segregation in public spaces. From the prior, a set of visual descriptors are devised as shown below.
Understanding such properties provides critical mid-representation to crowd motion analysis, and could facilitate other high-level semantic analysis such as crowd scene understanding, crowd video classification, and crowd event retrieval. Both applications are scene-independent.
Reference:
Jing Shao, Chen Change Loy, Xiaogang Wang. “Scene-Independent Group Profiling in Crowd.” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014) [PDF] [Abstract] [Bibtex] [Project page]
Stationary Groups in Crowd Situations
by Shuai Yi
With steady population growth and worldwide urbanization, more and more people gather in big cities and crowd situation is happening more and more often. Crowd analysis in video surveillance attracts lots of attention and has plenty of applications. Existing work focuses on detecting motion patterns of crowds and analyzing interactions among pedestrians during movement. On the other hand, stationary crowd group analysis has never been sufficiently studied although these groups can provide surprisingly rich information.
Stationary crowd group is playing an important role in crowd analysis. It is one of the most common and basic pattern in crowd situations. Groups that stay for a period of time are often worth attention, as most interesting and attractive activities happen on the persons staying in the scene for a relatively long time rather than those passing through the scene quickly.
First of all, we can detect different types of group activities and discover valuable information from these activities. Figure 1 shows four activities that are to be detected. They are group gathering, group stopping by, group relocating, and group deformation, respectively. From different group activities, we may infer underlying social relationship of group members. For some groups, group members are familiar with each other (e.g. friends waiting for each other, or a group of people having discussion), while some others are just unfamiliar people sharing the same goal (e.g. buying tickets together or waiting for the same train). Moreover, the emergence, dispersal, stationary duration, and status of stationary groups may incur great security interest and are necessary to be discovered.

Figure 1. Four major types of stationary group activities to be detected in our work. (a) People join a group from different directions at different time. When all people arrive, the whole group moves along the same destination. (b) A group of people enter the view together, stay for a period of time, and leave together. (c) After staying at a place for a while, people move to another location and become stationary again. (d) People in a group have their own activities, taking photos for example.
Secondly, stationary groups will change traffic flow and will decrease traffic efficiency. Previous works mainly model the global motion pattern based on scene structures (e.g. entrances, exits, walls, and roads) and the interactions among individual moving pedestrians. However, study of shows that stationary groups have a greater impact on changing traffic patterns than mobile pedestrians in some situations. When pedestrians move around, they adjust speed but not direction to avoid collisions. Such self-organized behaviors keep traffic flow smooth. However, if pedestrians form stationary groups, they force others to change directions and transportation efficiency will be decreased a lot. As shown in Figure 2, the emergence and dispersal of stationary groups cause dynamic variation of crowd traffic patterns.It is thus of great interest to incorporate stationary groups into dynamic modeling of crowd systems. Moreover, stationary groups will lead to lower efficiency as pedestrians need to walk a longer way to bypass these groups, and special attention should be paid to this area.

Figure 2. The emergence and dispersal of stationary crowd groups will cause the dynamic variations of traffic patterns. Stationary groups are marked in red and main traffic patterns are marked in blue.
Lastly, stationary groups can help us better understand scene structure. It is informative to investigate where stationary groups are likely to emerge and how long they tend to stay. An average stationary time map is shown in Figure 3. It can provide guidance for crowd management, as well as provision of facilities and support.

Figure 3. Average stationary time distribution over 4 hours. Stationary groups tend to emerge and stay long around the information booth and in front of the ticketing windows.
All the above mentioned applications rely on one key technology of stationary time estimation. We propose a new method that estimates stationary time[1], i.e., period that a foreground pixel exists in a local region allowing local movements. As shown in Figure 4, given a video sequence, our method produces a 3D stationary time map in the spatio-temporal space. It is an important step for further analysis on stationary crowds.

Figure 4. Estimating a 3D stationary time map from a video sequence. Results from a few frames are shown. How long a pixel has been stationary up to each frame is encoded by the intensity level. Brighter pixels correspond to longer time.
Reference:
[1] Shuai Yi, Xiaogang Wang, Cewu Lu, and Jiaya Jia. “L0 Regularizes Stationary Time Estimation for Crowd Group Analysis.” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014). [Paper] [Spotlight] [Demo] [Poster] [Presentation] [Abstract] [Bibtex]
Tiltor, Stop Protests From Becoming Riots
Here I would like to recommend a mobile APP system called Titor, which aims at stopping crowd protests from becoming riots. The following introduction article on Tiltor is written by the CEO and Co-Founder of Tiltor, Greg Millington.
Anyone who has ever had the pleasure of joining a protest, doing the mexican wave, or attending a flash mob knows that it doesn’t take long to snap into sync with the crowd. This is because formed crowds are self-organizing systems with many parts supporting a particular global behavior. An entrant to an existing crowd is swept into lock step without having to be explicitly told what to do. Since self-organizing systems are not centrally controlled, but rather each person plays a role in maintaining the global behavior, they can withstand damages and perturbations such as the loss of participants. This feature makes protesting crowds notoriously difficult to terminate and it is exactly what Tiltor attacks.
In this respect, Tiltor can be thought of as the “Anti-Flash Mob.” Instead of making people spontaneously form a crowd, we make an already formed crowd of people disperse.
Tiltor is designed to compromise the collective motivation of a protesting crowd. When a significant segment of a crowd has their allegiances flipped, it does more than only peel away that segment from the protest (which a self-organized system can survive). Rather, the “turned” population of protestors serves to disrupt the system. This has a much greater undermining effect on the crowd sync than if they had just quietly left the protest. By flipping the feedback loop within the crowd from positive (“do what you can to keep the protest going”) to negative (“do what you can to stop the protest”) the crowd behavior can be altered, or even extinguished.
Tiltor presents this non-violent means of crowd control to law enforcement authorities in the hopes that the increasingly deadly clashes between protesters and police can be avoided. Upcoming applications of TIltor include the Winter Olympics in Sochi and the World Cup in Brazil.
Measuring Crowd Collectiveness
As we know that crowds in nature have a variety of scales, shapes, and dynamics. To quantitatively analyze the dynamic properties of crowd, we need to have a general descriptor that could measure the level of collective motions in crowd.
The simple and naive measure is the average velocity of the whole crowd, but we found that this measure is sensitive to noise and the global shapes of the crowd movement. Like these crowds in the following Figure, if the crowd move globally in a C shape, the average velocity would be very small, but in fact the ‘collectiveness’ of the crowd is high.
In recent work of Zhou et al (CVPR2013 oral, TPAMI2014) , a new descriptor of crowd called Collectiveness is proposed. This descriptor utilizes the graph connectivity of individuals in the neighborhood to build a global indicator to measure the collective level of crowd motions. As shown below, crowd movement could be accurately estimated and quantified into different dynamic categories.
Besides, there are a lot of applications based on this general descriptor, such as monitoring crowd dynamics in videos, detecting collective motions in time-series data, and generating collective map of scenes. Just check the TPAMI journal paper of this work.
Reference:
- Bolei Zhou, Xiaoou Tang, and Xiaogang Wang. “Measuring Crowd Collectiveness.” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013, oral paper)
- Bolei Zhou, Xiaoou Tang, Hepeng Zhang, and Xiaogang Wang. “Measuring Crowd Collectiveness.” The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI, regular paper)
A Complete Review on Collective Motion
A review on collective motion written by Tamas Vicsek is finally published, which introduces the state-of-the-art research on collective motion. In my opinion it is the greatest review on collective motion so far for its thoroughness and wide coverage. Check it out at here. The arXiv preprint is at here.
Collective Motion, Tamás Vicsek, Anna Zafeirisa, Physics Reports, 2012
Besides, Prof. Vicsek is a leading researcher on collective motion and complex network. You could see his homepage for a lot of useful information on this exciting research direction.
A TEDx talk titled The Simple and the Complex given by Prof. Vicsek is here.
The Power of Swarms Can Help Us Fight Cancer, Understand the Brain, and Predict the Future
A new article published at Wired gives a very nice introduction on the state-of-the-art research of swarm and crowd. Enjoy!
The Power of Swarms Can Help Us Fight Cancer, Understand the Brain, and Predict the Future
Breathtaking View of the Underwater
A large group of Bigeye travellies at Cabo Pulmo National Park, Mexico, captured by Octavio Aburto.
Coherent Filtering: Detecting Coherent Motions from Crowd Clutter
Coherent motion is a universal phenomenon in nature and widely exists in many physical and biological systems. For example, the tornadoes, storms and atmospheric circulation are all caused by the coherent movements of physical particles in the atmosphere. Meanwhile, the collective behaviors of organisms such as schooling fishes and pedestrian crowd have long captured the interests of social and natural scientists. Here are examples of coherent motions in videos.
Detecting these coherent motion patterns in crowd is the first step to organize the low-level features into semantic clusters. It will benefit high-level tasks such as scene understanding and activity analysis.
Recently I proposed a simple coherent motion detection technique called Coherent Filtering. It is published in Proceedings of 12th European Conference on Computer Vision (ECCV 2012). It is a generic clustering algorithm for analyzing time-series signals.
In our formulation, the low-level features are the keypoint trajectories (short time-series) automatically extracted from crowd video. Here are examples of the keypoint trajectories extracted from the crowd videos.

Figure 2. A) One frame of the crowd videos. B) The trajectories extracted from the videos. Colors of trajectories are randomly assigned.
Since the scenes in video are very crowded, there will be lots of dynamic noises and cluttered trajectories. Thus the purpose of the technique is to remove these noises and cluster keypoint trajectories into different coherent motion patterns. Here are some clustering results:

Figure 3. The coherent motion detection results by Coherent Filtering. Keypoints with the same color belong to the same coherent motion pattern.
The mechanism behind our technique is that it is based on a prior discovered in the particle dynamics called Coherent Neighbor Invariance. The details can be found at the project page and technical paper.
Understanding Collective Crowd Behavior: A Computer Vision Approach
Recently I publish a research paper at IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012, to analyze the collective crowd behavior in New York Grand Central Station. Here is the paper and project page.

Grand Central Station
Generally speaking, the objective of this project is to learn the collective crowd behavior patterns from the real video of New York Grand Central Station. And the learned collective behavior patterns are used in a lot of important applications, such as crowd simulation, collective behavior classification, and abnormality detection.
Though there are quite a lot of pedestrian walking in the station at one time(~400 population) which form a variety of collective crowd behaviors, one key fact is that instead of randomly moving, majority of these pedestrian have clear belief of the destination to reach in mind, i.e., entering from one entry and walking to one exit in other side of the station. Thus, the overall behavior of one pedestrian in the station will be largely influenced by his belief of starting point and destination, along with two other properties: his preference of movement dynamics and timing of emerging (the frequency of entering the scene from the starting point).
Following this intuitive analysis, from agent-based modeling of the crowd in station, every pedestrian is driven by one type of agents with three properties: the belief of starting point and destination, movement dynamics, and the timing of emerging. And the whole crowd is modeled as a mixture of pedestrian-agents with different three properties.
For the computational modeling of the pedestrian-agents, please refer to project page and paper for detailed information. Welcome to contact me if you have any questions or suggestions. The original video of the train station and the trajectories used in my paper could be downloaded at here.