BEAT: Berkeley Emotion and Affect Tracking Dataset

Zhihang Ren1*, Jefferson Ortega1*, Yifan Wang1*, Ana Hernandez1, Zhimin Chen1, Yunhui Guo2, Stella X. Yu1,3, David Whitney1,
1University of California, Berkeley 2University of Texas at Dallas 3University of Michigan, Ann Arbor
(*Equal Contribution)

Importance of context in emotion recognition.


The ability to perceive the emotions of others is incredibly important when navigating and understanding the social world around us. To understand this visual perceptual mechanism, previous studies have focused on face processing leading many previous datasets to collect face-centric data. Recent research has found that the visual system also utilizes background scene context to modulate and assign perceived emotion. Similarly, “in-the-wild” datasets have been created to include contextual information, such as CAER and EMOTIC. However, these datasets only track categorical emotions, and ignore dimensional ratings (i.e. valence and arousal) or collect ratings on static images. In this project, we propose BEAT: The Berkeley Emotion and Affect Tracking Dataset, the first video-based dataset that contains both categorical and continuous emotion annotations for a large number of videos (124 videos total). BEAT provides more insights into human emotion perception by providing both categorical and dimensional ratings for individual videos. Additionally, BEAT can help researchers better understand how emotion is processed temporally, as it is in the real world. Additionally, compared to other datasets, a large number of annotators (n = 245) were recruited to avoid idiosyncratic biases. BEAT can also be beneficial for artificial intelligence (AI) models. Using unbiased and multi-modality annotations, AI models trained on BEAT can be more robust and fair. Finally, we also release a new AI benchmark for emotion recognition multi-tasking. The BEAT dataset will help increase our understanding about how humans perceive the emotions of others in natural scenes.

Dataset Preview



User interface used for video annotation. a) Participants were first shown the target character and were reminded of the task instructions before the start of each video. b) The overlayed valence and arousal grid / emotional states wheel that was present while observers annotated the videos. c) Observers were instructed to continuously rate the emotion of the target character in the video in real-time.

Sample Ratings

Example valence and arousal ratings and categorical emotion state ratings for a single video (video 78). Transparent gray lines indicate individual subject ratings and the red/blue line is the average rating across participants. For the categorical ratings, we show the proportion of articipants choosing the specific category. The final rating is based on the popularity voting.

Data Distributions

Distribution of valence and arousal ratings across participants. Individual white dots represent the average valence and arousal of the continuous ratings for each video clip for Hollywood movies. Blue squares and green triangles represent the average valence and arousal for documentaries and home videos, respectively.

Distribution of 11 categorical emotion states across participants.

Download (TBD)

BibTeX (TBD)