M2P2: Multimodal Persuasion Prediction using Adaptive Fusion

M2P2 is a multimodal sequence learning framework that predicts persuasion in debate videos. Given a speaking clip (including audio, video and text modalities), M2P2 learns both shared and heterogeneous embeddings to predict persuasion.

The QPS debate video dataset is released for future persuasion research.

QPS Dataset (Download)

We release the QPS dataset collected from the popular Chinese debate TV show, Qipashuo. The dataset contains multimodal (video, audio, and text) speaking segments of debaters. Each segment is associated with the numbers of pre- and post-vote out of 100 audience. It is the first multimodal peresuasion dataset with the persuasion intensity.
Dataset statistics
Duration (minutes) 582
Number of debates 62
Number of speakers 48
Number of segments 2,297
The histogram of normalized vote changes (i.e. (post-vote - pre-vote)/100) is as follows:

The details of the dataset format can be found on GitHub.

IQ2US Dataset (Link)

This dataset is collected from the Intelligence Squared Debates (IQ2) TV series. A description of it can be found at Convokit.


Controversial topics (e.g. foreign policy, immigration, national debt, privacy issues) engender much debate amongst academics, businesses, and politicians. Identifying persuasive speakers in an adversarial environment is a critical task. In debate videos, multiple modalities (audio, video and text) are persuasive cues. Different modalities (1) are often semantically aligned, but (2) may provide diverse information for prediction.

To leverage the alignment of different modalities while maintaining the diversity of the cues they provide, M2P2 devises a novel adaptive fusion learning framework which fuses embeddings obtained from two modules – an alignment module that extracts shared information between modalities and a heterogeneity module that learns the weights of different modalities with guidance from three separately trained unimodal reference models.

The example above shows the realtime prediction of debate persuasiveness (number of votes) using M2P2. The debate is from a Chinese debate TV show, Qipashuo.


A reference implementation of M2P2 in Python is available on GitHub.


The following people contributed to M2P2:
Chongyang Bai
Haipeng Chen
Srijan Kumar
Jure Leskovec
V.S. Subrahmanian


M2P2: Multimodal Persuasion Prediction using Adaptive Fusion. C. Bai, H. Chen, S. Kumar, J. Leskovec, S. Venkatramanan arXiv, 2020.

The following BibTeX citation can be used:

Author = {Chongyang Bai and Haipeng Chen and Srijan Kumar and Jure Leskovec and V. S. Subrahmanian},
Title = {M2P2: Multimodal Persuasion Prediction using Adaptive Fusion},
Year = {2020},
Eprint = {arXiv:2006.11405},