MULTIDATA Workshop at AELCO 2026

Working with Multimodal Data: Speech, Gesture and Your Own Video Data

Learn how to use our multimodal pipeline for speech and gesture analysis

MULTIDATA (https://www.multi-data.eu) is an ERASMUS PLUS KA220 HED project involving the universities of Murcia, Radboud Nijmegen, and FAU Erlangen-Nürnberg as partners, with the MPI for Psycholinguistics and the Red Hen Lab as associated partners. The project offers an online platform for the study of multimodal communication, including AI-based tools to analyze speech and gesture data from videos, as well as resources for developing audiovisual collections and educational materials.

Join us for a hands-on MULTIDATA workshop at AELCO 2026. This practical session will introduce participants to the MULTIDATA platform and pipeline, with a focus on how to process video data for multimodal communication research. Participants will learn how to use the MULTIDATA tools for analyzing gesture together with co-occurring language and speech (prosody, pauses, vocal articulation).

The workshop will cover the main stages of the MULTIDATA workflow: platform access, aligned transcription and speech analysis, gesture extraction through pose estimation, and practical ways of working with participants’ own data. The session is designed for researchers, teachers and students interested in multimodal communication, cognitive linguistics, gesture studies, audiovisual data, and corpus-based approaches to language.

Workshop Information

Title: MULTIDATA Workshop: Working with Multimodal Data. Speech, Gesture and Your Own Video Data
Date, time, place: Tuesday 22 September, 17:00-20:00 CEST. Faculty of Arts (Facultad de Letras), University of Murcia, La Merced Campus (Calle Santo Cristo 1, Murcia, Spain). Co-located with AELCO 2026
Duration: 3 hours
Format: Hands-on workshop
Teams involved: University of Murcia and FAU Erlangen-Nürnberg
Structure: 1h45 workshop + 15 min coffee break + 1h practical session
Participants: Researchers, teachers, students, and anyone interested in multimodal communication, speech, gesture, audiovisual data, and corpus-based language research.

Workshop Description

This 3-hour hands-on workshop introduces participants to the MULTIDATA platform and multimodal pipeline. The session focuses on practical workflows for processing video data, including platform access, aligned transcription, speech and prosodic metrics, gesture extraction through pose estimation, and the use of participants’ own data.

The workshop begins with a short introduction to the MULTIDATA project and its relevance for multimodal communication research. Participants will then be guided through platform sign-up and basic navigation. The first demo will show how to extract aligned transcriptions and speech metrics such as pitch and intensity from video data. The second demo will focus on gesture analysis, showing how pose estimation can be used to extract body keypoints and prepare gesture features for research.

After a 15-minute break, the final hour will be devoted to participants’ own data. Attendees will be able to discuss possible applications, ask technical and methodological questions, and receive guidance on how MULTIDATA tools can support their research or teaching.

Workshop Objectives

By the end of the workshop, participants will have a general understanding of how to use the MULTIDATA platform to process video data for multimodal communication research.

Participants will learn how to:

Access and navigate the MULTIDATA platform.
Prepare video data for processing.
Extract aligned transcriptions from video content.
Obtain speech and prosodic metrics such as pitch and intensity.
Use pose estimation to extract gesture-related information from video.
Understand how body keypoints can be transformed into structured datasets.
Explore possible applications of MULTIDATA tools with their own data.

MULTIDATA Workshop Program

AELCO 2026. 3-hours

17:00 – 17:20 – Project Introduction

Speakers: Cristóbal Pagán Cánovas, University of Murcia, MULTIDATA coordinator

Welcome and general introduction to the MULTIDATA project.

This opening session introduces the main goals of MULTIDATA and explains how the platform supports the study of multimodal communication. It will present the general logic of the pipeline and the types of data that can be processed: video, speech, prosody, transcription and gesture.

Topics:

Overview of the MULTIDATA project.
Why multimodal data matter for language research.
What the MULTIDATA platform offers.
Main steps of the pipeline: video, transcription, speech, gesture and analysis.

17:20 – 17:35 – Platform Sign-Up and Access

Speakers: Raúl Sánchez Sánchez, University of Murcia, MULTIDATA technical coordinator

This short practical session helps participants access the MULTIDATA platform and prepare for the hands-on parts of the workshop.

Topics:

How to access the MULTIDATA platform.
Account creation and login.
Basic platform navigation.
Preparing data for upload and processing.

17:35–18:05 – Demo 1: Extracting Aligned Transcriptions and Speech Metrics

Speaker: Raúl Sánchez Sánchez and Rosa Illán Castillo

This session demonstrates how to use MULTIDATA tools to extract speech-related information from video data.

Participants will learn how to generate time-aligned transcriptions, obtain word-level alignment, export files compatible with ELAN or further analysis, and extract speech metrics such as pitch and intensity synchronized with video frames.

Topics:

Uploading or selecting video data.
Generating aligned transcriptions.
Obtaining word-level alignment.
Exporting transcription files.
Extracting speech metrics such as pitch and intensity.

18:05–18:45 – Demo 2: Extracting Gesture Features from Video Data

Speakers: Members of the FAU team (TBA)

This session introduces the gesture analysis side of the MULTIDATA pipeline, focusing on pose estimation and the extraction of body movement features.

Participants will see how video data can be processed to obtain body keypoints and how these outputs can be prepared for gesture research and multimodal analysis.

Topics:

What pose estimation contributes to gesture research.
How MULTIDATA processes video for gesture analysis.
From body keypoints to structured datasets.
Cleaning and preparing gesture data.
Extracting gesture features for analysis.
Combining gesture information with speech and transcription data.

18:45–19:00 – Coffee Break

19:00–19:55 – Practical Session: Working with Your Own Data

Speakers: Members of the UMU and FAU teams

This final hands-on session is dedicated to practical experimentation and questions from participants.

Participants will be invited to discuss how their own video data could be processed with MULTIDATA, identify suitable types of recordings for the pipeline, ask questions about transcription, prosody, gesture extraction and data preparation, and receive guidance from the MULTIDATA team on possible research or teaching applications.

Participants will be able to:

Discuss their own video data and research interests.
Identify whether their data are suitable for MULTIDATA processing.
Ask practical questions about transcription, speech analysis and gesture extraction.
Explore possible educational or research applications.
Receive guidance on next steps for using the MULTIDATA platform.

19:55–20:00 – Closing Remarks

Speakers: Cristóbal Pagán Cánovas

Brief closing remarks and information about future MULTIDATA resources, documentation and collaboration opportunities.

All participants are then welcome to join the AELCO welcome party, with live music, starting about 8pm in a nearby venue. For details check out the conference program.

Avda. Teniente Flomesta, 5. 30003. Murcia
hello@multi-data.eu

MULTIDATA Workshop at AELCO 2026

Working with Multimodal Data: Speech, Gesture and Your Own Video Data

Learn how to use our multimodal pipeline for speech and gesture analysis

Workshop Information

Workshop Description

Workshop Objectives

MULTIDATA Workshop Program

AELCO 2026. 3-hours

17:00 – 17:20 – Project Introduction

17:20 – 17:35 – Platform Sign-Up and Access

17:35–18:05 – Demo 1: Extracting Aligned Transcriptions and Speech Metrics

18:05–18:45 – Demo 2: Extracting Gesture Features from Video Data

18:45–19:00 – Coffee Break

19:00–19:55 – Practical Session: Working with Your Own Data

19:55–20:00 – Closing Remarks

Resources

Menu