Extract insights from videos

 Extract insights from videos


In this code pattern, learn how to extract speaker diarized notes and meaningful insights reports using IBM® Watson™ Speech To Text, Watson Natural Language Processing, and Watson Tone Analysis when given any video.


In a virtually connected world, staying focused on work or education is very important. Studies suggest that many people lose their focus in live virtual meetings or virtual classroom sessions after approximately 20 minutes. Therefore, many meetings and virtual classrooms are recorded so that an individual can watch it later.

It might help if these recordings could be analyzed, and a detailed report of the meeting or class is generated by using artificial intelligence (AI). This code pattern explains how to do that. Given a video recording of the virtual meeting or virtual classroom, it explains how to extract audio from a video file using the FFmpeg open source library, transcribe the audio to get speaker-diarized notes with custom-trained language and acoustic speech to text models, and generate a natural language understanding report that consists of the category, concepts, emotion, entities, keywords, sentiment, top positive sentences, and word clouds using a Python Flask runtime.

After completing the code pattern, you understand how to:

  • Use the Watson Speech to Text service to convert the human voice into the written word
  • Use advanced natural language processing to analyze text and extract metadata from content such as concepts, entities, keywords, categories, sentiment, and emotion
  • Leverage Watson Tone Analyzer cognitive linguistic analysis to identify a variety of tones at both the sentence and document level



  1. The user uploads a recorded video file of the virtual meeting or virtual classroom.
  2. The FFmpeg library extracts audio from the video file.
  3. The Watson Speech To Text service transcribes the audio to give a diarized textual output.
  4. (Optionally) The Watson Language Translator service translates other languages into an English transcript.
  5. Watson Tone Analyzer analyses the transcript and picks up the top positive statements from the transcript.
  6. Watson Natural Language Understanding reads the transcript to identify key pointers and to get the sentiments and emotions.
  7. The key pointers and summary of the video are presented to the user in the application.
  8. The user can download the textual insights.


Find the detailed steps in the README file. Those steps explain how to:

  1. Clone the GitHub repository.
  2. Add the credentials to the application.
  3. Deploy the application.
  4. Run the application.

This code pattern is part of the Extracting insights from videos with IBM Watson use case series, which showcases the solution on extracting meaningful insights from videos using Watson Speech to Text, Watson Natural Language Processing, and Watson Tone Analyzer services.

Source: https://developer.ibm.com/patterns/extract-textual-insights-from-a-given-video/


Related post