Building AI Agents with Multimodal Models (BAAMM)

 

Course Overview

Learn how to build neural network agents that reason across multiple data types using advanced fusion techniques, OCR, and NVIDIA AI Blueprints for real-world applications like robotics and healthcare.

Prerequisites

  • A basic understanding of Deep Learning Concepts.
  • Familiarity with a Deep Learning framework such as TensorFlow, PyTorch, or Keras. This course uses PyTorch.

Course Objectives

In this course, you will learn about:

  • Different data types and how to make them neural network ready
  • Model fusion, and the differences between early, late, and intermediate fusion
  • PDF extraction using OCR
  • The difference between modality and agent orchestration
  • Customization of NVIDIA AI Blueprints with Video Search and Summarization (VSS)

Course Content

We'll begin with a robotics use case to show how different datatypes impact an effective neural-networks architecture. The mathematical concepts we learn in the robotics use case can then be applied to Large Language Models (LLMs) in order to modify these powerful model to accept non-language data input. We'll end with orchestration where multiple models work together to answer user queries.

Prices & Delivery methods

Online Training

Duration
8 hours

Price
  • £ 420.—
Classroom Training

Duration
8 hours

Price
  • on request
 

Schedule

Instructor-led Online Training:   This computer icon in the schedule indicates that this date/time will be conducted as Instructor-Led Online Training. If you have any questions about our online courses, feel free to contact us via phone or Email anytime.

English

8 hours difference to British Summer Time (BST)

Online Training Time zone: Pacific Daylight Time (PDT) Course language: English
Online Training Time zone: Pacific Daylight Time (PDT) Course language: English