Course Overview
Learn how to build neural network agents that reason across multiple data types using advanced fusion techniques, OCR, and NVIDIA AI Blueprints for real-world applications like robotics and healthcare.
Prerequisites
- A basic understanding of Deep Learning Concepts.
- Familiarity with a Deep Learning framework such as TensorFlow, PyTorch, or Keras. This course uses PyTorch.
Course Objectives
In this course, you will learn about:
- Different data types and how to make them neural network ready
- Model fusion, and the differences between early, late, and intermediate fusion
- PDF extraction using OCR
- The difference between modality and agent orchestration
- Customization of NVIDIA AI Blueprints with Video Search and Summarization (VSS)
Course Content
We'll begin with a robotics use case to show how different datatypes impact an effective neural-networks architecture. The mathematical concepts we learn in the robotics use case can then be applied to Large Language Models (LLMs) in order to modify these powerful model to accept non-language data input. We'll end with orchestration where multiple models work together to answer user queries.