Skip to navigation (Press Enter)
Skip to search (Press Enter)
Skip to course offerings (Press Enter)
Skip to content (Press Enter)

0845 470 1000 Contact

NV-BAAMM

Online Training

Duration
8 hours

Price (excl. VAT)

£ 420.—

Book now

Enquire a date

Classroom Training

Duration
8 hours

Price

on request

Book now

Enquire a date

Contact us

Building AI Agents with Multimodal Models (BAAMM) – Outline

Detailed Course Outline

1. Early and Late Fusion (1 hr)

Use camera and LiDAR data to predict object positions.
Convert various datatypes to make them neural network ready.

2. Intermediate Fusion (1 hr)

Explore the theory behind effective multimodal model architecture.
Train a Contrastive Pretraining model.
Create a vector database.

3. Cross-modal Projection (2 hr)

Converting a Language model into a Vision Language Model (VLM).
Process PDFs with Optical Character Recognition (OCR) tools.

4. Model Orchestration (2 hr)

Analyze video using Cosmos Nemotron.
Use VSS to answer user queries about video content.
Orchestrate with NVIDIA AI Blueprints.

5. Assessment (1 hr)

Convert a pre-trained model to input a different datatype using projection.

Contact