Detailed Course Outline
Day 1
Module 1: Data Engineering Roles and Key Concepts
- Role of a Data Engineer
- Key functions of a Data Engineer
- Data Personas
- Data Discovery
- AWS Data Services
Module 2: AWS Data Engineering Tools and Services
- Orchestration and Automation
- Data Engineering Security
- Monitoring
- Continuous Integration and Continuous Delivery
- Infrastructure as Code
- AWS Serverless Application Model
- Networking Considerations
- Cost Optimization Tools
Module 3: Designing and Implementing Data Lakes
- Data lake introduction
- Data lake storage
- Ingest data into a data lake
- Catalog data
- Transform data
- Server data for consumption
Hands-on lab: Setting up a Data Lake on AWS
Module 4: Optimizing and Securing a Data Lake Solution
- Open Table Formats
- Security using AWS Lake Formation
- Setting permissions with Lake Formation
- Security and governance
- Troubleshooting
Hand-on lab: Automating Data Lake Creation using AWS Lake Formation Blueprints
Day 2
Module 5: Data Warehouse Architecture and Design Principles
- Introduction to data warehouses
- Amazon Redshift Overview
- Ingesting data into Redshift
- Processing data
- Serving data for consumption
Hands-on Lab: Setting up a Data Warehouse using Amazon Redshift Serverless
Module 6: Performance Optimization Techniques for Data Warehouses
- Monitoring and optimization options
- Data optimization in Amazon Redshift
- Query optimization in Amazon Redshift
- Orchestration options
Module 7: Security and Access Control for Data Warehouses
- Authentication and access control in Amazon Redshift
- Data security in Amazon Redshift
- Auditing and compliance in Amazon Redshift
Hands-on lab: Managing Access Control in Redshift
Module 8: Designing Batch Data Pipelines
- Introduction to batch data pipelines
- Designing a batch data pipeline
- AWS services for batch data processing
Module 9: Implementing Strategies for Batch Data Pipeline
- Elements of a batch data pipeline
- Processing and transforming data
- Integrating and cataloging your data
- Serving data for consumption
Hands-on lab: A Day in the Life of a Data Engineer
Day 3
Module 10: Optimizing, Orchestrating, and Securing Batch Data Pipelines
- Optimizing the batch data pipeline
- Orchestrating the batch data pipeline
- Securing the batch data pipeline
Hands-on lab: Orchestrating Data Processing in Spark using AWS Step Functions
Module 11: Streaming Data Architecture Patterns
- Introduction to streaming data pipelines
- Ingesting data from stream sources
- Streaming data ingestion services
- Storing streaming data
- Processing Streaming Data
- Analyzing Streaming Data with AWS Services
Hands-on lab: Streaming Analytics with Amazon Managed Service for Apache Flink
Module 12: Optimizing and Securing Streaming Solutions
- Optimizing a streaming data solution
- Securing a streaming data pipeline
- Compliance considerations
Hands-on lab: Access Control with Amazon Managed Streaming for Apache Kafka