Data Engineering on AWS (DEAWS) – Outline

Detailed Course Outline

Day 1

Module 1: Data Engineering Roles and Key Concepts

  • Role of a Data Engineer
  • Key functions of a Data Engineer
  • Data Personas
  • Data Discovery
  • AWS Data Services

Module 2: AWS Data Engineering Tools and Services

  • Orchestration and Automation
  • Data Engineering Security
  • Monitoring
  • Continuous Integration and Continuous Delivery
  • Infrastructure as Code
  • AWS Serverless Application Model
  • Networking Considerations
  • Cost Optimization Tools

Module 3: Designing and Implementing Data Lakes

  • Data lake introduction
  • Data lake storage
  • Ingest data into a data lake
  • Catalog data
  • Transform data
  • Server data for consumption

Hands-on lab: Setting up a Data Lake on AWS

Module 4: Optimizing and Securing a Data Lake Solution

  • Open Table Formats
  • Security using AWS Lake Formation
  • Setting permissions with Lake Formation
  • Security and governance
  • Troubleshooting

Hand-on lab: Automating Data Lake Creation using AWS Lake Formation Blueprints

Day 2

Module 5: Data Warehouse Architecture and Design Principles

  • Introduction to data warehouses
  • Amazon Redshift Overview
  • Ingesting data into Redshift
  • Processing data
  • Serving data for consumption

Hands-on Lab: Setting up a Data Warehouse using Amazon Redshift Serverless

Module 6: Performance Optimization Techniques for Data Warehouses

  • Monitoring and optimization options
  • Data optimization in Amazon Redshift
  • Query optimization in Amazon Redshift
  • Orchestration options

Module 7: Security and Access Control for Data Warehouses

  • Authentication and access control in Amazon Redshift
  • Data security in Amazon Redshift
  • Auditing and compliance in Amazon Redshift

Hands-on lab: Managing Access Control in Redshift

Module 8: Designing Batch Data Pipelines

  • Introduction to batch data pipelines
  • Designing a batch data pipeline
  • AWS services for batch data processing

Module 9: Implementing Strategies for Batch Data Pipeline

  • Elements of a batch data pipeline
  • Processing and transforming data
  • Integrating and cataloging your data
  • Serving data for consumption

Hands-on lab: A Day in the Life of a Data Engineer

Day 3

Module 10: Optimizing, Orchestrating, and Securing Batch Data Pipelines

  • Optimizing the batch data pipeline
  • Orchestrating the batch data pipeline
  • Securing the batch data pipeline

Hands-on lab: Orchestrating Data Processing in Spark using AWS Step Functions

Module 11: Streaming Data Architecture Patterns

  • Introduction to streaming data pipelines
  • Ingesting data from stream sources
  • Streaming data ingestion services
  • Storing streaming data
  • Processing Streaming Data
  • Analyzing Streaming Data with AWS Services

Hands-on lab: Streaming Analytics with Amazon Managed Service for Apache Flink

Module 12: Optimizing and Securing Streaming Solutions

  • Optimizing a streaming data solution
  • Securing a streaming data pipeline
  • Compliance considerations

Hands-on lab: Access Control with Amazon Managed Streaming for Apache Kafka