> > DSSH

Data Science at Scale using Spark and Hadoop (DSSH)

Course Description Schedule Course Outline

Course Overview

Data Science at Scale using Spark and Hadoop is a 3 day instructor-led class where you will learn how scientists use data to solve problems by understanding the tools and techniques they use. Through in-class simulations, participants apply data science methods to real-world challenges in different industries and prepare for data scientist roles in the field

Who should attend

  • Developers
  • Data analysts
  • Statisticians


  • Proficiency in a scripting language
    • Python is strongly preferred
    • Perl or Ruby is sufficient
  • Basic knowledge of Apache Hadoop
  • Experience working in Linux environments

Course Objectives

After completing this class, you will learn:

  • How to identify potential business use cases where data science can provide impactful results
  • How to obtain, clean and combine disparate data sources to create a coherent picture for analysis
  • What statistical methods to leverage for data exploration that will provide critical insight into your data
  • Where and when to leverage Hadoop streaming and Apache Spark for data science pipelines
  • What machine learning technique to use for a particular data science project
  • How to implement and manage recommenders using Spark’s MLlib, and how to set up and evaluate data experiments
  • What are the pitfalls of deploying new analytics projects to production, at scale
Classroom Training
Modality: G

Duration 3 days

Price (excl. VAT)
  • United Kingdom: £ 1,695.-
Enroll now
Online Training
Modality: U

Duration 3 days

Price (excl. VAT)
  • United Kingdom: £ 1,695.-
Enroll now

Accessing our website tells us you are happy to receive all our cookies. However you can change your cookie settings at any time. Find out more.   Got it!