Year 3

PySpark - Module 0.0: Course Overview
PySpark - Module 0.0: Course Overview

Learn pyspark, for Data Science and Data Engineers (in Databricks)

PySpark - Module 0.1: Getting Started on Databricks
PySpark - Module 0.1: Getting Started on Databricks

Create a free Databricks account and confirm your setup before the first session

PySpark - Module 1: Big Data and Apache Spark
PySpark - Module 1: Big Data and Apache Spark

Learn pyspark, Introduction to Big Data and Apache Spark

PySpark - Module 2: Spark Architecture and RDDs
PySpark - Module 2: Spark Architecture and RDDs

Learn pyspark, Spark Architecture and RDDs

PySpark - Module 3: DataFrames and Spark SQL
PySpark - Module 3: DataFrames and Spark SQL

Learn pyspark, DataFrames and Spark SQL (in Databricks)

PySpark - Module 4: Data Sources and Sinks
PySpark - Module 4: Data Sources and Sinks

Learn pyspark, Data Sources and Sinks

PySpark - Module 5.0: Spark Streaming
PySpark - Module 5.0: Spark Streaming

Learn pyspark, Spark Streaming

PySpark - Module 5.1: Deploying Kafka
PySpark - Module 5.1: Deploying Kafka

Deploying Apache Kafka on a Single Node

PySpark - Module 5.2: Kafka Producers and Consumers
PySpark - Module 5.2: Kafka Producers and Consumers

Python-Based Producers and Consumers

PySpark - Module 5.3: Kafka and Structured Streaming
PySpark - Module 5.3: Kafka and Structured Streaming

Apache Spark Structured Streaming

PySpark - Module 6.0: MLlib Fundamentals
PySpark - Module 6.0: MLlib Fundamentals

MLlib fundamentals: the Pipeline API and feature engineering on Databricks

PySpark - Module 6.1: Supervised Learning at Scale
PySpark - Module 6.1: Supervised Learning at Scale

Supervised learning at scale: trees, ensembles, evaluation, and tuning with CrossValidator

PySpark - Module 6.2: Unsupervised Learning and Recommendation
PySpark - Module 6.2: Unsupervised Learning and Recommendation

Unsupervised learning and recommendation: K-Means clustering and ALS collaborative filtering

PySpark - Module 6.3: ML Pipelines in Production
PySpark - Module 6.3: ML Pipelines in Production

ML pipelines in production: saving models, batch inference, MLflow tracking, and drift

PySpark - Module 6.4: H2O Sparkling Water
PySpark - Module 6.4: H2O Sparkling Water

Scalable Machine Learning with H2O and Sparkling Water

PySpark - Module 6.5: Distributed ML with H2O
PySpark - Module 6.5: Distributed ML with H2O

Introduction to Distributed Machine Learning

PySpark - Module 6.6: Running H2O Sparkling Water
PySpark - Module 6.6: Running H2O Sparkling Water

Running H2O Sparkling Water (requires classic Databricks compute)

PySpark - Module 7: Capstone Project
PySpark - Module 7: Capstone Project

The capstone: apply the full stack to a real problem, on one of two tracks

PySpark - Exercise: Reviewing AI-Generated Code
PySpark - Exercise: Reviewing AI-Generated Code

A hands-on exercise: find and fix the performance traps in an AI-generated Spark pipeline

PySpark - Tuning Essentials
PySpark - Tuning Essentials

Tuning essentials: reading the plan, partitioning, and caching, on serverless and classic

BRD for Successful Project Execution
BRD for Successful Project Execution

Business Requirement Document for Successful Project Execution

Unlocking Entrepreneurial Success
Unlocking Entrepreneurial Success

Unlocking Entrepreneurial Success - Insights from Netflix's Co-Founder Marc Randolph

Shaping Your Success and Impact
Shaping Your Success and Impact

Discover how practicing civility can enhance your leadership and boost performance.

Applying Conway's Law - A Blueprint for Successful Projects
Applying Conway's Law - A Blueprint for Successful Projects

Explore the impact of Conway's Law on student project success by aligning team structure and communication with project outcomes.