Big Data in 30 hours: collateral

This page contains materials to my Big Data in 30 hours class .

Thank-you note: I want to express my respect and gratitude to all those whose work is linked below or in the slides: authors of data, software and insight generously shared online. I took care to only use the legally available resources (by following the license terms or acquiring an explicit author consent) and give explicit credits where applicable. That said, should any of the intellectual property owners wish to withdraw their consent in the future, please notify me.

The course description and syllabus is here: Data Engineering and Data Science class syllabus

The linkedin discussion group (open to all): Big Data in 30 hours

The class participants may start with environment (laptop) configuration instructions, including a recommended $250 laptop configuration for students on tight budgets.

The slides and collateral for each lecture is below, and will be added systematically.

Lecture nameMaterials to download
Lecture 1: Linux power tools, and the programmer environment.1. Git version control: concise introduction
2. Git version control: part 2
3. Linux power tools slides / handouts: Big Data in 30 hours Lecture 1 handouts
4. happiness worldwide - statistics from World Bank
5. The Enron corpus
Lecture 2: Making relations work. (Relational databases, sqlite)1. Slides / handouts: Big Data in 30 hours Lecture 2 handouts.
2. chinook database
3. fars data.
Lecture 3: Data Warehousing (OLTP vs OLAP, 3NF vs star/snowflake, Oracle)1. How to prepare the environment: guide to instalation of Oracle Database Express Edition and SQL Developer
2. Slides / Handouts: Big Data in 30 hours Lecture 3 handouts
Lecture 4: Business Intelligence (BI, OLAP Cubes, data viz, Tableau: drill down, roll-up, aggregate, slice, dice)1. Slides / handouts: Big Data in 30 hours Lecture 4 (BI) handouts
Lecture 5: Non-relational. Mongo, BigTable, CosmosDBCosmosDB Key Concepts by Michał Wierzbiński
Lecture 6: Distributed filesystems. Hadoop, MapReduce, HiveLeture Notes: First Steps in Hadoop
Lecture Notes: the basics of HDFS
Lecture 7: Apache Spark1.Lecture notes: an intro to Apache Spark programming
2. 2018-12-Lecture07-spark-v05
Lecture 8: Kafkaslides