Big Data in 30 hours: collateral

This page contains selected materials to my Big Data in 30 hours class . Note that only fragmentary material is available online, hence it is not suitable as an online resource for self-learners. The page is designed as reference for students who already participated in the class.

Thank-you note: I want to express my respect and gratitude to all those whose work is linked below or in the slides: authors of data, software and insight generously shared online. I took care to only use the legally available resources and give explicit credits. That said, should any of the IP owners wish to withdraw their consent in the future, please notify me.

The course description and syllabus is here: Data Engineering and Data Science class syllabus

The linkedin discussion group (open to all): Big Data in 30 hours

The class participants may start with environment (laptop) configuration instructions, including a recommended $250 laptop configuration for students on tight budgets.

The slides and collateral for each lecture is below, and will be added systematically.

Lecture nameMaterials to download
Lecture 1: Linux power tools, and the programmer environment.1. Git version control: concise introduction
2. Git version control: part 2
3. Linux power tools slides / handouts: Big Data in 30 hours Lecture 1 handouts
4. happiness worldwide - statistics from World Bank
5. The Enron corpus
Lecture 2: Making relations work. (Relational databases, sqlite)1. Slides / handouts: Big Data in 30 hours Lecture 2 handouts.
2. chinook database
3. fars data.
Lecture 3: Data Warehousing (OLTP vs OLAP, 3NF vs star/snowflake, Oracle)1. How to prepare the environment: guide to instalation of Oracle Database Express Edition and SQL Developer
2. Slides / Handouts: Big Data in 30 hours Lecture 3 handouts
Lecture 4: Business Intelligence (BI, OLAP Cubes, data viz, Tableau: drill down, roll-up, aggregate, slice, dice)1. Slides / handouts: Big Data in 30 hours Lecture 4 (BI) handouts
Lecture 5: Non-relational. Mongo, BigTable, CosmosDBCosmosDB Key Concepts by Michał Wierzbiński
Lecture 6: Distributed filesystems. Hadoop, MapReduce, HiveLeture Notes: First Steps in Hadoop
Lecture Notes: the basics of HDFS
Lecture 7: Apache Spark1.Lecture notes: an intro to Apache Spark programming
Lecture 8 - 16This material is currently not available online. Feel free to contact me with any enquiries.