In Lecture 7 of Big Data in 30 hours lecture series, we introduce Apache Spark. The purpose of this memo is to serve to the students as a reference of some of the concepts learned. About Spark Spark, managed by
With help of this article, you made your first steps with Git, the version control software. You learned to commit your software so that it became version-controlled. You need just two more skills: work with remote repositories, and to check
This article is part of Big Data in 30 hours course material, meant as reference for the students. In our class we have looked at a number of Data Engineering and Data Science technologies. You may be wondering how they
This article is part of Big Data in 30 Hours lectures series and is intended to serve as reference material for students. However, I hope others can also benefit. Why do we need version control in Data Science? Working with data
In Lecture 7 of our Big Data in 30 hours class, we discussed Apache Spark and did some hands-on programming. The purpose of this memo is to summarize the terms and ideas presented. Apache Spark is the currently one of the most
We are in the middle of this semester’s Big Data in 30 Hours class. We just did lecture 7 out of 15. So far we covered Relational Databases, Data Warehousing, BI (Tableau), NoSQL (MongoDB and ElasticSearch), Hadoop, HDFS and Apache
The purpose of this post is to recap the most important points from recent Big Data in 30 hours Lecture 5. What is a graph? Vertices – Vertices denote discrete objects, such as a person, a place, or an event.
In Lecture 6 of the Big Data in 30 hours class we cover HDFS. The purpose of this memo is to provide participants a quick reference to the material covered.
In Lecture 6 of our Big Data in 30 hours class, we talk about Hadoop. The purpose of this memo is to summarize the terms and ideas presented. About Hadoop Hadoop by Apache Software Foundation is a software used to run
In Lecture 5 and 6 of our Big Data in 30 Hours hands-on class, we will be experimenting with Data Warehousing and ETL. Our classes are hands-on. Prior to the class you need to install Oracle Database and the client utility