Should the widely understood justice system (courts, police, penitentiary system, and the related government agencies), be banned by law from collecting Big Data and using Artificial Intelligence? Back in 2013, I was part of a data analytics project for State Police. On April Fool’s we received a hilarious hoax – an obviously fake internal announcement that police analytics could now predict crimes before they actually … Continue reading Should justice use AI?
In Lecture 7 of our Big Data in 30 hours class, we discussed Apache Spark and did some hands-on programming. The purpose of this memo is to summarize the terms and ideas presented. Apache Spark is the currently one of the most popular platforms for parallel execution of computing jobs in a distributed environment. The idea is not new. Starting in the late 1980’s, the HPC (high … Continue reading Lecture notes: an intro to Apache Spark programming
We are in the middle of this semester’s Big Data in 30 Hours class. We just did lecture 7 out of 15. So far we covered Relational Databases, Data Warehousing, BI (Tableau), NoSQL (MongoDB and ElasticSearch), Hadoop, HDFS and Apache Spark. While we are about to move to Data Science in Python (with Numpy, Scikit-learn, Keras and TensorFlow), I received valuable feedback from the students. … Continue reading Top 10 Data Infrastructure technologies for a Data Scientist
The purpose of this post is to recap the most important points from recent Big Data in 30 hours Lecture 5. What is a graph? Vertices – Vertices denote discrete objects, such as a person, a place, or an event. Edges – Edges denote relationships between vertices. For example, a person might know another person, be involved in an event, and recently been at a … Continue reading Graph Databases: Cosmos DB Graph API – Key Concepts and Best Practices
In Lecture 6 of the Big Data in 30 hours class we cover HDFS. The purpose of this memo is to provide participants a quick reference to the material covered.
In Lecture 6 of our Big Data in 30 hours class, we talk about Hadoop. The purpose of this memo is to summarize the terms and ideas presented. About Hadoop Hadoop by Apache Software Foundation is a software used to run other software in parallel. It is a distributed batch processing system that comes together with a distributed filesystem. It scales well over commodity hardware and … Continue reading Lecture notes: first steps in Hadoop
In Lecture 5 and 6 of our Big Data in 30 Hours hands-on class, we will be experimenting with Data Warehousing and ETL. Our classes are hands-on. Prior to the class you need to install Oracle Database and the client utility Oracle SQL Developer. Here is the brief instruction how to do it on your laptop. I wrote this summary because I found the Oracle documents too detailed. Continue reading “Installing Oracle Database on Windows 10”
I have been commissioned to prepare a complete Big Data class to fit in 30 hours of teaching. The goal is to introduce practical Data Engineering and Data Science to technical personnel (corporate or academic). The class is very technical and hands-on. Most subjects are introduced by examples that students are expected to play with. The class is designed for technical audience, reasonably fluent in … Continue reading Data Engineering & Data Science in 30 hours
Summary: the basic local environment to learn Data Engineering and Data Science without overspending: Laptop (Thinkpad X220 or similar class), 16GB RAM and 120GB SSD, Windows 10, OS language set as English (United States) language, WSL (Windows Subsystem for Linux) enabled, Python 3, venv, Notepad++, vi. Total budget: hardware $250, software free. Continue reading “Big Data in 30 hours and…. $250”
Hello all, not much here yet. This first post serves just to initialize this blog and verify how it looks under the new layout. The work presented here is not a new project, but continues on some ideas previously published on www.altanova.pl and the earlier Big Data Matters blog (not available any more). Real content to appear soon; meanwhile to keep you entertained, below contemplate … Continue reading To start this blog