Should justice use AI?

Should the widely understood justice system (courts, police, penitentiary system, and the related government agencies), be banned by law from collecting Big Data and using Artificial Intelligence? Back in 2013,  I was part of a data analytics project for State Police. On April Fool’s we received a hilarious hoax – an obviously fake internal announcement that police analytics could now predict crimes before they actually … Continue reading Should justice use AI?

Lecture notes: an intro to Apache Spark programming

In Lecture 7 of our Big Data in 30 hours class, we discussed Apache Spark and did some hands-on programming. The purpose of this memo is to summarize the terms and ideas presented. Apache Spark is the currently one of the most popular platforms for parallel execution of computing jobs in a distributed environment. The idea is not new. Starting in the late 1980’s, the HPC (high … Continue reading Lecture notes: an intro to Apache Spark programming

Top 10 Data Infrastructure technologies for a Data Scientist

We are in the middle of this semester’s Big Data in 30 Hours class. We just did lecture 7 out of 15. So far we covered Relational Databases, Data Warehousing, BI (Tableau), NoSQL (MongoDB and ElasticSearch), Hadoop,  HDFS and Apache Spark. While we are about to move to Data Science in Python (with Numpy, Scikit-learn, Keras and TensorFlow), I received valuable feedback from the students. … Continue reading Top 10 Data Infrastructure technologies for a Data Scientist