Skip to content
OnData.blog

OnData.blog

Menu

  • Articles
  • By topic
  • About
  • Linkedin
  • Facebook
  • twitter
  • RSS

Big Data in 30 hours

Lecture notes: Introduction to Apache Spark

In Lecture 7 of Big Data in 30 hours lecture series, we introduce Apache Spark. The purpose of this memo is to serve to the students as a reference of some of the concepts learned. About Spark Spark, managed by

Pawel Plaszczak March 14, 2019March 14, 2019 Articles No Comments Read more

Git version control: part 2

With help of this article, you made your first steps with Git, the version control software. You learned to commit your software so that it became version-controlled. You need just two more skills: work with remote repositories, and to check

Pawel Plaszczak January 8, 2019January 10, 2019 Articles No Comments Read more

Data Engineering + Data Science: building the full stack

This article is part of Big Data in 30 hours course material, meant as reference for the students. In our class we have looked at a number of Data Engineering and Data Science technologies. You may be wondering how they

Pawel Plaszczak January 5, 2019January 8, 2019 Articles No Comments Read more

Git version control: concise introduction

This article is part of Big Data in 30 Hours lectures series and is intended to serve as reference material for students. However, I hope others can also benefit. Why do we need version control in Data Science? Working with data

Pawel Plaszczak January 1, 2019January 8, 2019 Articles 1 Comment Read more

Lecture notes: an intro to Apache Spark programming

In Lecture 7 of our Big Data in 30 hours class, we discussed Apache Spark and did some hands-on programming. The purpose of this memo is to summarize the terms and ideas presented. Apache Spark is the currently one of the most

Pawel Plaszczak December 13, 2018December 13, 2018 Articles No Comments Read more

Top 10 Data Infrastructure technologies for a Data Scientist

We are in the middle of this semester’s Big Data in 30 Hours class. We just did lecture 7 out of 15. So far we covered Relational Databases, Data Warehousing, BI (Tableau), NoSQL (MongoDB and ElasticSearch), Hadoop,  HDFS and Apache

Pawel Plaszczak December 6, 2018December 6, 2018 Articles No Comments Read more

Graph Databases: Cosmos DB Graph API – Key Concepts and Best Practices

The purpose of this post is to recap the most important points from recent Big Data in 30 hours Lecture 5. What is a graph? Vertices – Vertices denote discrete objects, such as a person, a place, or an event.

Michał Wierzbiński November 23, 2018November 27, 2018 Articles No Comments Read more

Lecture Notes: Hadoop HDFS orientation

In Lecture 6 of the Big Data in 30 hours class we cover HDFS. The purpose of this memo is to provide participants a quick reference to the material covered.

Pawel Plaszczak November 21, 2018November 27, 2018 Articles No Comments Read more

Lecture notes: first steps in Hadoop

In Lecture 6 of our Big Data in 30 hours class, we talk about Hadoop. The purpose of this memo is to summarize the terms and ideas presented. About Hadoop Hadoop by Apache Software Foundation is a software used to run

Pawel Plaszczak November 21, 2018November 27, 2018 Articles 2 Comments Read more

Installing Oracle Database on Windows 10

In Lecture 5 and 6 of our Big Data in 30 Hours hands-on class, we will be experimenting with Data Warehousing and ETL. Our classes are hands-on. Prior to the class you need to install Oracle Database and the client utility

Pawel Plaszczak October 19, 2018November 27, 2018 Articles No Comments Read more
  • « Previous

Recent Posts

  • Moving On April 4, 2026
  • Data Literacy: Six examples of bad data interpretation April 29, 2024
  • Porting PyTorch neural network to Amazon AWS June 30, 2022
  • Porting pyTorch cloud detection model to Amazon AWS S3 June 17, 2022
  • pushing data to AWS. SageMaker sucks. So does Anaconda June 14, 2022
  • Linear Regression: Killer App with 19-century maths January 19, 2022
  • Democratization of statistics: Chi2 for non-experts January 12, 2022
  • An approach to categorize multi-lingual phrases December 15, 2021
  • The implications of Scikit-learn bug #21455 November 29, 2021
  • Your model may be inaccurate November 25, 2021

Recent Posts

  • Moving On
  • Data Literacy: Six examples of bad data interpretation
  • Porting PyTorch neural network to Amazon AWS
  • Porting pyTorch cloud detection model to Amazon AWS S3
  • pushing data to AWS. SageMaker sucks. So does Anaconda

Recent Comments

  • Pawel Plaszczak on How to isolate data that constitutes a spike in histogram?
  • robert on How to isolate data that constitutes a spike in histogram?
  • Marcello Anselmi Tamburini on Your model may be inaccurate
  • C on Product Owner vs Product Manager vs Architect
  • Houcem on Don’t trust Data Science. Ask the people

Archives

  • April 2026
  • April 2024
  • June 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • June 2021
  • April 2021
  • March 2021
  • February 2021
  • October 2020
  • July 2020
  • June 2020
  • April 2020
  • March 2020
  • February 2020
  • November 2019
  • October 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • July 2018
  • November 2016

Categories

  • Articles
  • General Public
  • Uncategorized

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
Copyright © 2026 OnData.blog. All rights reserved. Theme Spacious by ThemeGrill. Powered by: WordPress.