Building End-to-End Production Machine Learning pipelines

Much of the public discourse in Data Science focuses on model optimization (selection of regressors/classifiers, hyperparameter tuning, model training and improvment of the prediction accuracy). Less material is available on using and deploying these trained Machine Learning models in production. I was asked to summarize my experience in this domain in a series of workshops, one of which I deliver next week at the TopHPC … Continue reading Building End-to-End Production Machine Learning pipelines

Audit logging for Amazon Redshift

I lately spent a while configuring and analysing the logs for Amazon Redshift warehouse. I am summarizing the experience here so others can achieve the same faster. Amazon Redshift is the analytical data warehouse platform on the AWS cloud, with rapidly growing user base. It is optimized to work with S3 storage service. Redshift is focused on performance (columnar architecture, Massively Parallel Processing (MPP) using … Continue reading Audit logging for Amazon Redshift