Data Science allows us to create models that analyze data faster and more accurately than humans. If you’re a Python programmer, you’re likely to use libraries such as TensorFlow, Keras, Scikit-learn, or Pandas to create those models.

To turn those models into production machines, we need a bit of Data Engineering knwoledge. We need to define the underlying data architecture, perhaps considering as components: Apache Spark, Kafka, Hadoop, Oracle Database, MongoDB, ElasticSearch, to name just examples. Amazon AWS might provide the infrastructure. We then need some Software Engineering knowledge to properly build and deploy those models, perhaps using DevOps methodology, Docker container isolation and Git version control.

Those are some elements of Data Engineering and Software Engineering useful, if not essential to every Data Scientist interested in bringing their models out of the lab into the production environment.

This invites you to the workshops entitled Data Engineering for Data Scientists, which I lately composed and which will be held at several conferences later this year. During the hands-on workshop we should be able to build a simple end-to-end scenario, connecting the pieces I just named above. If you like to learn more, during the workshop we will selectively use the content of my Big Data in 30 hours class, partially available online here. I hope to see you there.

Introducing my Big Data orientation workshops

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.