Azure Machine Learning, Knime, and Spinning Your Own Hadoop Cluster

Azure Machine Learning, Knime, and Spinning Your Own Hadoop Cluster

As part of learning about Big Data, I took an online course on machine learning and played around with some of the concepts. They are two different things that get conflated frequently. Big Data is a field of deriving value from and managing huge amounts of data, levels of data beyond what organizations have ever had to deal with before. Machine learning is a discipline that uses algorithms and statistical methods to find patterns in test data that can be applied to new data to make predictions. It is frequently used in tandem with Big Data because part of the value of all the data is finding ways to learn and predict from it. The two overlap, but machine learning can involve data levels that are not really “big” and Big Data encompasses a lot more than just machine learning.

Screenshot of Knime, a system that helps you run and manage machine learning routines.
Screenshot of Knime, a system that helps you run and manage machine learning routines.

While taking my machine learning course, I was introduced to an open source tool called Knime, a GUI toolkit (and probably much more, but that was my take) for machine learning that I really liked. In fact, while experimenting with machine learning on Amazon’s AWS (cloud servers) I kept thinking how nice it would be to have a tool like Knime that could link directly to my datasets in the cloud. It’s entirely possible that Knime supports this, but Azure has similar features built in.

As an aside, I created my own Hadoop cluster in AWS from scratch from generic Linux servers using this handy blog post. I don’t really recommend this except as a learning exercise (or if you know something I don’t) since Amazon offers its own flavor of Hadoop as a turnkey option. Amazon also offers specific machine learning instances, but I have to say I didn’t find it particularly intuitive or useful, at least in my use case.

That brings me to Azure’s machine learning solution. I assumed that AWS would be ahead in this area because of their renown in the area of cloud computing, but that doesn’t seem to be the case as I recently discovered at a tech meetup at Ann Arbor’s DUO offices.

Jennifer Marsman, an evangelist for Microsoft, presented a great demo of using Azure’s Machine Learning tools to create a web service that helps predict whether a passenger of the Titanic will survive in a scenario based on Kaggle’s learning exercise. You can see the full presentation in the link below and if you’re playing with machine learning I highly recommend that you do, or at least check out the Azure site. That’s all for now. Cheers.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.