Key Takeaways from the Microsoft Hackathon
Is machine learning a buzzword? Will it change the way we work? With these questions in mind we attended Microsoft’s Machine Learning Open Hack in Sydney from 4th September to 6th September. This event aimed to sharpen machine learning skills through a series of structured challenges to solve real-world developer problems in the space. Throughout this event, we had the opportunity to leverage industry standard machine learning tools and platforms to solve a series of challenges which a wide range of topics such as data wrangling, cognitive services, Azure machine learning, as well as neural networks on industry standard frameworks such as TensorFlow.
Computer vision is a field of machine learning that aims at giving computers a visual understanding of the world and is one of the main components of machine learning. The goal of computer vision is to emulate human vision using digital images through image acquisition, image processing and image analysis and understanding.
There are two common problems in computer vision people try to solve: Image classification and object detection. Image classification is the task of assigning an input image one label from a fixed set of categories. It is one of the core problems in computer vision that has a large variety of practical applications. Object detection is the task of identifying what objects are present in the image and where they are located.
The core concepts of computer vision are already being integrated into products that we use. So what can we do with computer vision? One of the applications that uses computer vision is a self-driving car, which uses AI to drive a vehicle with minimum assistance from a human. Cameras are used to perform tasks such as lane finding, obstacle detection and traffic sign detection. Those tasks are essentially object detection and classification.
In health care, computer vision is used for the detection of Alzheimer’s disease by analysing the hippocampus region from an MRI scan. In retail, computer vision technologies enable shoppers to purchase items without the need for a checkout. A person can enter the store, shop for products, collect them and walk out, all while the system automatically detects which and when products are taken and keeps track of them in a virtual cart.
In fact, technical giants such as Amazon, Google, Microsoft and IBM are all interested in Machine Learning and AI. According to Forbes market report :
- The spending on AI and machine learning will grow from $12B in 2017 to $57.6B by 2021
- The global machine learning market is expected to grow from $1.41B in 2017 to $8.81B by 2022
- Worldwide revenues from cognitive and artificial intelligence system will increase from $12.5B in 2017 to more than $46B in 2020
There is huge demand in the market for machine learning.
A neural network is a computational model that works in a similar way to the neurons in the human brain. Each neuron takes an input, performs some operations then passes the output to the following neuron. Convolution neural networks are a category of Neural Networks that have proven to be very effective in areas such as image recognition and classification. They normally consist of ReLU, Rectified Linear Unit, which is an easy function to introduce non-linearity into the feature. All negative values are simply changed to zero, removing all black from the images.
During this three day event, we completed six challenges. Each challenge was designed to tackle a particular area of the machine learning process.
The first challenge we had was to create a classification model to predict whether an image is a hard-shell jacket or insulated jacket using a portion of the jacket data within the gear catalogue images dataset. We used binary classification to tackle this problem. The tool we used for building this custom image classifier was the Microsoft Custom Vision Service (Ref). It takes advantage of transfer learning which is when an existing machine learning model can be utilised to predict similar classes (Ref). This requires less data than training from scratch.
When the model was trained, we used the prediction endpoint from the custom vision service, using Python and a Jupyter Notebook, to predict the class of an image not used in training. With less than 10 minutes of training, our model was able to identify any hard-shell jacket image with 100% accuracy. This surprised us because if we were asked to write a program to do the same task, it would take very long time.
The second challenge was about data manipulation. There’s a saying that “your model is only as good as your data”. We had to transform all the classes of the gear images into a particular format. We then applied various image process techniques such as feature scaling to “normalise” the images, handle outliers and missing values. The transformed images were saved to a disk for further processing.
In this challenge, we started our journey into custom machine learning. The first task was to pick an algorithm from one of the most popular and well-established Python ML packages, Scikit-Learn. After discussing with the team, we decided to use a technique that combines the predictions from multiple machine learning algorithms together to make more accurate predictions than any individual model. It is also one of the most popular algorithms, because its simplicity and the fact that it can be used for both classification and regression tasks. After a couple of minutes of training, our model achieved an accuracy over 85%.
We started exploring deep learning with a convolutional neural network (CNN) in this challenge. What differentiates deep learning from the more general neural networks is the hidden layers in its architecture which help to better “learn” features in complex data. Deep learning solutions require less pre-processing and feature engineering. Our CNN consists of the following types of layers:
- Input Layer
- Convolutional (2D)
- Max Pooling
- Convolutional (2D)
- Max Pooling
- Fully Connected (Output layer)
We managed to achieve a model with accuracy over 92% at the end. The details are beyond the scope of this article, check the References for more information.
Building a world-class machine learning model is not enough, you also need to know how to expose it so that others can consume it. So in this challenge, we learnt how to deploy our model as a REST API in the cloud. We first saved the model as a file (Ref), then built our Docker container and deployed it to Azure Cloud. For a detailed deployment walk through, you can check the Reference.
In the previous challenges, we’d gone through the full process of machine learning from pre-processing data, creating a model, training a model, testing the model through to final deployment. The last challenge was about a new problem in Computer Vision – object detection. This was much harder because we had to identify various objects within the image. To tackle this problem, we utilised the Visual Object Tagging Tool (Ref) and Tensorflow Object Detection API (Ref). The model is able to detect and create a bounding box around each entity present in an image.
The following are the primary tools and libraries we used during our challenges:
- Python – An interpreted high-level programming language for general-purpose programming.
- Jupyter Notebook – An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.
- Azure ML – Microsoft’s cloud-based predictive analytics service.
- Microsoft Custom Vision Service – an Azure Cognitive Service that lets you build custom image classifiers.
- Matplotlib – A Python 2D plotting library which produces publication quality figures in a variety of hard copy formats and interactive environments across platforms.
- Numpy – The fundamental package for scientific computing with Python.
- Scikit-Learn – A free software machine learning library for the Python programming language
- Tensorflow – An open source software library for high performance numerical computation.
There really is no way to learn faster than by attending an open hack event. Through-out the three days, we further developed our skills in computer vision and machine learning with lots of hands-on experience. We also have a deeper understanding of the pros and cons of various tools and libraries. It’s a great experience and well worth it. Python surprised us! It’s a very powerful programming language and a great programming tool for minimal coding. The CNN algorithm for object detection was a bit complicated and it was challenging to get it working. As machine learning continues to transform the way we interact with the world, we will see more and more adoption of AI technology in the future.