This is a story of a software engineer’s head-first dive into the “deep” end of machine learning. CTO of Amplifr shares notes taken on his still ongoing journey from Ruby developer to deep learning enthusiast and provides tips on how to start from scratch and make the most out of a life-changing experience.
Let’s start with an imaginary portrait and see if you recognize yourself or someone you know.
You are a software engineer who works with code every day, building complex things, turning business requirements into application logic and shipping mostly on time. You have tried your hand at different programming languages and have chosen your primary weapon. You are confident enough in what you do and you are ready to learn something entirely new, something to give you new powers and to keep you relevant in the profession.
You keep hearing about the “new electricity” that is artificial intelligence. Whole industries are being transformed by advances in machine learning while the most optimistic researchers compare the rise of AI to the industrial revolution.
That revolution started in your lifetime, and you want to take part. You feel the itch you don’t know how to scratch. You want to learn more about AI/ML/DL and experiment with it. You have no idea where to start.
Hi, my name is Alexey, and two years ago I was that person. Today, I oversee machine learning at Amplifr, a social media management startup I lead as a CTO. I am still very much involved with “traditional” code: we are a Ruby on Rails application, after all. While not working on our main codebase, I participate in machine learning contests (and recently took a prize in a major NLP competition), attend AI-related conferences, read scientific papers, experiment daily and explore how I can apply machine learning to make Amplifr stand out from competition (more on that—in our future posts).
First, let’s set the record straight. The “proper” use of terms Artificial Intelligence, Machine Learning, and Deep Learning is a subject of a never-ending debate that quickly gets rather high-brow and confusing for beginners. To keep things simple, let’s regard ML as a set of tools that come from AI—and DL as a particular subset of them. I will use all three terms somewhat interchangeably, but mostly in the context of deep learning.
As I still remember very well how it feels starting out fresh, I want to offer some tips to those who only begin to explore the new field.
1. Don’t fret
Let’s assume that you are not that much into math. Personally, I graduated from a technical university eight years ago and hadn’t read a math textbook since (at least before I started with DL). You know how it goes: you read your language/framework documentation more often than anything else.
After some initial googling and talking to more math-inclined people around you, here is an impression of the amount of knowledge you need to amass before attempting to solve real-world problems with neural nets. At least, that was an impression I got two years ago:
- Get a good grasp of linear algebra. Textbooks. Matrix operations. Pen and paper, rows of numbers in tall square brackets.
- Cozy up with probability theory. More textbooks. The Reverend Thomas Bayes is your friend again.
P(B), and such.
- Study all classic ML concepts, starting with the linear regression.
- Learn how to implement all those algorithms in Python, C, C++ or Java.
- Learn how to cook datasets, extract features, fine-tune parameters and develop an intuition on which particular algorithm suits the task at hand.
- Get familiar with DL frameworks/libraries (in my time, it was Theano and Torch, now it’s probably PyTorch, TensorFlow, and Keras).
And only after mastering all that, according to some experts, you’d be fit enough to solve some practical problem like telling cats from dogs.
If you are somewhat like me, the list above is enough to make your ego shrink, and that leads to nothing but sweet, sweet procrastination.
Don’t fret though! Though technically everything in the list holds true, those are not entry level requirements. If you know how to program, you already know how to train a model.
2. Remember it’s still code
Take a look at this code:
# a bunch of Python imports like: from fastai.imports import * data = ImageClassifierData.from_paths("./dogscats", tfms=tfms_from_model(resnet34, 224)) learn = ConvLearner.pretrained(resnet34, data, precompute=True) learn.fit(0.01, 3) # Wait 17 seconds or so...
What can we tell from it?
- It’s Python.
- It uses the
fastaideep learning library.
- It’s three lines long (not counting imports).
resnet34seems to be important here. Quick googling explains why.
The code above tweaks a pre-trained image classification model (trained on ImageNet, a dataset of roughly 15 million images) so that it can solve the already mentioned Dogs vs. Cats task. It reaches a 98% accuracy in just three epochs (passes over data). Training takes 17 seconds on a computer equipped with a GPU. Those results blow early attempts to solve the same problem out of the water.
Sure, behind those three lines of code lie years of research, dozens of academic papers and thousands of man-hours of trial and error. But these are the lines you can use now. And once you get the gist of it—classifying images for your own use case (and using it in production) is not that different from separating cats and dogs.
3. Find a partner in crime
In my off-work time, I played double bass and guitar, got into oil painting, took up surfing.
Procrastination, though, is not a stranger to me either: in my less productive periods I binge-watch TV series and waste hours on MMORPGs and fantasy books. I’m a
nerdhuman, after all.
When I was ready to start with machine learning, and deep learning in particular, my friend and I were on the Heroes of the Storm binge.
To trick myself into making a first step on a long road to new knowledge I had to make a pact with my friend, who also dreamed of AI. We decided to stop procrastinating together, enroll in the same course and oversee each other’s achievements. Now we regularly participate in contests together.
4. Avoid cognitive overload
It is common knowledge that learning something that is too hard is a frustrating experience. As humans, we are wired to avoid frustration. At the same time, learning something that is too easy is not satisfying either: you quickly lose any motivation. The trick is to bite precisely as much as you can chew.
The first online course that I took was Udacity’s Deep Learning Nanodegree: a pricey ($999 now, about $400 at the time I took it) program that promises a four-month-long primer on the theory plus the practice required to apply what you learned in the real world. As a bonus, completing the course unlocks a discounted enrollment into Self-Driving Cars Nanodegree.
My mistake was that I went too deep, without covering my bases first. When feeling a bit behind on some concept introduced in the course, I panicked and started reading everything I could find online: articles, books, other courses.
As a result, I could not stay focused on a material that was supposed to give me a foundation to build upon. In hindsight, I’d highly recommend sticking to one course and not trying to learn things in parallel. People, after all, are notoriously bad at multi-tasking.
If I were starting now, I would first look at Jeremy Howard’s fast.ai, that I already mentioned, and the Andrew Ng’s latest Coursera offering (there is a certificate cost attached to it, but you can take it for free). It consists of five courses: from the introduction to neural networks and deep learning, through discovering the convolutional neural networks (that are essential to working with image data), to sequence models (speech recognition, music synthesis, any time series data).
The second course is more focused on theory, while the first one makes an emphasis on “quick and dirty” implementations, which, I believe, is the best way to get started. Just remember to pace yourself, avoid multi-tasking and take small steps.
5. Set your sights
Instead of trying to learn it all at once, try to pick areas where using deep learning techniques provides results that are more satisfying to you, personally. Working with something you can relate to (as opposed to dealing with random abstract data points) will keep you motivated. You need a feedback loop, a way to get tangible results from your experiments.
Here are some ideas for starter projects:
- If you are passionate about visual arts (cinema, photography, video, fine arts), dive into Computer Vision. Neural networks are used to classify objects on images, highlight areas of interest (anomalies on MRI scans or pedestrians on the road), detect emotions or age on portraits, transfer artistic styles, even generate original artworks.
- If you are more into sound, you can compose music with neural networks, classify genres and recommend new tracks as Spotify does. Or, you can explore voice style transfer and pretend to speak in another person’s voice.
- If you are into video games, you definitely should take a look at Reinforcement learning. You can train game AI to be better than yourself in your favorite game. Also, you get to play, and no one gets to blame you for that, because, you know, research.
- If you are passionate about user experience and customer support—look into Natural Language Processing and сhatbots, this way you can automate (to a certain extent) interacting with your customers: deduce intent from a message, categorize support tickets, offer immediate answers to most common questions.
After trying our hand at computer vision, my friend and I shifted our attention to Automatic Speech Recognition (ASR) and natural language processing. Computer vision, with industry giants (Google, Apple) backing self-driving car projects, is now probably the most funded area of research and also the one where deep learning techniques had cemented their positions the most: in image classification, accuracy of neural network predictions grew from below 75% in 2010 to 98% and up in 2018.
The main reason for this late bloom were hardware limitations: machine translation tasks require enormous amounts of memory and processing power to train large neural nets.
Language-related challenges (especially the ones that have to do with written speech), on the other hand, have only recently started to benefit from neural networks. The hottest field right now is Machine Translation (MT). Everyone has probably noticed that the quality of Google Translate had improved drastically in the past few years. Deep learning plays a major role in that since 2015.
To get an idea just how fast DL can transform an area of research that hasn’t seen significant advances in decades, here’s the fun fact:
Neural networks had made their first appearance in machine translation competition only three years ago, in 2015. In 2016, 90% of contenders in such competitions were neural network-based.
There is a tremendous amount of knowledge to be extracted from academic papers on the subject and applied to real-world tasks, especially if your startup has anything to do with text (and Amplifr does).
If this convinces you to try your hand at applying deep learning to NLP, take a look at Stanford’s CS224n course: “Natural Language Processing with Deep Learning”. You don’t have to be a student of Stanford to follow the course: all lecture videos are available on YouTube. If you progress best in the group setting, there is the whole subreddit dedicated to the course where you can find online study partners.
6. Be competitive
The field of machine learning is inherently competitive. In 2010, Kaggle, now the world’s largest community of data scientists and machine learners, took the spirit of a hackathon to what had been mostly an academic field. Since then, the competitive way to solve ML tasks had become standard practice. Companies and institutions, from Microsoft to CERN, offer prizes for solving challenges in exchange for a royalty-free license to use the technique behind the winning entry.
An ML competition is the best way to assess your skill, get the feeling of the “baseline” in a certain field, draw inspiration from more advanced competitors, find colleagues to collaborate with and just get your name out there.
Consider participating in a contest as a rite of passage for an amateur machine learner. For my friend and me, this passage happened in 2017, a year into our self-study. We chose Understanding the Amazon from Space сompetition at Kaggle, as this was our chance to play with multi-class image classification (and we also care about the environment). For over two months we have spent every weekend solving the task: detect deforestation from satellite imagery and differentiate its causes.
Another sign of rapid progress in everything DL-related: a year ago we had spent a lot of time and effort on coming up with the way to trick Google Cloud Platform into running our experiments for less money. Today, Google offers a free GPU-backed Jupyter notebook environment, and there is a plethora of services that will be happy to train your models for you.
We did not take the prize, got into the top 15% on the leaderboard (which is nothing to brag about), made every beginner’s mistake in the book, but still the experience proved invaluable: we gained the confidence to continue our efforts and chose the next competition, this time in the field of NLP.
Russian is a morphologically rich language which is considered under-resourced in terms of NLP research, some details in this paper.
The challenge to develop a question answering system was hosted by a major Russian bank, and the contenders got to work with a unique dataset created in the spirit of a renowned SQUAD (Stanford’s reading comprehension dataset consisting of 150 000 questions created by volunteers based on the set of Wikipedia articles), but for the Russian language.
The task was to train a system to answer questions based on a piece of text. The model, accepted for submission in a Docker container (RAM is limited to 8GB), was supposed to be able, given a question in a natural language, to highlight the relevant part from a paragraph of text. As it often happens with the most challenging competitions, we had to submit not an already trained model, but a solution that had to fully train and give answers to test questions (and the test dataset was only partially public, to ensure fair play) in under 2 hours of machine time.
Our solution took a second place at the public leaderboard, but we were so focused on solving a task that forgot to read the rules properly: they stated that teaming up was forbidden and only individual entries were accepted. We had to come clean and were offered a “consolation” bronze prize (sort of a Cannes festival thing, when a movie is shown “out of competition”).
We were lucky to avoid disqualification, but we learned our lesson, and now I urge everyone to fight the first impulse to tackle the problem head-on and to take some time to read the rules carefully.
As my interest in deep learning is primarily production-oriented (coming up with solutions that can be applied to real-world needs of my startup), I also noticed that looking at the leaderboard gives you a nice baseline on how “production close” are you. Top solutions are usually academic and are not yet ready to be deployed commercially, while silver, bronze, and a bit below are often the ones that are the most promising regarding an application.
7. Stay in the loop
The blessing and the curse of deep learning is the pace at which the field is evolving. Even this article (as introductory, personal and non-academic as it is) was probably outdated in some regard before being published.
The best way to stay up to date is to become a part of a large online forum filled with ML enthusiasts. If you are lucky to understand Russian, definitely join the Open Data Science community: a public Slack server with over 12 000 users and more than 140 public channels. There is always a way to find smaller and more local groups either through Reddit’s r/MachineLearning or Meetup.com. If you happen to know any international Slack group that matches the scale of ODS (not to be confused with ODCS, which is also an AI resource), please make sure to let us know!
I used to think that summer camps in Spain were for surfing until I visited the International Summer School on Deep Learning in Bilbao. It was a rash decision, but I don’t regret it: it was a perfect fit for my level (after a year in the field). In the absence of practicals, the school was more of a conference, though a really intense one: nine to six, five days in a row. The whole program was split into sections that ran in parallel with each speaker presenting a set of three an hour and a half lectures.
Once you feel more confident, try to get into one of the major conferences on AI, ML, and DL: this year, I was lucky to visit ICLR. Other notable international conferences are CVPR (specifically on Computer Vision) and NIPS. Yes, in the field of AI your life almost entirely consists of acronyms.
8. Use your programming chops
Let’s admit the obvious: Python completely won over the AI and data science community. There is probably no reason today to start with a different language unless you are really good at it or plan to be dealing with some really low-level optimizations.
For me, as a Ruby developer, switching to Python was an easy and overall pleasant experience. It takes just a couple of weeks of practice (and learning array indexing tricks and comprehensions) to feel somewhat comfortable. However, I took some time to complete a free mid-level Python programming course and it definitely did not hurt.
For a practicing software engineer, a language “barrier” is not a problem. For enthusiasts coming from non-programming backgrounds getting into DL is harder. So you already got a head start.
However, don’t expect some stellar OOP code and intuitive APIs. The majority of public code examples would not pass a serious code review at my team either. This is not about software engineering, after all, it’s about math: matrix multiplication needs to multiply matrices first, clean DSL (and Ruby makes you used to good DSLs) is always an afterthought.
The same functionality can have different APIs even inside the same library. It might seem confusing that making an array of ones is done with
np.ones((2,3)) (takes a tuple) while creating an array of random numbers of the same shape is done with two separate integer arguments:
Also, don’t get your hopes up about documentation or style. Once you run into some non-trivial details when translating some academic paper into code, you will have to read libraries’ source code, and it will not be easy. Test coverage is also often lacking.
However! This is a chance to put your best programming practices to good use: feel free to make good reusable libraries out of publicly available Jupyter notebooks.
9. Finally, brush up your math
Of course, I left the best for last. Eventually, you will have to close any mathematical gaps you have. Especially if, after covering your bases, you are willing to stay on the cutting edge and follow academic publications.
Luckily, machine learning has its own “bible” in the form of an 800-page-long ultra-dense textbook “Deep Learning (Adaptive Computation and Machine Learning)” by Ian Goodfellow, Yoshua Bengio and Aaron Courville, known as just the Deep Learning Book. Also luckily, it is available online, for free and in full.
Part I (Linear Algebra, Probability and Information Theory, Numerical Computation, Machine Learning Basics) is the bare introductory minimum that is surprisingly enough to feel much less intimidated when following up on the current research. Yes, that’s 130 pages of a not so leisurely read, but you will not regret reading it.
Thank you for reading!
I hope this article was able to convey my passion for deep learning and made the field seem approachable for people like me, who come from applied programming. I genuinely believe that with recent advances in AI and deep learning the world is approaching yet another light bulb moment, and yes, I mean Edison’s light bulb.
A “curious software developer” like you and me will be the main driving force behind the revolution that has already started. Perhaps, not exactly on the cutting edge of science (otherwise you will probably not be reading this article), but with a capacity to implement best ideas from academia, one application at a time, is how we change the world.
So go ahead, browse some resources from the text above and the list below, build up your confidence and get started!
- Deep Learning Book–everything you need to get up to speed with some formal math. Be warned though, the text is pretty fast-paced, but you can easily find more down-to-earth explanations of described concepts online.
- deeplearning.ai—a Coursera offering from Andrew Ng. Can be taken for free. The only prerequisite, in my opinion, is knowing how to program.
- Practical Deep Learning For Coders by Jeremy Howard. Completely free. A seven-week course for programmers who want to try their hand at deep learning but don’t know where to start.
- Must-read blogs for AI and Deep Learning enthusiasts—start building your reading list!
- Some email subscriptions to stay informed.
- Distill.pub—a platform that presents machine learning research in the best possible, interactive way. Perfect for “visual” learners.
- A great recap of matrix calculus for deep learning. Hosted at explained.ai—a notable (but still rather small) collection of clear explanations on ML-related topics.