Lets take a close look at three related terms (Deep Learning vs Machine Learning vs Pattern Recognition), and see how they relate to some of the hottest tech-themes in 2015 (namely Robotics and Artificial Intelligence). In our short journey through jargon, you should acquire a better understanding of how computer vision fits in, as well as gain an intuitive feel for how the machine learning zeitgeist has slowly evolved over time.
Fig 1. Putting a human inside a computer is not Artificial Intelligence
If you look around, you’ll see no shortage of jobs at high-tech startups looking for machine learning experts. While only a fraction of them are looking for Deep Learning experts, I bet most of these startups can benefit from even the most elementary kind of data scientist. So how do you spot a future data-scientist? You learn how they think.
The three highly-related “learning” buzz words
“Pattern recognition,” “machine learning,” and “deep learning” represent three different schools of thought. Pattern recognition is the oldest (and as a term is quite outdated). Machine Learning is the most fundamental (one of the hottest areas for startups and research labs as of today, early 2015). And Deep Learning is the new, the big, the bleeding-edge — we’re not even close to thinking about the post-deep-learning era. Just take a look at the following Google Trends graph. You’ll see that a) Machine Learning is rising like a true champion, b) Pattern Recognition started as synonymous with Machine Learning, c) Pattern Recognition is dying, and d) Deep Learning is new and rising fast.
1. Pattern Recognition: The birth of smart programs
Pattern recognition was a term popular in the 70s and 80s. The emphasis was on getting a computer program to do something “smart” like recognize the character “3”. And it really took a lot of cleverness and intuition to build such a program. Just think of “3” vs “B” and “3” vs “8”. Back in the day, it didn’t really matter how you did it as long as there was no human-in-a-box pretending to be a machine. (See Figure 1) So if your algorithm would apply some filters to an image, localize some edges, and apply morphological operators, it was definitely of interest to the pattern recognition community. Optical Character Recognition grew out of this community and it is fair to call “Pattern Recognition” as the “Smart” Signal Processing of the 70s, 80s, and early 90s. Decision trees, heuristics, quadratic discriminant analysis, etc all came out of this era. Pattern Recognition become something CS folks did, and not EE folks. One of the most popular books from that time period is the infamous invaluable Duda & Hart “Pattern Classification” book and is still a great starting point for young researchers. But don’t get too caught up in the vocabulary, it’s a bit dated.
The character “3” partitioned into 16 sub-matrices. Custom rules, custom decisions, and custom “smart” programs used to be all the rage.
Quiz: The most popular Computer Vision conference is called CVPR and the PR stands for Pattern Recognition. Can you guess the year of the first CVPR conference?
2. Machine Learning: Smart programs can learn from examples
Sometime in the early 90s people started realizing that a more powerful way to build pattern recognition algorithms is to replace an expert (who probably knows way too much about pixels) with data (which can be mined from cheap laborers). So you collect a bunch of face images and non-face images, choose an algorithm, and wait for the computations to finish. This is the spirit of machine learning. “Machine Learning” emphasizes that the computer program (or machine) must do some work after it is given data. The Learning step is made explicit. And believe me, waiting 1 day for your computations to finish scales better than inviting your academic colleagues to your home institution to design some classification rules by hand.
“What is Machine Learning” from
. The most important part of this diagram are the “Gears” which suggests that crunching/working/computing is an important step in the ML pipeline.
As Machine Learning grew into a major research topic in the mid 2000s, computer scientists began applying these ideas to a wide array of problems. No longer was it only character recognition, cat vs. dog recognition, and other “recognize a pattern inside an array of pixels” problems. Researchers started applying Machine Learning to Robotics (reinforcement learning, manipulation, motion planning, grasping), to genome data, as well as to predict financial markets. Machine Learning was married with Graph Theory under the brand “Graphical Models,” every robotics expert had no choice but to become a Machine Learning Expert, and Machine Learning quickly became one of the most desired and versatile computing skills. However “Machine Learning” says nothing about the underlying algorithm. We’ve seen convex optimization, Kernel-based methods, Support Vector Machines, as well as Boosting have their winning days. Together with some custom manually engineered features, we had lots of recipes, lots of different schools of thought, and it wasn’t entirely clear how a newcomer should select features and algorithms. But that was all about to change…
Further reading: To learn more about the kinds of features that were used in Computer Vision research see my blog post: From feature descriptors to deep learning: 20 years of computer vision.
3. Deep Learning: one architecture to rule them all
Fast forward to today and what we’re seeing is a large interest in something called Deep Learning. The most popular kinds of Deep Learning models, as they are using in large scale image recognition tasks, are known as Convolutional Neural Nets, or simply ConvNets.
Deep Learning emphasizes the kind of model you might want to use (e.g., a deep convolutional multi-layer neural network) and that you can use data fill in the missing parameters. But with deep-learning comes great responsibility. Because you are starting with a model of the world which has a high dimensionality, you really need a lot of data (big data) and a lot of crunching power (GPUs). Convolutions are used extensively in deep learning (especially computer vision applications), and the architectures are far from shallow.
If you’re starting out with Deep Learning, simply brush up on some elementary Linear Algebra and start coding. I highly recommend Andrej Karpathy’s Hacker’s guide to Neural Networks. Implementing your own CPU-based backpropagation algorithm on a non-convolution based problem is a good place to start.
There are still lots of unknowns. The theory of why deep learning works is incomplete, and no single guide or book is better than true machine learning experience. There are lots of reasons why Deep Learning is gaining popularity, but Deep Learning is not going to take over the world. As long as you continue brushing up on your machine learning skills, your job is safe. But don’t be afraid to chop these networks in half, slice ‘n dice at will, and build software architectures that work in tandem with your learning algorithm. The Linux Kernel of tomorrow might run on
(one of the most popular deep learning frameworks), but great products will always need great vision, domain expertise, market development, and most importantly: human creativity.
Other related buzz-words
Big-data is the philosophy of measuring all sorts of things, saving that data, and looking through it for information. For business, this big-data approach can give you actionable insights. In the context of learning algorithms, we’ve only started seeing the marriage of big-data and machine learning within the past few years. Cloud-computing, GPUs, DevOps, and PaaS providers have made large scale computing within reach of the researcher and ambitious “everyday” developer.
Artificial Intelligence is perhaps the oldest term, the most vague, and the one that was gone through the most ups and downs in the past 50 years. When somebody says they work on Artificial Intelligence, you are either going to want to laugh at them or take out a piece of paper and write down everything they say.
Further reading: My 2011 Blog post Computer Vision is Artificial Intelligence.
Machine Learning is here to stay. Don’t think about it as Pattern Recognition vs Machine Learning vs Deep Learning, just realize that each term emphasizes something a little bit different. But the search continues. Go ahead and explore. Break something. We will continue building smarter software and our algorithms will continue to learn, but we’ve only begun to explore the kinds of architectures that can truly rule-them-all.
If you’re interested in real-time vision applications of deep learning, namely those suitable for robotic and home automation applications, then you should check out what we’ve been building at
. Hopefully in a few days, I’ll be able to say a little bit more. 🙂
Until next time.