Most Viewed


  • No categories

When should linear regression be called “machine learning”?

There’s no law that says that a cabinet maker can’t use a barrel maker’s saw.

Machine learning and statistics are vague labels, but if well-defined there is a lot of overlap between statistics and machine learning. And this goes for methods of these two areas as well as (and separately) for people who label themselves with these two areas. But as far as math goes, machine learning is entirely within the field of statistics.

Linear regression is a very well defined mathematical procedure. I tend to associate it with the area of statistics and people who call themselves ‘statisticians’ and those who come out of academic programs with labels like ‘statistics’. SVM (Support Vector Machines) is likewise a very well defined mathematical procedure that has some every similar inputs and outputs and solves similar problems. But I tend to associate it however with the area of machine learning and people who call themselves computer scientists or people who work in artificial intelligence or machine learning which tend to be considered part of computer science as a discipline.

But some statisticians might use SVM and some AI people use logistic regression. Just to be clear, it is more likely that a statistician or AI researcher would develop a method than actually put it to practical use.

I put all the methods of machine learning squarely inside the domain of statistics. Even such recent things like Deep Learning, RNNs, CNNs, LSTMs, CRFs. An applied statistician (biostatistician, agronomist) may well not be familiar with them. Those are all predictive modeling methods usually labeled with ‘machine learning’, and rarely associated with statistics. But they are predictive models, with the allowance that they can be judged using statistical methods.

In the end, logistic regression must be considered part of machine learning.

But, yes, I see and often share your distaste for the misapplication of these words. Linear regression is such a fundamental part of things called statistics that it feels very strange and misleading to call its use ‘machine learning’.

To illustrate, Logistic regression is identical mathematically to a Deep Learning network with no hidden nodes and the logistic function as the activation function for the single output node. I wouldn’t call logistic regression a machine learning method, but it is certainly used in machine learning contexts.

It’s mostly an issue of expectation.

A:”I used machine learning to predict readmission to a hospital after heart surgery.”

B:”Oh yeah? Deep Learning? Random Forests?!!?”

A:”Oh, no, nothing as fancy as that, just Logistic Regression.”

B: extremely disappointed look .

It’s like saying, when washing a window with water that you’re using quantum chemistry. Well yeah sure that’s not technically wrong but you’re implying a lot more than what’s needed.

But really, that is exactly a culture difference vs. a substance difference. The connotations of a word and associations with groups of people (LR is totally not ML!) vs the math and applications (LR is totally ML!).


Forgot Password