A little brother used machine learning to help him take off the order, and as a result. . .

Bai Li, a young Chinese elder brother from the university of waterloo, shared on Medium his divine operation of “how to use the logistic regression method in ML to help him find his own order”.

For such a practical technology, one must learn one.

The University of Waterloo is a well-known university in Canada and one of the best universities in Canada. In particular, the teaching level of engineering disciplines such as mathematics and computer science ranks among the top in the world, with computer science, the dominant specialty, ranking 18th in the 2017 usnews World University Rankings.

However, like all science and engineering schools, the University of Waterloo lacks social activities and it is difficult to find an object, except for the extremely incongruous ratio between men and women.

图片描述

Some people think that such things as love cannot be quantified. You just “be yourself and let nature take its course”.

However, as a data scientist at the University of Waterloo, my younger brother holds different opinions. He thinks that since he is engaged in computer, why not try to find a girlfriend with the help of machine learning?

Pick up hot chicks’s Methodology: Arm Yourself

It is better to move than to move, and immediately begin to study how to use machine learning technology to find women’s tickets.

The core question of this research is: What attributes can stand out among many boys and be favored by brother-in-law?

The younger brother tried to list the characteristic attributes of male students, trying to find out which hypotheses can be supported by data.

图片描述

In the above cases, I will assign a value of 1 or 0 according to whether the criteria are met. Therefore, we are measuring the relationship between the above attributes of people and the objects that can be found.

Some of the above attributes are very subjective, such as how to prove a person is very interesting? Therefore, if you want to see the ultra-hard core and strict statistical research, then the following content may not be your dish.

In order to collect data, I listed everyone I could think of in the table, and scored them with 0 or 1 in each attribute. Finally, the dataset has N=70 rows. If you went to the same school with me in the past two years and knew me, most of them have you on this form.

Carefully analyze the reasons for being single

First of all, we will use the exact probability method (Fisher’s Exact Test) to analyze the target appointment variables and all explanatory variables, and find that there are 3 variables that have the most significant impact:

  • Fitness: People who go to the gym or exercise regularly are more than twice as likely to have a girlfriend (P =0.02)
  • Glasses: The probability of having a girlfriend for people without glasses is 70% higher than that for people with glasses (P =0.08)
  • Self-confidence: People with self-confidence are more likely to have friends (P =0.09)

The younger brother was surprised by the impact of wearing glasses. He wondered if wearing glasses would generally give people the impression of “nerd”.

图片描述

Therefore, the younger brother checked some data and found that there was such a thing. A research paper mentioned that most people think that wearing glasses will reduce their attractiveness, whether male or female.

Some variables may be more predictive of successful dating, but it is difficult to determine because the sample size is small:

  • Foreign students have a higher dating success rate than Canadian students
  • Asians have fewer opportunities to date than other races.

Looking at other factors, although there are few female students, male students majoring in computer do not seem to be at a disadvantage. The remaining variables (height/career/fun/sociality/fashion/residence) have little to do with successful dating. After all, dating is only the first step to confirm the relationship. Few young people think too far and too complicated.

Complete results of this experiment:
图片描述

Then we examine the relationship between variables, which can help us identify incorrect model assumptions.

Red indicates positive correlation, blue indicates negative correlation, showing only correlation with statistical significance < 0.1, so the relationship between most variables is blank.

From the picture, it seems that {having a girlfriend, looking confident, going to the gym, not wearing glasses} are related to each other. The model trained with these data will also reflect these deviations. I will also expand the scope of investigation and collect more data in the future.

Looking for Women’s Tickets by Logistic Regression Forecast

Wouldn’t it be nice if there was an algorithm that could predict your chances of finding women’s tickets?

The younger brother trained a generalized linear model of logistic regression to predict whether there will be female tickets according to the explanatory variables listed above.

With the help of glmnet and caret packages in r language, I trained this generalized linear model with elastic network regularization. Then, the standard grid search method is used to optimize the super-parameters, and a cross-validation method is used in each iteration, and kappa coefficient is optimized.

Final conclusion

图片描述

The cross-validation ROC AUC score of the final model is 0.673, which means that the model is more reliable in predicting your chances of finding women’s tickets than guessing by feeling.

Of course, there will always be occasional uncertainties in life, and there will also be surprises in life. All right, let’s not talk about it. My little brother is going to the gym, and he has to try to take off his glasses!

图片描述
(present a close-up photo of Bai Li’s younger brother.)

Egg: How is the little brother now?

The original author Bai Li’s younger brother completed the research in April this year. He was also highly praised after he published the article in Medium. His younger brother’s project can be learned more through his GitHub.

图片描述

If you reply to “single dog” after paying attention to the public number, you can get the GitHub address of your younger brother.

It has been almost four months since the article was published today. How is my little brother? We also contacted my little brother Ben through a non-existent website, also known as Facebook, and realized it for ourselves: