data

Notes on machine learning, part 1: What is it

This is the first part of a series (that’s the intention at least šŸ¤£) trying to give an overview, from a broad lens, of what I do in my academic field. It’s machine learning - ML1 - and in this I’ll focus on what it is, while I’ll talk about its applications, research and development in later parts. Keep in mind also that this is the first time I write extensively about what I do in general terms and in a popular format! I’ll explain mainly through examples to try and avoid as possible any formal definitions. I’ll try my best šŸ™ƒ and please send feedback, it’ll help!!

ML can help in many fields but the problem to solve often boils down to a prediction. Some pretty famous examples of usage are:

  • how will the rates of infection from Covid develop over specific locations in the following weeks and months?
  • which advertisements to put in front of a specific user’s eyes to maximize the chance that they will stop scrolling, click and buy?
  • how does my iPhone correctly identifies its owner’s face and unlock?
  • can we estimate future rates of reoffending for people convicted of crimes?
  • how to evaluate workers performance in any given field?2

There are also less popular and yet not less interesting applications, such as:

  • how will a community’s population evolve in the coming years, in regards to phenomena like gentrification and segregation?3
  • how to help doctors in making more informed diagnostic and treatment decisions?
  • how to ensure a prediction is made for ethical reasons?
  • how do we isolate, recognize and measure unfairness?

Classical predictive models

There are several ways to try to predict the future. A few common ways are:

  1. based on human-set rules,
  2. based on models of reality,
  3. based on statistical knowledge of historical behavior.

Let’s look at each of them because ML draws concepts from each.

Human-set rules

This is what I refer to as the “classical” way of controlling and predicting behavior in artificial systems. An example would be: if the industrial machine for baking cookies reaches 90 CĀ° on the outside surface, shut it off automatically before it burns. This rule (90 CĀ° external sensor ā†’ shut down) is set in this example by a human expert. They know that:

  1. 90 CĀ° is above the normal temperatures during usage, and
  2. it’s far enough above the normal to be dangerous to the materials used or to the factory in which the machine is installed or to the humans operating it.

There are fields in which this approach is the gold standard, and probably should be so for a long time (e.g. nuclear power plants control systems). Still, this approach represents some of the roots of ML.

Models of reality

Many engineering design works are first represented in a computer model through programming and design languages. For instance, a highway bridge will be represented in a computer before being built. The same will happen during the design of a space rocket4. The softwares employed allow for artificial perturbation of conditions, like introducing strong winds or earthquakes to check and see what would happen to the bridge as designed.

Statistics and historical behavior

There are several assumptions that we make when trying to infer conclusions from historical behavior using statistics. I won’t talk about all of them here, just one: the concept of a “data generating machine”. The idea is that phenomena are directed by invisible, highly complex mathematical functions that for a set of values of variables (the input of the function) give an outcome to the phenomena (the output of the function).

A (made-up) example

Variables
number of trains passing on the same tracks today
number of passengers for each of those trains
detailed weather characteristics
experience of train staff including conductors
Generates
number of minutes a train will be late at each station

Once again this example is 100% made up. All I’m trying to picture is the hypothesis, made in statistics, of the existence of a specific relationship between variables (the characteristics of the causes of the phenomenon) and the outcome of the phenomenon itself (in this case, how many minutes the train will be late). The idea of a data generating machine is that there exists a maths function that formalizes this relationship and assigns unique values of the outcome to each set of inputs.

Much of the predictive statistical modeling work is to try and figure out this function5 as accurately as possible. There are many possible techniques. We’re getting closer to machine learning.

Inference of an approximate function

A regression is one of these statistical predictive techniques and is used to extract this function from a set of data. The data consists of sets of variable values and outcomes. We call the variables “features” and the outcome “class” or “dependent/target variable”6.

Another example

Let’s suppose that the phenomenon in question is the causal connection between the number of years of formal education that a person had and their current monthly revenue. We’re hypothesizing that the first determines the second. These might be the data we have (made up, but realistic7):

The process of applying regression might extract this function, in pink:

This function does not provide, for each value of the feature, values of the phenomenon that exactly mirror the input data. Instead, for each value of X it provides the value lying on the segment drawn. It’s just the best function that can be inferred from the data given the algorithm chosen by the operator; in this case, the algorithm is a “linear regression with 1 regressor and the OLS optimization function”.

A deep dive on linear regression is here in a 30 minute really well-made video by a fantastic YouTube channel, StatQuest.

Machine learning predictive models

All of these three concepts play a role in ML. Applying ML to a predictive problem means all of the following:

  1. Modeling reality, that is creating a model of reality - although the model is automatically inferred instead of being intelligently designed by the operator;
  2. A model that is (often) based on rules - although:
    • the rules might be so complex they don’t make sense to the human, or
    • the rules might look like different types of rules than what you would expect8, or, finally,
    • the rules might not look like rules at all.
  3. A model whose behavior and/or rules are (often) inferred from statistical, historical data9.

The purpose of creating this model is to try and predict the behavior of our phenomenon of interest based on the data (circumstances and outcomes, a.k.a. features and target variables) that same phenomenon generated in the past.

Next articlesā€¦

I’m thinking some interesting points to touch in the next parts are:

  • what is the role of a human operator in ML?
  • What are objective functions?
  • What are some of the most crucial issues with this?
  • What are interesting research directions right now?
  • How does ML influence society?

Do let me know some points you’d like to read about, as well as any questions!! I’m particularly interested on how accessible this text was for non-experts. Looking forward for your feedback!

Further reading

  • An MIT article, a bit more in depth on specific ML categories and on the relationship between private business and ML.

  1. Part of the field that is sometimes called artificial intelligence, but I don’t like that denomination↩︎

  2. In trying to paint a complete picture, I’ll mention all applications, not only the ethical ones. These last two examples, in particular, have come to the attention of the American press for their callousness. ↩︎

  3. For those interested, an article announcing the publication of an interesting scientific paper regarding this very issue. ↩︎

  4. In fact, there’s even a simulation game to build space vehicles. It’s Kerbal Space Program. I got it and tried it, but never got the hang of it, I’m really bad at building things. Same reason why I never got the hang of Minecraft. šŸ˜ ↩︎

  5. A function that, again, is hypothetical and might not exist. As an example, one might make the hypothesis that at the core of weather is a maths function. It’s a conjecture that we might never be able to verify, because there is no theoretical limit to the possible complexity of a function. We can exclude many functions from being candidates of generating a phenomenon, but it’s hard to prove that we can exclude all functions. ↩︎

  6. One possible form of this function might look like $y = \beta _0 + \beta _1 x_1 + \beta _2 x_2$. This is a linear regression with two features $x_1$ and $x_2$ and a residual $\beta _0$. In regression the features can also be called independent variables or regressors↩︎

  7. Source: OECD’s Education and earnings, Australia, 2020, 25-64 years of age. ↩︎

  8. Example, based on geometry, on unusual measures of dissimilarity, or on proxy representations of reality. ↩︎

  9. There are two ways to do statistical inference: one is inferring future behavior of the objects you are investigating, and is based on historical data of those same objects. The other is inferring behavior of different objects than those you have data on, but still belonging to the same population. An example is inferring the behavior of all voters when you have data (surveys) on just a smaller group of voters. In this article I spoke on the first of these two approaches, but much of the same concepts apply to the second approach. ↩︎

I see so much misinformation around AI on Linkedin. Blanket statements like “a human can solve a problem in 5 minutes; AI can solve 10 in one minute” are so disingenuous that I don’t know if it’s malice at play or just simplicity. šŸ¤·ā€ā™€ļø is the algorithm just messing with me?

Why I donā€™t call what I do ā€œartificial intelligenceā€

While I use machine learning, ML, or sometimes applied or computational statistics to describe1 what I do, the term artificial intelligence, AI, is now very widespread in society, industry, politics and marketing. In Italian too, where the shorthand is very similar: IA. However, I try to refrain from using it as much as possible. Why? For coherence, cultural and societal concerns.

AI is overused and already characterized in literature and cinema.

While we like to think that everyone now knows what we do, weā€™re nowhere near that place. Thereā€™s still millions of people who, when hearing artificial intelligence, think of machines in The Matrix. They think of consciousness, of AI-human wars, even of AI being alive. They know that AI is either malevolent and something to stop or a benevolent force that will consciously help humanity. It sounds silly? A Google engineer sometime ago thought that their chat model was alive, and recently a letter by a few AI practitioners foresaw a danger of a war against AI.

AI is already very characterized in pop culture and what weā€™re doing has nothing to do with that.

I donā€™t think what we create qualifies as intelligence.

The question of ā€œwhat is intelligenceā€ is philosophical in nature and possibly will never have one uniquely correct answer. For me, the main component of intelligence is creativity. When we see something thatā€™s scolding hot, and we want to move it, we might poke it with a stick, protect our hand with a thick glove, kick it quickly so that our skin is not burned, or more. We might even come up with something that nobody ever did, ever. We all find it strange to think that something so simple might have a historically unique answer, but in the end, everything that exists was first made by someone, or a team, for the first time ever.

A computer isnā€™t able to create. If nobody ever did something, a computer wonā€™t invent it.2

And Iā€™ll hazard a prediction: a computer will never be able to actually be creative. Of course, what it means to be creative is also a philosophical question. Painting something beautiful used to be used as an example of creativity, but a computer can emulate it by using known painting and image patterns, rearranging them randomly or according to a distribution, joining different known techniques and patterns in a new way, and it ends up that not every painting is actually an act of creativity. Who would have guessed?

Of course there are more components to intelligence. Memory, the ability to learn from knowledge and experience, the ability to make calculations. And while the computer obviously has memory and calculation power, the ability of a computer to learn I would reckon is nowhere near the way that a human learns. A computer learns, when weā€™re getting to the nitty gritty, to optimize with math and statistics a mathematical function. What is that? Who chooses the function? Who chooses the metrics? Who chooses the datasets and the algorithms? Humans do, because that requires real reasoning, real intelligence.

And while there are some parallelisms between machine learning and the sociology of human growth and human acquired behavior - where it is encouraged or discouraged by the social groups we find ourselves in - I still think thereā€™s way enough difference to see the two processes of learning as deeply different.

Much of the human learning process has to do with rewards, the human necessity of belonging, of social recognition, with the very human emotion of fear, of love, with the existence of death.

Can rewards and punishments be emulated well by mathematical functions? I donā€™t know. Maybe? Possibly not, possibly never? But not today, for sure.

It makes it sound like AI is not the work of humans, or that the results of AI are not the work of humans.

This is crucial. An AI denied your mortgage application? No, a human did that. Most likely a team. We as machine learning developers and data scientists need to own the results of our work. Especially its deficiencies, especially its biases, its idiosyncrasies, its reinforcements of historical unfairness. Just as well as its successes.

When we train our models on historical data without accounting that historical data paints a picture of an unfair world, then our ML models will replicate that unfairness. Experts know this all too well, in fact itā€™s taught in data science courses. Computers donā€™t have ethics, they donā€™t see the bias themselves. They donā€™t know what discrimination is, and even if we taught them that (again, with mathematical functions3), it is only humans that can tell a computer that discrimination is bad. Is adherence to the optimization of mathematical functions the same, or will it ever be the same than a fair mind, empathy, the experience of pain and the hope for a better future? Maybe. Maybe not, maybe never. But for sure it is only humans that can tell an algorithm what to optimize.

Who creates the content?

The only reason ChatGPT is able to write your college essay is that it has read billions of college essays. So if the only way that AI can produce results is based on humansā€™ work, is AI really anything at all without the human experience? I would argue, not.

In fact, it is the developers and financiers of ChatGPT that are writing your college essay. And all the millions of people that authored those billions of pieces of original work.

My conclusions

What I think should really be at the forefront of social discussion is the impact and consequences of AI. The European Union - following the GDPR work in privacy4 - is doing massive work in AI regulation which looks to be a good step forward, but this discussion cannot be left to experts only. We need to decide as a society how and in what direction to employ our collective efforts. And the place of experts is to educate, yes, but most importantly to own our work, its results and its impact.

We are data scientists developing machine learning algorithms. We are the artificial intelligence. And - we are not so artificial ourselves, and our computers are not so very intelligent, at all.


  1. Not much of a description, yes. As I am proof reading this, Iā€™ve realized my next post should probably be ā€œHow would I describe what I do?ā€↩︎

  2. An exception is, if we asked a computer to list 1 million things that could work for moving something scolding, and then looked at those million ideas, there may be something new, but not because of intelligence, but because there were 1 million minus one silly ideas. ↩︎

  3. Today, itā€™s not even clear how we would model discrimination and make it part of our loss functions. I read some really cool ideas though. ↩︎

  4. Work which, while good, is not perfect at all. Already it looks like the anti-unsolicited marketing communications is being hollowed out by a legitimate interest interpretation that almost completely empties all the GDPRā€™s protections against personal data usage for commercial reasons. ↩︎

Facebook’s latest (production!) version of their AI chatbot spews fake news about American politics, @gruber reports. Many Facebook users will think that FB’s chatbot might have particular authority as it belongs to the platform itself.

404: responsibility not found.

We all agree privacy is great. (And possibly, a fundamental human right.) But which specific topics are you only comfortable talking about on a privacy-respecting platform/medium?

I wanted to try something regarding data. Data science enthusiasts, data analysis professionals, evidence-informed activists and decision makers, machine learning and AI researchers, data visualization designers and developers, and moreā€¦ Are you here on Micro.blog? šŸ”¬