A Physicist’s Guide to a Career in Machine Learning

Part one of our "Physicists Guide to a Career in Machine Learning' by our CTO Giacomo Piccinini ...

A Physicist’s Guide to a Career in Machine Learning

Embarking on a PHD in Physics and wondering about potential career opportunities? Or perhaps you’ve just finished your studies in Theoretical Physics and are trying to figure out your next move. One of the most interesting career paths outside of academia is the world of Machine Learning and Artificial Intelligence. The Machine Learning industry is rapidly growing, and the journey to a career in AI can take many different forms. Drawing from the insights of Gemmo’s own Machine Learning Engineers, we have created a 2-part ‘ Physicist’s Guide to a Career in Machine Learning’. Our goal is to answer the most common questions asked on how to develop a foundation in Theoretical Physics, into your dream role in AI!

Meet Gemmo’s CTO: Giacomo Piccinini 

Hey there, I’m Giacomo!


Two years ago, I rerouted my career from a Ph.D. in Theoretical Physics to a role in AI and Machine Learning.  Although there are others who follow this same path, I feel more often than not this is a leap in the dark. When I changed career path, there was no outline on how to move from a career in physics into machine learning. Even now, I am often asked questions such as, ‘what is Machine Learning? What does a Machine Learning engineer do? How different is a Machine Learning Engineer from a Data Engineer?’ 

Some others develop this further, with more specific questions like, ‘what hard skills do I need to pursue this career? Do I need certifications to prove my competence? Which technology shall I start learning right away?’

Below, I will share the insights I’ve learned from my own transition. I hope to shed some light on the nitty-gritty of this process, to provide a physicist’s guide to a career in machine learning. I’ll also be reflecting on the experience I’ve gained in the last two years at Gemmo AI, the Dublin-based start-up I work at. 

Understanding the Role: What does a Machine Learning Engineer Do?

Let’s start from the basics, and try to give an answer to the question, ‘What is Machine Learning and why should I care?’. To summarise the basics, Machine Learning involves presenting to a computer a sufficient amount of data, in order for it to become able to make its own predictions on new, unseen data. Here, ‘data’ refers to whatever media you can think of: text, tables, images, videos, songs, you name it. For example, the by-now famous ChatGPT takes your prompt (that is, a written text) and predicts what answer you are looking for. 

The Machine Learning Engineer plays the role of a personal tutor for the computer, deciding how much data it should ingest, how to organise the “study”, how many hours to spend in the learning phase, and so on. Even though admittedly biased by my background, I reckon the parallelism of ‘computer : student’ equals ‘ML Engineer : maths teacher’ holds very much true. Notice I am being very specific here: not English, not geography, an actual maths teacher. 

My Process

My experience with learning maths and physics was always the same: I try to solve an exercise; I fail; I look up the solution; I try again. If I solve it, I move on to the next one, and repeat. The saying with maths is that you should “understand it” and not “learn it by heart”. That is, if I simply memorise the solution to the exercises, I will most likely fail during the test. A crucial part of ML engineering is precisely to prevent this memorisation from happening. And it’s damn hard!

So, why would you care? Well, as physicists we are expected to frame a phenomenon in a theory, and a theory means equations. For theoretical physics, we are used to a top-down approach: fix the equations, and then see if the world fits in. However, finding the right set of equations is an excruciating pain and, if the phenomenon we are trying to model is not simple or constrained enough, it is very very close to impossible. 

In many cases, it’s pretty evident that this inherent difficulty does not play well with the real world, where a zillion of variables interact with one another to animate a very complicated landscape. Finding or solving the right equations is simply not doable, or perhaps simply not worth the effort. In the last decades, we often resorted to coding to partially solve this, such as Monte Carlo simulations, or other methods of sorts. Once again, this is harder than it superficially looks, because a good understanding of the theoretical foundations is still needed. 

Breaking it Down

Let’s use a simple example. If I gave you a picture and asked you to indicate the faces of people appearing in it, it would be a no-brainer. We (humans) just “know” what a face is. Suppose you do the same with a computer now; at the core they work with a sequence of ‘if-else’ statements. We could try and cook something up like: “if the region is oval-shaped and if it contains two eyes, one nose and one mouth, then it’s a face”. Cool, except a computer does not have the minimum understanding of what an eye or mouth is. Therefore, with “classical” methods, this seemingly simple task would require a lot of effort, and the results would not even be that satisfactory (people have done it!). 

Long story short, we need a paradigm shift. That is, we should simply stop pretending to hard-code rules and let the computer figure them out by itself. Ambitious, isn’t it? Well, this is the core of Machine Learning. How is this done in practice, you might ask? 

Well, the idea is very simple: every object (datum) a computer ingests, is somehow turned into an array of numbers. Think of these numbers as coordinates in a given system. If the problem is hard to solve, take the General Relativity approach: change coordinates! In a way, Machine Learning consists of applying transformations (a.k.a. diffeomorphisms) via matrix multiplication until the problem becomes easy.

Partners in Crime: How About the Other Roles?

If you are looking for a job in Machine Learning, you will often encounter other professional figures whose role might look somewhat mysterious at first. Ideally, these roles are very different from one another, but in practice they get messed up if you are reading job descriptions. Let’s try and bring some clarity to these roles, simplified for a physicist’s guide to a career in machine learning.

  • Software Engineer

Very broad term that generally refers to people focusing on designing and building entire software systems. They have a vast knowledge of programming languages and software development. In principle it has nothing to do with ML. 

  • Frontend Developer

Frontend developers work on the visual and interactive aspects of a website or web application. They implement the design and user interface using languages like HTML, CSS, and JavaScript. In principle, it has nothing to do with ML.

  • Backend Developer

Backend developers concentrate on what is called the server-side of applications. In simpler terms, they are not responsible for the visual and interactive part of an application. Instead, they focus on handling database interactions, user authentication and all the core machinery happening under the hood. In principle, it has nothing to do with ML.

  • Full-stack Developer

Full-stack developers combine the roles of frontend and backend development, and are capable of building both client and server software. Thus, they handle the full spectrum of a software application. In principle, it has nothing to do with ML.

  • Data Analyst

Data analysts are the first figures that actually use data to study and/or predict something. Crucially, they very much partner with the business side, so in some respect it is a less tech-oriented job.

  • Data Engineer

You might have heard that ML is ‘data-hungry’. This simply means that the more data you feed in, the more trustworthy and reliable your ML model becomes. Depending on the context, you might have a ton of information to handle (think of banks processing thousands of transactions every minute), and you certainly can’t deal with it manually. To use this data effectively in training a ML model, pre-processing, cleaning and preparation phases are needed. This is where the Data Engineer comes in, focusing on the architecture, design, and management of the data infrastructure to enable data storage, processing, and flow for analytical or operational uses.

  • Data Scientist

Data scientists are weird figures, because their role varies a lot depending on the company they work for. I suggest you pay attention when you see this name appearing, and try to understand if it actually falls into one of the other categories. Remember that this name is sometimes deceptively used, just because it might look cooler on paper. If properly used, the term refers to people using advanced analytics techniques, such as machine learning and predictive modelling. Other techniques can include deriving insights and predictions from both structured and unstructured data. 

  • DevOps

DevOps are the answer to a question you might not have considered just yet. Suppose I have a cool application, with a nice frontend and a robust backend. How do I make it available for everyone? How do I check that it is properly working? DevOps orchestrate all of this, managing all the cloud services, infrastructure and workflows. In principle, it has nothing to do with ML.

Frequently Asked Questions

  • What does a Machine Learning Engineer do?
     Machine Learning involves presenting to a computer a sufficient amount of data, in order for it to become able to make its own predictions on new, unseen data
  • What other roles are there (e.g. data engineer) and in which way they are different?
    1. Software Engineer: Focuses on designing and building entire software systems, with extensive knowledge of programming languages, unrelated to machine learning in principle.
    2. Frontend Developer: Specializes in the visual and interactive elements of websites or web apps using HTML, CSS, and JavaScript, generally unrelated to machine learning.
    3. Backend Developer: Concentrates on server-side application aspects, handling database interactions and core functionalities, typically unrelated to machine learning.
    4. Full-stack Developer: Merges frontend and backend development roles, capable of building both client and server software, usually unrelated to machine learning.
    5. Data Analyst: Partners with the business side, using data for studies or predictions, less tech-oriented and more related to machine learning.
    6. Data Engineer: Manages data infrastructure for effective machine learning, focusing on data storage, processing, and preparation.
    7. Data Scientist: Role varies by company, involves advanced analytics and machine learning for insights and predictions, sometimes misused as a job title.
    8. DevOps: Manages cloud services, infrastructure, and workflows for application deployment and maintenance, generally unrelated to machine learning.
  • Why are some tips and tricks to succeed in Machine Learning as a physicist?
    1. Do not waste too much time trying to obtain certifications. You might end up spending quite some cash for no real added value (at least interview-wise)
    2. A ML engineer’s job is useful if and only if other people can interact with it. Always think of what comes before and after your model is good enough.
    3. Python is your best friend, but not your only friend: other languages and frameworks might serve better purposes (e.g. create interface with HTML/CSS/JavaScript, containerize your app with Docker, run fast code with C or Rust, etc.)
    4. Get your hands very dirty: it’s the only way to learn ML
    5. Pick a problem you like, “solve it” and document everything on github. It looks much better than a certificate
    6. Even if working alone, treat your project as a team effort: document everything, create branches when modifying your code, ensure reproducibility. Remember that to excel at ML you also need to be a good software engineer. Keep this in mind and You will be better than 99% of candidates applying for roles.
Gemmo's noise classification case study with Sonitus