The Science of Safety | Blog

Becoming a Practical Data Scientist | Blog | EHS Analytics

Written by EHS Analytics | Oct 29, 2020 6:00:00 AM


When I told my colleagues at the university that I was leaving to start a new career in the commercial sector, their first response was mostly shock and disbelief. In their defense, an academic leaving after five years of successful research and teaching is not something you see every day. It may not seem like a safe or even a rational decision, but I guess I’ve never had a problem with making this kind of decision if I truly believe in it.

Why I left university and joined EHS Analytics

The short answer is that I believed that I could put my academic expertise to practical use. That, and the fact that both my bosses wore graphic tee-shirts at the interview!

During my time at the university, I was involved with both theoretical and practical research. While I value the theoretical research, I’ve always leaned toward research ideas with immediate impact on industry. It is probably because of my engineer-self that I find practical research and solving industry challenges using my scientific skills more satisfying. So, the longer version of the answer is that I joined EHS Analytics to be a practical data scientist in a one-of-a-kind company, whose sole purpose is to use data to solve one of the biggest challenges of industry – workplace injuries and fatalities.

Being a practical data scientist is when I feel like a data science superhero!

Only very recently data science became an academic major, and there are few universities that offer it as a stand-alone program. Almost none of the nerds that call themselves a data scientist today graduated university as one. Data scientists come from different backgrounds, yet I believe there are some characteristics that are common in all of them: they are challenge-seeking people with good analytical abilities. They are good in math, enjoy coding and spending hours at the computer. You will enjoy data science if you have these characteristics, but for me the real joy comes from being a practical data scientist.

Being a practical data scientist is when I feel like a data science superhero! After years of experience, I have come up with a list of characteristics of a good practical data scientist, or practical scientist in general. Here is my take on the most important characteristics that are essential to be a practical data scientist: Put your boots on the ground!

The first and most important skill of a practical data scientist is the ability to understand the field under study. We, as data scientists, are prone to become snobbish and think that we can ascertain all the answers. If you don’t have enough experience, leave your computer, get your hands dirty and explore…

It is true that there is a large demand for us – the modern world is a world of data. The valuable product we offer is implementing the data to make novel insights and solutions. You can drown in this abundant data if you don’t know where to look or how to use it. Never undervalue expertise in the field. 

If you don’t have enough experience, leave your computer, get your hands dirty and explore; don’t stop unless you understand the challenges and questions completely. One of the characteristics that make our company unique is we have boots on the ground. We have practical experience that gives us the ability to ask the right questions of the data and understand challenges that our clients face every day.

Get clean, ready and understandable data in your sleep!

If you are thinking of becoming a data scientist, you probably took one of those fancy courses where they taught you how to train deep neural networks using big-data. One of the things that they probably didn’t teach you in those machine learning courses is how to acquire the data itself.

You don’t need to be this guy to be a data scientist!

The first step of becoming practical with data is to know how to get the data efficiently. Your data most likely comes from a relational (SQL) and sometimes a non-relational (No-SQL) database and usually has a nasty, unusable format. If you are lucky, you get a table from a SQL database with a messed-up data format, if you are not you may get nested JSONs from an API where the initial format does not make any sense whatsoever. Of course, I’m exaggerating and there is no nasty data that a practical data scientist can’t clean with a bit of smart coding.

Python is very productive in working with the data and that is why it is the most popular programming language among our tribe. A practical data scientist codes in Python standing on their head! Knowing how to get the data from the database without crashing it is also a must. Almost all the relationships between a data scientist and the data starts with a “SELECT” statement. Those fancy tensor-flow codes may end up useless if you can’t write the proper “SELECT” statement.

One of the strongest advantages of our company is our data hub, which brings different safety data such as incident forms, WCB claims, and training/audit records from diverse sources into one place.

Our data science team has built AI solutions that read and understand data coming from different sources and convert them into a standard format.

LRs before DNNs!

Deep-learning and deep neural networks are very trendy terms but how often will you implement them in practice? Probably not much unless you work with big-data. Deep-learning may have little use for a practical data scientist meanwhile good understanding of the simplest models, like Linear Regression, can go a long way. Don’t get me wrong, I have nothing against neural nets, in fact I am working with them on my current project. What I’m trying to say is don’t jump to DNNs because of their trendiness.

Getting proficient in regression will help you to solve many problems and, at the same time, it will help you to understand the theory behind neural networks better. Imagine you watch the last fight scene in Avengers: Endgame movie without watching any other Avengers movie. Wouldn’t it be confusing?! Learning ins and outs of regression will equip you to understand more advanced techniques. Aren’t neural nets just fancy regressions anyway?

Statistics is the practical math!

Statistics are a big part of the scientific method. That is why they are an essential characteristic of a good scientist, especially one working with data!

Engineering students are generally good at math and usually find statistics courses, especially the undergrad ones, boring in school. I skipped my statistics courses when I was an undergrad, so trust me I’m speaking from experience! Later, when I got into serious academic research, my lack of statistics proficiency showed itself. I had to revisit statistics and this time I found it exciting. If it was not for my academic experience, I would not have enough knowledge in statistics to be a good practical data scientist. As a practical data scientist, half of your job is to analyze and extract insights from data. Skills in statistics will make you stand out among your peers.

Here at EHS Analytics, we love to do research on the data we bring into our data hub. The outcome of our work is often delivered to our customers in the form of research summaries and white papers. It is a wonderful feeling when you see the outcome of your team’s research is helping clients and enabling them to make data-driven decisions.

At EHS Analytics we have one and only one goal and that is using data to eliminate serious workplace injuries and fatalities. Like other departments in our company, our data science team is working hard towards achieving this. Our goal may seem out of reach, but this won’t stop us. We are all superheroes here and that’s what superheroes do: We make the impossible possible…