“Data analyst” is a term being discussed more frequently by employers, colleges, and universities to encompass a lot of things – but the concept of a data analyst is also the source of a lot of confusion. In short, everyone is talking about data analysts, yet there’s generally little clarity about what they do.
When starting their journey into the complex world of data, many organizations think the first step is to hire a data analyst to help. That’s not always the case. To help you make an informed decision, we want to take a moment to give you some additional background.
We’ll explain what data analysts are, what do they do, and when you need one. We’ll also explain how data analysts differ from data engineers and data scientists. By the end of this article, you’ll understand how the puzzle pieces fit together.
Let’s start with some context. The role of a data analyst often overlaps other data-related roles in the overall data and analytics lifecycle. Typically, data projects are driven by a particular set of use cases or user stories.
Use cases and user stories are strongly tied to decisions that need to be made, assumptions that require validation, or simply questions that need answers backed by data.
Understanding how those information needs map to actual sources is one of the initial points in a data project where a data analyst would get involved. Get involved to do what specifically? Glad you asked.
The data analyst will – get ready for it – analyze the user story or use case to understand the logical data points that are needed.
Working with subject matter experts, they will examine the possible sources of data to determine the best source (there may be more than one) for the data that’s needed.
They then will profile the data that is available to understand its overall quality, completeness, integrity, and overall sufficiency to answer the question that the user story or use case is asking. This can be done using a variety of tools and techniques, but the bottom line is the data analyst will determine if the data is suitable as is, requires some cleansing, or is not sufficient to meet the use case.
If the data quality isn’t good enough and remediation is required, the data analyst may suggest updates that need to be made and may even write the code to make those updates.
There are a variety of tools available for a data analyst to carry out their primary role of data profiling. Depending on the source of the data, the tools and techniques will vary.
For data sitting in tables in a database, options to help profile the data include writing custom SQL or using separate third-party tools such as Alteryx that have built-in profiling capabilities. These tools can help you understand the distribution of values, cardinality, and even show correlations between fields to assist with more advanced use cases that are focused more on feature selection for machine learning.
For data residing in spreadsheets in Excel, native capabilities such as pivot tables or pivot charts can help profile data.
Finally, languages more familiar to data scientists such as R and Python can be extremely helpful in data profiling and overlap with the subsequent data engineering tasks that typically follow. Speaking of data engineering . . .
While there may be some overlap on a project between doing the work of a data analyst and a data engineer, a data engineer is a separate role. A data engineer focuses on the movement of data, extracting, transforming when necessary, and loading into a target source.
There is no shortage of ETL (extract, transform, and load) and ELT (extract, load, and transform) tools to choose from, in addition to data-centric programming languages to accomplish this task. Data engineering sits between the work of a data analyst and that of a data scientist or business intelligence (BI) developer.
There are situations where data engineering work is done by a data scientist or a BI developer, and times when a data analyst may also be the individual doing the actual data engineering work as well. But data engineering is a separate role, regardless of who is wearing the data engineer hat.
While answering this question in a few sentences won’t do it justice, the net of being a data scientist is this:
A data scientist is the rare individual who has the business domain expertise, programming skills, knowledge of statistics, and understanding of working with data to balance the art and science of developing meaningful models that help us predict, classify, segment, or recommend outcomes based on the data.
A big part of this job is actually doing data analyst work to understand the sufficiency of data, as well as data engineering. Not just doing ETL-type work, but rather engineering with more of a focus on understanding the relationships in the data to better select or engineer the right “features” to inform a model.
Note that “feature engineering” in this sense is a particular shade of data engineering that is a bit different than what a typical data engineer might do, as the end in mind is slightly different. In a sense, all data scientists are data analysts and data engineers, but not all data analysts and data engineers are data scientists.
One source of confusion around the distinction between data analysts and data scientists is that many companies will list an opening for a data analyst with a job description that reads like a data scientist. This isn’t by accident.
By labeling a data scientist role as a data analyst, the salary expectations and what they feel compelled to pay are much different. This doesn’t have to be a bad thing – it presents a way for aspiring data scientists to get some experience. They may make less money, but they’ll be able to build up the real-world data science skills they need to get to the next step and higher-paying data scientist role.
If you’re still wondering when you need a data analyst, ask yourself where your company is on the data management maturity spectrum.
Most of the time, data analyst functions can be and are performed by you existing resources doing data-related work or just out of necessity to answer questions being asked or reports that need development.
That said, there is a point where your culture does shift to really focus on establishing a data-driven culture and data-driven decision making. When that happens, having resources dedicated to data management in general becomes a necessity. Early in that transition is when you should consider hiring a data analyst.
The biggest factors preventing you from getting the most from your data are data quality and sufficiency for analytics. The data analyst can help you understand where potential data land mines lay and get ahead of data remediation work. This can pave the way and tip the scales in your favor in undertaking your analytics journey.
Also, if your organization is looking to modernize their analytics and data platform in general, a data analyst should be a part of that larger team.
So let’s go back to answering whether you need to hire a data analyst. Reflect on where you are as an organization, understand what needs you have, and look inside the organization for those skills and capabilities. If there is a clear gap, the time is right to add a data analyst to your team.
Have more questions about data analysts? Contact us