An insight into the field of data science
This text reflects experiences lived and shared with data professionals in various projects I have participated in, and I believe that those reading it may identify with some experiences in their own careers.
My original article in portuguese here.
The Role and Resilience of the Data Scientist
At times, we wonder what it truly takes for the role of the data scientist to be fully operational in institutions, but most often we stick to the idea of building models and analyzing data, and indeed, it is that and much more.
A definition that underlies the ideas in this text is a passage from an article discussing the methodologies employed in data science projects:
“Data science is a multidisciplinary field that sits between computer science, mathematics, and statistics, and involves the use of scientific methods and techniques to extract knowledge and value from large amounts of structured and/or unstructured data.”
In my view, this passage closely relates to what many data scientists do: understand and extract knowledge from data. Understanding and extracting do not necessarily involve finding a ready-made structured base or a file sent by the business area (the latter scenario being more common). But, everyday life is not a Kaggle competition. It often involves evaluating the quality of the data to be used, understanding how they are generated, analyzing how they will be made available, and ensuring they are “ready for consumption.”
The path to well-defined pipelines, whether in terms of work methodology or how models will be deployed, is built on these understandings and interactions with various data and technology teams. This will lead to a dissemination of your role as a data scientist and provide greater sensitivity in constructing analyses and machine learning models.
Positioning and Understanding of Scenarios
The idea worked on until this point is to emphasize that the data scientist has the role of building machine learning models and bringing forth this information. However, I understand that they can contribute to data availability, not necessarily getting hands-on, but working with the responsible teams, as this data will be their work. Hence, having this knowledge is crucial.
I’m not here to say that this is the ideal scenario, but that this is the common scenario, still experienced in many institutions for various reasons, from the lack of data structure to the nascent way data is collected. Additionally, I believe that expanding your vision and scope contributes greatly to your career development and enhances interactions with colleagues.
Yes, the ideal scenario is to have data ready for at least minimal analysis to propose or not a project, formulate hypotheses, build and deploy in production environments, but it is not always the case.
Also, understand that this is not a justification for your ideas and attributes as a data scientist not to be prioritized. What I present here is that the role of the data scientist is broad, and there are many opportunities for development and contribution to generating results in your field.
Methodologies Applied in Projects
One challenge I always see for data scientists in projects is the way activities are conducted and developed. As it is considered a new field, some traditional software engineering methodologies should not be strictly applied, which can lead to misunderstandings within the squad and with the business area. Some traditional models like CRISP-DM in the data mining and delivery area are interesting, but new methodologies are being tested and applied to deliver these activities more efficiently and align with product managers and other team members.
I emphasize this point because this misunderstanding can lead to frustrations from all parties. Data scientists may feel frustrated for not performing their role, and clients may be disappointed for not receiving an expected product, leading to turnovers and greater resistance to presented ideas. Overcoming these challenges is not simple and not always possible, as the scientist may have to propose a methodology that suits their team’s dynamics, which can be a significant challenge regardless of seniority.
In this context, it becomes an important skill for the data scientist to negotiate these activities to stay true to their role and preserve their functions. As I mentioned, the field requires a broad perspective, and it’s not just about building models; I believe this phrase applies to other contexts in different areas.
P.S.: Just for reference, some of the methodologies applied in data science are presented in the first chapters of this book.
Final Reflection
Perhaps it’s just a vent, but the path of the data scientist involves having greater resilience to the embedded contexts and understanding that the structuring of institutions in the data field and their modifications will be a long and challenging journey. Never forget that your role is broad, and your scope is likely to expand over time, whether as the title of a data scientist or a new title yet to be defined.