Skip to content

The Data Science Hierarchy of Needs

    The Data Science Hierarchy of Needs, developed after Maslow’s Hierarchy of Needs, is a model that shows the needs of data science practitioners and helps them understand and prioritise their projects. It outlines the different phases a data science project should pass through to benefit from AI implementation. It’s a model that more data scientists should be familiar with, as it gives them a better understanding of their projects and helps decide on the order of activities.
     
    Data collection
    Data collection is crucial and, therefore, at the base of the hierarchy. There will be no insightful analysis without having access to data in a suitable format.
    You will have to know how the data is collected, the data flow and the various data analysis done to derive valuable and profitable insight, and how to use this insight to influence your decisions on making profits.
    It is beneficial for data scientists to participate in the data collection process to understand its history and make the best decision regarding data format. Choosing a suitable data type (CSV or parquet) might improve the processing speed for large files.
    Activities at this level include:
    Recording transaction
    Logging errors
    Digitising analogue data
    Data Management Plan
    Data generation
    Data platform development and database management
    Data Acquisition
     
    Move, store and organise data.
    Once data scientists have collected their data, they have to move it somewhere safe and accessible. Organising data make it easy to find the information researchers want to see.
    One thing to note about data sets is that they’re often messy. The data scientist needs a way to ensure the data they’re collecting is appropriately structured and is ready to be analysed. They do this by coding their scripts or using the software.
    Activities at this level include:
    Data migration
    ETL / ELT

     
    Explore, transform and Analyse data
    Next, we need to explore the data, transform it into a suitable format, and analyse it. Data can help understand what’s happening in the organisation and why. It generally starts with essential data analysis tools, like reports, dashboards, and KPIs.
    As the company matures, more robust solutions like ETL pipeline, warehouse, or data lake will be in place.
    Activities at this level include:
    Building ETL pipeline
    Data cleaning
    Descriptive analytics
    Reporting/dashboards
    Exploratory data analysis
     
    Generate insights from data – predictive, prescriptive, diagnostic, descriptive analytics
    Once data are collected, stored, transformed, and analysed, we need to use them to generate insights that drive business decisions. Four types of data analytics can help create insights; descriptive, diagnostic, predictive and prescriptive.
    The organisation may incorporate predictive analytics, prescriptive analytics, and machine learning in their data-science pipeline.
    Activities at this level include:
    Statistical analysis
    Descriptive analytics – Reporting and Dashboards
    Diagnostic analytics – anomaly detection
    Predictive analytics – Supervised and unsupervised ML
    Prescriptive analytics – AI & Machine learning
     
    Automate the process
    Automation is a critical aspect of data science since data scientists should be spending most of their time-solving business problems. We need to automate our data-science processes.
    When appropriately applied, data-driven AI can minimise our costs and maximise our revenue. This type of AI sets the industry leaders apart from everyone else.
    Activities at this level include:
    ML Pipeline development
    Auto ML
    A/B testing and experimentation

     
    Conclusion
    The Data Science Hierarchy of Needs model shows the needs of data science practitioners and helps them understand and prioritise their projects. It outlines the different phases a data science project goes through, with a focus on the needs of the team. It’s a model that more data scientists should be familiar with, as it gives them a better understanding of their projects and helps decide the order of activities.
     Stroll down and click on the like button if you enjoy this blog.
    Follow me on Medium.
    Click here to Subscribe to my weekly newsletter for more blog posts.
    See you next week. Thank you!