Published Sep 3, 2019

SE-Radio Episode 315: Jeroen Janssens on Tools for Data Science

Dive into the dynamic world of data science with Jeroen Janssens as he and Felienne Hermans discuss the transformative power of open source, the nuances of educational pathways, the pros and cons of Python and R, and the importance of machine learning in data-driven insights.
Episode Highlights
Software Engineering Radio - the podcast for professional software developers logo

Popular Clips

Episode Highlights

  • Statistical Methods

    emphasizes that statistics is a crucial component of data science, yet it's not the entirety of it. He explains that data science encompasses a broader range of activities, including obtaining, cleaning, and visualizing data, which are essential before applying statistical methods 1. Jeroen refers to the "awesome model" by Hilary Mason and Chris Wiggins, which outlines five steps: obtaining, scrubbing, exploring, modeling, and interpreting data 1.

    Data science basically comes down to obtaining data without any data. There's little data science that you can do.

    ---

    These steps highlight the multifaceted nature of data science, where statistics plays a role but is complemented by other processes.

       

    Testing Methods

    Testing in data science involves various approaches, including dividing data into training and test sets to evaluate model performance. explains that while this type of testing is common, unit testing is less frequently practiced among data scientists 2. He notes that unit testing, akin to documentation, is often overlooked despite its importance in ensuring algorithm accuracy 2.

    It's one of those things that data scientists are not really fond of. It's kind of like a documentation thing. It should be done more often.

    ---

    This highlights a gap in data science practices, suggesting a need for more rigorous testing methodologies.

       

    Machine Learning

    Machine learning is a significant aspect of data science, providing tools for pattern recognition and prediction. describes data science as extracting value from data, which can manifest as analysis or integration into products like recommender systems 3. He underscores the versatility of machine learning in enhancing data-driven decision-making processes 4.

    If you ask a dozen people what data science means, you probably get back 13 different answers.

    ---

    This reflects the diverse applications and interpretations of data science, with machine learning being a pivotal component.

Related Episodes