Published May 2, 2023

675: Pandas for Data Analysis and Visualization — with Stefanie Molin

Data scientist Stefanie Molin delves into sophisticated data wrangling techniques using Pandas, revealing the power of chaining operations and the efficiency of the assign method, while also offering insightful guidance on leveraging Pandas, Matplotlib, and Seaborn for effective data visualizations.
Episode Highlights
Super Data Science: ML & AI Podcast with Jon Krohn logo

Popular Clips

Episode Highlights

  • Chaining Ops

    Chaining operations in Pandas can significantly streamline your data analysis workflow. emphasizes that chaining operations together not only saves effort but also results in cleaner, more readable code. This method avoids the confusion of managing multiple intermediate data frames and makes it easier to follow the analytical pipeline 1. agrees, noting that chaining is akin to using a pipe operator in bash, making the code more intuitive and easier to review 1.

    The more you use pandas, the more you will see that just chaining the operations together is going to save you a lot of effort and just make a lot cleaner code.

    ---

    This approach is particularly useful for creating plots quickly with Pandas, while libraries like Matplotlib and Seaborn offer more flexibility and aesthetic options 2.

       

    Assign Method

    The assign method in Pandas plays a crucial role in chaining operations, allowing for the efficient creation of new variables and columns. explains that this method enables users to create or overwrite multiple columns in a single call, streamlining the data manipulation process 1. This technique is particularly useful when preparing data for visualization, as it allows for the seamless integration of new variables into the workflow.

    When you're chaining, let's say you want to create a new variable that's based on some other columns in there, and you want to create a new column that's just part of the flow.

    ---

    highlights that this method is not only efficient but also makes the code more readable and easier to maintain 1.

       

    Data Morph

    introduces her Data Morph library, which allows for the transformation of 2D scatter plots into various shapes without altering key statistical properties. This innovative tool is particularly useful for creating engaging visualizations in presentations 2. She shares her journey of developing the library, including the challenges of refactoring code and ensuring it works for any dataset 3.

    I spent, like, the last three months or so building this, but it's been a really insightful experience for me, also on the software engineering side.

    ---

    This project has enriched her understanding of software engineering practices and has had a positive impact on her professional work at Bloomberg 3.

Related Episodes