675: Pandas for Data Analysis and Visualization — with Stefanie Molin

Topics covered
Popular Clips
Episode Highlights
Chaining Ops
Chaining operations in Pandas can significantly streamline your data analysis workflow. emphasizes that chaining operations together not only saves effort but also results in cleaner, more readable code. This method avoids the confusion of managing multiple intermediate data frames and makes it easier to follow the analytical pipeline 1. agrees, noting that chaining is akin to using a pipe operator in bash, making the code more intuitive and easier to review 1.
The more you use pandas, the more you will see that just chaining the operations together is going to save you a lot of effort and just make a lot cleaner code.
---
This approach is particularly useful for creating plots quickly with Pandas, while libraries like Matplotlib and Seaborn offer more flexibility and aesthetic options 2.
Assign Method
The assign method in Pandas plays a crucial role in chaining operations, allowing for the efficient creation of new variables and columns. explains that this method enables users to create or overwrite multiple columns in a single call, streamlining the data manipulation process 1. This technique is particularly useful when preparing data for visualization, as it allows for the seamless integration of new variables into the workflow.
When you're chaining, let's say you want to create a new variable that's based on some other columns in there, and you want to create a new column that's just part of the flow.
---
highlights that this method is not only efficient but also makes the code more readable and easier to maintain 1.
Data Morph
introduces her Data Morph library, which allows for the transformation of 2D scatter plots into various shapes without altering key statistical properties. This innovative tool is particularly useful for creating engaging visualizations in presentations 2. She shares her journey of developing the library, including the challenges of refactoring code and ensuring it works for any dataset 3.
I spent, like, the last three months or so building this, but it's been a really insightful experience for me, also on the software engineering side.
---
This project has enriched her understanding of software engineering practices and has had a positive impact on her professional work at Bloomberg 3.
Related Episodes


SDS 523: Open-Source Analytical Computing (pandas, Apache Arrow) — with Wes McKinney
Answers 383 questions

SDS 557: Effective Pandas — with Matt Harrison
Answers 383 questions

629: Software for Efficient Data Science — with Jodie Burchell
Answers 383 questions

SDS 563: How to Rock at Data Science — with @TinaHuang1
Answers 383 questions

815: DataFrame Operations 100x Faster than Pandas — with Marco Gorelli
Answers 383 questions

SDS 433: Data Science Trends for 2021 — with Ben Taylor
Answers 383 questions

765: NumPy, SciPy and the Economics of Open-Source — with Dr. Travis Oliphant
Answers 383 questions

SDS 467: High-Impact Data Science Made Easy — with Noah Gift
Answers 383 questions

SDS 489: Monetizing Machine Learning — with Vin Vashishta
Answers 383 questions

SDS 599: MLOps: Machine Learning Operations — with @Miki_ML
Answers 383 questions

SDS 493: Bringing Data to the People — with Anjali Shrivastava
Answers 383 questions

732: Data Science for Astronomy — with Dr. Daniela Huppenkothen
Answers 383 questions

SDS 587: Data Engineering for Data Scientists — with Mark Freeman
Answers 383 questions













