815: DataFrame Operations 100x Faster than Pandas — with Marco Gorelli

Topics covered
Popular Clips
Questions from this episode
- Asked by 1 person
Episode Highlights
Community Role
Community contributions play a crucial role in the development of open-source projects like Polars. explains that while people are often more motivated to work on their own projects, commercial support for open-source work can benefit both the community and companies like Quantite Labs 1. This hybrid model allows maintainers to balance time between community projects and consulting work, ensuring the sustainability of critical infrastructure 2.
People are often much more motivated to work on their own things than to review other people's. The other side, though, is that quantite labs is not a charity. They don't just, out of the goodness of their heart, give maintainers time to do things. It also helps the bottom line, because there are companies that then know quantite as experts in open source.
---
Personal Journey
Marco shares his personal journey in contributing to Polars, highlighting the challenges and rewards of open-source work. He notes that modern tools like GitHub facilitate coordination among hundreds of contributors, but people-related issues remain the hardest part 3. His initial contributions to Polars involved fixing time zone issues, which also helped him learn Rust 4.
When I started contributing to polas, I noticed that a lot of the time zone stuff just hadn't been done. The other maintainers just didn't find it very interesting or found it frustrating and all of that. And I was like, okay, well, I don't know, Rust, maybe this can be a bit of a win win situation. Like, I'll help you with your time zones, you help me with my rust.
---
Diversity Issues
Addressing the lack of diversity in open-source contributions, Marco discusses the low percentage of women in the field and the need for proactive steps to improve this 5. He emphasizes that while the pipeline problem is often cited, there are deeper societal issues at play. Mentorship and creating more paid roles can help bridge this gap 6.
There's a lot of projects where it does feel that a bit like an old boys club like in the way maybe that people use humour, the kinds of things that people might say or discuss. There's a lot of things that have historically been tolerated that probably shouldn't have.
---
Mentorship
Marco highlights the importance of mentorship in sustaining contributions from underrepresented groups. He shares his experience with initiatives like Pandas sprints and emphasizes the need for active efforts to mentor and support new contributors 7. Creating more paid roles in open-source projects can also alleviate some of the barriers faced by women and other underrepresented groups 8.
Unless you're actively going to set aside time and money towards mentoring people, it's very difficult. And this becomes doubly difficult in a project which has already been going on for like 15 years or something and which has historically been all male.
---
Related Episodes


827: Polars: Past, Present and Future — with Polars Creator Ritchie Vink
Answers 383 questions

675: Pandas for Data Analysis and Visualization — with Stefanie Molin
Answers 383 questions

826: In Case You Missed It in September 2024 — with Jon Krohn (@JonKrohnLearns)
Answers 383 questions

SDS 523: Open-Source Analytical Computing (pandas, Apache Arrow) — with Wes McKinney
Answers 383 questions

669: Streaming, reactive, real-time machine learning — with Adrian Kosowski
Answers 383 questions

SDS 557: Effective Pandas — with Matt Harrison
Answers 383 questions

629: Software for Efficient Data Science — with Jodie Burchell
Answers 383 questions

803: How to Thrive in Your (Data Science) Career — with Daliana Liu
Answers 383 questions

653: Efficiently Glean-ing Insights from Vast Data Warehouses — with Carlos Aguilar
Answers 383 questions
780: How to Become a Data Scientist — with Dr. Adam Ross Nelson
Answers 383 questions

817: The Positron IDE, Tidy NLP and MLOps — with Dr. Julia Silge
Answers 383 questions
SDS 429: 2020's Biggest Data Science Breakthroughs — with Jon Krohn
Answers 383 questions

SDS 467: High-Impact Data Science Made Easy — with Noah Gift
Answers 383 questions













