Published Sep 3, 2024

815: DataFrame Operations 100x Faster than Pandas — with Marco Gorelli

Explore how Polars outperforms Pandas for lightning-fast DataFrame operations with insights from Marco Gorelli on open-source development, community contributions, and his experiences in data science and forecasting competitions.
Episode Highlights
Super Data Science: ML & AI Podcast with Jon Krohn logo

Popular Clips

Questions from this episode

Episode Highlights

  • Community Role

    Community contributions play a crucial role in the development of open-source projects like Polars. explains that while people are often more motivated to work on their own projects, commercial support for open-source work can benefit both the community and companies like Quantite Labs 1. This hybrid model allows maintainers to balance time between community projects and consulting work, ensuring the sustainability of critical infrastructure 2.

    People are often much more motivated to work on their own things than to review other people's. The other side, though, is that quantite labs is not a charity. They don't just, out of the goodness of their heart, give maintainers time to do things. It also helps the bottom line, because there are companies that then know quantite as experts in open source.

    ---

       

    Personal Journey

    Marco shares his personal journey in contributing to Polars, highlighting the challenges and rewards of open-source work. He notes that modern tools like GitHub facilitate coordination among hundreds of contributors, but people-related issues remain the hardest part 3. His initial contributions to Polars involved fixing time zone issues, which also helped him learn Rust 4.

    When I started contributing to polas, I noticed that a lot of the time zone stuff just hadn't been done. The other maintainers just didn't find it very interesting or found it frustrating and all of that. And I was like, okay, well, I don't know, Rust, maybe this can be a bit of a win win situation. Like, I'll help you with your time zones, you help me with my rust.

    ---

       

    Diversity Issues

    Addressing the lack of diversity in open-source contributions, Marco discusses the low percentage of women in the field and the need for proactive steps to improve this 5. He emphasizes that while the pipeline problem is often cited, there are deeper societal issues at play. Mentorship and creating more paid roles can help bridge this gap 6.

    There's a lot of projects where it does feel that a bit like an old boys club like in the way maybe that people use humour, the kinds of things that people might say or discuss. There's a lot of things that have historically been tolerated that probably shouldn't have.

    ---

       

    Mentorship

    Marco highlights the importance of mentorship in sustaining contributions from underrepresented groups. He shares his experience with initiatives like Pandas sprints and emphasizes the need for active efforts to mentor and support new contributors 7. Creating more paid roles in open-source projects can also alleviate some of the barriers faced by women and other underrepresented groups 8.

    Unless you're actively going to set aside time and money towards mentoring people, it's very difficult. And this becomes doubly difficult in a project which has already been going on for like 15 years or something and which has historically been all male.

    ---

Related Episodes