Published Nov 16, 2021

SDS 523: Open-Source Analytical Computing (pandas, Apache Arrow) — with Wes McKinney

Wes McKinney, creator of the pandas library, reflects on his journey in open-source analytics, highlighting the evolution of pandas and Apache Arrow and their transformative impact on data science. Discover insights into open-source development, data processing with Python, and how community-driven projects enhance scalability and innovation.
Episode Highlights
Super Data Science: ML & AI Podcast with Jon Krohn logo

Popular Clips

Episode Highlights

  • Python Tools

    Wes McKinney discusses the evolution of data tools and his journey with Python. He was initially drawn to Python while working at AQR, where he found it more efficient for data manipulation than Excel and SQL. This led him to create pandas, inspired by both the strengths and limitations of R. Jon Krohn highlights Wes's book, Python for Data Analysis, as a top reference for working with data in Python 1 2.

    I was trying to capture some of the good things about R data frames and that way of working with data, but adding a bunch of additional features that R did not have.

    ---

       

    Advanced Hardware

    Wes emphasizes the integration of advanced hardware like GPUs in data processing. He explains how the Rapids team at Nvidia demonstrated the effectiveness of GPUs for analytics, leading to significant performance and cost benefits. This integration is crucial for handling exponentially growing data sets efficiently 3 4.

    We've got to accelerate, we've got to do more with less. Otherwise, the explosion in data volumes can become a real problem.

    ---

       

    Digital Tools

    Wes shares his preference for digital note-taking tools like the Remarkable tablet, which has replaced his paper notebooks. He also discusses the importance of tools like Jira and Notion for project management and collaboration in a globally distributed team. These tools help maintain a written culture and facilitate asynchronous collaboration 5 6.

    Using tools like Notion has been really helpful in building that kind of knowledge curation, like written culture.

    ---

Related Episodes