Published Nov 16, 2021

SDS 523: Open-Source Analytical Computing (pandas, Apache Arrow) — with Wes McKinney

Wes McKinney, creator of the pandas library, reflects on his journey in open-source analytics, highlighting the evolution of pandas and Apache Arrow and their transformative impact on data science. Discover insights into open-source development, data processing with Python, and how community-driven projects enhance scalability and innovation.
Episode Highlights
Super Data Science: ML & AI Podcast with Jon Krohn logo

Popular Clips

Episode Highlights

  • Community

    The open-source community plays a crucial role in the development and evolution of projects like pandas and Apache Arrow. highlights how these projects thrive through both digital and real-world interactions, such as conferences and community meetups 1. This dynamic allows for continuous growth and innovation, making tools like pandas essential in various systems, including Dask and Spark 1.

    Pandas has become this essential glue between different types of systems. It's used by over a half million other projects on GitHub.

    ---

    Additionally, discusses the relationship between Voltron Data and the open-source Apache Arrow project, emphasizing how commercial entities can accelerate the impact of open-source initiatives 2.

       

    Challenges

    Open-source development faces several challenges, particularly around dependency management and project sustainability. expresses concerns about the growing complexity of dependency management in Python, highlighting issues with incompatible versions and the need for better solutions like Conda 3. He also discusses the importance of building a sustainable business model to support ongoing investments in open-source infrastructure 4.

    We need to build a sustainable business so that we can continue to make these kinds of investments in open-source infrastructure.

    ---

    These challenges underscore the need for a collaborative community and mindful engagement to ensure the long-term success of open-source projects.

       

    Opportunities

    The open-source ecosystem offers numerous opportunities for both developers and organizations. notes the commercial potential of open-source projects like Apache Arrow, which can be adapted for various use cases, creating consulting and development opportunities 4. He also emphasizes the diverse ways developers can contribute, from Rust and Go development to JavaScript, highlighting the collaborative nature of these projects 5.

    There's so much interesting stuff going on. There's Go development, there's Rust development, there's JavaScript development. So many ways to be involved.

    ---

    This vibrant ecosystem not only drives innovation but also provides a platform for developers to make significant contributions to the computing world.

Related Episodes