SDS 523: Open-Source Analytical Computing (pandas, Apache Arrow) — with Wes McKinney

Topics covered
Popular Clips
Episode Highlights
Python Tools
Wes McKinney discusses the evolution of data tools and his journey with Python. He was initially drawn to Python while working at AQR, where he found it more efficient for data manipulation than Excel and SQL. This led him to create pandas, inspired by both the strengths and limitations of R. Jon Krohn highlights Wes's book, Python for Data Analysis, as a top reference for working with data in Python 1 2.
I was trying to capture some of the good things about R data frames and that way of working with data, but adding a bunch of additional features that R did not have.
---
Advanced Hardware
Wes emphasizes the integration of advanced hardware like GPUs in data processing. He explains how the Rapids team at Nvidia demonstrated the effectiveness of GPUs for analytics, leading to significant performance and cost benefits. This integration is crucial for handling exponentially growing data sets efficiently 3 4.
We've got to accelerate, we've got to do more with less. Otherwise, the explosion in data volumes can become a real problem.
---
Digital Tools
Wes shares his preference for digital note-taking tools like the Remarkable tablet, which has replaced his paper notebooks. He also discusses the importance of tools like Jira and Notion for project management and collaboration in a globally distributed team. These tools help maintain a written culture and facilitate asynchronous collaboration 5 6.
Using tools like Notion has been really helpful in building that kind of knowledge curation, like written culture.
---
Related Episodes


675: Pandas for Data Analysis and Visualization — with Stefanie Molin
Answers 383 questions

SDS 557: Effective Pandas — with Matt Harrison
Answers 383 questions

765: NumPy, SciPy and the Economics of Open-Source — with Dr. Travis Oliphant
Answers 383 questions

SDS 587: Data Engineering for Data Scientists — with Mark Freeman
Answers 383 questions

SDS 535: How to Found, Grow, and Sell a Data Science Start-up — with Austin Ogilvie
Answers 383 questions

SDS 433: Data Science Trends for 2021 — with Ben Taylor
Answers 383 questions

SDS 567: Open-Access Publishing — with Amy Brand
Answers 383 questions

SDS 595: Data Engineering 101 — with Joe Reis and Matt Housley
Answers 383 questions

SDS 537: Data Science Trends for 2022 — with Sadie St. Lawrence
Answers 383 questions

SDS 571: Collaborative, No-Code Machine Learning — with Tim Kraska
Answers 383 questions

SDS 511: Data Science for Private Investing — LIVE with Drew Conway
Answers 383 questions

SDS 493: Bringing Data to the People — with Anjali Shrivastava
Answers 383 questions

SDS 575: Optimizing Computer Hardware with Deep Learning — with Magnus Ekman
Answers 383 questions














