Roger & DJ — The Rise of Big Data and CA's COVID-19 Response

Topics covered
Popular Clips
Episode Highlights
Hadoop Issues
The limitations of Hadoop became apparent as data storage needs evolved. recalls the challenges faced at eBay, where storing all user data was impractical, leading to the erasure of 99.9% of it 1. This inefficiency prompted a shift towards Hadoop, though it wasn't without its issues. DJ highlights the need for more sophisticated data models during California's COVID-19 response, emphasizing the inadequacy of one-size-fits-all models 2.
Every time we want to do something interesting, we have to go to the lords of the data warehouse and ask permission.
---
The need for adaptable and scalable data solutions became evident, pushing the industry to seek alternatives.
Spark Shift
The transition from Hadoop to Spark marked a significant evolution in data processing. explains that Hadoop's limitations as a write engine necessitated a shift to Spark, which offered better analytics support and in-memory processing 3. This shift was further facilitated by Spark's integration with Python, making it accessible to a broader range of developers. Roger also discusses the role of NoSQL databases, noting their utility but emphasizing the importance of structured data for effective analytics 4.
Spark was just better at that than Hadoop was.
---
The transition to Spark represented a move towards more efficient and user-friendly data processing tools.
Open Source
Open source communities have played a crucial role in advancing data infrastructure technologies. emphasizes the collaborative nature of these communities, where sharing skills and techniques leads to collective improvement 5. adds that open source initiatives like Hadoop and Spark have democratized access to powerful tools, enabling widespread participation in data science 6.
The community owns this collectively.
---
This collaborative spirit has been instrumental in driving innovation and making advanced data technologies accessible to a global audience.
Related Episodes


Daphne Koller — Digital Biology and the Next Epoch of Science
Answers 383 questions

Richard Socher — The Challenges of Making ML Work in the Real World
Answers 383 questions

Sean and Greg — Biology and ML for Drug Discovery
Answers 383 questions

Jeff Hammerbacher — From data science to biomedicine
Answers 383 questions

Alyssa Simpson Rochwerger — Responsible ML in the Real World
Answers 383 questions

Robert Nishihara — The State of Distributed Computing in ML
Answers 383 questions

Dave Rogenmoser & Saad Ansari on Growing & Maintaining Jasper AI
Answers 383 questions

Accelerating drug discovery with AI: Insights from Isomorphic Labs
Answers 383 questions

Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Answers 383 questions

D. Sculley — Technical Debt, Trade-offs, and Kaggle
Answers 383 questions

Cade Metz — The Stories Behind the Rise of AI
Answers 383 questions

Vicki Boykis — Machine Learning Across Industries
Answers 383 questions

The Power of AI in Search with You.com's Richard Socher
Answers 383 questions

Johannes Otterbach — Unlocking ML for Traditional Companies
Answers 383 questions














