Published Feb 3, 2020

Designing Data-Intensive Applications – Data Models: Query Languages

    Delve into the world of data models as the hosts discuss the power of declarative programming, the mechanics of MapReduce in NoSQL databases, and the flexibility of graph databases in managing complex relationships.
    Episode Highlights
    Coding Blocks logo

    Popular Clips

    Episode Highlights

    • MapReduce Basics

      MapReduce, popularized by Google, is a programming model designed for processing large datasets in a horizontally distributed manner. Joe Zack explains that it involves two main functions: map() and reduce(), which are pure functions that reshape and aggregate data, respectively 1. Initially, MapReduce was challenging for developers accustomed to SQL due to its different paradigm, but its concepts have become more familiar with the rise of functional programming languages 2.

      With MapReduce, you implement two methods: a map method and a reduce method. The map method reshapes data into a particular pattern, and the reduce method performs an aggregate summation on the values for each key that gets passed to it.

      --- Joe Zack

      Despite its initial complexity, MapReduce's ability to handle large-scale data processing efficiently has made it a valuable tool in data-intensive applications.

         

      MapReduce in NoSQL

      MapReduce is also utilized in NoSQL databases like MongoDB and CouchDB to perform read-only queries across many documents. However, Joe Zack notes that writing MapReduce functions can be more effort than writing a single SQL query due to the need for two interdependent functions 3. To address this, MongoDB introduced the aggregation pipeline, a declarative query language that wraps around MapReduce functionality, offering a more SQL-like experience 3.

      A purely declarative SQL query would better be able to take advantage of the optimizer, which your quasi-imperative MapReduce code isn't.

      --- Joe Zack

      Interestingly, the book warns that NoSQL systems like MongoDB might inadvertently reinvent SQL by adding such functionalities, highlighting the evolving nature of database technologies 4.

    Related Episodes