Published Sep 3, 2019

Episode 193: Apache Mahout

Explore the transformative role of machine learning with Grant Ingersoll, founder of the Apache Mahout project, as he delves into classification, recommendation, and clustering techniques, while highlighting applications like fraud detection and natural language processing, all driven by powerful data and computation.
Episode Highlights
Software Engineering Radio - the podcast for professional software developers logo

Popular Clips

Questions from this episode

Episode Highlights

  • Fraud Detection

    Machine learning has become a cornerstone in fraud detection and analytics, offering businesses a powerful tool to combat fraudulent activities. explains that the growth in machine learning is driven by the availability of vast amounts of data and the computational power to process it. This enables companies to better understand user behavior and refine their systems to prevent fraud. He notes, "Fraud analytics that's been using machine learning techniques for quite a number of years...generally speaking, they do pretty well there" 1. The adaptability of machine learning models is crucial, as they must be periodically reevaluated to stay effective against evolving threats 2.

       

    Recommendations

    Recommendation systems leverage machine learning to enhance user experience by suggesting relevant content or products. describes collaborative filtering as a key technique, which involves either user-based or item-based similarity to make recommendations. He explains, "Collaborative filtering is more of just simply mechanism...if everybody is buying some particular book and you're similar...then we should recommend that book to you" 3. The flexibility of these systems allows for various distance measures, such as Euclidean or cosine similarity, to be used in determining user likeness 4.

       

    NLP

    Natural language processing (NLP) is another area where machine learning excels, particularly in text categorization and understanding. highlights the use of open-source tools like Mahout and Lucene to solve NLP problems. He mentions his book, "Taming Text," which serves as an engineer's introduction to NLP and machine learning 5. Machine learning in NLP involves organizing data into consumable formats, such as classifying news articles into categories like sports or politics 6.

Related Episodes