Published Jun 24, 2020

Hamel Husain — Building Machine Learning Tools

Hamel Husain, Staff Machine Learning Engineer at GitHub, delves into the trailblazing efforts of CodeSearchNet in semantic code search, the transformative use of GitHub Actions in CI/CD for machine learning, and the empowering role of AutoML in enhancing data scientists' workflows and analysis capabilities.
Episode Highlights
Gradient Dissent - A Machine Learning Podcast logo

Popular Clips

Episode Highlights

  • Project Goals

    , a Staff Machine Learning Engineer at GitHub, spearheaded the CodeSearchNet project to improve code search using machine learning techniques. The project aims to create a representation of code, similar to natural language embeddings, to enhance search capabilities on GitHub. This approach addresses the limitations of traditional keyword searches by enabling semantic search, allowing users to find code based on concepts rather than specific syntax 1.

    What if you're trying to search for some kind of concept in a code? You know, is that possible?

    ---

    By leveraging ideas from natural language processing, the project explores the potential of machine learning to transform code search, making it more intuitive and efficient for developers.

       

    Challenge and Data

    The CodeSearchNet Challenge was structured to tackle the complexities of matching code with natural language descriptions, such as docstrings, to improve code discoverability. explains that the challenge involved creating a large parallel corpus of code and natural language, which was then used to benchmark information retrieval tasks 2. This initiative was supported by a partnership with Weights & Biases, which provided transparency and detailed logs of the training process.

    The task isn't perfect, but we sort of did whatever we could in the time we had.

    ---

    The challenge also included real-world search queries to simulate practical applications, aiming to enhance the discoverability of code even when specific keywords are absent 3.

Related Episodes