Published Sep 3, 2019

SE-Radio-Episode-282-Donny-Nadolny-on-Debugging-Distributed-Systems

Donny Nadolny from PagerDuty shares expert insights into debugging distributed systems, discussing the intricacies of TCP connections, the challenges of maintaining Zookeeper clusters, and innovative techniques that emphasize persistence and cross-disciplinary collaboration for effective problem-solving.
Episode Highlights
Software Engineering Radio - the podcast for professional software developers logo

Popular Clips

Episode Highlights

  • Persistence

    In the realm of debugging distributed systems, persistence emerges as a crucial trait. shares his experience with a complex issue at PagerDuty, where he spent days examining logs and code without immediate progress. Despite the lack of tangible results, he emphasizes the importance of perseverance, stating, "I was making some progress in terms of gaining an understanding of zookeeper and of TCP behavior and other things like that. But in terms of making real, tangible progress to figure out what was going on, it was just no progress at all."

    I was making some progress in terms of gaining an understanding of zookeeper and of TCP behavior and other things like that. But in terms of making real, tangible progress to figure out what was going on, it was just no progress at all.

    ---

    This persistence eventually led to breakthroughs, highlighting the necessity of continuous exploration even when faced with seemingly insurmountable challenges 1 2.

       

    Cross-disciplinary

    Debugging distributed systems often requires cross-disciplinary knowledge, as discovered while tackling issues at PagerDuty. He navigated through layers from Java to the Linux kernel and TCP protocol, collaborating with colleagues who had expertise in different areas. Nadolny notes, "I had lots of ideas of things that it could be, or kind of the next step of something new that I should investigate, or some new bit of the code that I should look at, or things like that."

    I had lots of ideas of things that it could be, or kind of the next step of something new that I should investigate, or some new bit of the code that I should look at, or things like that.

    ---

    This collaborative approach underscores the value of diverse expertise in solving complex problems, as it allows for a more comprehensive understanding and innovative solutions 3 2.

       

    Zookeeper Debugging

    The debugging of Zookeeper issues at PagerDuty illustrates the importance of innovative problem-solving techniques. Donny Nadolny

    What we had added was an external health check that was just simple. It would just do a write to a key in Zookeeper and then it would try to read that one.

    ---

    This approach allowed the team to identify the root cause of the issue, demonstrating the effectiveness of creative strategies in debugging complex systems 4.

Related Episodes