The discussion dives into the complexities of mixture of experts (MoE) models, clarifying common misconceptions about their structure and functionality. Insights reveal that MoEs are not simply composed of distinct experts for specific domains, but rather rely on a more intricate architecture akin to matrix factorization. Additionally, a novel merging hack is introduced, showcasing a creative, albeit impractical, approach to utilizing multiple models within a mixture framework.