Sash explains the innovative transformer architecture, revolutionizing NLP by utilizing attention mechanisms over recurrent connections. The architecture's ability to access all past data at each step led to significant advancements in translation accuracy and scalability.