What I Read:  Reducing High Cost of Training NLP Models.
      
    https://www.asapp.com/blog/reducing-the-high-cost-of-training-nlp-models-with-sru/
Reducing the High Cost of Training NLP Models With SRU++
By Tao Lei, PhD
Research Leader and Scientist at ASAPP
“The Transformer architecture was proposed to accelerate model training in NLP…. A couple of interesting questions arises following the development of Transformer:  Is attention all we need for modeling?  If recurrence is not a compute bottleneck, can we find better architectures?”
