IEEE (Style) Citations#
Deep Learning#
[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ć. Kaiser, and I. Polosukhin. âAttention is all you needâ. In Advances in Neural Information Processing Systems, pp. 5998â6008, 2017.
[2] I. Loshchilov and F. Hutter, âDecoupled weight decay regularizationâ, arXiv preprint arXiv:1711.05101, [Submitted on 14 Nov 2017 (v1), last revised 4 Jan 2019 (this version, v3)].
[3] D. P. Kingma and J. Ba, âAdam: A Method for Stochastic Optimizationâ, arXiv preprint arXiv:1412.6980, [Submitted on 22 Dec 2014 (v1), last revised 30 Jan 2017 (this version, v9)].
[4] L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, âOn the Variance of the Adaptive Learning Rate and Beyondâ, arXiv preprint arXiv:1908.03265, [Submitted on 8 Aug 2019 (v1), last revised 26 Oct 2021 (this version, v4)].
[5] A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, âDive Into Deep Learningâ, Cambridge University Press, 2023.
[6] D. Jurafsky and J. H. Martin, âSpeech and Language Processingâ in Speech and Language Processing, 3rd ed., Pearson, 2023.
[7] I. Loshchilov and F. Hutter, âSGDR: Stochastic Gradient Descent with Restartsâ, CoRR, vol. abs/1608.03983, 2016.
[8] L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, âOn the Variance of the Adaptive Learning Rate and Beyondâ, arXiv preprint arXiv:1908.03265, [Submitted on 8 Aug 2019 (v1), last revised 26 Oct 2021 (this version, v4)].
[9] D. A. Roberts, S. Yaida, B. Hanin, âThe Principles of Deep Learning Theoryâ, arXiv preprint arXiv:2106.10165, [Submitted on 18 Jun 2021 (v1), last revised 24 Aug 2021 (this version, v2)].
[10] I. Goodfellow, Y. Bengio, A. Courville, âDeep Learningâ, MIT Press, 2016.
[11] C. M. Bishop, Pattern Recognition and Machine Learning. New York: Springer, 2006.
[12] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, âImproving Language Understanding by Generative Pre-Trainingâ.
[13] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, âLanguage Models are Unsupervised Multitask Learnersâ.
Computer Science#
[1] Z. Luo, S. Soloviev, and T. Xue, âCoercive subtyping: Theory and implementationâ, Information and Computation, vol. 223, pp. 18â42, Feb. 2013. doi:10.1016/j.ic.2012.10.020
[2] C. Muñoz, âType Theory and Its Applications to Computer Scienceâ, National Institute of Aerospace, Hampton, VA, Tech. Rep., Apr. 10, 2007.