IEEE (Style) Citations#
Deep Learning#
[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. “Attention is all you need”. In Advances in Neural Information Processing Systems, pp. 5998–6008, 2017.
[2] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization”, arXiv preprint arXiv:1711.05101, [Submitted on 14 Nov 2017 (v1), last revised 4 Jan 2019 (this version, v3)].
[3] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization”, arXiv preprint arXiv:1412.6980, [Submitted on 22 Dec 2014 (v1), last revised 30 Jan 2017 (this version, v9)].
[4] L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, “On the Variance of the Adaptive Learning Rate and Beyond”, arXiv preprint arXiv:1908.03265, [Submitted on 8 Aug 2019 (v1), last revised 26 Oct 2021 (this version, v4)].
[5] A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, “Dive Into Deep Learning”, Cambridge University Press, 2023.
[6] D. Jurafsky and J. H. Martin, “Speech and Language Processing” in Speech and Language Processing, 3rd ed., Pearson, 2023.
[7] I. Loshchilov and F. Hutter, “SGDR: Stochastic Gradient Descent with Restarts”, CoRR, vol. abs/1608.03983, 2016.
[8] L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, “On the Variance of the Adaptive Learning Rate and Beyond”, arXiv preprint arXiv:1908.03265, [Submitted on 8 Aug 2019 (v1), last revised 26 Oct 2021 (this version, v4)].
[9] D. A. Roberts, S. Yaida, B. Hanin, “The Principles of Deep Learning Theory”, arXiv preprint arXiv:2106.10165, [Submitted on 18 Jun 2021 (v1), last revised 24 Aug 2021 (this version, v2)].
[10] I. Goodfellow, Y. Bengio, A. Courville, “Deep Learning”, MIT Press, 2016.
[11] C. M. Bishop, Pattern Recognition and Machine Learning. New York: Springer, 2006.
[12] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language Understanding by Generative Pre-Training”.
[13] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language Models are Unsupervised Multitask Learners”.
Computer Science#
[1] Z. Luo, S. Soloviev, and T. Xue, “Coercive subtyping: Theory and implementation”, Information and Computation, vol. 223, pp. 18–42, Feb. 2013. doi:10.1016/j.ic.2012.10.020
[2] C. Muñoz, “Type Theory and Its Applications to Computer Science”, National Institute of Aerospace, Hampton, VA, Tech. Rep., Apr. 10, 2007.