Bibliography#
A. Jung. Machine Learning: The Basics. Springer, Singapore, 2022.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. 2017. arXiv:1706.03762.
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019.
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training. 2018.
Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher. The natural language decathlon: multitask learning as question answering. 2018. arXiv:1806.08730.
Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, and Jakob Uszkoreit. One model to learn them all. 2017. arXiv:1706.05137.
Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. 2017. arXiv:1703.03400.
Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola. Dive into Deep Learning. Cambridge University Press, 2023. URL: https://D2L.ai.
Minhyeok Lee. A mathematical interpretation of autoregressive generative pre-trained transformer and self-supervised learning. Mathematics, 2023. URL: https://www.mdpi.com/2227-7390/11/11/2451, doi:10.3390/math11112451.
Phillip Lippe. UvA Deep Learning Tutorials. https://uvadlc-notebooks.readthedocs.io/en/latest/, 2023.
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. 2014. arXiv:1409.0473.
Diederik P. Kingma and Jimmy Ba. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [Submitted on 22 Dec 2014 (v1)]. URL: https://arxiv.org/abs/1412.6980.
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. [Submitted on 14 Nov 2017 (v1)]. URL: https://arxiv.org/abs/1711.05101.
Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265, 2019. [Submitted on 8 Aug 2019 (v1), last revised 26 Oct 2021 (this version, v4)]. URL: https://arxiv.org/abs/1908.03265.
Ilya Loshchilov and Frank Hutter. SGDR: stochastic gradient descent with restarts. CoRR, 2016. URL: http://arxiv.org/abs/1608.03983, arXiv:1608.03983.
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: low-rank adaptation of large language models. 2021. URL: https://arxiv.org/abs/2106.09685, arXiv:2106.09685.
Hal Daume III. A course in machine learning. 2017.
Kevin P. Murphy. Probabilistic Machine Learning: An introduction. MIT Press, 2022. URL: probml.ai.
Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, 1 edition, 2007. ISBN 0387310738. URL: http://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738%3FSubscriptionId%3D13CT5CVB80YFWJEPWS02%26tag%3Dws%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D0387310738.
Stanley H. Chan. Introduction to probability for Data Science. Michigan Publishing, 2021.
Alexander Jung. Machine learning: The basics. Springer Nature Singapore, 2023.
Marc Peter Deisenroth, Cheng Soon Ong, and Aldo A. Faisal. Mathematics for Machine Learning. Cambridge University Press, 2021.
Mehryar Mohri and Rostamizadeh Afshin; Talwalkar Ameet. Foundations of machine learning. The MIT Press, 2018.
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An introduction to statistical learning: With applications in R. Springer, 2022.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
Hossein Pishro-Nik. Introduction to probability, statistics, and Random Processes. Kappa Research LLC, 2014. URL: https://www.probabilitycourse.com/.
John M. Shea. Foundations of data science with python. 2021. URL: https://jmshea.github.io/Foundations-of-Data-Science-with-Python/intro.html.
Histogram and. Empirical histogram and pmf. Oct 2022. URL: https://stats.stackexchange.com/questions/590792/empirical-histogram-and-pmf.
Tony Yiu. Fun with the binomial distribution - towards data science. Jul 2019. URL: https://towardsdatascience.com/fun-with-the-binomial-distribution-96a5ecabf65b.
Wiki. Poisson distribution. Oct 2022. URL: https://en.wikipedia.org/wiki/Poisson_distribution.
Bloom Orloff. Reading 5b: continuous random variables. https://ocw.mit.edu/courses/18-05-introduction-to-probability-and-statistics-spring-2014/1f88c7c765d2532fd57d8ee719a751b3_MIT18_05S14_Reading5b.pdf. (Accessed on 10/28/2022).
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. MIT Press, 4 edition, 2022.
Runestone Interactive. Problem solving with algorithms and data structures using python. 2023. URL: https://runestone.academy/ns/books/published/pythonds3/index.html.
Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for Machine Learning. Cambridge University Press, 2020. URL: https://mml-book.github.io/.
Joseph Muscat. Functional Analysis: An Introduction to Metric Spaces, Hilbert Spaces, and Banach Algebras. Springer, 2014.