Bibliography

Bibliography#

[1]

A. Jung. Machine Learning: The Basics. Springer, Singapore, 2022.

[2]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. 2017. arXiv:1706.03762.

[3]

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019.

[4]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training. 2018.

[5]

Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher. The natural language decathlon: multitask learning as question answering. 2018. arXiv:1806.08730.

[6]

Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, and Jakob Uszkoreit. One model to learn them all. 2017. arXiv:1706.05137.

[7]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. 2017. arXiv:1703.03400.

[8]

Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola. Dive into Deep Learning. Cambridge University Press, 2023. URL: https://D2L.ai.

[9]

Minhyeok Lee. A mathematical interpretation of autoregressive generative pre-trained transformer and self-supervised learning. Mathematics, 2023. URL: https://www.mdpi.com/2227-7390/11/11/2451, doi:10.3390/math11112451.

[10]

Phillip Lippe. UvA Deep Learning Tutorials. https://uvadlc-notebooks.readthedocs.io/en/latest/, 2023.

[11]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. 2014. arXiv:1409.0473.

[12]

Diederik P. Kingma and Jimmy Ba. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [Submitted on 22 Dec 2014 (v1)]. URL: https://arxiv.org/abs/1412.6980.

[13]

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. [Submitted on 14 Nov 2017 (v1)]. URL: https://arxiv.org/abs/1711.05101.

[14]

Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265, 2019. [Submitted on 8 Aug 2019 (v1), last revised 26 Oct 2021 (this version, v4)]. URL: https://arxiv.org/abs/1908.03265.

[15]

Ilya Loshchilov and Frank Hutter. SGDR: stochastic gradient descent with restarts. CoRR, 2016. URL: http://arxiv.org/abs/1608.03983, arXiv:1608.03983.

[16]

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: low-rank adaptation of large language models. 2021. URL: https://arxiv.org/abs/2106.09685, arXiv:2106.09685.

[17]

Hal Daume III. A course in machine learning. 2017.

[18]

Kevin P. Murphy. Probabilistic Machine Learning: An introduction. MIT Press, 2022. URL: probml.ai.

[19]

Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, 1 edition, 2007. ISBN 0387310738. URL: http://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738%3FSubscriptionId%3D13CT5CVB80YFWJEPWS02%26tag%3Dws%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D0387310738.

[20]

Stanley H. Chan. Introduction to probability for Data Science. Michigan Publishing, 2021.

[21]

Alexander Jung. Machine learning: The basics. Springer Nature Singapore, 2023.

[22]

Marc Peter Deisenroth, Cheng Soon Ong, and Aldo A. Faisal. Mathematics for Machine Learning. Cambridge University Press, 2021.

[23]

Mehryar Mohri and Rostamizadeh Afshin; Talwalkar Ameet. Foundations of machine learning. The MIT Press, 2018.

[24]

Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An introduction to statistical learning: With applications in R. Springer, 2022.

[25]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.

[26]

Hossein Pishro-Nik. Introduction to probability, statistics, and Random Processes. Kappa Research LLC, 2014. URL: https://www.probabilitycourse.com/.

[27]

John M. Shea. Foundations of data science with python. 2021. URL: https://jmshea.github.io/Foundations-of-Data-Science-with-Python/intro.html.

[28]

Histogram and. Empirical histogram and pmf. Oct 2022. URL: https://stats.stackexchange.com/questions/590792/empirical-histogram-and-pmf.

[29]

Tony Yiu. Fun with the binomial distribution - towards data science. Jul 2019. URL: https://towardsdatascience.com/fun-with-the-binomial-distribution-96a5ecabf65b.

[30]

Wiki. Poisson distribution. Oct 2022. URL: https://en.wikipedia.org/wiki/Poisson_distribution.

[32]

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. MIT Press, 4 edition, 2022.

[33]

Runestone Interactive. Problem solving with algorithms and data structures using python. 2023. URL: https://runestone.academy/ns/books/published/pythonds3/index.html.

[34]

Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for Machine Learning. Cambridge University Press, 2020. URL: https://mml-book.github.io/.

[35]

Joseph Muscat. Functional Analysis: An Introduction to Metric Spaces, Hilbert Spaces, and Banach Algebras. Springer, 2014.