Skip to content

simonespa/machine-learning-playground

Repository files navigation

Machine Learning Playground

This is a personal playground repository to practice machine learning and data science techniques and algorithms using the most popular Python libraries for the job. I use this repo to collect and organise notes for a quick reference of things I learned during my studies.

The notebooks are written using Jupyter on my local machine. You can also use Kaggle or Google Colab to edit them on the cloud.

Notice

📚 The resources contained in this repository are notes written down while studying topics of Machine Learning and Data Science and exercises I wrote to practice. They reflect my understanding of the topic and as such, they are not meant to be used as an authoritative source of information and/or reference documentation about the subject they refer to.

🗒️ I take notes by summarising the concepts I read and/or watch. I also put together pieces from different materials I consult about a specific topic. I write stuff to conceptualise my own understanding of a broader topic or my thoughts about it. I even attempt to jot down ideas.

📦 I've built this repository mainly for myself, to have a place where to collect my notes and to practice. I made it public because it doesn't hurt to give other people access to it, in the hope it could be useful in the process.

⚠️ Feel free to use these resources as you wish - according to the LICENSE - with the knowledge that inaccuracy may be likely. So, please use these resources with a pinch of salt, by knowing that they could contain mistakes. Please, always compare and integrate the concepts you read from this repo with other material you have access to (there is plenty of other excellent resources out there, more complete and accurate than this), in order to make sure you are not inadvertently taking only my word for it.

⛔ Any mistake, blunder, typo, inaccuracy - if present - is there in good faith. No hard feelings. If you care enough about this work though, please report any of the above to me by raising an issue so that I can fix it. Even better, you can also raise a Pull Request if you like to propose a fix yourself.

❤️ If you feel grateful about this collection and feel that this work may have helped you even so slightly in any form or shape with your study, consider crediting this repo as a way to say thank you 😊

Creative Commons Licence
Machine Learning Playground by Simone Spaccarotella is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Credits

I started to be interested in topics gravitating the Artificial Intelligence world since my academic studies at the Università della Calabria, when I was then studying topics like Intelligent Systems, Answer-Set Programming and this thing called Data Mining at the Department of Mathematics and Informatics

I'm currently attending a Level 7 staff apprenticeship in AI and Data Science with Cambridge Spark at the BBC.

I'm also expanding these concepts and beyond by reading and watching further material available on the internet (papers, resources, video).

I'd like to give a shout out to StatQuest with Josh Starmer. It's an excellent YouTube channel with a vast and "clearly explained" catalogue of concepts spanning from Statistics to Data Science and Machine Learning. For example, I can personally say that I'm now able to grasp the main concepts behind Encoder-Decoder and Transformer architectures thanks to Josh. We don't know each other nor I get compensated to say this, so please, "checkout the quest" and subscribe, it's worth your time.

I also attended courses on LinkedIn Learning, Pluralsight and Coursera as well as training courses provided by the BBC, spanning from introduction to ML to TensorFlow and Keras to data manipulation and visualisation with Pandas, Matplotlib and Seaborn.

Python prerequisites

Make sure to have a suitable stable version of Python 3.x and pip installed on your machine. Consider using Pyenv to manage your Python versions.

The desired Python version can now be set by running pyenv install 3.n.m, where n and m are the minor and patch version respectively. If you are not sure which version to install, you can check the available ones by running pyenv install --list.

Please read Python Version section to check what's the latest python version compatible with the installed packages.

Getting started with PIP and Virtualenv

Install Virtualenv in order to setup an isolated virtual environment to manage the Python project and dependencies (read this installation guide on how to).

You need to create a virtual environment with a clean installation of Python. The following command do so, by creating a folder called ml-playground (which is automatically excluded from revision control) containing a vanilla installation of Python with just the initial depdendencies installed.

Create a virtual environment (only if you don't have an ml-playground folder yet)

virtualenv ml-playground

Enable the virtual environment

source ml-playground/bin/activate

Check the python interpreter used is the one from the virtual environment

which python

Install the required dependencies

pip install -r requirements.txt

Download the English pipeline for Spacy

python -m spacy download en_core_web_md

Start the development environment

jupyter lab

NOTE: remember to deactivate the virtual environment by running the deactivate command once finished or if you switch project. If you don't do this and run python in another project through the same terminal session, you'll be running the same local version of Python with dependencies you may not want or need.

Getting started with Anaconda

  • Download and install Anaconda on your machine
  • Start the Anaconda Navigator
  • Install and launch the Jupyter notebook or JupyterLab from the "home" tab

Dependencies

This is the list of the main DS libraries included in the requirements.txt file.

The full list of dependencies directly installed via PIP is the following:

pip install black isort split-folders rdflib notebook jupyterlab ipywidgets voila numpy scipy sympy statsmodels pandas polars 'dask[complete]' distributed 'dask-ml[complete]' ydata-profiling sweetviz autoviz lux matplotlib seaborn altair plotly scikit-learn tensorflow tensorflow_datasets keras-tuner torch torchvision torchaudio xgboost lightgbm prophet awswrangler sagemaker pyspark pyarrow optuna imbalanced-learn category_encoders shap lime anchor-exp dowhy econml causal-learn spacy gensim lightfm

Python Version

Read Tensorflow Software Requirements to check the latest Python version compatibility

Tech Radar

Technology worth investigating:

How to generate the requirements file

If you want to generate a new "requirements" file or add/remove dependencies and update the existing one

pip freeze > requirements.txt

How to upgrade the dependencies

To upgrade the dependencies we first need to replace all == symbols in the requirements.txt file with >=, so that we unlock the version and allow PIP to download the latest. We then run the upgrade command, and finally freze the packages again with the == symbol to lock the latest versions.

Unlock the current versions

sed -i '' 's/[~=]=/>=/' requirements.txt

Upgrade to the latest versions

pip install --upgrade -r requirements.txt

Lock the latest versions

pip freeze > requirements.txt

Useful PIP commands

To list all the installed libraries in site-packages

pip list

To "show" the details of a specific library

pip show numpy

References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published