Ashutosh Sanzgiri
About

Updates from week of November 26, 2019

Nov 26, 2019

  • Gilbert Strang playlist on MIT OCW (Matrix Algebra): https://www.youtube.com/playlist?list=PLUl4u3cNGP63oMNUHXqIUcrkS2PivhN3k
  • The 100 page ML Book: http://themlbook.com/wiki/doku.php?id=start
  • Clean Code ML repo: https://github.com/davified/clean-code-ml
  • Transformers:
    • Simple Transformers Blog https://medium.com/swlh/simple-transformers-multi-class-text-classification-with-bert-roberta-xlnet-xlm-and-8b585000ce3a
    • Repo: https://github.com/ThilinaRajapakse/simpletransformers
    • Nvidia Apex https://github.com/NVIDIA/apex
    • Hugging Face: Blog: https://medium.com/tensorflow/using-tensorflow-2-for-state-of-the-art-natural-language-processing-102445cda54a, Examples: https://github.com/huggingface/transformers/tree/master/examples, Docs: https://huggingface.co/transformers/quickstart.html
    • Illustrated BERT: http://jalammar.github.io/illustrated-bert/
    • Allen NLP: https://github.com/allenai/allennlp
  • Financial Models:
    • Notebook collection https://github.com/cantaro86/Financial-Models-Numerical-Methods, in particular one on Kalman filters: https://github.com/cantaro86/Financial-Models-Numerical-Methods/blob/master/5.1%20Linear%20regression%20-%20Kalman%20filter.ipynb
    • Another gem on Kalman filters: https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python and PDF https://drive.google.com/file/d/0By_SW19c1BfhSVFzNHc0SjduNzg/view
  • Data Sampling in Presto: https://ragrawal.wordpress.com/2017/08/11/data-sampling-in-presto/
  • Clean Pytorch implementation of Style Transfer: https://github.com/shivamswarnkar/Style-Transfer/tree/871b2607d68d7dfa46c0242e4fdd9e98f77bbd93
  • Kaggle class on TWIML AI:
    • Github: https://github.com/philpackmohr/kaggle-twimlai
    • Kaggle Winning Solutions & Pipeline: http://kagglesolutions.com/r/?ref=headerlinkh
    • Incredible Glossary: https://www.kaggle.com/shivamb/data-science-glossary-on-kaggle/
    • Winning Kaggle Solutions: https://www.kaggle.com/sudalairajkumar/winning-solutions-of-kaggle-competitions/notebook
    • Short clean kernel: https://www.kaggle.com/lopuhin/mercari-golf-0-3875-cv-in-75-loc-1900-s
    • Pavel Pleskov secrets: https://www.youtube.com/watch?v=fXnzjJMbujc
    • Model stacking (Kaggle Blog): http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/
    • No Free Hunch: http://blog.kaggle.com/
    • Feature Selection via Target Permutations: https://www.kaggle.com/ogrellier/feature-selection-target-permutations and https://www.kaggle.com/ogrellier/feature-selection-with-null-importances
    • Feature Importances: https://medium.com/the-artificial-impostor/feature-importance-measures-for-tree-models-part-i-47f187c1a2c3
    • Slidedecks with tricks: https://www.slideshare.net/markpeng/general-tips-for-participating-kaggle-competitions, https://www.slideshare.net/HJvanVeen/kaggle-presentation?qid=9945759e-a06f-447d-bcfb-2a15592f30b6&v=&b=&from_search=11, https://www.slideshare.net/DariusBaruauskas/tips-and-tricks-to-win-kaggle-data-science-competitions?qid=2ea2c741-a9af-4c84-9292-d11725c0c68c&v=&b=&from_search=5, https://www.slideshare.net/gabrielspmoreira/feature-engineering-getting-most-out-of-data-for-predictive-models-tdc-2017, https://www.slideshare.net/jeongyoonlee/winning-data-science-competitions-74391113
    • Good AMA: https://towardsdatascience.com/ask-me-anything-session-with-a-kaggle-grandmaster-vladimir-i-iglovikov-942ad6a06acd
  • MLCourse.ai:
    • Resources: https://mlcourse.ai/resources
    • Kernels: https://www.kaggle.com/kashnitsky/mlcourse/kernels
    • Github: https://github.com/Yorko/mlcourse.ai
    • Open Data Science courses: https://medium.com/open-machine-learning-course
  • Optuna vs HyperOpt: https://neptune.ml/blog/optuna-vs-hyperopt
  • Free PyTorch Intro Book: https://pytorch.org/assets/deep-learning/Deep-Learning-with-PyTorch.pdf
  • Makefiles everywhere: https://blog.mindlessness.life/makefile/2019/11/17/the-language-agnostic-all-purpose-incredible-makefile.html
  • DVC for model version control: https://dvc.org/
  • Eli5 Model Explainability: https://github.com/TeamHG-Memex/eli5
  • Autoencoders: https://www.kaggle.com/shivamb/how-autoencoders-work-intro-and-usecases

Ashutosh Sanzgiri

  • Ashutosh Sanzgiri
  • sanzgiri@gmail.com
  • sanzgiri
  • sanzgiri

Musings on Data Science and Machine Learning