====== Data Mining ===== ===== Imbalanced data problem ===== * http://www.svds.com/learning-imbalanced-classes/ * http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/ * P. Branco, L. Torgo, and R. Ribeiro, “A Survey of Predictive Modelling under Imbalanced Distributions,” arXiv:1505.01658 [cs], May 2015. ===== Survival Analysis ===== * https://github.com/aburke99/gdc-analyzing-churn/blob/master/gdc.ipynb * https://plot.ly/ipython-notebooks/survival-analysis-r-vs-python/ * https://lifelines.readthedocs.io/en/latest/index.html ===== Data Cleaning ===== * http://openrefine.org/ ===== Dataset ====== * https://research.google.com/audioset/ * https://research.googleblog.com/2017/03/announcing-audioset-dataset-for-audio.html * https://archive.org/details/2015_reddit_comments_corpus * https://github.com/mdeff/fma * https://magenta.tensorflow.org/datasets/nsynth * https://github.com/zalandoresearch/fashion-mnist/blob/master/README.md * https://github.com/hardmaru/pytorch_notebooks/blob/master/pytorch_tiny_custom_mnist_adam.ipynb * 자율 자동차 * http://blog.mapillary.com/product/2017/05/03/mapillary-vistas-dataset.html