====== Pandas ====== Python기반 데이터 분석 라이브러리 * [[http://pandas.pydata.org/|Pandas]] 상관관계 분석 import matplotlib.pyplot as plt import pandas as pd from pandas.tools.plotting import scatter_matrix infile = 'test-in.csv' outfile = 'test-out.csv' df = pd.read_csv(infile) coff = df.corr() coff.to_csv(outfile) scatter_matrix(df, alpha=0.2, figsize=(10, 10), diagonal='kde') plt.savefig('test-fig.png') print(df) print(coff) * 데이터 전처리 * [[https://towardsdatascience.com/the-simple-yet-practical-data-cleaning-codes-ad27c4ce0a38|The Simple Yet Practical Data Cleaning Codes]] * 최소문법 * https://medium.com/dunder-data/minimally-sufficient-pandas-a8e67f2a2428 ===== multiprocessing ===== * https://towardsdatascience.com/make-your-own-super-pandas-using-multiproc-1c04f41944a1 ===== Out-of-Memory, Out-of-cores ===== 병렬/분산처리로 더 빠른 Dataframe를 제공하는 도구들 * Polars * https://www.pola.rs/ * https://betterprogramming.pub/this-library-is-15-times-faster-than-pandas-7e49c0a17adc * Modin * https://github.com/modin-project/modin * https://towardsdatascience.com/get-faster-pandas-with-modin-even-on-your-laptops-b527a2eeda74 * Dask * Vaex * https://towardsdatascience.com/how-to-process-a-dataframe-with-billions-of-rows-in-seconds-c8212580f447 ===== 속도 개선 ===== * [[https://github.com/jmcarpenter2/swifter|Swifter]] {{tag>pandas dataframe}}