Note : In life cycle of data science more then 60% of time goes into data analysis like feature engineering, feature selection. Feature engineering means Cleaning data, handle missing values, unbalanced data, category features. Python packages Pandas - read dataset read_csv, head, isnull, getdummies, drop, concat, Numpy - work with arrays matplotlib.pyplot - for visualization Seaborn - for visualization heatmap, countplot, boxplot, Handling Categorical features one hot encoding for nominal variables label encoding for ordinal variables Ways for finding Outliers Scatter plot Box plot z-score IQR Correlation : Strength of association between two variables This is both ways A and B = B and A Regression If one of the variable is dependent & other is independent variable Regression equation = Average value of 'y' is a function of x R Square Significance of F & P values Covariance(cov) Quantify relationship between features, rand...
Comments
Post a Comment