Posts

Showing posts from May, 2020

Natural Language Processing

Uses Text Classification Used for filtering information in web search Helps to avoid spam mail  Sentiment Analysis Identify opinions & sentiments of audience Chatbots Used for customer support Used in HR systems Used in e-commerce systems Customer service Insights into audience preferences Helps improve customer satisfaction' Advertisement Helps target right customers Tokenization Process of breaking up text into smaller pieces(tokens). Token can be word or a sentence Stop Words an, a, when - which doesn't convey actual meaning Part of speech (POS) Tags nouns, verb, adjectives etc Stemming Process of reducing or root of the word or taking the stem Lemmatization  Process of reducing or root of the word or taking the stem in dictionary form Named Entity Recognition Recognize entities like People, Organizatio, places etc Bag of words covert to lower case perform stemming and lemmatization remove stop ...

Statistics in Machine Learning

Note : In life cycle of data science more then 60% of time goes into data analysis like feature engineering, feature selection.  Feature engineering means Cleaning data, handle missing values, unbalanced data, category features.   Python packages Pandas - read dataset read_csv, head, isnull, getdummies, drop, concat, Numpy - work with arrays matplotlib.pyplot - for visualization Seaborn - for visualization heatmap, countplot, boxplot, Handling Categorical features one hot encoding for nominal variables label encoding for ordinal variables Ways for finding Outliers Scatter plot Box plot  z-score IQR Correlation :  Strength of association between two variables  This is both ways  A and B = B and A Regression If one of the variable is dependent & other is independent variable Regression equation = Average value of 'y' is a function of x R Square Significance of F & P values     Covariance(cov) Quantify relationship between features, rand...