What is feature selection techniques in machine learning?
While building a machine learning model there are many factors on which the output is finalized. These factors are called input variables or features. Nowadays we are having access to large data sets, for having large number of features in our data set, but we can't use all the features to predict the final outcome of our problem because there are many features which are not contributing towards the accuracy of the model. Having redundant variables reduces the overall performance of the model and further adding more and more variables will also affect our model increasing the overall complexity and decreasing the overall accuracy. So, what should we do for solving this type of problem?
Here comes feature selection algorithm to solve this problem.
Now what is feature selection techniques in machine learning and why it is used?
Although there are many uses of feature selection techniques in machine learning, but some of the best uses in machine learning and data science are as follows -
- It simplifies the model
- It improves accuracy and efficiency
- It reduces training time
- it reduces over fitting
- it avoids the Curse of dimensionality
- it improves data compatibility
feature selection techniques in machine learning is also one of the techniques of dimensionality reduction and further It can be classified into two categories -
1. Unsupervised Technique -
- This technique is used when the data is unlabelled
- It can automatically extract the final number of desired features.
- As the data is unlabelled here clustering is used
- it has relatively low time complexity even in large data set
2. Supervised Technique -
- This technique is used when data is labelled
- It uses the target variable to remove in relevant variables
Under supervised technique there are three types of methods for feature selection -
Although, all the methods try to find out the best subset but they work somewhat differently
1. Filter methods - This type of methods first selects the best subject, then apply the learning algorithm to increase the overall performance of model. Some of the majorly used techniques of filter methods are as follows -
- Information gain - It is an entropy-based feature evaluation method; it finds out the gain for each variable with reference to the target variable.
- Correlation Coefficient - Correlation is a statistical term which defines the dependency of two variables with each other. It identifies their direct relationship from range of -1 to 1.
- Chi-square test - it is a statistical test used to test the independence of two variables. It compares the observed value with the expected value.
2. Wrapper method - This type of methods first considers subset of all feature then apply the learning algorithm while the best subset is not being found, hence increasing the overall performance of the model. It consists of various methods to eliminate the features which are not required by using the techniques which are as follows -
- Forward selection
- Backward elimination
- Bidirectional elimination
3. Embedded methods - This type of method first considered subsets of all the features then apply the learning algorithm and evaluate the performance and while the performance is not at its best it repeats this procedure. The difference between embedded method and wrapper method is that embedded method uses performance but the wrapper method does not use the performance to repeat the procedure. It consists of two methods which are as follows -
- Regularization - It adds a number of different parameters to avoid the over fitting in the machine learning model. Due to this the coefficients of some variables become zero and hence removed from the data set.
- Tree-based methods - In this method the concept of feature importance is used. The method such as Gradient Boosting, Random Forest tell us the importance of the feature and hence it becomes easy to select a particular feature based on its importance.
keyword
feature selection in machine learning
feature selection techniques in machine learning
feature selection and feature extraction
feature selection in python
feature selection sklearn
feature selection algorithms
feature selection in data mining
feature selection using random forest
feature selection algorithm
feature selection analytics vidhya
feature selection and dimensionality reduction
feature selection approaches
feature selection algorithms in machine learning
feature selection and extraction in machine learning
feature selection and feature engineering
feature a selection
the feature selection techniques
the feature selection and classification
a review of feature selection techniques in bioinformatics
a framework for feature selection in clustering
feature selection based on correlation
feature selection based on information gain
feature selection by random forest
feature selection benefits
feature selection book
feature selection based on correlation python
feature selection based on mutual information
feature selection before or after one hot encoding
feature selection can be done by
feature selection code in python
feature selection correlation
feature selection chi square
feature selection categorical variables python
feature selection classification
feature selection chi2
feature selection correlation python
feature selection definition
feature selection diabetes
feature selection decision tree
feature selection dataset
feature selection disadvantages
feature selection diagram
feature selection deep learning
feature selection diabetes dataset
feature selection example
feature selection embedded methods
feature selection extratreesclassifier
feature selection employ all original features
feature selection extraction and construction
feature selection example python
feature selection entropy
feature selection evaluation
feature selection for classification
feature selection for regression
feature selection for categorical variables
feature selection for clustering
feature selection for text classification
feature selection for unsupervised learning
feature selection filter method
feature selection for classification python
feature selection genetic algorithm
feature selection github
feature selection geeksforgeeks
feature selection genetic algorithm python
feature selection genetic algorithm github
feature selection high dimensional data
feature selection hyperparameter tuning
feature selection heatmap
feature selection hyperparameter tuning first
feature selection high correlation
feature selection in data science
feature selection in random forest
feature selection in image processing
feature selection in logistic regression
feature selection jupyter notebook
feature selection kaggle
feature selection krish naik github
feature selection k-nearest neighbors
#tekhnologiyamechtablog #coderprogram #somaymangla #machinelearning #datascience #computervision #featureselection #EDA #pandas #numpy #abhinavkaushik
Article By: Abhinav Kaushik
No comments:
Post a Comment