Machine-Learning Interview Questions
Master the most commonly asked interview questions with comprehensive, expert-crafted answers designed to help you succeed.
What are Different Kernels in SVM?
Support Vector Machine (SVM) uses kernel functions to transform the input data into the required form, making it easier to classify data points that are not linearly separable in the original space.
Different Types of Kernels in SVM:
- Linear Kernel: Used when the data is linearly separable. It is the fastest kernel and best used when features are many and data is simple.
- Polynomial Kernel: Suitable for datasets where the data points are not linearly separable and exhibit polynomial relationships.
- Radial Basis Function (RBF) Kernel / Gaussian Kernel: Most commonly used; it maps the data into an infinite-dimensional space and is effective for non-linear problems.
- Sigmoid Kernel: Acts similarly to the activation function used in neural networks. Not commonly used in practice compared to others.
- ANOVA Kernel: Useful in capturing interactions between features; often applied in complex pattern recognition tasks.
- Hyperbolic Tangent Kernel: Similar to the sigmoid, and also used for neural network-style classification.
Choosing the right kernel is critical for model performance, especially in high-dimensional or non-linear datasets.
Explain the Difference Between Classification and Regression?
Classification and Regression are two types of supervised machine learning techniques used for predicting outcomes based on input data.
Key Differences Between Classification and Regression:
Classification | Regression |
---|---|
Used to predict discrete categories or labels. | Used to predict continuous numeric values. |
Example: Classifying emails as Spam or Not Spam. | Example: Predicting the price of a house based on features. |
Outputs are labels like 'Yes' or 'No', 'Cat' or 'Dog'. | Outputs are real-valued, such as 72.5 or 120.0. |
Common algorithms: Logistic Regression, Decision Trees, Random Forest, SVM (classification mode). | Common algorithms: Linear Regression, Polynomial Regression, Support Vector Regression (SVR). |
Accuracy, Precision, Recall are commonly used metrics. | Mean Absolute Error, Mean Squared Error are commonly used metrics. |
What are some real-life applications of clustering algorithms?
Clustering algorithms group data points into clusters based on similarity, and they are widely used across industries for uncovering hidden patterns in data.
Real-life Applications of Clustering Algorithms:
- Customer Segmentation: Grouping customers based on behavior, preferences, or demographics to personalize marketing strategies.
- Recommendation Systems: Clustering similar items or users to suggest relevant products, movies, or content.
- Fraud Detection: Identifying unusual patterns in financial transactions that deviate from regular clusters to detect fraud.
- Image Compression: Reducing the number of colors or patterns by clustering similar pixels, improving storage efficiency.
- Healthcare: Grouping patients with similar symptoms or genetic markers for better treatment planning and diagnosis.
- Document Categorization: Clustering similar documents or web pages to improve information retrieval and search relevance.
These applications help businesses and systems make data-driven decisions, improve efficiency, and personalize user experiences.
What Are the Different Types of Machine Learning?
Machine Learning is a subset of Artificial Intelligence that enables systems to learn from data and improve their performance over time without being explicitly programmed. It is broadly classified into three types based on the type of learning:
1. Supervised Learning
In supervised learning, the model is trained using labeled data (input-output pairs). The goal is to learn a function that maps inputs to desired outputs.
- Examples: Email spam detection, sentiment analysis, price prediction.
- Algorithms: Linear regression, decision trees, support vector machines.
2. Unsupervised Learning
In unsupervised learning, the model is given data without labels and must find patterns, groupings, or hidden structures within it.
- Examples: Customer segmentation, anomaly detection, topic modeling.
- Algorithms: K-means clustering, hierarchical clustering, PCA.
3. Reinforcement Learning
Reinforcement learning is based on the reward-punishment mechanism. An agent learns by interacting with an environment and receiving feedback in the form of rewards or penalties for its actions.
- Examples: Game playing, robotics, self-driving cars.
- Key Components: Agent, environment, reward signal, policy.
Each type of learning is suited for specific problems and scenarios, making machine learning a versatile approach in various domains.
What is Bias in Machine Learning?
Bias in Machine Learning refers to the error introduced by approximating a real-world problem, which may be complex, with a simplified model. It can also indicate a preference in the data or algorithms that leads to inaccurate or unfair outcomes.
In simpler terms, bias occurs when the model makes assumptions about the data that may not be true, potentially leading to underfitting or systematic errors.
Example: Consider a case where a company like Amazon creates a resume filtering system. If the historical hiring data is biased towards one gender or school, the model might learn and reproduce that bias, preferring male candidates or candidates from specific universities.
Types of Bias in ML:
- Prejudice Bias: Bias in the training data due to human prejudices.
- Measurement Bias: Inaccuracies in measuring features or labels.
- Algorithmic Bias: Bias introduced by how the model processes data or learns patterns.
Detecting and reducing bias is crucial to ensure fairness, especially in applications like hiring, loan approval, or healthcare predictions.
What is overfitting in machine learning and how can it be avoided?
Overfitting occurs when a machine learning model learns not only the underlying patterns in the training data but also the noise and random fluctuations. This results in excellent performance on training data but poor generalization to new, unseen data.
Symptoms of Overfitting: Low error on training data, but high error on test/validation data.
How to Avoid Overfitting:
- Early Stopping: Stop training the model once the performance on the validation data starts to deteriorate, even if training error continues to decrease.
- Regularization: Use L1 (Lasso) or L2 (Ridge) regularization to add penalties to large weights, discouraging the model from relying too heavily on specific features.
- Cross-Validation: Use techniques like k-fold cross-validation to ensure the model performs well across different data splits.
- Pruning (for trees): Remove parts of the model that do not contribute significantly to prediction.
- Dropout (for neural networks): Randomly drop neurons during training to prevent co-dependency.
- More Data: Adding more relevant training data can help the model generalize better.
By applying these techniques, we can build models that generalize better and perform consistently on both training and test data.
Why can't we use linear regression for a classification task?
Linear Regression is not suitable for classification tasks due to the fundamental differences between regression and classification problems.
Key Reasons:
- Continuous vs. Discrete Output: Linear regression predicts continuous and unbounded values, whereas classification requires discrete labels (like 0 or 1 in binary classification).
- Probability Interpretation: Classification often requires probabilistic outputs between 0 and 1 to make decisions (e.g., via thresholding). Linear regression doesn't naturally confine predictions within this range.
- Non-Convex Loss Function: When using linear regression for classification, the loss function becomes non-convex, increasing the risk of the optimization process getting stuck in local minima rather than finding the global minimum.
- Performance Issues: Linear regression can produce values outside the target class boundaries, making predictions unreliable (e.g., predicting a probability of 1.2 or -0.3 in binary classification).
Instead of linear regression, algorithms like logistic regression are used for classification as they provide bounded, probabilistic outputs and are optimized using convex loss functions, ensuring reliable classification performance.
What is the difference between Precision and Recall?
Precision and Recall are two key evaluation metrics used in classification problems, especially when dealing with imbalanced datasets.
Definitions:
- Precision: The proportion of predicted positive cases that are actually positive. It focuses on reducing false positives.
- Recall: The proportion of actual positive cases that are correctly predicted. It focuses on reducing false negatives.
Formulas:
Metric | Formula | Meaning |
---|---|---|
Precision | TP / (TP + FP) | Out of all predicted positives, how many are actually correct? |
Recall | TP / (TP + FN) | Out of all actual positives, how many did we correctly identify? |
Example: In a cancer detection system, if the goal is to catch all possible positive cases (patients with cancer), then recall is more important. If we want to be sure that those we diagnose as positive really are positive, then precision becomes more important.
What is Cross-Validation?
Cross-validation is a resampling technique used in machine learning to evaluate and improve the performance of a model by training and testing it on different subsets of the dataset. It helps in ensuring that the model generalizes well to unseen data and prevents overfitting.
K-Fold Cross-Validation:
- The dataset is divided into k equal-sized folds (subsets).
- The model is trained on k-1 folds and tested on the remaining 1 fold.
- This process is repeated k times, with each fold used once as the test set.
- The final performance score is obtained by averaging the results from each fold.
Advantages:
- Reduces variance in performance estimates.
- Makes efficient use of limited data.
- Improves the reliability of model evaluation.
Example: In 5-fold cross-validation, the dataset is split into 5 parts. The model trains on 4 parts and validates on the remaining 1, repeating this process 5 times. The average of the 5 validation scores gives the final model evaluation metric.
How to choose an optimal number of clusters?
Elbow Method: Plot the explained variance or within-cluster sum of squares (WCSS) against the number of clusters. The "elbow" point, where the curve starts to flatten, indicates the optimal number of clusters.
Silhouette Score: Measures how similar each point is to its own cluster compared to other clusters. A higher silhouette score indicates better-defined clusters. The optimal number of clusters is the one with the highest average silhouette score.
Gap Statistic: Compares the clustering result with a random clustering of the same data. A larger gap between the real and random clustering suggests a more appropriate number of clusters.
What is feature engineering? How does it affect the model’s performance?
Feature engineering refers to developing some new features by using existing features. Sometimes there is a very subtle mathematical relation between some features which if explored properly then the new features can be developed using those mathematical operations.
Also, there are times when multiple pieces of information are clubbed and provided as a single data column. At those times developing new features and using them help us to gain deeper insights into the data as well as if the features derived are significant enough helps to improve the model’s performance a lot.
Why do we perform normalization?
To achieve stable and fast training of the model we use normalization techniques to bring all the features to a certain scale or range of values. If we do not perform normalization then there are chances that the gradient will not converge to the global or local minima and end up oscillating back and forth.
What is data leakage and how can we identify it?
If there is a high correlation between the target variable and the input features then this situation is referred to as data leakage. This is because when we train our model with that highly correlated feature then the model gets most of the target variable's information in the training process only and it has to do very little to achieve high accuracy.
In this situation, the model gives pretty decent performance both on the training as well as the validation data but as we use that model to make actual predictions then the model’s performance is not up to the mark. This is how we can identify data leakage.
What are some of the hyperparameters of the random forest regressor which help to avoid overfitting?
The most important hyperparameters of a Random Forest are:
- max_depth: Sometimes the larger depth of the tree can create overfitting. To overcome it, the depth should be limited.
- n-estimator: It is the number of decision trees we want in our forest.
- min_sample_split: It is the minimum number of samples an internal node must hold in order to split into further nodes.
- max_leaf_nodes: It helps the model to control the splitting of the nodes and in turn, the depth of the model is also restricted.
Is it always necessary to use an 80:20 ratio for the train test split?
No, there is no such necessary condition that the data must be split into 80:20 ratio. The main purpose of the splitting is to have some data which the model has not seen previously so, that we can evaluate the performance of the model.
If the dataset contains let’s say 50,000 rows of data then only 1000 or maybe 2000 rows of data is enough to evaluate the model’s performance.
What is one-shot learning?
One-shot learning is a concept in machine learning where the model is trained to recognize the patterns in datasets from a single example instead of training on large datasets. This is useful when we haven't large datasets. It is applied to find the similarity and dissimilarities between the two images.
Whether decision tree or random forest is more robust to the outliers.
Decision trees and random forests are both relatively robust to outliers. A random forest model is an ensemble of multiple decision trees so, the output of a random forest model is an aggregate of multiple decision trees.
So, when we average the results the chances of overfitting get reduced. Hence we can say that the random forest models are more robust to outliers.
Explain SMOTE method used to handle data imbalance.
In SMOTE, we synthesize new data points using the existing ones from the minority classes by using linear interpolation. The advantage of using this method is that the model does not get trained on the same data.
But the disadvantage of using this method is that it adds undesired noise to the dataset and can lead to a negative effect on the model’s performance.
How Do You Handle Missing or Corrupted Data in a Dataset?
One of the easiest ways to handle missing or corrupted data is to drop those rows or columns or replace them entirely with some other value.
There are two useful methods in Pandas:
- isnull() and dropna() will help to find the columns/rows with missing data and drop them
- fillna() will replace the wrong values with a placeholder value
What Is a False Positive and False Negative and How Are They Significant?
False positives are those cases that wrongly get classified as True but are False.
False negatives are those cases that wrongly get classified as False but are True.
In the term ‘False Positive,’ the word ‘Positive’ refers to the ‘Yes’ row of the predicted value in the confusion matrix. The complete term indicates that the system has predicted it as a positive, but the actual value is negative.
Why Choose Our Question Bank?
Get access to expertly crafted answers and comprehensive preparation materials
Complete Collection
Access all 20 carefully curated questions covering every aspect of Machine-Learning interviews
Expert Answers
Get detailed, professional answers crafted by industry experts with real-world experience
Instant Access
Start preparing immediately with instant access to all questions and answers after sign-up