Machine Learning Interview Questions | Machine Learning Interview Preparation | Intellipaat
Summary
TLDRThis video dives into essential machine learning interview questions, explaining key concepts such as the differences between machine learning, artificial intelligence, and deep learning. It covers topics like bias and variance, clustering, linear regression, decision trees, and overfitting. The script also explores hypothesis testing, supervised vs. unsupervised learning, PCA, SVM, cross-validation, entropy, epochs, and the variance inflation factor. It discusses metrics like confusion matrices, type 1 and type 2 errors, and the use of logistic regression. Additionally, it provides insights on handling missing data in datasets, offering a comprehensive guide for those preparing for a career in data science.
Takeaways
- ๐ค Machine Learning, Artificial Intelligence (AI), and Deep Learning are distinct yet interrelated fields, with Deep Learning being a subset of Machine Learning, and Machine Learning being a subset of AI.
- ๐ Bias in machine learning refers to the difference between a model's average prediction and the correct value, while Variance measures the fluctuation in the model's output, with lower values being preferable for both.
- ๐ฅ Clustering is an unsupervised learning technique that groups similar data points together based on features and properties, with algorithms like K-Means and Mean Shift Clustering being commonly used.
- ๐ Linear Regression is a supervised learning algorithm that models the linear relationship between dependent and independent variables for predictive analysis.
- ๐ณ Decision Trees are a hierarchical model used to map out decisions and actions, helping to predict outcomes based on a sequence of choices.
- ๐ง Overfitting occurs when a model learns the training data too well, including its noise and outliers, which can be mitigated by techniques like cross-validation.
- โ๏ธ Hypothesis Testing in machine learning involves using a dataset to approximate an unknown target function that maps inputs to outputs effectively.
- ๐ท๏ธ Supervised Learning uses labeled data to train models that can predict outcomes, while Unsupervised Learning works with unlabeled data to discover underlying structures and patterns.
- ๐ The Bayes' Theorem is fundamental in machine learning, particularly for Bayesian Belief Networks and Naive Bayes classifiers, providing a way to calculate conditional probabilities.
- ๐ Principal Component Analysis (PCA) is a technique used to reduce the dimensions of multi-dimensional data by keeping only the most relevant dimensions, helping with data visualization and analysis.
- ๐ก๏ธ Support Vector Machines (SVM) are used for classification tasks and work by finding the hyperplane that best separates data into different classes.
- ๐ Cross-Validation is a technique to ensure that a machine learning model generalizes well to an independent dataset, involving methods like hold-out, k-fold, and leave-one-out.
- ๐๏ธ Entropy measures the randomness or unpredictability in data, with higher entropy indicating more difficulty in drawing conclusions from the data.
- ๐ Epoch refers to a complete pass through the entire training dataset in machine learning, with the number of epochs affecting the model's training.
- ๐ Variance Inflation Factor (VIF) is used to estimate the amount of multicollinearity in regression variables, helping to identify and manage it.
- ๐ข Confusion Matrix is a tool used to evaluate the performance of classification models by summarizing the counts of correct and incorrect predictions.
- ๐ซ Type 1 and Type 2 errors refer to False Positives and False Negatives respectively, which are critical to understand when evaluating the accuracy of predictive models.
- ๐ The choice between using Classification or Regression depends on the nature of the prediction task, with regression used for numerical predictions and classification for categorical outcomes.
- ๐ Logistic Regression is used for binary or categorical dependent variables, predicting the probability of an event occurring.
- ๐งฉ Handling Missing Values in datasets can be done using methods like detecting with `isnull()`, removing with `dropna()`, or filling with placeholder values using `fillna()` in Python's pandas library.
Q & A
What is the average salary of a machine learning engineer in the United States according to the video?
-According to the video, the average salary of a machine learning engineer in the United States is around $112,742 per year.
How much does a machine learning engineer typically earn in India per year?
-The video states that the average salary of a machine learning engineer in India is around 9 LPA (Lakhs per Annum) per year.
What is the relationship between machine learning, artificial intelligence, and deep learning?
-As explained in the video, deep learning is a subset of machine learning, and machine learning is a subset of artificial intelligence. These technologies are interrelated but distinct, with overlapping terms and techniques.
What is the difference between bias and variance in machine learning?
-Bias in machine learning is the difference between the average prediction of a model and the correct value, while variance is the difference of predictions over a training set and anticipated value of another training set. High bias can lead to inaccurate predictions, and high variance can lead to large fluctuations in the output.
Can you explain what clustering is in the context of machine learning?
-Clustering, as mentioned in the video, is an unsupervised learning technique used for grouping data points with similar features and properties into distinct categories. Algorithms like k-means and mean shift clustering help in classifying data points into their respective groups.
What is linear regression and how is it used in machine learning?
-Linear regression is a supervised machine learning algorithm used to find the linear relationship between dependent and independent variables for predictive analysis. It is represented by the equation y = a + b * x, where 'a' is the intercept, 'b' is the coefficient, 'x' is the independent variable, and 'y' is the dependent variable.
What is a decision tree in machine learning and how does it work?
-A decision tree in machine learning is a hierarchical diagram used to explain a sequence of actions that must be performed to get a desired output. It helps in making decisions by breaking down a complex problem into simpler steps based on a set of conditions.
What is overfitting in machine learning and how can it be avoided?
-Overfitting occurs when a machine learning model learns the training data too well, including its noise and outliers, leading to poor generalization on new data. It can be avoided by using techniques like cross-validation, which divides the data set into training and testing subsets to ensure the model performs well on unseen data.
What is hypothesis testing in machine learning and what is its purpose?
-Hypothesis testing in machine learning involves using a dataset to understand a specific function that maps inputs to outputs in the best possible way, known as function approximation. The goal is to find a model that approximates the target function and performs necessary input-output mappings.
What is the main difference between supervised and unsupervised learning in machine learning?
-Supervised learning uses labeled data to train the model, providing both input and output data, with the aim of predicting outputs for new data. Unsupervised learning, on the other hand, uses unlabeled data to identify hidden trends without any feedback, aiming to extract information from unknown datasets.
What is the purpose of Principal Component Analysis (PCA) in machine learning?
-PCA is used in machine learning to reduce the dimensions of multi-dimensional data by removing irrelevant dimensions and keeping only the most relevant ones. It finds a new set of uncorrelated dimensions or orthogonal dimensions and ranks them based on variance.
What is a Support Vector Machine (SVM) and how is it used in machine learning?
-A Support Vector Machine (SVM) is a machine learning algorithm primarily used for classification tasks. It operates on high-dimensional feature spaces and is designed to find the optimal hyperplane that separates data points into different classes.
What are the different techniques of cross-validation in machine learning?
-The video mentions several cross-validation techniques: hold-out method, k-fold cross-validation, stratified k-fold cross-validation, and leave-p-out cross-validation. These methods help in evaluating the performance of a machine learning model by using different subsets of the data for training and testing.
What does entropy measure in the context of machine learning?
-In machine learning, entropy measures the randomness or unpredictability in the data. The higher the entropy, the more difficult it is to draw useful conclusions from the data, as it indicates a higher level of disorder or randomness.
What is an Epoch in machine learning and how is it related to training a model?
-An Epoch in machine learning refers to a complete pass through the entire training dataset. It indicates the number of times the training process has worked through the entire dataset. The relationship between epochs, dataset size, iterations, and batch size can be understood through the formula D * E = I * B, where D is the dataset, E is the number of epochs, I is the number of iterations, and B is the batch size.
What is a confusion matrix and how does it help in evaluating a classification model?
-A confusion matrix is a tool used to evaluate the performance of a classification model by summarizing the predictions and comparing them with the actual outcomes. It provides counts of correct and incorrect predictions and helps identify the uncertainty between classes, contributing to the calculation of accuracy and other performance metrics.
What are Type 1 and Type 2 errors in the context of testing and evaluation?
-Type 1 error, also known as a false positive, occurs when a test incorrectly indicates that a condition is present when it is not. Type 2 error, or false negative, happens when a test fails to detect a condition that is actually present. These errors are important considerations in the evaluation and interpretation of test results.
When should classification be used over regression in predictive modeling?
-Classification should be used over regression when the task involves predicting categorical or discrete outcomes, such as determining whether an event belongs to a specific category. Regression, on the other hand, is used for predicting continuous numerical values, like the price of a house.
What is logistic regression and how is it different from linear regression?
-Logistic regression is a type of regression analysis used when the dependent variable is categorical or binary. Unlike linear regression, which predicts continuous outcomes, logistic regression is used to predict the probability of a certain class or event occurring and is particularly useful for binary classification tasks.
How can missing values in a dataset be handled using Python's pandas library?
-In Python's pandas library, missing values can be handled using functions like 'isnull()' to detect missing values, 'dropna()' to remove rows or columns with null values, and 'fillna()' to fill missing values with placeholder values or statistics like mean or median.
Outlines
๐ Introduction to Machine Learning Interview Questions
The video begins with an introduction to machine learning as an exciting field that enables computers to learn from data and make decisions without explicit programming. It highlights the popularity and high demand for machine learning engineers across various industries and mentions average salaries in the United States and India. The video then invites viewers to subscribe and prepare for a series of interview questions starting with the basic definitions of machine learning, artificial intelligence, and deep learning, emphasizing their differences and relationships.
๐ Understanding Bias and Variance in Machine Learning
This section delves into the concepts of bias and variance in machine learning. Bias refers to the difference between a model's average predictions and the correct value, with a lower bias indicating more accurate predictions. Variance is the difference between predictions over a training set and the expected value from other training sets, with a lower variance desired to avoid large fluctuations in output. The video explains the trade-off between bias and variance and provides a visual representation to help viewers understand the balance needed for optimal model performance.
๐ค Clustering Techniques in Machine Learning
The script explains clustering as an unsupervised learning technique that groups similar data points together. It introduces k-means clustering as a common algorithm used to find hidden patterns in data and classify it into groups based on feature similarity. The mean shift clustering is also mentioned, which differs from k-means by automatically discovering the number of clusters without pre-specifying it. The video aims to provide a brief understanding of clustering algorithms and their applications in machine learning.
๐ Linear Regression and Decision Trees in Machine Learning
The video discusses linear regression, a supervised learning algorithm used to find linear relationships between dependent and independent variables for predictive analysis. It uses the equation y = a + b*x to illustrate this relationship and explains the process of finding the best fit line by adjusting coefficients. The decision tree is also introduced as a hierarchical diagram that represents a sequence of actions to achieve a desired output, using an example of driving with or without a license to demonstrate its application in machine learning.
๐ Overfitting, Hypothesis Testing, and Learning Types
This part of the video addresses the issue of overfitting, which occurs when a model learns from an inadequate dataset, and how it can be mitigated using cross-validation techniques. Hypothesis testing is introduced as a method to understand the function that maps inputs to outputs, with the model approximating this function. The difference between supervised and unsupervised learning is also explained, with supervised learning using labeled data for training and unsupervised learning identifying hidden trends in unlabeled data.
๐ Principal Component Analysis (PCA) and Support Vector Machines (SVM)
The script introduces principal component analysis (PCA) as a method to reduce the dimensions of multi-dimensional data by removing irrelevant dimensions and keeping the most relevant ones, which is essential for data visualization and analysis. Support vector machines (SVM) are then discussed as a classification algorithm used in high-dimensional spaces, highlighting their role in machine learning for classification tasks.
๐ Cross-Validation and Entropy in Machine Learning
Cross-validation is explained as a technique to enhance the performance of a machine learning algorithm by breaking the dataset into smaller parts for training and testing. Various cross-validation techniques are mentioned, including hold-out method, k-fold cross-validation, and others. Entropy is introduced as a measure of randomness in data, with higher entropy indicating more difficulty in drawing conclusions from the data.
๐ข Epoch, Variance Inflation Factor, and Confusion Matrix
The concept of an epoch in machine learning is defined as the count of passes over a training dataset, with the relationship between epochs, iterations, and batch size explained through a formula. The variance inflation factor is introduced as an estimate of multicollinearity in regression variables. The confusion matrix is also discussed as a tool to evaluate the performance of classification models by providing a summary of correct and incorrect predictions.
๐ซ Type 1 and Type 2 Errors, and Choosing Between Classification and Regression
The video clarifies type 1 and type 2 errors, with type 1 being a false positive and type 2 being a false negative, using examples to illustrate each. It also discusses when to use classification over regression, explaining that classification is for identifying groups while regression is for predicting numerical outcomes, with examples provided to distinguish between the two.
๐งฎ Logistic Regression and Handling Missing Values
Logistic regression is introduced as a method for predictive analysis when the dependent variable is categorical or binary, used to predict probabilities of categorical outcomes. The video provides examples of its application, such as predicting seniority or disease presence. The final topic is handling missing values in a dataset using Python's pandas library, with methods like 'isnull' for detection, 'dropna' for removal, and 'fillna' for replacing missing values.
Mindmap
Keywords
๐กMachine Learning
๐กArtificial Intelligence
๐กDeep Learning
๐กBias
๐กVariance
๐กClustering
๐กLinear Regression
๐กDecision Tree
๐กOverfitting
๐กHypothesis Testing
๐กSupervised Learning
๐กUnsupervised Learning
๐กPCA (Principal Component Analysis)
๐กSVM (Support Vector Machine)
๐กCross-Validation
๐กEntropy
๐กEpoch
๐กVariance Inflation Factor (VIF)
๐กConfusion Matrix
๐กType 1 and Type 2 Errors
๐กLogistic Regression
๐กHandling Missing Values
Highlights
Machine learning is an exciting field involving algorithms and statistical models that enable computers to learn from data.
Machine learning is popular in various industries such as Finance, Healthcare, and e-commerce.
The average salary of a machine learning engineer in the United States is around $112,742 per year.
In India, the average salary for a machine learning engineer is approximately 9 LPA per year.
Machine learning, artificial intelligence, and deep learning are distinct yet interrelated technologies.
Deep learning is a subset of machine learning, which in turn is a subset of artificial intelligence.
Bias in machine learning refers to the difference between a model's average prediction and the correct value.
Variance in machine learning is the difference of predictions over a training set and anticipated value of other training sets.
Clustering is an unsupervised learning technique used for grouping similar data points.
K-means clustering is a popular algorithm for finding hidden patterns in data and classifying it into various groups.
Linear regression is a supervised machine learning algorithm used to find the linear relationship between variables for predictive analysis.
Decision trees are used to explain a sequence of actions that must be performed to get the desired output.
Overfitting occurs when a machine learning model learns from an inadequate dataset.
Hypothesis testing in machine learning involves using a dataset to understand a specific function that maps input to output.
Supervised learning uses labeled data to train models and confirm the correctness of predictions.
Unsupervised learning uses unlabeled data to identify hidden trends without feedback.
Bias theorem provides the probability of an event occurring using prior knowledge.
Principal Component Analysis (PCA) is used to reduce the dimension of data by keeping only the most relevant dimensions.
Support Vector Machines (SVM) are used for classification on high dimensionality of characteristic vectors.
Cross-validation is used to increase the performance of a machine learning algorithm by using sample data.
Entropy in machine learning measures the randomness in data, affecting the ease of drawing conclusions.
An Epoch in machine learning indicates the count of passes in a training dataset by the algorithm.
Variance Inflation Factor (VIF) estimates the volume of multi-collinearity in regression variables.
A confusion matrix is used to explain a model's performance and summarize predictions of classification problems.
Type 1 error (false positive) and Type 2 error (false negative) are significant concepts in understanding test outcomes.
Classification should be used over regression when predicting categorical outcomes, while regression is for numerical predictions.
Logistic regression is used for binary or categorical dependent variables and predicts probabilities of outcomes.
Handling missing values in a dataset can be done using methods like isnull, dropna, and fillna in Python pandas.
Transcripts
[Music]
hello everyone and welcome to today's
video on machine learning interview
questions on intellipart
do you know friends that machine
learning is an exciting field that
involves developing algorithms and
statistical models that enables
computers to learn from data and make
predictions or decisions without being
explicitly programmed machine learning
is becoming increasingly popular in wide
range of Industries including Finance
Healthcare and e-commerce
according to the pay scale the average
salary of a machine learning engineer in
the United States is around 112 742
dollars per year while going upwards
it's around 160 000 per year in India
the average salary of a machine learning
engineer is around 9 LPA per year
so without further Ado let's dig dive
and discuss our interview questions but
before that do not forget to hit the
Subscribe button and click the Bell icon
so let's start with machine learning
interview question here is your first
question it is a kind of pretty basic
question and the question is explain
machine learning artificial intelligence
and deep learning
it is very common to get confused
between the three in-demand Technologies
which are machine learning artificial
intelligence and deep learning these
three Technologies throw a little
different from one another but are
quietly interrelated while deep learning
is a subset of machine learning machine
learning is a subset of artificial
intelligence some terms and techniques
May overlap in these Technologies and it
is quite easy to get confused among them
so let's learn about these Technologies
if I talk about machine learning machine
learning involves various statisticals
and deep learning techniques that allows
machines to use their past experiences
and get better at performing specific
tasks without having been to be
monitored if I talk about artificial
intelligence artificial intelligence
uses numerous machine learning and deep
learning techniques that enable computer
systems to perform tasks using
human-like intelligence with logic and
rules
if I talk about deep learning then deep
learning comprises of several algorithms
that enable softwares to learn from
themselves and perform various business
tasks including image and speech
recognition deep learning is a possible
when systems expose their multi-layered
neural networks to a large volume of
data
I hope so guys you would have got brief
idea regarding machine learning
artificial intelligence and deep
learning
so our next question is what is the
difference between bias and variance in
machine learning
the answer to the same question is that
bias is a difference between the average
prediction of a model and the correct
value of the model if the bias value is
high the prediction of the model is not
accurate hence the bias value should be
as low as possible to make the desired
predictions if I talk about the variance
variance is a number that gives a
difference of predictions over a
training set and anticipated value of
another training sets High variance may
lead to large fluctuations in the output
therefore a model's output should have a
low variance if you could see in a
diagram you could see the following
trade-off so here is a bias and variance
trade-off here is a desired result in
the blue circle at the center if you get
off from the blue section then the
prediction goes on wrong I hope so guys
you would have got a brief idea
regarding the difference between bias
and variance in the machine learning now
let's move on to our next question
our next question is what is clustering
in machine learning
if I talk about clustering clustering is
a technique which is used in
unsupervised learning that involves
grouping data points the clustering
algorithms can be used with a set of
data points this technique will allow
you to classify all the data points into
their particular groups the data points
that are thrown into the same category
have similar features and properties
while the data points that belong to a
different group have distinct features
and properties statistical data analysis
can be performed by this method let us
take at some of the examples some of the
examples can be k-means clustering if I
talk about the k-means clustering this
algorithm will be commonly used when
there is a data with no specific group
or category K means clustering allows
you to find the hidden patterns in the
data which can be used to classify the
data into the various groups the
variable key is used to represent the
number of groups the data is divided
into and the data points are clustered
using the similarity of features here
the centroids of the Clusters are used
for labeling new data another clustering
algorithm can be mean shaped clustering
if I talk about the mean shift
clustering the mean name of this
algorithm will be to update the center
point and the candidates to be mean and
find the center points of all the groups
in mean shift clustering unlike k-means
clustering the possible number of
clusters need not to be selected as it
can be automatically be discovered by
the mean shift so here are some of the
examples of the clustering algorithms I
hope so guys you would have got a brief
idea regarding what is clustering in
machine learning now let's move on and
discuss our next question
which is what is linear regression in
machine learning this is the most
popular questions which is asked in the
machine learning interview now let's
discuss the answer of the same
layer regression is a supervised machine
learning algorithm which is used to find
the linear relationship between the
dependent and the independent variables
for predictive analysis
here the equation can be consider y
equals to A plus b dot X where X is the
input or the independent variable where
Y is the output or a dependent variable
and a is the intercept and B is the
coefficient of x you can see here the
diagram with best fit shows that the
data of the weight y or a dependent
variable and the heter of the X of the
independent variable
here the straight line shows that the
best linear relationship that would help
in predicting the weight of the
candidates according to their height to
get this best fit line the best values
of A and B should be found by adjusting
the values of A and B the errors in the
prediction of Y can be reduced this is
how the linear regression helps in
finding the linear relationship and
predicting the output I hope so guys you
would have got a brief idea regarding
what is linear regression in machine
learning now let's move forward and
discuss our next question
our next question is what is decision
Tree in machine learning if I talk about
the decision tree a decision tree is
used to explain a sequence of actions
that must be performed to get the
desired output it is a hierarchical
diagram that shows the action this
algorithm can be created for a decision
Tree on the basis of the set of
hierarchy of actions in the above
decision tree diagram a sequence of
actions has been made for driving a
vehicle with or without license so you
can see how the decision tree algorithm
works in the machine learning domain I
hope so guys you would have got a brief
idea regarding what is decision Tree in
machine learning now let's move forward
and discuss our next question
our next question is what is overfitting
in machine learning
actually overfitting happens when a
machine learning has an inadequate data
set and tries to learn from it so
overfitting is inversely proportional to
the amount of data for small databases
overfitting can be bypassed by the cross
validation method in this approach a
data set is divided into two sections
these two sections will comprise the
testing and training data set to train
the model the training data set is used
and for testing of the model for new
inputs the testing data set is used this
is how we can avoid the overfitting I
hope so guys you would have got a brief
idea regarding what is overfitting in
machine learning now let's move forward
and discuss our next question
our next question is what is hypothesis
testing if I talk about the hypothesis
testing machine learning allows the use
of available data set to understand a
specific function that Maps the input to
the output in the best possible way this
problem is known as a function
approximation here the approximation
need to be used for the unknown Target
function that maps all plausible
observations based on a given problem in
the best manner hypothesis in machine
learning is a model that helps in
approximating the Target function and
Performing the necessary input to Output
mappings the choice and configuration of
algorithms allow defining the space of
the plausible hypothesis that may be
represented by the model in hypothesis
the lower Edge is used for a specific
hypothesis where the uppercase Edge or
the Capital Edge is used for hypothesis
space that is being searched so this is
what exactly the hypothesis testing is I
hope so guys you would have got a fair
idea regarding what exactly is
hypothesis testing now let's move
forward and discuss our next question
so the next question is what is the
difference between supervised and
unsupervised learning if I talk about
the supervised learning the algorithms
of the supervised learnings used the
label data to get trained the model
takes the direct feedback to confirm
with the output that is being predicted
is indeed correct moreover both the
input data and the output data are
provided to the model and the main aim
here is to train the model to predict
the output upon receiving the new data
supervoice learning offers accurate
results and can largely be divided into
two parts which is classification and
regression if I talk about the
unsupervised learning the algorithms of
the unsupervised learning use unlabeled
data for training purposes in
unsupervised learning the models
identify hidden data Trends and do not
take any feedback the unsupervised
learning model is only provided with
input data unsupervised learnings main
aim is to identify hidden patterns to
extract information from the unknown
sets of data it can also be classified
into two parts which is clustering and
Association unfortunately unsupervised
learnings can offer results that are
comparatively less accurate I hope so
guys you would have got a brief idea
regarding what is the difference between
supervised and unsupervised learning now
let's move on and discuss our next
question
so our next question is what is bias
theorem if I talk about the bias theorem
bias theorem offers the probability of
Any Given amount to occur using the
prior knowledge in mathematical terms it
can be defined as a true positive rate
of the given sample conditions divided
by the sum of the true positive rate of
the set conditions and false positive
rate of the entire population
two of the most significant applications
of bias theorem in machine learning are
and byzene belief networks this theorem
is also the foundation behind the
machine learning brand that involves
need bias classifier so as you can see
the formula here P of a by b equals to P
of B by a DOT p a divided by PB where P
of a by B is a probability of occurring
B given the evidence B has already
occurred where P of B by a is equals to
probability of B occurring given the
evidence a has already occurred here PA
is a probability of a occurring and PB
is a probability of B occurring I hope
so guys you would have got the idea
regarding what is bias theorem now let's
move on and discuss our next question
so our next question is what is PCA in
machine learning
if I talk about the multi-dimensional
data it plays an important role in the
real world data visualization and
computations become more challenging
with increase in the dimension in such
scenarios the dimension of data might
have to be reduced to analyze and
visualize it easily this is done by
removing irrelevant dimensions and
keeping only the most relevant Dimension
this is where the principal component
analysis is used the goal of the PCA is
to find a fresh collection of
uncorrelated Dimension or orthogonal and
rank them on the basis of variance
which defines the process of PCA in
machine learning I hope so guys you
would have got a brief idea regarding
what is PCA in machine learning now
let's move on and discuss our next
question
our next question is what is svm in
machine learning if I talk about svm or
support Vector machines it is a machine
learning algorithm that is majorly used
for classification it is used on the top
of the high dimensionality of the
characteristic vector which basically
defines what svm is in machine learning
I hope so guys you would have got a fair
idea what are support Vector machines in
machine learning now let's move on and
discuss our next question
so next question is what is cross
validation in machine learning if I talk
about the cross validation cross
validation allows a system to increase
the performance of a given machine
learning algorithm which is fed a number
of sample data from the data set this
sampling process is done to break the
data set into smaller parts that have
the same number of rows out of which a
random part is selected as a test set
and rest of the parts are kept as a
train sets cross validation consists of
the following techniques which can be
hold out method k-fold cross validation
stratified k-fold cross-validation and
leave P out cross validation I hope so
guys you would have got a fair idea
regarding what is cross validation in
machine learning
our next question is what is entropy in
machine learning the answer to the same
question is entropy in machine learning
measures the randomness in the data that
needs to be processed the more entropy
in the given data the more difficult it
becomes to draw any useful conclusion
from the data for example let us take
the flipping of a coin the result of
this act is random and it does not favor
heads or tails here the result of any
number of tosses cannot be predicted
easily as there is no definite
relationship between the action of
flipping and the possible outcomes I
hope so guys you would have got a brief
idea regarding entropy in machine
learning now let's move forward and
discuss our next question our next
question is what is Epoch in machine
learning
if I talk about Epoch in machine
learning which is basically used to
indicate the count of passes in a given
training data set where the machine
learning algorithm has done its job
generally when there is last chunk of
data it is grouped into several batches
and all these batches go through the
given model and this process is referred
to as iteration
now if the batch size comprises the
complete training data set the count of
iteration is same as that of epochs in a
case there is more than one batch which
is equals to D dot E equals to I star B
which is a Formula where D is a data set
and E is the number of epochs and I is
the number of iterations and B is the
batch size so you can remember d dot E
equals to I dot b where D is a data set
e is the number of epochs and I is the
number of iteration and where b equals
to batch size so this equation generally
defines the relationship between Epoch
data set iteration and number of batches
I hope so guys you would have got a fair
idea regarding what is Epoch in machine
learning now let's move on and discuss
our next question
our next question is what is the
variance inflation Factor the various
inflation factor is the estimate of the
volume of multi-collinearity in a
collection of many regression variables
where vif equals to variance of the
model divided by variance of the model
with single independent variable
that's it for variance inflation factor
I also guys you would have got a brief
idea regarding what is variation
inflation factor and it is one of the
most important questions which can be
asked in a machine learning interview
now let's move on and discuss our next
question
so our next question is what is a
confusion Matrix if I talk about the
confusion metric it is used to explain
modern's performance and gives the
summary of predictions of classification
problems it assists in identifying the
uncertainty between classes confusion
Matrix gives the count of correct and
incorrect values and error types
according to the model where you can see
accuracy is defined as TP plus TN
divided by TP plus TN plus FP plus FL
I hope so guys you would have got a fair
idea regarding what is confusion Matrix
now let's move on and discuss our next
question so our next question is what is
the type 1 and type 2 error type 1 error
is false positive is an error where the
outcome of a test shows the
non-acceptance of a true condition for
example suppose a person gets diagnosed
with depression even when they are not
suffering from the same it is a case of
false positive if I talk about the type
2 error type 2 error or false negative
is an error where the outcome of a test
shows the acceptance of a false
condition for example the CT scan of a
person shows that they do not have a
disease but in fact they do have a
disease here the test accepts the false
condition that the person does not have
the disease this is a case of false
negative I hope so guys you would have
got the idea regarding what are type 1
and type 2 error
now let's move on and discuss our next
question
so our next question is when should
classification be used over regression
both classification and regression are
associated with prediction
classification involves the
identification of values or entities
that lie in a specific group regression
entails predicting a response value from
the consecutive set of outcomes for
example if you want to predict the price
of a house you should use regression
since it is a numerical variable however
if you are trying to predict whether the
house situated in a particular area is
going to be high medium or low price
then the classification model should be
used
I hope so guys you would have got a fair
idea that when should you use
classification over regression now let's
move forward and discuss our next
question
so next question is explain logistic
regression
is also one of the most asked question
in a machine learning interview
so logistic regression is a proper
regression analysis when the dependent
variable is categorial or binary like
all regression analysis logistic
regression is a technique for predictive
analysis logistic regression is used to
explain data and a relationship between
one dependent binary variable or one or
more independent variable logistic
regression is also employed to predict
the probability of categorial dependent
variables logistic regressions can be
used in the following scenarios so here
are some of the examples such as to
predict whether a citizen is senior or
not or to check whether a person has a
disease or not
I hope so guys you will have got a fair
idea what exactly is logistic regression
now let's move on and discuss our next
question
so our final question is how do I handle
missing values in a data set so if I
correctly answer this question in Python
pandas there are two possible methods to
locate the lost or corrupted data and
discard those values the first function
is is null it can be used for detecting
the missing values the second one is
dropner where it can be used for
removing a column or row with null
values and there is also another which
is called filner which can be used to
fill all the word values with the
placeholder values I hope so guys you
would have got a fair idea regarding how
to handle the missing values in a data
set
thank you guys for watching this video
that was all for today's session I hope
so you would have enjoyed our today's
video on machine learning interview
questions if you want to make a career
in data science then intellipat has IIT
Madras Advanced Data science and AI
certification program
this course is of very high quality and
cost effective as it is taught by IIT
professors and Industry experts
thank you
Browse More Related Video
![](https://i.ytimg.com/vi/BT6Aw6Q75Yg/hq720.jpg)
All Learning Algorithms Explained in 14 Minutes
![](https://i.ytimg.com/vi/Yq0QkCxoTHM/hq720.jpg)
Googleโs AI Course for Beginners (in 10 minutes)!
![](https://i.ytimg.com/vi/UhJ_F4uovgE/hq720.jpg)
100+ Statistics Concepts You Should Know
![](https://i.ytimg.com/vi/ukzFI9rgwfU/hq720.jpg)
Machine Learning | What Is Machine Learning? | Introduction To Machine Learning | 2021 | Simplilearn
![](https://i.ytimg.com/vi/KhUpxmxnF8o/hq720.jpg)
OCR GCSE Computer Science Paper 2 in 30 mins
![](https://i.ytimg.com/vi/Rt6eb9VOFII/hq720.jpg)
How I Would Learn Data Science in 2022
5.0 / 5 (0 votes)