Structural Equation Modeling: what is it and what can we use it for? (part 1 of 6)
Summary
TLDRStructural Equation Modeling (SEM) is a comprehensive framework integrating multivariate techniques from various disciplines. It is particularly useful for analyzing complex constructs and relationships, including indirect effects. SEM uses latent variables and path analysis to model causal systems and correct for measurement errors, with software like AMOS and LISREL available for implementation.
Takeaways
- 🔍 Structural Equation Modeling (SEM) is a comprehensive framework that integrates various multivariate techniques from disciplines such as psychology, statistics, and econometrics.
- 📊 SEM is not a single technique but a dynamic modeling environment that evolves to incorporate new ways of fitting models over time.
- 🎯 SEM is particularly suitable for addressing complex research questions involving multifaceted constructs, such as psychological concepts, and for modeling causal systems with multiple outcomes or dependent variables.
- 🔍 SEM is adept at handling indirect or mediated effects, where the effect of one variable on another is transmitted through a third variable.
- 📚 SEM is known by various names, including covariance structure analysis, analysis of moment structures, and LISREL models, reflecting its diverse origins and applications.
- 💻 There are numerous software packages available for SEM, each with its own advantages and disadvantages, and some offer free versions for students to explore.
- 🛣 SEM can be thought of as path analysis using latent variables, which are underlying constructs that influence attitudes and behaviors but are not directly observable.
- 📝 Latent variables are measured using observable indicators, such as questionnaire items, which are believed to be caused by the underlying latent constructs.
- 🔧 SEM employs a true score equation, where the observed variable is comprised of a true score and error, and the goal is to isolate the true score while removing the error variance.
- 📈 The use of multiple indicators for latent constructs in SEM allows for the correction of measurement errors and provides a better estimate of the true score.
- 🔗 Path analysis, a key component of SEM, visually represents models diagrammatically, emphasizing direct, indirect, and total effects in the relationships between variables.
Q & A
What is the primary distinction of Structural Equation Modeling (SEM) compared to other statistical techniques?
-SEM is not a single technique but a general modeling framework that integrates various multivariate techniques into one environment, drawing on disciplines like psychology, statistics, epidemiology, and econometrics.
How does SEM differ from traditional statistical methods in terms of the research questions it can address?
-SEM is particularly suitable for addressing complex research questions involving multifaceted constructs, systems of relationships, causal systems, and indirect or mediated effects, which might be more challenging with traditional statistical methods.
What are some alternative names for SEM found in literature, and why might they be used?
-SEM is also known as covariance structure analysis models, analysis of moment structures, and LISREL models, named after the first software for fitting SEMs. The term causal modeling is sometimes used but is controversial as causality comes from research design, not the statistical model.
What are some of the software packages available for conducting SEM analysis?
-Several software packages are available for SEM, including LISREL, Mplus, EQS, AMOS, CALIS, and R. Each package has its advantages and disadvantages, and some offer free or limited versions for students.
How is SEM defined in terms of path analysis and latent variables?
-SEM can be thought of as path analysis using latent variables, which are hypothetical or not directly observable constructs that are measured using observable indicators.
Why are latent variables important in social science research, and how are they measured?
-Latent variables are important because many social science concepts, like intelligence or trust, are not directly observable. They are measured using observable indicators, such as questionnaire items, which are believed to be caused by the underlying latent constructs.
What is the significance of the true score equation X = t + e in SEM?
-The true score equation represents the idea that the observed variable X is comprised of a true score (t), which is the individual's actual level on the measured construct, and error (e), which includes random and systematic error variance.
Why is it necessary to have multiple indicators for latent constructs in SEM?
-Having multiple indicators is necessary to over-identify the true score equation, allowing for the estimation of true scores and error variance for each indicator, and providing a better measure of the concept by correcting for measurement error.
How does the use of multiple indicators benefit the measurement of latent constructs in SEM?
-Multiple indicators help to capture the complexity of the construct, reduce random error in the measurement, and provide more precise and accurate estimates of the construct, leading to better model estimates and less bias in effect sizes.
What is the role of path analysis in SEM, and how does it differ from traditional regression analysis?
-Path analysis in SEM is used to represent the model diagrammatically, focusing on both direct and indirect effects in the relationships between variables. It differs from traditional regression analysis by providing a visual representation and allowing for the examination of complex causal pathways.
How can indirect effects be identified and calculated in SEM?
-Indirect effects can be identified through path diagrams that show the pathways between variables. They are calculated by multiplying the direct effects along the pathway (e.g., beta2 * beta3 in the example given), and the total effect is the sum of direct and indirect effects.
Outlines
🧠 Structural Equation Modeling (SEM) Overview
Structural Equation Modeling (SEM) is introduced as a comprehensive framework that integrates various multivariate techniques from disciplines like psychology, statistics, and econometrics. It's not a single technique but a dynamic environment that evolves with new modeling methods. SEM is particularly suitable for complex research questions involving psychological or social constructs that are difficult to measure directly. It allows for the correction of measurement errors and is adept at modeling systems of relationships, causal systems, and indirect or mediated effects.
🔍 Research Questions and SEM's Versatility
SEM is highlighted as a versatile tool for addressing a wide range of research questions, especially those involving complex constructs and systems of relationships. It's noted for its ability to handle multiple outcomes and dependent variables in a more intricate system, making it ideal for modeling causal systems. SEM is also recognized for its utility in analyzing indirect or mediated effects, where the relationship between variables is not direct but influenced through other variables.
📚 SEM's Multiple Names and Software Options
The script discusses the various names by which SEM is known in literature, such as covariance structure analysis models and analysis of moment structures, which can be confusing. It also mentions the historical association of SEM with causal modeling, although this is considered controversial since causal inference relies more on research design than on statistical models. The paragraph lists several software packages available for fitting SEMs, including LISREL, Mplus, EQS, AMOS, and R, noting that each has its own advantages and disadvantages.
🔑 Understanding Latent Variables in SEM
This section delves into the concept of latent variables, which are hypothetical or not directly observable, such as intelligence or trust. The challenge of measuring these variables is addressed, and the use of observable indicators, like questionnaire items, to infer the latent constructs is explained. The true score equation, X = t + e, is introduced to represent the observed variable (X), the true score (t), and the error (e), emphasizing the need to isolate the true score and remove error variance for accurate modeling.
📈 Path Diagrams and Latent Variable Models
The importance of path diagrams in representing SEM is underscored, as they visually depict the relationships between variables, which is particularly appealing for those less comfortable with equations. The paragraph explains the notation used in path diagrams, including the representation of latent variables, observed variables, error terms, and the directional paths indicating causality. The benefits of using multiple indicators for latent constructs are discussed, such as better coverage of complex concepts and the reduction of random error, leading to more precise measurements and accurate effect sizes.
🛤️ Path Analysis in Structural Equation Modeling
Path analysis is described as a key feature of SEM, emphasizing its visual representation of models through diagrams and its focus on both direct and indirect effects. The standardized notation for path analysis is outlined, with examples provided to illustrate how path diagrams can represent simple and complex models. The ability to decompose regression coefficients into direct, indirect, and total effects is highlighted, showcasing the depth of analysis possible with SEM and path analysis.
Mindmap
Keywords
💡Structural Equation Modelling (SEM)
💡Latent Variables
💡Path Analysis
💡Measurement Error
💡Covariance Structure Analysis
💡Factor Analysis
💡Regression Modeling
💡Direct and Indirect Effects
💡Software Packages
💡Causal Modeling
Highlights
Structural Equation Modeling (SEM) is a general modeling framework that integrates various multivariate techniques.
SEM draws on disciplines like measurement theory, factor analysis, path analysis, regression modeling, and simultaneous equations.
SEM is dynamic, often integrating new ways of fitting models as the technique develops over time.
SEM is particularly suitable for complex, multifaceted constructs often related to psychological or social concepts.
SEM can correct for measurement errors, which is beneficial for concepts that are difficult to measure directly.
SEM is well-suited for modeling systems of relationships and causal systems with numerous outcomes or dependent variables.
SEM is often used to address indirect or mediated effects in research questions.
SEMs are known by various names such as covariance structure analysis models and analysis of moment structures.
The term 'causal modeling' for SEMs is controversial as causal inference claims come from research design, not the statistical model.
Multiple software packages are available for fitting SEMs, each with its advantages and disadvantages.
SEM can be thought of as path analysis using latent variables.
Latent variables are hypothetical or not directly observable, such as intelligence or trust.
Observable indicators, like questionnaire items, are used to measure latent variables.
The true score equation represents the relationship between observed variables, true scores, and error.
Path diagrams are key to representing SEM, showing the causal relationships between variables.
Having multiple indicators of latent constructs allows for the estimation of true scores and error variance.
Latent variable models provide benefits such as better coverage of complex concepts and reduction of random error.
Path analysis is visually represented and focuses on direct, indirect, and total effects in the model.
Standardized notation in path analysis includes specific symbols for latent variables, observed variables, and error terms.
Path diagrams can represent complex relationships and decompose effects into direct, indirect, and total components.
Transcripts
What is structural equation modelling? Well I think one of the first useful
things to understand about SEM as I'll refer to it is it isn't a single
technique as such we wouldn't want to compare its to say learning ordinary
least squares regression or logistic regression log linear modeling which
although these techniques have a number of different aspects we can think of
them as if you like
single approaches to address research questions I think SEM is much better
thought of as a general modeling framework that integrates a number of
different multivariate techniques into this overall framework it is a framework
which draws on a number of different disciplines it brings together
measurement theory from psychology factor analysis also from psychology and
statistics, path analysis from epidemiology in biology regression
modeling from statistics and simultaneous equations from econometrics
and all these different techniques come together to form structural equation
modeling as a general modeling environment. And it's also an environment
which is somewhat dynamic it is not set in stone at this point in time it is
actually often integrating new ways of fitting models as the technique
develop over time. |
What sort of research questions would SEM be particularly suitable for addressing?. Well I think
it's being a general model fitting environment it can address many
different kinds of research questions but i think it is particularly suitable in
situations where the key constructs the key concepts that a researcher is interested
in are complex and multifaceted often relating to psychological social
psychological
concepts. So these kinds of concept can be quite difficult to measure and are
often measured with error and one of the useful aspects of SEM as we'll see is
its ability to make corrections for errors of measurement. Other kinds of
research questions that SEM is well suited to are ones which specify systems
of relationships rather than as we may be used to if we're fitting regression
models where we have a single dependent variable and a set of predictors or
independent variables, structural equation models may have numerous
different outcomes or dependent variables each of which is affecting
other dependent variables in a more complex system. So if a researcher is
interested in modeling a causal system then structural equation models are
particularly suitable. Another kind of research question that structural
equation models are often used to address is where the researcher is
interested in indirect or mediated effects so in many research questions
we're interested in the effect of variable X on variable Y that would be
thought of as the direct effect of X on Y but in many research contexts we're
interested in more complex kinds of relationships where the first variable X
perhaps influences a second variables Z which then has a second effect on
Y that would be seen as an indirect effects and SEMs are very well suited
to addressing those kinds of mediated research questions. | Now SEMs are known
by a number of different names in the existing literature and this can be
somewhat confusing. Sometimes they are referred to as covariance structure
analysis models this relates to the
the fact that with SEMs we're actually analyzing covariance matrices not
variables directly. We will come on to that in later films. They're also known as
analysis of moment structures this is what gives the software the SEM
software Amos its name because this is in recognition of the fact that more
modern SEMs analyzed not just covariances but also means so higher order
moments. It's also know sometimes that LISREL model which again takes its name
from possibly the most well-known software certainly the first software for
fitting SEMs which is LISREL. More controversially SEMs have been
referred to as causal modeling and they're often certainly have
historically been associated with analysis which get a cause and effects
but I think that is probably more controversial name to give to any
modeling technique because the claims for causal inference will come from the
research design rather than the statistical model that we apply to
analyze the data. | There are many different software packages that are
available for fitting SEMs and this is a list that's changing and growing all the
time as I mentioned the probably best known is LISREL which was developed by
Joreskog and Sorbom one of the first available packages. Now there are
many more software packages available Mplus, EQS, AMOS,CALIS. R is a free package Stata and
many of these packages have more limited versions that are available for free for
students to download and try to see which one is most suitable I wouldn't
want to make a recommendation for any particular software package each one has
its own particular advantages and
disadvantages . | So what is structural equation modeling? Well there are many
possible answers to that question the one that i'm gonna propose in this film
is that SEM can be thought of as path analysis using latent variables. Now this
definition may not be very helpful to you if you are not very familiar with either
path analysis or latent variables so for the remainder of the module I'm gonna
run through what path analysis is and what latent variables are. | So what are
latent variables well most of the concept that we're interested in social science
are not directly observable things like intelligence social capital trust is
very impossible to go and put some kind of meter into people and get a direct
reading of their level of social capital or trust so this makes these concepts
hypothetical or latent as we refer to them we believe that they are latent
within people at some level and they they drive attitudes and behavior but we
can't actually directly observe them. So we're in a bit of a difficult position
if we can't measure these concepts that were interested in but fortunately we
can use approaches which measure these latent variables using observable
indicators using variables that we can measure directly that we believed to be
caused by the underlying latent constructs. So if we think of a
questionnaire items a question in a questionnaire that has been administered to a
sample of people and this would be a good example of an observable indicator
of a latent construct. So let's imagine that this question asked people how happy
they are with their lives on a scale
1 to 10. Now some people will give higher answers or lower answers there will be variability
Variance in this variable across the individuals in the sample. Now we
don't think that all of that variability is only to do with people's level of
happiness some of it will be so some of the variability will be caused by
variability in the true level of happiness across people but there will
be other factors that also caused variability possibly to do with the
questionnaire design the temperature in the room whether the question is it
administered by an interviewer or completed on a computer.
These are all other factors that we're not really interested in in what we're
trying to measure which is happiness. So some of the variability will be to do with
happiness, the latent constructs, but some of the variability will be due to
other factors error and unique variance. | So we can summarize these ideas quite
simply in this formula the true score equation where X =t + e . So
here the measured variable the observed indicator is X and as I said the X
the variability in X is comprised of both true score and of error. So the true
score is simply where the individual is on a true happiness dimension their
true underlying level of happiness. The error comprises two components the first
is what we could think of a systematic error this is a bias where perhaps the
question is phrased in a way which makes people give higher happiness ratings
than their actual level of happiness maybe it's because it's
question administered by an interviewer and they don't want to seem unhappy because that is
socially undesirable this would be a systematic error.
A random error would be one where you're just as likely to overrate as to
underrate your happiness so we can think of the the systematic error as
being one where the mean of the individual errors doesn't cancel out it
doesn't equal 0 where as a random error you are as likely to give a higher as a lower
score so the expectation would be that the means the mean of the rrror would
cancel out and be zero. So this is all by way of saying that when we measure
variable when we measure X ideally what we will be able to isolate would be the
t part of the variance the true score and to remove the error variance when we're
trying to predict t or use t as a predictor in a model. | So we can now translate this
true score equation into a very simple path diagram which is key to
representing structural equation models. So here we can see that the the X reads
over to being the observed item in the rectangle the t reads over to being that
latent variable the true score in the ellipse and the e reads over to being
the circle at the top of the diagram the error and the arrows indicate that the
observed score is caused by both the true score the latent variable and by
other factors the error so we can encapsulate those ideas in this simple
path diagram. | It would be nice if we could implement this as a statistical
model unfortunately when we only have one indicator of the latent variable
this is happiness then this equation is what we would call unidentified. We have
more unknown pieces of quantities that we're trying to estimate the t and the e
we don't know what they are we would like to estimate them then we have known
pieces of information the X we've measured X in our sample we have two
unknowns and one known so we can't solve that equation
uniquely the equation is unidentified so we can't separate the true score from
the error when we only have one measure of the underlying concept. What this then
tells us is that we need to have multiple indicators of our latent
constructs. When we have multiple indicators then we can start to over
identify the true score equation and estimate the quantities of t and e for each
indicator so we can apply many different kinds of latent variable
models we can use principal components analysis, factor analysis, latent class models
depending on the metrics of the observed indicators that we have in our dataset.
But what these are all going to do is to provide us with a summary score a reduced
set of factors or components relative to the full set of indicators that we start
out with and in doing that they will correct for the error in each of the
individual indicators and give us a better measure of the true score of the
concept. | We can represent this simply here with a a common factor model. Here
we have four measured variables let's think of these as questionnaire items
again they might be measuring happiness different aspects of happiness are you
happy at home with your work with your friends and so on. So we got four
indicators of the same underlying latent variable happiness now because they
measure the same thing we would generally expect these variables to be
correlated in our
population and that's what these double-headed arrows indicate. The curve
double-headed arrows indicate that the Axis are all correlated with one another
that's one way of representing what's going on here.
Another way would be to do away with these correlations and add in the
underlying latent variable someone's true level of happiness which we've here
denoted as eta. In this model now we have happiness latent variable having
a causal effect on each of the indicators and that causal effect is
what we can think of is that the true score or the t part in our X =t + e
equations now if that's the case then we also need to have error
terms for each of these equations here that's what we have shown in the diagram there so
| So with these multiple indicators we can apply a latent variable in this case
of factor model and we can get empirical estimates of these key quantities and here
now the lambda coefficients there in this model will refer to as factor
loadings and these are the correlation between the factor the e and each of the X
variables. Now if these are good if these indicators are good indicators of
happiness we would expect these correlations to be high we would expect
the correlation between a good indicator of the latent construct and the latent
constructs to be close to approaching 1. | So if we are able to measure our
constructs with multiple indicators we can apply latent variable models and
this brings a number of benefits
well firstly the kinds of things that we're interested in ,modeling in social
science, are generally complex and multifaceted. If we think of happiness
for example, it's difficult to come up with a single question which covers all
aspects of a person's
individual well-being so we probably need to have multiple indicators to get
a good coverage of the concept. As I mentioned it also enables us to remove
or least reduce random error in the construct that we are measuring this I think
we can convince ourselves that removing error seems to be a good thing to do but
more formally we can demonstrate that if we have random error in the dependent
variable although it leaves the the estimates in a model unbiased these will
be less precisely measured there'll be a noisier measure with wider confidence
intervals . More seriously perhaps if we have random error in independent variables
then regression coefficients that we estimate using those independent
variables will be attenuated they will be smaller than they are in the
population systematically smaller tending toward zero so we will
underestimate effect sizes and we will falsely fail to reject the null hypothesis
| So what is
path analysis well again there are many ways that we can answer this question but I
think a key feature of path analysis and one that makes it very appealing as part
of structural equation modeling for social scientists is that the model that
you're wanting to fit to the data is represented diagrammatically rather than
in the form of equations. Off course we can represent the structural equation
model as a system of equations but we can also represent it as a diagram
and this visual aspect again is very appealing for social scientists
perhaps less comfortable and less intuitive in their reading of equations.
So the standardized notation of path analysis is a very important feature. The
path analysis presents regression equations between our measured variables
so we're interested again in kind of systems of relationships between
multiple observed variables. Now that's important and I'm saying observed
variables there because in a standard path analysis we would not be using
latent variables but variables which are directly observed again perhaps single
questionnaire items other kinds of measure. A third key feature of path
analysis is its focus not just on direct effects but also as I was talking about
earlier on indirect effects and total effects. So for research questions where
we don't have a simple linear model where we 're estimating the effects of some
set of predictor variables on an outcome dependent or a criterion that we're
interested in the pathways between multiple independent variables and
possibly multiple dependent
variables. | So in this slide I'm presenting some of the standardized
notation the way that we represent different parts of the model using
diagrammatic notation. We can see at the top a measured latent variable so latent variable
will be presented as an ellipse and an observed or manifest variables
such as a questionnaire items that we might use as an indicator of a measured
latent variable would be a rectangle and error variance or disturbance term
is a small circle and there's a similarity with the measured latent
variable they are both circular shaped because an error variance is also a
latent variable it's is just that we are not specifying it as measuring anything in
particular it is the what's left over the residual or disturbance term.
A covariance path where we're specifying that two variables in the model are
related or correlated with one another
would be represented as a curved double-headed arrow this is a
non-directional association i.e. we're not specifying there is any causal link from
one variable to another but we want to indicate that they are correlated. And
finally the single headed straight arrow represents a directional path or what we
would generally think of as employing causality in the model a regression path
from one variable to another. | So here are some examples of some simple path
diagrams that we could represent in Equation form or using standardized path
notation in this simple diagram we can see that the variable X has a causal
effect on Y and the D term there is the disturbance term so the the error term in
this model we could this is essentially a bivariate regression
model we can also write this in that standard equation notation. This second
path diagram is somewhat more complicated but really is just adding in
a second independent variable X2 so again this is equivalent to a multiple linear
regression with two independent variables a dependent variable Y and an
error term which in this path diagram is labeled D for the disturbance term. |Now
as I mentioned one of the things that path diagrams, path analysis are
particularly useful for is for studying not just direct
effects but also indirect effects we can say now that we've introduced a more
complex relationship between these variables where x1 has a direct effect
on x2 but x2 also has a direct effect on Y so we now have an indirect
effects of x1 on y through x2. And we can use standard formulae
to decompose these regression coefficients indicated by beta1 to beta3 into
the direct indirect and total components. So here
beta1 represents the direct effect of x1 on y , beta2 is the direct effect of x1
on x2 . beta3 now is the direct effect of x2 on y and beta2 x beta3 will
give us the indirect effect of x1 on y. And we can also compute from this path
diagram the total effect which is the sum of the indirect and the
direct effects between one variable and other . So if we take the sum of beta1
and the product of beta2 and beta3 this will give us the total effect of x1 on
y.
| So that's given a very brief overview of both latent variables and path analysis
and what I'm encouraging you to think about to understand what we're doing
with structural equation models is that when we have a path diagram that
includes latent variables rather than just observed variables as we can see in
this diagram then we're representing a structural equation model.
Weitere ähnliche Videos ansehen
5.0 / 5 (0 votes)