SEM Series (2016) 3. Exploratory Factor Analysis (EFA)

James Gaskin
22 Apr 201613:48

Summary

TLDRThis video script details a comprehensive guide to conducting Exploratory Factor Analysis (EFA). It emphasizes the importance of reflective latent measures, excluding formative and categorical variables, and setting up the analysis with maximum likelihood extraction. The speaker discusses the significance of KMO and Bartlett's test, communalities, and factor extraction. They address issues like cross-loadings and low factor loadings, suggesting strategies for resolving them. The script concludes with reporting the final pattern matrix, factor correlations, and reliability analysis, ensuring convergent and discriminant validity.

Takeaways

  • 📊 The speaker begins by emphasizing the importance of saving data sets after making changes, suggesting to save the data as 'trimmed and no missing'.
  • 🔄 The process of reordering variables in the data set is discussed, with imputed variables being moved to the end and key demographic variables placed at the bottom.
  • 🔍 The speaker conducts an exploratory factor analysis (EFA), highlighting the need to include only reflective latent measures and excluding formative measures, categorical variables, and demographics not part of a reflective latent construct.
  • 🎯 The choice of extraction method is discussed, with the speaker preferring maximum likelihood estimation due to its use in subsequent confirmatory factor analysis (CFA).
  • 🔄 The use of Promax rotation is mentioned, which is less forgiving but may be necessary if issues arise with the factor analysis.
  • 📉 The speaker discusses the suppression of small coefficients, aiming to focus on loadings greater than 0.5 for meaningful results.
  • 📊 The Kaiser-Meyer-Olkin (KMO) measure and the significance of the Bartlett's test are highlighted as part of assessing the adequacy of the factor analysis.
  • 📈 The total variance explained by the factors is discussed, with the speaker aiming for over 60% as an ideal threshold.
  • 🔍 The pattern matrix is scrutinized for high loadings and cross-loadings, with the speaker identifying and addressing issues such as low loadings and Haywood cases.
  • 🔄 The speaker demonstrates how to refine the factor analysis by iteratively removing problematic items and rerunning the analysis until a satisfactory model is achieved.
  • 📝 The final step involves reporting the KMO and Bartlett's test results, the total variance explained, non-redundant residuals, the pattern matrix, and factor correlations to assess convergent and discriminant validity.

Q & A

  • What is the first step the speaker takes in the exploratory factor analysis process?

    -The first step the speaker takes is to save the dataset with the changes made, renaming it to 'trimmed and no missing'.

  • Why does the speaker reorder the variables in the dataset?

    -The speaker reorders the variables to place the imputed variables at the end and to organize the dataset with age, ID, gender, frequency, and experience at the bottom for better analysis.

  • What type of measures should be included in an Exploratory Factor Analysis (EFA) according to the speaker?

    -The speaker emphasizes that only reflective, not formative, measures should be included in EFA. Categorical variables and demographics not part of a reflective latent construct should be excluded.

  • What extraction method does the speaker prefer to use in EFA?

    -The speaker prefers to use the maximum likelihood extraction method because it is the same algorithm used in Amos for confirmatory factor analysis.

  • What rotation method is suggested by the speaker for EFA?

    -The speaker suggests using the Promax rotation method as it is less forgiving and can help in resolving issues with factor loadings.

  • Why does the speaker choose to suppress small coefficients at 0.3 in the analysis?

    -The speaker suppresses small coefficients at 0.3 because they are not interested in loadings less than 0.5, which is a threshold for meaningful factor loadings.

  • What does the speaker look for in the KMO (Kaiser-Meyer-Olkin) measure to assess the adequacy of the factor analysis?

    -The speaker looks for a KMO value above 0.7 and ideally about 0.9 to ensure the factor analysis is adequate.

  • How does the speaker interpret the total variance explained by the factors in the analysis?

    -The speaker interprets the total variance explained by the factors as good if it is more than 60%, with 64.5% being a satisfactory result in this case.

  • What issue does the speaker identify with the pattern matrix and how does it affect the analysis?

    -The speaker identifies a discriminant validity issue between decision quality and information acquisition, which affects the convergent validity of information acquisition.

  • What action does the speaker take to resolve the discriminant validity issue between decision quality and information acquisition?

    -The speaker runs another factor analysis excluding all items except decision quality and information acquisition to isolate and resolve the issue.

  • How does the speaker address items with low loadings in the pattern matrix?

    -The speaker considers dropping items with low loadings, such as decision quality one and eight, and re-runs the analysis to see the impact on the pattern matrix and overall model.

  • What additional analysis does the speaker perform to assess the reliability of the items?

    -The speaker performs a Cronbach's alpha scale reliability analysis to assess the internal consistency reliability of the items.

Outlines

00:00

📊 Exploratory Factor Analysis Setup

The speaker begins by discussing the process of setting up an exploratory factor analysis (EFA). They emphasize the importance of saving the dataset with changes and reordering variables. The speaker then proceeds to conduct an EFA, highlighting the need to include only reflective latent measures and exclude formative measures, categorical variables, and demographics not part of a reflective latent construct. They also mention the importance of using maximum likelihood as the extraction method and Promax rotation. The speaker sets parameters for the analysis, such as suppressing small coefficients and allowing a certain number of iterations, and checks the Kaiser-Meyer-Olkin (KMO) measure and the significance value as part of the adequacy assessment. The goal is to identify the number of factors that explain the variance in the model.

05:02

🔍 Addressing Factor Analysis Issues

In this segment, the speaker identifies issues in the factor analysis, particularly cross-loadings between decision quality and information acquisition factors. They decide to rerun the EFA, focusing only on these two sets of items to resolve the issue. After rerunning the analysis and examining the pattern matrix, they find that certain items, such as decision quality six and one, are causing problems and decide to eliminate them. The speaker also discusses the concept of Haywood cases, where factor loadings exceed one, and their decision to ignore these until other issues are resolved. The goal is to improve the discriminant and convergent validity of the factors.

10:05

📉 Finalizing Factor Analysis and Reporting

The speaker concludes the EFA by rerunning the analysis with the revised set of items. They report on the KMO and communalities, indicating the adequacy of the analysis, and the total variance explained by the six-factor model. They also discuss the non-redundant residuals and the pattern matrix, noting improvements in factor loadings. The speaker addresses the issue of discriminant validity by examining the factor correlation matrix and ensuring that no factors share a majority of variance. They also perform a reliability analysis using Cronbach's alpha to assess the internal consistency of the scales. The speaker provides a detailed explanation of what to report from the EFA, including the pattern matrix, factor correlation matrix, and reliability analysis results.

Mindmap

Keywords

💡Exploratory Factor Analysis (EFA)

Exploratory Factor Analysis (EFA) is a statistical method used to identify underlying relationships between measured variables and group them into latent factors. In the video, EFA is applied to explore patterns in the dataset, aiming to extract factors that explain the variance among the observed variables. The speaker uses EFA to ensure that only reflective latent measures are included and discusses how it reveals issues like cross-loadings.

💡Reflective Latent Measures

Reflective latent measures refer to variables that are influenced by an underlying factor. In this video, the speaker emphasizes that only reflective measures should be included in the EFA. Reflective indicators are used to measure a construct that influences them, unlike formative measures, which are not suitable for factor analysis.

💡Formative Measures

Formative measures are variables that cause or contribute to a construct, rather than being caused by it. The speaker warns against using formative measures in EFA, as they do not fit the model of reflective latent variables. This distinction is key to performing proper factor analysis, which focuses on identifying reflective relationships.

💡Kaiser-Meyer-Olkin (KMO)

The Kaiser-Meyer-Olkin (KMO) measure is a statistic that assesses the adequacy of the data for factor analysis. The speaker mentions that a good KMO value should be above 0.7, ideally closer to 0.9, indicating that the variables are suitable for factor analysis. A high KMO suggests that there is sufficient correlation among variables for the analysis to produce meaningful factors.

💡Maximum Likelihood Extraction

Maximum Likelihood Extraction is a method used in factor analysis to estimate factor loadings. The speaker prefers this method because it aligns with the confirmatory factor analysis (CFA) that will be conducted later. It is used to optimize the likelihood of obtaining the observed data based on the proposed factor model.

💡Promax Rotation

Promax rotation is an oblique rotation method used in factor analysis to make the output easier to interpret by allowing factors to correlate. The speaker prefers Promax rotation over other methods like Varimax because it is less restrictive and can handle more complex factor structures, which often reveal underlying relationships more clearly.

💡Cross-Loadings

Cross-loadings occur when a variable has significant loadings on more than one factor. The speaker identifies cross-loadings as an issue, particularly between decision quality and information acquisition. Cross-loadings are problematic because they can indicate that a variable is not clearly associated with one specific factor, complicating the interpretation of the results.

💡Cronbach's Alpha

Cronbach's Alpha is a measure of internal consistency or reliability of a set of items in a survey or test. The speaker uses Cronbach's Alpha to evaluate the reliability of the factors extracted in the EFA. A value above 0.7 is considered acceptable, with the speaker achieving 0.901 for decision quality, indicating high reliability for that factor.

💡Convergent Validity

Convergent validity refers to the degree to which two measures of the same concept are correlated. In the video, the speaker checks for convergent validity by ensuring that all factor loadings are above 0.5, indicating that the items within a factor are strongly related to the underlying construct.

💡Discriminant Validity

Discriminant validity is the extent to which factors are distinct and not too highly correlated with each other. The speaker checks for discriminant validity by ensuring that the factor correlation matrix values are below 0.7, confirming that the factors are not sharing a majority of variance. This helps to establish that each factor measures a unique aspect of the data.

Highlights

Exploratory Factor Analysis (EFA) is initiated with saving the dataset after minor changes.

Variables are reordered to place imputed ones at the end for clarity.

Factor analysis is conducted with all reflective latent measures, excluding formative or categorical variables.

Descriptives, KMO, and extraction methods are set, favoring maximum likelihood estimation for consistency with Amos.

Promax rotation is chosen for its less forgiving nature in handling factor loadings.

Factor scores can be saved as new variables for further analysis.

KMO measure indicates the data's suitability for factor analysis, with a value above 0.9 being ideal.

Eigenvalues above 0.3 in the extraction column suggest sufficient communalities.

Six factors are extracted, aligning with the预先 expected model.

Total variance explained by the model is over 64%, exceeding the minimum threshold of 60%.

Reproduced values are below the 5% threshold, indicating good model fit.

Issues with discriminant validity between decision quality and information acquisition are identified.

A focused factor analysis is conducted on decision quality and information acquisition to resolve issues.

Items with cross-loadings or low factor loadings are considered for removal to improve model clarity.

Reliability analysis is performed to assess the impact of item removal on Cronbach's alpha.

Final model retains all items despite some low loadings, anticipating improvement in confirmatory factor analysis.

Reporting includes KMO, eigenvalues, total variance explained, non-redundant residuals, and pattern matrix.

Cronbach's alpha values are reported to assess the reliability of the factors.

Factor correlation matrix is examined for evidence of discriminant validity.

Transcripts

play00:00

all right moving right along next is

play00:02

exploratory factor analysis so what I

play00:05

would do first things first I would go

play00:07

back to the data set and save it we've

play00:09

made several minor changes here and

play00:12

there I mean who had save this as and

play00:14

I'm going to save it as trimmed and no

play00:17

missing let's see

play00:19

no missing there we go save so now all

play00:24

of our changes are recorded and if we

play00:26

make a mistake somewhere we're good to

play00:29

go and I'm going to go ahead and reorder

play00:32

these variables look we have these guys

play00:34

in the end here because we imputed them

play00:36

we replaced the missing values I'm going

play00:39

to resource this column sort ascending

play00:42

smus a sure you want to do this the

play00:44

answer is yes but do you want to save it

play00:47

as a different thing no okay we're in

play00:49

order now I'm just going to put age ID

play00:52

gender frequency experience all at the

play00:55

bottom again okay there we go

play00:59

the next thing to do is a factor

play01:01

analysis I'm excited I like factor

play01:03

analyses so analyze dimension reduction

play01:06

factor analysis what I'm going to stick

play01:09

in there we're going to start with

play01:10

everything throw it all in there all the

play01:13

the reflective latent measures this is

play01:17

critical to bear in mind you must have

play01:20

reflective not formative reflective

play01:23

latent meaning multi indicator measures

play01:27

if you have formative measures don't

play01:32

include them in the EFA if you have

play01:34

categorical variables like gender don't

play01:37

include them the EFA if you have

play01:39

demographics that are clearly not part

play01:42

of a reflective latent construct don't

play01:44

include them in the EFA or CFA they

play01:47

don't belong in a reflective measurement

play01:50

model you're only going to include

play01:52

reflective latent factors hope that was

play01:56

clear enough cool throw this in here if

play01:58

you're not sure what reflective versus

play02:00

formative means please refer to my

play02:03

YouTube video called formative versus

play02:06

reflective measures and a factor

play02:08

analysis I think it's called that

play02:09

something like that okay throw these all

play02:12

over descriptives I've done this

play02:14

times before reproduced kmo continued

play02:17

extraction I like to use maximum

play02:19

likelihood why well that's the same

play02:23

algorithm that Amos is going to use when

play02:25

we do the confirmatory factor analysis I

play02:26

like to do it based on I gain values

play02:28

instead of a fixed number of factors at

play02:30

least initially just to see what it's

play02:32

going to give me how many iterations do

play02:34

we want to allow 25 is fine continued

play02:37

rotation I like to use Promax it's less

play02:42

forgiving but we might have to switch if

play02:45

we have issues continue noting in scores

play02:48

although just FYI if you wanted to save

play02:51

each of the factors as a variable is

play02:54

called factor scores then you click here

play02:56

and save variables and that'll give you

play03:00

however many factors you came up with in

play03:03

your pattern matrix it will create that

play03:04

many more new variables to represent

play03:08

each of those factors in it if set I'm

play03:10

not going to check that okay cancel in

play03:13

the options I am going to suppress small

play03:16

coefficients at point three because I'm

play03:19

really not interested in loadings less

play03:21

than 0.5 and we need them to be at least

play03:24

point to difference I've talked about

play03:26

this in other videos I'm not going to go

play03:27

into depth here okay this is more

play03:30

procedural video anyway we want to look

play03:34

at km okay Emily looks good 935 you want

play03:36

this above 0.7 um ideally about 0.9

play03:39

you're at the sig value to be

play03:42

significant this is all part of adequacy

play03:45

we're going to talk about adequacy

play03:49

validity convergent discriminant and

play03:51

reliability okay back here we look at

play03:55

the extraction column and we want to see

play03:58

if there's anything less than about 0.3

play04:00

we're at point three a decision called

play04:02

the eighth order the line dude do

play04:06

looking pretty good okay moving on and

play04:10

we have total variance explained that we

play04:13

will look at this cumulative column it

play04:15

came up with six factors how many were

play04:17

we expecting well if we go back to our

play04:18

model we were expecting one two three

play04:21

four five six it came up with exactly

play04:22

what we wanted this rarely happens so

play04:25

I'm kind of surprised

play04:27

and this doesn't provide us an

play04:29

opportunity to do mitigation strategies

play04:32

so maybe see my other videos for

play04:34

mitigation strategies okay

play04:36

it explains sixty four and a half

play04:38

percent of the variance in the model

play04:39

that's good we want more than sixty

play04:41

percent at a minimum we want more than

play04:43

fifty percent but again above sixty

play04:45

percent is ideal skip the factor matrix

play04:47

skip the goodness fit go down to the

play04:49

reproduced we want a number here less

play04:51

than about five percent we're looking

play04:53

pretty good and pattern matrix is

play04:56

looking stellar ish who actually we do

play05:02

have some issues we have a few issues

play05:06

this is fun I don't like it when just

play05:09

works so let's do this one time

play05:13

a typical use looks fabulous is there no

play05:15

cross loadings anywhere um all the items

play05:18

loaded on to a single factor factor five

play05:21

decision quality not so fabulous

play05:24

I'm still good but not fabulous look we

play05:27

have 0.40 seven that's fairly low we

play05:30

also have these two other items from

play05:32

information acquisition that loaded with

play05:33

all the decision quality items that's a

play05:35

problem we'll have to resolve that

play05:37

separately information acquisition

play05:39

loaded onto its own factor but look at

play05:41

those loadings they're awful so I'm not

play05:45

sure what to do about that I'll have to

play05:46

address that next joy looks joyful no

play05:50

problems there whatsoever playfulness

play05:52

looks incredible

play05:54

and usefulness looks incredible as well

play05:56

so the only real problem is this

play05:59

discriminant validity issue between the

play06:01

decision quality and information

play06:03

acquisition which my guess is is causing

play06:06

the convergent validity issue with

play06:09

information acquisition so what would

play06:11

you do here I would actually just run

play06:13

another factor analysis but get rid of

play06:15

everything except decision quality and

play06:18

information acquisition there we go and

play06:22

just run it again with just those two

play06:25

sets of items and looking good looking

play06:28

good really what I want to do is go down

play06:30

to the pattern matrix good it did come

play06:33

up with two factors that's what we

play06:34

wanted but you can see there are some

play06:36

issues decision quality six is loading

play06:40

most equally on both sides that is the

play06:43

first one I would delete so let's do

play06:45

that factor analysis again decision

play06:47

quality six sayonara K write it again

play06:51

jump down to the pattern matrix decision

play06:55

quality one loading on both sides hey

play06:57

look at these loadings though those are

play06:59

looking better okay this one no good

play07:02

decision quality one you may say hey

play07:04

James wait wait wait what is this

play07:06

it's above one it was last time two

play07:08

we're going to ignore what it is called

play07:10

a Haywood case we're going to ignore

play07:12

this Haywood case until we resolve other

play07:15

issues because it'll probably just

play07:17

resolve itself so decision quality one

play07:19

you are gone kicked off the island there

play07:23

we go jump down the pattern matrix

play07:25

looking better but look at this decision

play07:28

quality eight not really contributing

play07:30

very well I'm going to drop decision

play07:32

quality gate and pattern matrix much

play07:42

better this is borderline we might keep

play07:44

it this is also borderline we might keep

play07:45

it what I'm going to do at this point is

play07:47

I'm going to recreate the larger pattern

play07:50

matrix and see if everything is resolved

play07:53

if not we can see where we'll go

play07:55

probably decision quality seven and into

play07:58

acquisition five will be the next to be

play08:00

eliminated so back to the full factor

play08:02

analysis we're going to throw everything

play08:04

in there except decision quality one six

play08:07

and eight do yep run it again and I am

play08:16

just going to do a few cursory things

play08:18

I'm going to jump down here looks like

play08:20

we still have six factors excellent good

play08:22

variance explained actually better than

play08:23

before and we have only three percent on

play08:27

or doesn't residuals this time and

play08:28

here's the pattern matrix and it already

play08:30

looks better okay decision quality that

play08:34

looks really good information

play08:36

acquisition also very good Wow actually

play08:40

I wasn't expecting it to be that good um

play08:42

and everything else looks just as good

play08:44

as before well I might do is drop

play08:47

information acquisition five it is still

play08:50

fairly low and you can see these

play08:51

loadings here point seven point seven

play08:53

point seven

play08:54

6m4 these aren't going to average out to

play08:57

above 0.7 which is problem if I want to

play08:59

verify this what I might do is do

play09:01

reliability analysis aniline is scale

play09:04

reliability and just stick in those

play09:07

information acquisition items here I'll

play09:10

pull it over and then go to statistics

play09:13

and do a scale of item deleted continue

play09:16

and okay and what this is going to tell

play09:18

us is if dropping that item will

play09:21

actually do us any good so it was me go

play09:23

back over here to the pattern matrix it

play09:24

was information acquisition five now if

play09:28

we go down to the reliability analysis

play09:29

click here if you look at this last

play09:32

column it says what our cronbach's alpha

play09:35

would be if we deleted each of these

play09:37

items the current convict self is 0.8 4

play09:40

- but if we deleted information

play09:43

acquisition 5 it would go up to point 8

play09:45

4 6 this isn't a big difference and so

play09:47

if I was struggling if I wanted to keep

play09:50

all these items

play09:51

I'm fully justified in keeping all these

play09:53

items even though that is a low loading

play09:54

most likely scenario is it will bump up

play09:57

a little bit during the confirmatory

play09:58

factor analysis in Amos so I can keep it

play10:01

if I really don't care and these are

play10:04

scales I made up myself and and I had

play10:08

the liberty to do so then I might just

play10:10

drop information acquisition 5 which is

play10:14

what I am going to do at this point so

play10:16

I'm going to run this one more time drop

play10:17

into X 5 watch what happens to see makes

play10:21

big differences that's an uptick which

play10:23

is good 3 percents the same ooh ok so it

play10:29

actually caused some problems it threw

play10:32

in a new loading here above 0.3 what

play10:36

what happened is information acquisition

play10:38

5 helped distinguish us from decision

play10:41

quality whereas now we're having a hard

play10:45

time distinguishing yourself so I'm

play10:47

going to retain information acquisition

play10:49

5 even though this is a greater than 0.2

play10:52

difference it did bring up that

play10:54

discriminant sort of cost loading issue

play10:57

so my final pattern matrix is actually

play11:00

going to be the one with information

play11:01

acquisition 5 still in it here we go run

play11:05

it

play11:06

what do I report I report the kmo say

play11:09

it's awesome I report the cig say it's

play11:11

awesome these are all under adequacy

play11:12

under communalities this is this is

play11:15

another adequacy measure I look at the

play11:18

extraction column and I say all of mine

play11:20

all my communalities were above 0.3

play11:25

looks like they are the lowest one is

play11:27

this one point three nine seven and then

play11:29

I'd say the six factor model explains

play11:33

sixty six point three percent of the

play11:35

variance which is good and then I'd say

play11:38

we had less than three percent non

play11:41

redundant residuals which is great and

play11:44

here's the pattern matrix and I'd say as

play11:47

evidence of convergent validity we have

play11:49

all the loadings above 0.5 except this

play11:51

one point 4 which way which I'd mention

play11:54

and then evidence of discriminant

play11:56

validity is we had no strong cross

play11:58

loadings another bit of evidence for

play12:01

discriminant validity is this factor

play12:02

correlation matrix we can look and see

play12:04

at all these non diagonal values and

play12:07

make sure they're not above 0.7 which

play12:11

would indicate sharing a majority of the

play12:12

variance so the closest one is this

play12:15

factor for two factors six I'm guessing

play12:18

that is information acquisition and

play12:20

decision quality I'd go here check four

play12:23

and six yep those are those two and they

play12:26

are highly related but not so related

play12:28

that they're sharing a majority of their

play12:30

variance so that's the closest one

play12:31

what would I report I would report the

play12:34

pattern matrix at a minimum you may also

play12:36

want to report this factor correlation

play12:38

matrix okay that is adequacy convergent

play12:46

validity discriminant validity if we

play12:48

want to do reliability you just do like

play12:50

I did before go do a cronbach's alpha

play12:53

scale reliability analysis we did it for

play12:56

information acquisition move those over

play12:58

we'll do another one for decision

play12:59

quality but not all decision quality

play13:02

items will they have two three four five

play13:06

and seven two three four five and do two

play13:09

six eight loops so two three four five

play13:13

and seven throw those in there okay and

play13:16

report this number here 0.90 one

play13:19

what I like to do is just stick it at

play13:20

the top of my pattern matrix so this

play13:23

point 901 I would go stick it right here

play13:24

that was decision quality so I'd

play13:26

replaced this 4 with 0.9 on one and that

play13:29

put cronbach's alpha right over here

play13:30

okay you want all those cronbach's

play13:33

alphas to be greater than 0.7 if they're

play13:36

not there's actually literature that

play13:38

says it can get down 0.6 particularly if

play13:41

you have only a few items 2 or 3 and

play13:44

that is the EFA

Rate This

5.0 / 5 (0 votes)

相关标签
EFAFactor AnalysisData AnalysisStatistical MethodsReflective MeasuresData ReorderingDescriptivesReliability AnalysisCronbach's AlphaDiscriminant Validity
您是否需要英文摘要?