In social science, collecting data is an interesting process. Whether we observe or ask questions, it takes time, thought, and precious energy to select the right process or questions to answer our research questions. Even if we do select the perfect question, we can still never exactly measure the true nature of a phenomenon. This inability to measure the real world is known as *measurement error*, specifically *random measurement error*. (It’s more insidious cousin is *systematic measurement error*, which occurs when we, the researcher, make the wrong decisions and introduce bias into a study.) Because of this error, I am highly jealous of the “hard” sciences (e.g. biology, chemistry, physics). Yes, not every reaction works as predicted and sensors need to be calibrated correctly, but their research doesn’t need to deal with people!

And let’s be realistic. People are not good research subjects. They forget things. They give different answers to the same question, and they give the same answer to different questions. We get around this inherent difficulty of working with people, or at least try to, through a pretty simple mechanism: we ask multiple questions about the same topic.

Let’s use depression as an example. Depression is a multi-faceted disease. Each person can have a unique manifestation of depression, and each person can recover from depression in a unique way. Depression will resolve spontaneously in some but require lifelong treatment in others. Even the diagnosis of depression is rather complex.

How do we accurately determine if someone is suffering from depression? We can use a series of reliable and validated questions, such as the Beck’s Depression Inventory (BDI). The BDI consists of 21 multiple choice questions that can be answered by interview or self-report, and each response option to each question is coded with a number. In its simplest implementation, all someone needs to do to reasonably diagnose someone with depression is add up all the numbers and see what category the person falls in.

This aggregating of responses across questions creates an *index variable* because within the single number, say a BDI of 14, multiple facets of depression are represented. At the risk of repeating myself, depression is a complex disease, and we ask multiple questions about depression because we don’t want to miss any aspect of the disease that may be important to research or treatment. While this single number is useful, it is not necessary informative because we can’t fully understand how depression is being externalized in any given individual. Instead of an index variable, which contains information on multiple facets, it is often more fruitful to work with *scale variables*, which are created by aggregating the responses of multiple questions that measure the same thing.

*Latent Variables*

This leads us into the idea of *latent variables*. Latent variables are a little strange. They exist. We give them names. They are real, but we can never directly measure them. In actuality, latent variables are THE thing we want to measure in the real world but can’t because of measurement error (and people. It’s always people too). Because we can’t measure these real-world things, which we know really exist, directly, we use multiple questions and then combine these questions into a scale variable. Essentially, a scale variable is a numerical representation of the real-world thing that exists but we can’t directly measure, and each scale variable represents one latent variable.

Now back to depression. A total BDI score is not a scale variable because depression has multiple facets and can’t be represented by a single value. Each facet of depression is a separate latent variable that makes of the disease we see as depression. It turns out depression, according to the BDI, consists of 2 facets, or latent variables: an affective facet, which is the psychological side of the disease, and a somatic facet, which is the physical side of the disease. Affective and Somatic are 2 latent variables within depression. We can’t directly measure them, but we can construct scale variables that come pretty close.

*Factor Analysis*

Alright, if you bought into the idea of facets of disease and latent variables so far, a logical question to ask is: how do we know what questions to combine to create these scale variables?

This is where factor analysis comes in and a bit of methodology which isn’t necessarily the most scientific. In its simplest form, *factor analysis* is the act of identifying and confirming what questions measure different parts of the same underlying latent variable. Ideally, we would know how to combine the questions as the questions were being written. Unfortunately, this is often impossible because we can’t predict how the questions will perform in the real world. A layperson’s interpretation of a question may be remarkably different than what the researcher intended. This information is still useful but in a slightly different way than envisioned.

Instead of assuming how questions should be combined to form scale variables to represent latent variables, we conduct an *exploratory factor analysis*, which is just how it sounds. We explore the data. We let the data tell us how to combine the questions. We, for lack of a better term, go on a small fishing expedition. In an exploratory factor analysis, we look for sets of questions whose responses are highly correlated with each other. (Thankfully, some very sophisticated algorithms exist to do this for us so we aren’t staring at correlation tables for hours on end).

Suppose we run an exploratory factor analysis on a 10 item questionnaire. The results of the analysis show that there are likely 3 latent variables being measured by this questionnaire. Questions 1, 2, and 7 are highly correlated (let’s call it Physical Health). Questions 3, 4, 6, and 10 are highly correlated (Mental Health), and questions 5, 8, and 9 are highly correlated (Spiritual Health). So it appears that our 10 item questionnaire measures 3 different facets, or latent variables, of health: physical, mental, and spiritual.

*How do we know an exploratory factor analysis is correct?*

Replicate. Replicate. Replicate.

After we run an exploratory factor analysis, we need to confirm our findings. The best method to do that is to recruit new samples of people from the same population as the original study and recruit samples of people from different populations compared to the original study. Once these new samples are recruited and the questions have been answered, we can test whether questions 1, 2 and 7, questions 3, 4, 6 and 10, and questions 5, 8 and 9 remain highly correlated. When we attempt to confirm the findings of an exploratory factor analysis, the procedure is called a *confirmatory factor analysis* because we want to confirm the findings (get it?).

If the findings of a confirmatory factor analysis replicate those of an exploratory factor analysis, you have just discovered a method to reliably measure a real, but unmeasurable, latent variable. If your findings differ between samples of the same population, perhaps the questionnaire has a more complicated structure than originally thought. If your findings differ between samples of different populations, then you need to explore why the findings differ between populations, a research path that can be very intriguing.

* The take home message:* We often use multiple questions to measure some real-world construct because it is impossible to do so with a single question. These unmeasurable constructs are called latent variables. We identify how to combine these questions, and create latent variables, using exploratory factor analysis and confirm the findings using exploratory factor analysis.