I admit it. Part of my business’ marketing strategy is having at least some menial social media presence. That is what all the marketing advice columns say to do and who am I to buck this trend? I have focused my efforts on LinkedIn because the Groups function provides me with an opportunity to answer, hopefully intelligently, research-related questions posed by other investigators, and a recent question I answered got me thinking.

A LinkedIn user who listed her job title as a research associate asked about how to analyze data that contained different levels. I answered her question to the best of my abilities, but with so little information to work on (I still don’t know what “levels” meant to her), I’m confident I only partially answered her inquiry.

*But the idea of levels in data is interesting.*

There are many ways to interpret data levels. The most traditional interpretation likely occurs when we discuss the response options to a single question. If a question asks about your age, each possible response is a different level. Strongly disagree is a different level than strongly agree for Likert scale questions. Perhaps the mysterious researcher was asking how to test for differences between different levels of an intervention. Another possible interpretation is in how data is collected. Most researchers conceptualize a hierarchy of research designs, with randomized controlled trials at the top and a myriad of observational designs closer to the bottom. I don’t think this is what the researcher was referring to because I can’t think of many circumstances where you would have the capability of even testing, for example, the findings of an RCT versus a cohort study.

Instead, I think our researcher was talking about how the things that produce data, typically humans but not necessarily so, tend to cluster into groups, and this clustering creates a hierarchy within the data that should be accounted for. Each *level* of the hierarchy is a different *level* of data available to the researcher.

The simple answer is that data that contains some type of hierarchical structure should be evaluated using hierarchical linear modeling/multi-level modeling (HLM), structural equation modeling (SEM), or generalized linear mixed models (GLMM). But providing such a simple answer doesn’t provide any information about why you should use such complex statistical methods.

(To provide a reference point, it is relatively easy to hand calculate a t-test or a chi-square. Odds ratios are a breeze and even ANOVA’s aren’t beyond our reach. Those only take a few minutes. Because of the iterative processes that are used, it would probably take years to solve an HLM, SEM, and GLMM model by hand.)

*Data Likes to Cluster*

ANOVA’s, regressions, t-tests, and chi-square make the same large assumption: all observations are independent and uncorrelated, at least the errors are uncorrelated. But in the research world, we very often experience correlated observations. The simplest example of correlated observations is when a study incorporates a longitudinal design. Since the same people (or other units of analysis) are being measured multiple times, we would expect that different measurements from the same person will be correlated. In fact, they should be correlated because *the same *person is answering *the same *question. Even when responses change over time, we would not expect such differences to occur at random (e.g. age). If these within-subjects measurements are not correlated, we should question how the data was collected, labeled, and cleaned.

But there are other situations where data can cluster even if we don’t expect it to. When we try to determine the difference of a between-subjects effect, we assume that the participants don’t know each other, but that isn’t necessarily true. When I was conducting research on tobacco control policies, convenience samples were routinely recruited. Several study subjects knew of each other; some were friends and completed the study as a group; and in one scenario, a subject was actually a participant in another subject’s research study!

Connections like these are relatively random (a participant in another subject’s study? Really?), and our basic statistical tools are typically robust enough to withstand such correlations. However, there are numerous other situations that require relationships between participants to be considered when determine statistical effects because these correlations can have dramatic effects on our results. Twin studies should take into account shared genomes and family environments. School-based programs must consider how students are clustered into classrooms or even schools. Clinical trials need to consider how patients may cluster within hospitals. Evaluation studies may need to assess how program participants cluster within neighborhoods.

*Why do we need to account for clustering?*

When performing a statistical test, we are trying to see if the distribution of scores in group A is different than the distribution of scores in Group B. This distribution of scores is called the variance. When study participants are clustered or related, there is a greater likelihood that these individuals will provide similar responses to the questions being asked or measurements being taken. The greater sameness between members of the same group reduces the variance of group and creates a statistical bias towards better being able to find a between-subject difference. In effect, you increase the probability that you’ll find a significant difference when one doesn’t truly exist, known as Type I error.

HLM, SEM, and GLMM can account for this bias and make statistical adjustments to ensure that this sameness among the participants does not influence the final conclusions of the study. Yes, for many reasons detecting significant effects becomes more difficult, but when significant effects do occur, there is greater confidence that any differences truly exist.

*What tests should I use?*

Before you try to use HLM, SEM, or GLMM, you might need to take a class or 2, read a couple of books (stay away from journal articles unless you really like statistical theory), and/or watch a whole bunch of YouTube videos.

With that note of caution out of the way, here are my recommendations. If you are working with within-subjects comparisons, HLM and SEM perform best. If your time component is structured (e.g. all measurements were taken exactly 12 months apart), HLM and SEM work equally well. If you time component is unstructured (e.g. some measurements were taken at 6 months while others at 9 months), HLM performs better. If your data contains multiple measurements for each participant without respect to time (e.g. 3 cholesterol tests run on the same blood sample), GLMM is appropriate. If you are concerned about between-subjects clusters, HLM and SEM both perform well.

* The take home message:* We often work with data that is structured in levels or hierarchies, and measurements within such levels are often correlated. When such hierarchies are non-random and measurement correlation is expected to be high, sophisticated statistical models are required to account and adjust for the clustering effects. If no adjustments are made, the analysis is prone to finding significant differences that don’t really exist.