Designing a good experiment is both a science and a skill. Whether you are testing a new educational intervention, studying consumer behavior, or evaluating a policy program, the way you structure your experiment determines whether your findings are credible, generalizable, and ethically sound.
This guide walks through the core principles of experimental design, from choosing the right type of experiment to calculating your required sample size, drawing on frameworks used in behavioral economics and social science research.
The Four Types of Experiments
Not all experiments are created equal. Researchers generally work within four distinct experimental frameworks, each with its own trade-offs between control and realism.
Lab Experiment
Conducted in a controlled setting, typically with student subjects. High internal validity, but often criticized for using non-representative samples and artificial conditions.
Artefactual Experiment
A lab experiment run with non-standard subjects such as traders, politicians, or board members, improving external validity over traditional lab studies.
Framed Field Experiment
A structured experiment conducted in the participant's natural environment. Subjects know they are in a study. Susceptible to Hawthorne effects and selection bias.
Natural Field Experiment
Subjects are unaware they are participating. Combines experimental rigor with real-world realism. Considered the gold standard for causal inference.
The key tension in this taxonomy is between control and realism. Lab experiments offer the most control but least realism. Natural field experiments offer the most realism but require the most careful ethical and logistical planning.
Informed Consent and IRB Approval
Any research involving human subjects raises important ethical questions. Informed consent, the process of ensuring subjects understand what they are participating in, what data is collected, and how it will be used, is a cornerstone of ethical research.
In the United States, the Institutional Review Board (IRB) exists to ensure that research with human subjects is conducted in accordance with federal regulations and basic ethical principles. Researchers at universities are typically required to complete the Collaborative IRB Training Initiative (CITI) before running studies, particularly in the social and behavioral sciences.
A critical question researchers must ask: when is IRB approval required? Natural field experiments, where subjects do not know they are being observed, raise particular challenges for informed consent, since by design, consent cannot be obtained in advance.
Tips for Running Successful Field Experiments
Running a field experiment inside a real organization is very different from running a controlled lab study. Here is what experienced researchers have learned:
- Use economic theory to guide your design and help interpret results.
- Become a genuine expert about the market or context you are studying.
- Always have a proper control group. This is non-negotiable for causal claims.
- Obtain sufficient sample size before starting (see power calculations below).
- Get a champion within the organization at a senior level.
- Understand the organization's internal dynamics and incentives.
- Ensure the organization has real "skin in the game." They should care about the outcome.
- Emphasize what it costs not to run the experiment, not just the cost of running it.
- Be transparent that you do not have the answers. That is why you are running the experiment.
- Be open to running experiments that benefit the organization even if there is no direct academic payoff for you.
- Understand fairness concerns. Organizations worry about treating similar people differently.
- Always get IRB approval before beginning any data collection.
Randomization Strategies
How you assign subjects to treatment and control groups matters enormously for the validity of your results. There are two primary approaches.
Complete Randomization
Each subject is independently assigned to a condition with a fixed probability. Simple to implement, but it can result in unequal group sizes and groups that differ systematically by chance, especially in smaller samples.
Block Randomization
Subjects are grouped into "blocks" based on relevant characteristics such as age, baseline score, or location, and randomization happens within each block. This ensures better balance across groups. Within-subject designs are a special case of block randomization where the same individuals appear in multiple conditions.
Three important questions to always check after randomizing:
- Was selection into treatment truly random? Watch for cases where assignment procedures broke down.
- Do treated subjects behave differently because they know they are in a study? (Hawthorne effect)
- Is there anything about your sample that would limit how broadly you can generalize the results?
Power Calculations and Sample Size
One of the most common mistakes in experimental research is running a study that is too small to reliably detect the effect you care about. Power calculations help you determine the right sample size before you begin.
The Key Ingredients
A proper power calculation requires you to specify four things:
- Null hypothesis: Usually that the treatment has zero effect.
- Significance level (a): Almost always 5%, the probability of a false positive (Type I error).
- Desired power (1 - b): Often 80%, meaning you accept a 20% chance of missing a real effect (Type II error).
- Minimum detectable effect and its variance: The smallest effect size you would consider meaningful, and how variable your outcome is.
Practical Guidance on Sample Allocation
Research by List, Sadoff, and Wagner (2011) offers useful rules of thumb for allocating sample size across groups:
- With a continuous outcome, the relative sample sizes between groups should depend on the variance of the outcome in each group and the relative cost of sampling from each group.
- With a binary outcome, equal sample sizes across groups are optimal.
- When observations are clustered such as students within classrooms, you must adjust the formula. Standard errors are larger when observations within a cluster are correlated.
- For a linear treatment effect, two groups suffice. Place half at each extreme.
- For a quadratic treatment effect, use three groups. Place half at the midpoint and the rest at extremes.
Also important: if you expect the treatment to increase both the mean and the variance of the outcome, using equal sample sizes may not be optimal. For example, if the standard deviations between groups are in a 2:1 ratio, a non-optimized design requires approximately 11% more total observations to achieve the same power. The more unequal the variances, the more this penalty grows.
Ethical Considerations in Practice
Real-world experiments frequently generate ethical dilemmas that go beyond standard IRB requirements. Two common scenarios illustrate these challenges well.
Scenario 1: The "Everyone Wants the Treatment" Problem
In a summer 2013 Khan Academy experiment, four middle school principals agreed to participate but insisted on placing all their students in the treatment group. Without a control group, causal inference is impossible. What do you do when partners resist randomization for ethical or political reasons?
Scenario 2: Randomizing When Need Is Unequal
Organizations running humanitarian or welfare programs often want to give treatment first to those who need it most or want it most. Neither of those approaches constitutes an RCT. How do you preserve scientific rigor while honoring the organization's mission?
Two practical solutions researchers have developed:
Random rollout: Recruit the people who most want or need the program, then randomize only the timing of when they receive it. This works as a valid experimental design, but only when effects are detectable quickly after the treatment begins.
Alternative treatment for the control group: Provide the control group with a different, less-intensive version of the treatment. The challenge: if the alternative is too good, you lose your true control. If it is too clearly inferior, you risk creating resentment and harming the organization's reputation and future research relationships.
Key Takeaways
Great experimental design is about making deliberate trade-offs: between control and realism, between statistical power and cost, and between scientific rigor and organizational ethics.
Mastering these trade-offs and knowing how to navigate them with partners and IRBs is what separates a good researcher from a great one. The four types of experiments, careful power calculations, and thoughtful ethical planning are the foundation of credible, actionable research.
At Shaqti Ventures, we apply rigorous research-backed frameworks to marketing strategy and experimentation.
Let's Talk →