A stata program that plots multiple superimposed distributions.
Install histover by opening Stata and running the command:
The documentation for histover is included with the installation. To read the documentation, run the command:
Suppose we want to compare two distributions. Stata’s command hist can be used to make the below graph that is from The Political Ideologies of Law Clerks 19 American Law and Economics Review 97 (2017). The graph plots the distributions of law clerk ideology separately for male clerks and female clerks. The x-axis is a measure of ideology known as the Campaign Finance score (“CFscore”), which situates individuals on a unidimensional ideological scale from extremely liberal to extremely conservative. As explained in the article from which the example is drawn, “[t]he scale is normalized such that it has a mean of zero and a standard deviation of one with respect to the population of U.S. donors. For example, Barack Obama and Hillary Clinton, on the ideological left side of the spectrum, have CFscores of -1.65 and -1.16, respectively; Joseph Lieberman and Chris Christie, ideologically more moderate, have CFscores of -0.54 and 0.46, respectively; and Scott Walker and Ron Paul, on the ideological right, have CFscores of 1.28 and 1.57, respectively.”
While this side-by-side comparison is useful, it is still difficult to visually compare the distributions. We can easily stack the histograms vertically but comparisons of the distributions are still difficult. A better way to compare distributions is to superimpose them. One can plot multiple superimposed kernel density plots (in stata, if you want to plot the distribution of y separately by x=0 and x=1, one uses twoway (kdensity y if x == 0) || (kdensity y if x == 1)), but the smoother might not be preferred. I often superimpose distributions in my research directly without kernel density estimation to make comparisons of the distributions easier. I wrote a stata command to help other’s make similar superimposed distributions as well. Since I have written the command, stata has come out with options to change the transparency in graphs that allows for superimposed histograms (read about it here).
The plot below is made using histover. The syntax takes the form: histover var_outcome, by(var_unique) bmin(min) bmax(max) interval(step), where var_outcome is the outcome variable (the distribution that will be plotted), var_unique is variable that defines each of the distributions to be plotted, min is the user selected minimum that will be plotted in the distributions, max is the user selected maximum that will be plotted in the distributions, and step is the user selected step or interval that will be used to define the bins in the distributions.