 A Scatterplot is used to display the relationship between two quantitative variables plotted along two axes. A series of dots represent the position of observations from the data set.

The independent variable is generally plotted along the horizontal (X) axis, and the dependent (or responsive) variable along the vertical (Y) axis. If no dependent variable exists, either type of variable can be plotted on either axis; in this case, the scatter plot will illustrate only the degree of correlation (and not causation) between two variables.​

Usually scatterplots have a single line (called a regression line) running through them. The line of the scatterplot represents the trend of the relationship between the two variables, rather than joining the dots together as with a line graph. The regression line can be used, in some circumstances, as a predictive tool.

Scatterplots are used to analyse patterns of the relationship between two sets of continuous data. Scatterplots can visually show the strength of the relationship between the variables (i.e., the “scatter” in the plot: the more concentrated the dots are along the line, the stronger the relationship); whether there is a positive or negative association between the variables (i.e., whether the slope is positive or negative); whether the data pattern is linear (straight) or nonlinear (curved); and whether unusual features such as outliers, clusters and gaps exist in the data sets.

## Examples

### Scatterplot from NC State University "With a scatter plot a mark, usually a dot or small circle, represents a single data point. With one mark (point) for every data point a visual distribution of the data can be seen. Depending on how tightly the points cluster together, you may be able to discern a clear trend in the data."

Source: NC State University  "Because the data points represent real data collected in a laboratory setting rather than theoretically calculated values, they will represent all of the error inherent in such a collection process. A regression line can be used to statistically describe the trend of the points in the scatter plot to help tie the data back to a theoretical ideal. This regression line expresses a mathematical relationship between the independent and dependent variable. Depending on the software used to generate the regression line, you may also be given a constant that expresses the 'goodness of fit' of the curve. That is to say, to what degree of certainty can we say this line truly describes the trend in the data. The correlational constant is usually expressed as R2 (R-squared)​. Whether this regression line should be linear or curved depends on what your hypothesis predicts the relationship is. When a curved line is used, it is typically expressed as either a second order (cubic) or third order (quadratic) curve. Higher order curves may follow the actual data points more closely, but rarely provide a better mathematical description of the relationship."

Source: NC State University

## Advice for choosing this method

Scatterplots are appropriate when you want to graph two continuous quantitative variables, like height and weight. Likert scale scores will not work, for example, because the dots will simply line up along the Likert scale values, rather than being scattered. The variables must be continuous.

## Advice for using this method

Sometimes data points are tightly clustered, even overlapping. There are a couple of ways to clarify those areas of the scatterplot. One way is to very slightly nudge each data point away from the cluster center by manually moving the point. Another way is to make the fill colour of each point slightly transparent and change the border colour of all data points in the scatterplot so they stand out against one another better.

Often scatterplots are too busy to label each data point. However, it can be helpful to add labels to identify key data points, such as outliers, targets, program sites, or the average.