Data analysis methods

A large standard deviation on the other hand indicates that the values are more spread out.

A small standard deviation indicates that most of the values are close to the mean. The wider the spread, the greater the standard deviation and the greater the range of the values from their mean. Standard deviation, like variance, is a measure of the spread of a set of values around the mean of the values. The variance is the standard deviation squared. It is calculated by taking the average of the squared differences between each value and the mean. Variance is a commonly used measure of dispersion, or how spread out a set of values are around the mean. Researchers often report simply the values of the range (e.g., 75 – 100). Range is simply the difference between the smallest and largest values in the data. There are three key measures of dispersion: Measures of dispersion provide information about the spread of a variable's values. For example, the median income is often the best measure of the average income because, while most individuals earn between $0 and $200,000 annually, a handful of individuals earn millions. Medians are generally used when a few values are extremely different from the rest of the values (this is called a skewed distribution).

The mean is the most commonly used measure of central tendency. Mode-The value of a variable that occurs most often. 50% of the variable's values lie above the median, and 50% lie below the median). Median-The value within a set of values that divides the values in half (i.e. To calculate the mean, all the values of a variable are summed and divided by the total number of cases. Mean-The arithmetic average of the values of a variable. There are three measures of central tendency: Measures of central tendency describe the "average" member of the sample or population of interest. Examples of interval scales or interval variables include household income, years of schooling, hours a child spends in child care and the cost of child care. Values on an interval scale can be added and subtracted. The values of an interval variable are ordered where the distance between any two adjacent values is the same but the zero point is arbitrary. Measures of central tendency are the most basic and, often, the most informative description of a population's characteristics, when those characteristics are measured using an interval scale. Proportions, percentages and ratios are used to summarize the characteristics of a sample or population that fall into discrete categories. Ratio of Hispanic children to White children in the program = 40/20 = 2.0, or the ratio of Hispanic to White children enrolled in the Head Start program is 2 to 1. Percentage of Hispanic children in the program =. Proportion of Hispanic children in the program = 40 / (20+30+40+10) =. The sample includes 20 White children, 30 African American children, 40 Hispanic children and 10 children of mixed-race/ethnicity. Ratio-The number of cases in one category to the number of cases in a second category.Ī researcher selects a sample of 100 students from a Head Start program. Percentage-The proportion multiplied by 100 (or the number of cases in a category divided by the total number of cases across all categories of a value times 100). Proportion-The number of cases in a category divided by the total number of cases across all categories of a variable. Researchers calculate proportions, percentages and ratios in order to summarize the data from nominal or categorical variables and to allow for comparisons to be made between groups. Examples of nominal variables include gender (male, female), preschool program attendance (yes, no), and race/ethnicity (White, African American, Hispanic, Asian, American Indian). The categories can be given numerical codes, but they cannot be ranked, added, or multiplied. In research, variables with discrete, qualitative categories are called nominal or categorical variables. One of the most basic ways of describing the characteristics of a sample or population is to classify its individual members into mutually exclusive categories and counting the number of cases in each of the categories. The four most common descriptive statistics are: To highlight potential relationships between these characteristics, or the relationships among the variables in the dataset. These characteristics are represented by variables in a research study dataset. To provide basic information about the characteristics of a sample or population.

Descriptive statistics can be useful for two purposes: