STATISTICS - CH 3, LECTURE
3.1
Measures of Location: Mean
Sample mean:
, or in summation notation:
![]()
Population mean: corresponding formula contains Greek μ and N.
How good is mean as a measure of the "center" of the data?
See list, pg 49
1. Can be calculated for any numerical data
2. Unique
3. Can be used for further stats (e.g. combining data sets)
4. If every value is replaced by the mean, the mean stays the same
5. Takes into account every item in the data
6. Relatively "reliable" -- for multiple samples from same population, mean varies less than median or mode
Ex 3.3: The point: a careless error in recording one value can affect the mean significantly
"Trimmed mean" omits top and bottom 5% of data to eliminate "outliers"
3.2
Measures of Location: Weighted
Mean
Formula:
Shopping example (5 at $4, 7 at $3, 8 at $2, 2 at $1)
Apply to grade point avg (4A, 6B, 3C, 1D, 2F)
Grand Mean: Find the overall mean of data sets having known means when x-bars and n's are known
Formula is essentially the same as weighted mean (but notation is different)
Ex from Lesson 3.8: (page 81) App: Mean of a frequency distribution
Weight is the frequency
Value is the class mark (Assumption: every data item is located at the class mark)
Table technique for recording x, f, x*f and sums of f and x*f
3.3
Measures of Location: Median and
Quartiles (etc)
Formula for median position: (n+1)/2 th item
Even vs odd n on the location of the position
Note: the median position is not the same as the median item
You must arrange the data before looking for the median.
Ex from a stem and leaf plot (Use X2.13 answer, pg 508, n=30)
Quartile defs (not universal)
Q1: median of all values to left of median position
Q2: median
Q3: median of all values to left of median position
Ex: from stem and leaf plot
Data: 22 23
35 37 38 38
41 42
50 51 51 51 53
n = 13
Median position is 7; median value is 41 (there are 6 values above 41 and 6 below 41)
For data below (or above) the median, n = 6
Median position is 3.5
Q1 is between the 35 and 37: Q1 = 36
Q3 is between the first and second 51s: Q3 = 51
Note that there are 3 values below Q1, 3 more below median, 3 more below Q3, 3 above Q3
Box and Whisker Plot
On a number line, plot:
Max and min data points -- ends of whiskers
Q1 and Q3 -- ends of box
median (Q2) -- middle of box.
Result: good visual indication of where the data is bunched.
3.4
Measures of Location: Mode
Data item that occurs most frequently
A data set may have several modes, or none at all.
Best: arrange the data in order before examining it to find the mode(s)
3.5
Measures of Variations: Range
Max - min
Does not take into account the data in the middle
3.6
Measures of Variation: Standard
Deviation
Population μ versus sample
(x-bar)
Formulas -- See text
Variation is (std dev)2
Definition vs. computing formula
For comparison find both μ and x-bar for a data set of seven items, correct to 2 decimal places
by the def and by the computing formula
Data: 14, 22, 37, 41, 45 52, 71
Use table approach for items to be added
Note the rounding issue when using the def.
3.7
Applications of Standard Deviation
Empirical rule: 68%, 95%, 99.7% of data within 1, 2, 3 SD of the mean
Applies to data sets with a "normal" "bell-shaped" distribution
Chebyshev's Theorem" Fraction of data within k std dev of the mean ³ 1 + 1/k2
So, we are guaranteed at least 0%, 75%, 89% of the data within 1, 2, 3 SD of the mean
Chebyshev gives the worst case for data sets that are "ugly" (not "normal")
Standard units "z" (which count the number of std dev's from the mean) (same as k in Chebyshev)
Converting from std dev and mean to std units:
z = (x - mean) / std dev
Example: IQ scores: mean 100, std dev 15
Relative variation (coefficient of variation)
V = std dev / mean
Ex: Heights of trees in a forest
A: mean 10 ft, std dev 3 ft, V = 0.3
B: mean 100 ft, std dev 4 ft, V = 0.04
Which forest has the more consistent (or uniform) height?
*3.8
Description of Grouped Data
We assume all data in a class is at the class mark
So: To calculate mean and std dev:
Σx gets replaced by Σxf (order of operations: multiply each xf before adding)
and Σx2 gets replaced by Σx2f
*3.9
Further Descriptions
"Skewness:" If a distribution is not symmetric, it is skewed
Positive skewness: tail is on the positive side
Negative skewness: tail is on the negative side
Concept [refer to the diagram below]:
Mode: highest point
Mean: closest to tail
(in this
example, on the left)
Median: between mean and mode

Coefficient of skewness: SK = 3(mean-median)/std dev
Ex: mean = 100, median = 110, std dev = 15; SK = -30/15 = -2
3.10 Technical Note on Summations
Meaning of Σ
Proof of equivalence of two formulas for Sxx = Σ(x-μ)2 = Σx2 - (Σx)2 / n:
Σ(x-μ)2 Square the binomials: (x1-μ)2 + (x2-μ)2 +...(n terms)
= Σ (x2 - 2μx + μ2) Separate sums (commute): (x1-2μx1+μ2) + (x2-2μx2+μ2) + ... (n terms)
= Σx2 - Σ2μx + Σμ2 Remove common factors: x12+x22+... -2μx1-2μx2-... +μ2+μ2+...(n terms)
= Σx2 - 2μ Σx + μ2Σ1 Replace Σ1 by n because Σ1 = 11+12+13+...+1n (n terms)
= Σx2 - 2μ Σx + μ2n Replace μ by Σx / n (definition of the mean)
= Σx2 - 2ΣxΣx / n + n(Σx/n)2 Simplify each fraction
= Σx2 - 2(Σx)2 / n + (Σx)2/n Combine like terms (i.e. the last two fractions)
= Σx2 - (Σx)2 / n:
Return to: Merced College; Don Power Updated 01/27/06 by Don Power