Merced College; Don Power

 

STATISTICS - CH 3, LECTURE

 

3.1  Measures of Location:  Mean

 

Sample mean:   , or in summation notation:  

Population mean:  corresponding formula contains Greek μ and N.

 

 

How good is mean as a measure of the "center" of the data?

            See list, pg 49

            1.  Can be calculated for any numerical data

            2.  Unique

            3.  Can be used for further stats (e.g. combining data sets)

            4.  If every value is replaced by the mean, the mean stays the same

            5.  Takes into account every item in the data

6.  Relatively "reliable" -- for multiple samples from same population, mean varies less than median or mode

Ex 3.3:  The point:  a careless error in recording one value can affect the mean significantly

 

"Trimmed mean" omits top and bottom 5% of data to eliminate "outliers"

 

 

3.2  Measures of Location:  Weighted Mean

 

Formula:

            Shopping example (5 at $4, 7 at $3, 8 at $2, 2 at $1)

            Apply to grade point avg (4A, 6B, 3C, 1D, 2F)

 

Grand Mean:  Find the overall mean of data sets having known means when x-bars and n's are known

            Formula is essentially the same as weighted mean (but notation is different)

 

Ex from Lesson 3.8:  (page 81)  App:  Mean of a frequency distribution

            Weight is the frequency

            Value is the class mark   (Assumption:  every data item is located at the class mark)

 

Table technique for recording x, f, x*f and sums of f and x*f

 

3.3  Measures of Location:  Median and Quartiles (etc)

 

Formula for median position:  (n+1)/2 th item

Even vs odd n on the location of the position

Note:  the median position is not the same as the median item

 

You must arrange the data before looking for the median.

 

Ex from a stem and leaf plot (Use X2.13 answer, pg 508, n=30)

 

Quartile defs (not universal)

            Q1:  median of all values to left of median position

            Q2:  median

            Q3:  median of all values to left of median position

 

Ex:  from stem and leaf plot

            Data:   22 23

                        35 37 38 38

                        41 42

                        50 51 51 51 53

            n = 13

            Median position is 7; median value is 41 (there are 6 values above 41 and 6 below 41)

            For data below (or above) the median, n = 6

                        Median position is 3.5

                        Q1 is between the 35 and 37:  Q1 = 36

                        Q3 is between the first and second 51s:  Q3 = 51

            Note that there are 3 values below Q1, 3 more below median, 3 more below Q3, 3 above Q3

 

Box and Whisker Plot

            On a number line, plot:

Max and min data points -- ends of whiskers

Q1 and Q3 -- ends of box

median (Q2) -- middle of box.

            Result:  good visual indication of where the data is bunched.

 

3.4  Measures of Location:  Mode

 

Data item that occurs most frequently

A data set may have several modes, or none at all.

Best:  arrange the data in order before examining it to find the mode(s)

 

3.5  Measures of Variations:  Range

 

Max - min

Does not take into account the data in the middle

 

3.6  Measures of Variation:  Standard Deviation

 

Population μ versus sample  (x-bar)

Formulas -- See text

 

Variation is (std dev)2

 

Definition vs. computing formula

For comparison find both μ and x-bar for a data set of seven items, correct to 2 decimal places

by the def and by the computing formula

Data:  14, 22, 37, 41, 45 52, 71

Use table approach for items to be added

Note the rounding issue when using the def.

 

3.7  Applications of Standard Deviation

 

Empirical rule:  68%, 95%, 99.7% of data within 1, 2, 3 SD of the mean

            Applies to data sets with a "normal" "bell-shaped" distribution

 

Chebyshev's Theorem"  Fraction of data within k std dev of the mean ³ 1 + 1/k2

            So, we are guaranteed at least 0%, 75%, 89% of the data within 1, 2, 3 SD of the mean

            Chebyshev gives the worst case for data sets that are "ugly" (not "normal")

 

Standard units "z" (which count the number of std dev's from the mean) (same as k in Chebyshev)

Converting from std dev and mean to std units:

            z = (x - mean) / std dev

Example:  IQ scores:  mean 100, std dev 15

 

Relative variation (coefficient of variation)

            V = std dev / mean

            Ex:  Heights of trees in a forest

                        A:  mean 10 ft, std dev 3 ft, V = 0.3

                        B:  mean 100 ft, std dev 4 ft, V = 0.04

                        Which forest has the more consistent (or uniform) height?

 

*3.8  Description of Grouped Data

 

We assume all data in a class is at the class mark

            So:  To calculate mean and std dev:

                        Σx gets replaced by Σxf          (order of operations: multiply each xf before adding)

                        and Σx2 gets replaced by Σx2f

 

*3.9  Further Descriptions

 

"Skewness:"  If a distribution is not symmetric, it is skewed

            Positive skewness:  tail is on the positive side

            Negative skewness:  tail is on the negative side

 

            Concept [refer to the diagram below]:

                        Mode:  highest point

                        Mean:  closest to tail
                           (in this example, on the left)

                        Median:  between mean and mode

 

 

 

 

 

 

 


Coefficient of skewness:  SK = 3(mean-median)/std dev

            Ex:  mean = 100, median = 110, std dev = 15;  SK = -30/15 = -2

 

3.10  Technical Note on Summations

 

Meaning of Σ

 

Proof of equivalence of two formulas for Sxx = Σ(x-μ)2 = Σx2 - (Σx)2 / n:

            Σ(x-μ)2                         Square the binomials:  (x1-μ)2 + (x2-μ)2 +...(n terms)

            = Σ (x2 - 2μx + μ2)       Separate sums (commute):  (x1-2μx12) + (x2-2μx22) + ... (n terms)

            = Σx2 - Σ2μx + Σμ2      Remove common factors:   x12+x22+... -2μx1-2μx2-... +μ22+...(n terms)

            = Σx2 - 2μ Σx + μ2Σ1   Replace Σ1 by n because Σ1 = 11+12+13+...+1n  (n terms)

            = Σx2 - 2μ Σx + μ2n     Replace μ by Σx / n  (definition of the mean)

            = Σx2 - 2ΣxΣx / n + n(Σx/n)2   Simplify each fraction

            = Σx2 - 2(Σx)2 / n + (Σx)2/n      Combine like terms (i.e. the last two fractions)

            = Σx2 - (Σx)2 / n:

 

 

Return to:  Merced College; Don Power               Updated 01/27/06 by Don Power