Lecture - Ch 3 Triola

 

3.2 Measures of Center

 

Mean

Median

Mode

Midrange

Weighted Mean / Mean from frequency distribution / GPA

 

Meaning of Greek capital Sigma:  Σ

 

Pg 88:  round-off suggestion

 

3.3       Measures of Variation:  How spread out is the data?

 

Possible measures:

 

Range = Max - Min     Doesn't consider all data

 

Mean deviation from mean

            Σ(x-x-bar) / N  Always 0

 

Mean absolute deviation from mean

            Σ |x - x-bar| /  N           Usable, but awkward due to abs value

 

RMS (Root mean square)

sqrt [ Σ x2 / N ]            Used for electricity, with alternating current

 

Sqrt of mean of squares of deviations from mean [Standard Deviation]

            sqrt [ Σ (x - x-bar)2 / N ]          Most frequently used measure

 

Variance = (Std Dev)2

 

Issues with StDev

            1.         Sample StDev vs Pop StDev

            2.         Computing Formula to avoid having to calculate mean first

 

Computing formula

 

            Definition:  s =  

            Computing formula:  s = , where

 

Note:  Formula in book is slightly different, and has the advantage that if the data consists of integers, both the numerator and denominator under the radical will also be integers.

 

Definition vs Computing formula:  Prove that they are the same

   We need to show

            Note  means  so =1 + 1 +...+1 (n terms) = n

           

   Proof:                   Expand binomial square

                                    =        Linearity: 

                                    =       Factor out constants: 2 and x-bar

                                    =    Def. of  is ; 

                                    =   Simplify algebraically

                                   

                                                                                   

 

Range Rule of Thumb:  s ~ range / 4

 

Empirical Rule – For Normal data sets ;  What % of data is within _______ SD of mean?

 

Chebyshev’s Theorem – For general data sets:  Same issue, but data cannot be expected to be as close to the mean.

 

Coefficient of variation – For comparing variation in different data sets:

            CV = s / x-bar, converted to a percent.

            The data set with the highest CV is the most spread out

 

 

3.4 Measures of Relative  Standing and Boxplots

 

z-scores

 

            Formula:  z = (x – x-bar) / s  or the equivalent for populations

           

            Round-off rule:  to 2 decimal places (like table A-2)

 

            Ex 1:  Which score is relatively more extreme?

 

            What constitutes an “unusual” value?  Interpret using the empirical rule; Chebyshev’s Thm

 

            Negative z-scores:  How do they happen?  What do they tell us?

 

Percentiles – See definition, pg 116:  they are locations that divide the data into 100 groups

 

            Two basic problems

 

                        Given a value (data item), find its percentile

 

                                    Formula: %ile = nr of values less than x / total nr of values, convert to %

 

                        Given a percentile, find the location (data value)

 

                                    Compute L = (k/100)*n, then

                                                If L is not an integer, round L up (always up), then take the Lth item

                                                If L is an integer, average the Lth item and the next item

 

            Relate percentiles to the median and to quartiles

 

Boxplots:

           

            Plot the 5-number summary:  Min, Q1, Median, Q3, Max

            For modified boxplot, you can show outliers as dots outside the main graph

                        Outliers, for this purpose, are items that are outside Q1 or Q3 by an amout greater than 1.5 * Interquartile range (Q3 – Q1)