Lecture - Ch 2 Triola

 

2.1 Overview

 

Important characteristics of data

            1.         Center - Where is the middle?  What is "typical"

            2.         Variation - How spread out?

            3.         Distribution - What shape? (Uniform, bell-shaped, bi-modal, skewed...)

            4.         Outliers - Are there data items separated from the main body of the data?

            5.         Time - How do the above characteristics change as time passes?

 

2.2 Frequency Distributions

 

A table that summarizes data:  Has categories and frequencies:  Examples, Exercises 5, 6, 7, 8

 

Class

Frequency

Lower class limit

Upper class limit

 

90

100

10

80

89

20

70

79

40

60

69

22

0

59

18

 

 

            If categories are numerical, there are

                        Class limits (lower and upper)

                        Class boundaries  -0.5, 59.5, 69.5, 79.5, 89.5, 100.5

                        Class midpoints ("class marks")  29.5, 64.5, 74.5, 84.5, 95

                        Class width  60, 10, 10, 10, 11

 

            A good freq dist:  Classes will be consistent, and the result will be readable

            (Which of these principles are violated by the distribution above?)

                        consistency:  same class width for every class

                        readability:  "round numbers" for lower ( usually) or upper class limits

                        no ambiguity as to where a particular data item belongs

                                    (no overlap between classes)

                        No “open” classes (e.g. “59 and below”)

                        5 - 20 classes (7 - 10 best for most purposes)   Formula pg 48

                       

            Relative freq dist (% = class freq / total of all frequencies).  Example, Table 2-3

            Cumulative freq dist.  Example, Table 2-4

            Normal dist (bell-shaped).  Example, Table 2-5

           

 

2.3  Histograms

 

A histogram summarizes data in a frequency distribution

            Classes are numerical, and are on the horizontal scale

            Frequencies are on the vertical scale

Histogram is a specific kind of bar chart

            Bars are of equal width

            There are no gaps between the bars

            The boundaries between the bars are located at the class boundaries

            How can you lavel the horizontal axis?

 

The main point about histograms:  We can use histograms to examine the shape of the data
Normal distribution is bell shaped

 

Also (not in text at this point):  For a relative frequency histogram (e.g. Fig 2-4), the height of each bar gives the probability that a particular data item falls within that bar.  So in Fig 2-4, what is the probability of a pulse rate between 59.5 and 79.5

 

2.4  Statistical Graphics

 

Lots of options are listed, in addition to the histograms in 2-3.  Know what each type is, how to read it, how to construct it

 

Frequency polygon and ogive (Cumulative frequency polygon)

 

Dotplots

 

Stemplots (Often called stem and leaf plots)  Be able to start with raw data, construct a stemplot, turn it into a frequency distribution, and then construct a histogram.

 

Bar graphs

 

Pareto charts:  Categories are qualitative (i.e. names), and bars are in descending order by size (frequency)

 

Pie charts (Angle for each piece is the relative frequency times 360 degrees)  I won’t ask you to construct these by hand – it requires a protractor).

 

Scatterplots (for paired data

 

Time-series graphs

 

2.5  Bad Graphs

 

Two things to watch for:

 

  1. Starting the vertical scale at some point other than 0.  See the figures on page 71
  2. Showing deceptive area or volume pictographs:  the brain confuses the increase in height with the increase in area or volume – and they are not the same.