Lecture - Ch 2 Triola
2.1 Overview
Important characteristics of data
1. Center - Where is the middle? What is "typical"
2. Variation - How spread out?
3. Distribution - What shape? (Uniform, bell-shaped, bi-modal, skewed...)
4. Outliers - Are there data items separated from the main body of the data?
5. Time - How do the above characteristics change as time passes?
2.2 Frequency Distributions
A table that summarizes data: Has categories and frequencies: Examples, Exercises 5, 6, 7, 8
|
Class |
Frequency |
|
|
Lower class limit |
Upper class limit |
|
|
90 |
100 |
10 |
|
80 |
89 |
20 |
|
70 |
79 |
40 |
|
60 |
69 |
22 |
|
0 |
59 |
18 |
If categories are numerical, there are
Class limits (lower and upper)
Class boundaries -0.5, 59.5, 69.5, 79.5, 89.5, 100.5
Class midpoints ("class marks") 29.5, 64.5, 74.5, 84.5, 95
Class width 60, 10, 10, 10, 11
A good freq dist: Classes will be consistent, and the result will be readable
(Which of these principles are violated by the distribution above?)
consistency: same class width for every class
readability: "round numbers" for lower ( usually) or upper class limits
no ambiguity as to where a particular data item belongs
(no overlap between classes)
No “open” classes (e.g. “59 and below”)
5 - 20 classes (7 - 10 best for most purposes) Formula pg 48
Relative freq dist (% = class freq / total of all frequencies). Example, Table 2-3
Cumulative freq dist. Example, Table 2-4
Normal dist (bell-shaped). Example, Table 2-5
2.3
Histograms
A histogram summarizes data in a frequency distribution
Classes are numerical, and are on the horizontal scale
Frequencies are on the vertical scale
Histogram is a specific kind of bar chart
Bars are of equal width
There are no gaps between the bars
The boundaries between the bars are located at the class boundaries
How can you lavel the horizontal axis?
The main point about histograms: We can use histograms to examine the shape
of the data
Normal distribution is bell shaped
Also (not in text at this point): For a relative frequency histogram (e.g. Fig 2-4), the height of each bar gives the probability that a particular data item falls within that bar. So in Fig 2-4, what is the probability of a pulse rate between 59.5 and 79.5
2.4
Statistical Graphics
Lots of options are listed, in addition to the histograms in 2-3. Know what each type is, how to read it, how to construct it
Frequency polygon and ogive (Cumulative frequency polygon)
Dotplots
Stemplots (Often called stem and leaf plots) Be able to start with raw data, construct a stemplot, turn it into a frequency distribution, and then construct a histogram.
Bar graphs
Pareto charts: Categories are qualitative (i.e. names), and bars are in descending order by size (frequency)
Pie charts (Angle for each piece is the relative frequency times 360 degrees) I won’t ask you to construct these by hand – it requires a protractor).
Scatterplots (for paired data
Time-series graphs
2.5 Bad Graphs
Two things to watch for: