# AP Statistics Curriculum 2007 EDA DataTypes

(Difference between revisions)
 Revision as of 18:54, 27 January 2008 (view source)IvoDinov (Talk | contribs) (New page: == General Advance-Placement (AP) Statistics Curriculum - Types of Data== ===Definitions=== * '''Population''': A population is an entire group, collect...)← Older edit Current revision as of 18:43, 28 June 2010 (view source)Jenny (Talk | contribs) (→Types of Variables) (10 intermediate revisions not shown) Line 3: Line 3: ===Definitions=== ===Definitions=== * '''Population''': A population is an entire group, collection or space of objects which we want to characterize. * '''Population''': A population is an entire group, collection or space of objects which we want to characterize. - * '''Sample''': A sample is a collection of observations on which we measure one or more characteristics. Frequently, we use (small) samples of (large) populations to characterize the properties and affinities within the space of objects in the population of interest. For example, if we want to characterize the US population, we can take a sample (poll or survey) and the summaries that we obtain on the sample (e.g., mean age, race, income, body-weight, etc.) may be used to study the properties of the population, in geenral. + * '''Sample''': A sample is a collection of observations on which we measure one or more characteristics. Frequently, we use (small) samples of (large) populations to characterize the properties and affinities within the space of objects in the population of interest. For example, if we want to characterize the US population, we can take a sample (poll or survey) and the summaries that we obtain from the sample (e.g., mean age, race, income, body-weight, etc.) may be used to study the properties of the population, in general. - * '''Variable''': A variable is a characteristic of an observation that can be assigned a number or a category. For instance, the year in college (variable) of a student (observational unit). + * '''Variable''': A variable is a characteristic of an observation that can be assigned a number or a category. For instance, the year in college (variable) for a student (observational unit). ===Types of Variables=== ===Types of Variables=== - There are two types of variables: '''categorical''' and '''quantitative''' these types of variables can be split further. + Appropriate classification of process and variable types are important because they directly influence our decision on how to collect, explore, analyze and interpret data and results. For example, we can carry arithmetic (e.g., average) on quantitative variables, but we need to analyze frequencies of occurrence for qualitative variables. + + There are two types of variables: '''categorical''' and '''quantitative'''. These types of variables can be split further. * '''Categorical''':  Categorical variables are qualitative measurements of samples or populations that are classified into groups: * '''Categorical''':  Categorical variables are qualitative measurements of samples or populations that are classified into groups: - ** '''Ordinal''' categorical variables are qualitative descriptions that have a natural arrangement or order of the measurements -- e.g., rank in college (freshman, sophomore, junior, senior), size of soda (small, medium, large), etc. + ** '''Ordinal''' categorical variables are qualitative descriptions that have a natural arrangement or order of the measurements -- e.g., rank in college (freshman, sophomore, junior, senior), size of soda (small, medium, large), etc. ** '''Not ordinal''' (or nominal) variable is a categorical variable that does not have a naturally imposed (or meaningful) order of its values -- e.g., gender, race, political affiliation (democrat, republican, independent, green party, other), etc. ** '''Not ordinal''' (or nominal) variable is a categorical variable that does not have a naturally imposed (or meaningful) order of its values -- e.g., gender, race, political affiliation (democrat, republican, independent, green party, other), etc. *'''Quantitative''': Quantitative variables are measurements that have a meaningful numerical value representation. There are two types of quantitative variables: *'''Quantitative''': Quantitative variables are measurements that have a meaningful numerical value representation. There are two types of quantitative variables: - ** '''Continuous''' varibles indicate numerical observations that contain intervals with infinite (uncountable) possible values - e.g., weight, height, time, speed, etc. + ** '''Continuous''' variables indicate numerical observations that contain intervals with infinite (uncountable) possible values - e.g., weight, height, time, speed, etc. - ** '''Discrete''': Discrete variables are also numerical measurements, but they are ''sparse in space'' and any interval will contain at most countably many posible values -- e.g., number of students in a school, number of rational numbers in a given interval [a ; b], age, etc. + ** '''Discrete''': Discrete variables are also numerical measurements, but they are ''sparse in space'' and any interval will contain at most countably many possible values -- e.g., number of students in a school, number of rational numbers in a given interval [a ; b], age, etc. + + : The interpretation of ''discrete'' and ''continuous'' quantitative variables is always subjective. It depends on several factors. All of the following factors influence our decision to label certain processes as discrete or continuous - the physical, biological or psychological laws that govern the observed system, the data measuring apparatus, prior understanding of the process in terms of the relationship between variable/process changes and their practical effects (e.g., fetal/infant and adult ages are measured in weeks and years, respectively, even though time is generally continuous!) There is a general duality between the continuous and discrete world (just like ''light'' can be considered as collection of [http://en.wikipedia.org/wiki/Light discrete photons, or as a continuous wave]). ===Example=== ===Example=== - Most breast cancer patients (>80%) are over the age of 50 at diagnosis.  A researcher at a particular New York cancer center believes that his patients are even older than the norm, typically older than 65 years at diagnosis.  To investigate he reviews the ages of a random sample of 100 of his female patients diagnosed with breast cancer.
Identify the following: + Most breast cancer patients (>80%) are over the age of 50 at diagnosis.  A researcher at a particular New York cancer center believes that his patients are even older than the norm, typically older than 65 years at diagnosis.  To investigate, he reviews the ages of a random sample of 100 of his female patients diagnosed with breast cancer.
Identify the following: *Population *Population *Sample *Sample *Sample size *Sample size *Variable of interest *Variable of interest - *quantitative or qualitative? + *Quantitative or qualitative? - *Other variables + - *quantitative or qualitative? + *Observational unit *Observational unit + *Other variables + + ===[[EBook_Problems_EDA_DataTypes |Problems]]===

## General Advance-Placement (AP) Statistics Curriculum - Types of Data

### Definitions

• Population: A population is an entire group, collection or space of objects which we want to characterize.
• Sample: A sample is a collection of observations on which we measure one or more characteristics. Frequently, we use (small) samples of (large) populations to characterize the properties and affinities within the space of objects in the population of interest. For example, if we want to characterize the US population, we can take a sample (poll or survey) and the summaries that we obtain from the sample (e.g., mean age, race, income, body-weight, etc.) may be used to study the properties of the population, in general.
• Variable: A variable is a characteristic of an observation that can be assigned a number or a category. For instance, the year in college (variable) for a student (observational unit).

### Types of Variables

Appropriate classification of process and variable types are important because they directly influence our decision on how to collect, explore, analyze and interpret data and results. For example, we can carry arithmetic (e.g., average) on quantitative variables, but we need to analyze frequencies of occurrence for qualitative variables.

There are two types of variables: categorical and quantitative. These types of variables can be split further.

• Categorical: Categorical variables are qualitative measurements of samples or populations that are classified into groups:
• Ordinal categorical variables are qualitative descriptions that have a natural arrangement or order of the measurements -- e.g., rank in college (freshman, sophomore, junior, senior), size of soda (small, medium, large), etc.
• Not ordinal (or nominal) variable is a categorical variable that does not have a naturally imposed (or meaningful) order of its values -- e.g., gender, race, political affiliation (democrat, republican, independent, green party, other), etc.
• Quantitative: Quantitative variables are measurements that have a meaningful numerical value representation. There are two types of quantitative variables:
• Continuous variables indicate numerical observations that contain intervals with infinite (uncountable) possible values - e.g., weight, height, time, speed, etc.
• Discrete: Discrete variables are also numerical measurements, but they are sparse in space and any interval will contain at most countably many possible values -- e.g., number of students in a school, number of rational numbers in a given interval [a ; b], age, etc.
The interpretation of discrete and continuous quantitative variables is always subjective. It depends on several factors. All of the following factors influence our decision to label certain processes as discrete or continuous - the physical, biological or psychological laws that govern the observed system, the data measuring apparatus, prior understanding of the process in terms of the relationship between variable/process changes and their practical effects (e.g., fetal/infant and adult ages are measured in weeks and years, respectively, even though time is generally continuous!) There is a general duality between the continuous and discrete world (just like light can be considered as collection of discrete photons, or as a continuous wave).

### Example

Most breast cancer patients (>80%) are over the age of 50 at diagnosis. A researcher at a particular New York cancer center believes that his patients are even older than the norm, typically older than 65 years at diagnosis. To investigate, he reviews the ages of a random sample of 100 of his female patients diagnosed with breast cancer.
Identify the following:

• Population
• Sample
• Sample size
• Variable of interest
• Quantitative or qualitative?
• Observational unit
• Other variables