|
|
|
||
The course provides students with the SAS software including practical examples. In addition, selected procedures from SAS/STAT software are taught in order to use multidimensional statistics in demography. SAS/STAT (BOXPLOT, ANOVA, FACTOR, STDIZE, CLUSTER, DISTANCE, TREE, VARCLUS).
Last update: Rychtaříková Jitka, prof. RNDr., CSc. (29.06.2021)
|
|
||
Obligatory literature: Base SAS 9.4 Statistical Procedures SAS/STAT 14.2 User's Guide. Hendl, J. (2004): Přehled statistických metod zpracování dat. Praha: Portál. Last update: Rychtaříková Jitka, prof. RNDr., CSc. (29.06.2021)
|
|
||
Examination: written. Precondition is the final written test (program preparation) and active participation in lessons are required. Last update: Rychtaříková Jitka, prof. RNDr., CSc. (29.06.2021)
|
|
||
1. The BOXPLOT Procedure (SAS/STAT). Box-and-whisker plots, referred also as a box plot displays the mean, quartiles, and minimum and maximum observations for a group. The length of the box represents the interquartile range (the distance between the 25th and the 75th percentiles), the dot in the box interior represents the mean, the horizontal line in the box interior represents the median, the vertical lines issuing from the box extend to the minimum and maximum values of the analysis variable. BOXSTYLE=SKELETAL (the whiskers are drawn from the edges of the box to the extreme values of the group). BOXSTYLE=SCHEMATIC, a whisker is drawn from the upper edge of the box to the largest observed value within the upper fence and from the lower edge of the box to the smallest observed value within the lower fence.
2. The UNIVARIATE Procedure (Base SAS Statistical Procedures). Descriptive (summary) statistics based on moments (mean, variance, standard deviation, coefficient of variation, skewness, kurtosis), quantiles, mode(s), extreme values, frequencies. Confidence intervals for the mean, standard deviation, and variance. FREQ and WEIGHT statements. Histograms (HISTOGRAM), options (parametric distributions, kernel density estimation-nonparametric, graphic options). Placement of a box or a table of summary statistics in the graph (INSET). Quantile-Quantile plots (Q-Q plots), and probability-probability plots (P-P plots). Grouping data or creating comparative plots with CLASS statement. Rounding values of a variable (ROUND). Goodness-of-fit tests for a variety of distributions including the normal.
3. The FREQ Procedure (Base SAS Statistical Procedures). Creating one-way and n-way frequency and contingency (crosstabulation) tables. Goodness-of-fit tests for equal proportions or specified null proportions, and confidence limits. Testing for association in a crosstabulation table. TABLES (specify the type of a table, 2x2 Tables – Odds ratio and Relative Risks). TEST (Chi-Square Test), Pearson Correlation Coefficient, Spearman Rank Correlation Coefficient. WEIGHT statement.
4. The CORR Procedure (Base SAS Statistical Procedures). Pearson product-moment correlation (parametric measure of association for two variables. It measures the strength and direction of a linear relationship), Spearman rank-order correlation (nonparametric measure of association based on the ranks), Kendall’s tau-b coefficient (measure of association based on the number of concordances and discordances in paired observations). Pearson, Spearman, Kendall partial correlation (PARTIAL statement, a partial correlation measures the strength of a relationship between two variables while controlling the effect of other variables). FREQ and WEIGHT statements available.
5. ODS Graphics (Base SAS). The ODS Graphics procedures, sometimes called Statistical Graphics procedures produce plots for exploratory data analysis.The SGPLOT procedure creates line plots, scatters, histograms, area plots, etc. Different types of scatter plots, also in panels and with different layouts. The SGSCATTER procedure creates a paneled graph of scatter plots for multiple combinationsof variables. The SGPIE Procedure produces pie charts and donut charts. The SGPANEL procedure creates a panel of graph cells for the values of one or more classificationvariables.
6. Standardization Procedures: Standard, Stdize. · The STANDARD Procedure (Base SAS). The procedure standardizes variables in a SAS data set to a given mean and standard deviation, and it creates a new SAS data set containing the standardized values. · The STDIZE Procedure (SAS/STAT). The STDIZE procedure standardizes one or more numeric variables in a SAS data set by subtracting a location measure and dividing by a scale measure. A variety of location and scale measures are provided.
7. The ANOVA Procedure (SAS/STAT). The analysis of variance (ANOVA) for balanced data. The goal is to test for differences among the means of the levels and to quantify these differences. The classification variable is specified in the CLASS statement. Response variables must be numeric. Tukey’s multiple comparison tests for each level of the main effects can be produced. Procedure GLM handles unbalanced data.
8. MACRO Facility. Macro variables, macro Functions, macro Statements, macro Programs. Define and invoke a Macro Variable. Macro with Parameters. Positional macro arguments, Keyword macro arguments. Define and invoke Macro Program. Including External Macros, Autocall Macro Libraries.
9. The FACTOR Procedure (SAS/STAT) performs a variety of common factor and component analyses and rotations. The purpose of common factor analysis is to explain the correlations or covariances among a set of variables in terms of a limited number of unobservable, latent variables. Factor extraction includes principal component analysis, factor rotation, factor loadings, factor scores. FREQ and WEIGHT statements.
10. The DISTANCE Procedure (SAS/STAT) computes various measures of distance, dissimilarity, or similarity between the observations (rows) of an input data set, which can contain numeric or character variables, or both, depending on which proximity measure is used. Various nonparametric and parametric methods can be used for standardizing variables.
11. The CLUSTER Procedure (SAS/STAT). The purpose of cluster analysis is to place objects into groups or clusters suggested by the data, not defined a priori, such that objects in a given cluster tend to be similar to each other in some sense, and objects in different clusters tend to be dissimilar. The CLUSTER Procedure performs hierarchical clustering of observations. The data can be coordinates or distances. Scaling or transforming variables. Computing Euclidean distances or using the Distance Procedure and Distance Matrix. Different (11) clustering methods, creating an output dataset (OUTTREE) in order to draw a tree diagram by TREE procedure.
12. The TREE Procedure (SAS/STAT). The tree procedure reads a data set created by the CLUSTER or VARCLUS procedure and produces a tree diagram (dendrogram or phenogram). Horizontal or vertical tree diagram. ID statement (identifies objects).
13. The VARCLUS Procedure (SAS/STAT) divides a set of numeric variables into disjoint or hierarchical clusters. Associated with each cluster is a linear combination of the variables in the cluster. This linear combination can be either the first principal component (the default) or the centroid component. The first principal component is a weighted average of the variables that explains as much variance as possible. Centroid components are unweighted averages of either the standardized variables (the default) or the raw variables. A dendrogram of variable clusters is displayed.
Last update: Rychtaříková Jitka, prof. RNDr., CSc. (04.07.2021)
|