Data are the raw material of statistics. Descriptive statistics are used to describe data from a population or a sample. We measure characteristics of study subjects using variables. Age, gender, race, income, systolic blood pressure, serum cholesterol, blood group are examples of variables.
To demonstrate descriptive statistics, I will use a subset of the data (n=3,000) collected in the Framingham Heart Study. The Framingham Heart Study is a longitudinal study to assess risk factors for cardiovascular disease. Details of the study, its design, data, and other information can be found at www.framingham.com/heart.
The Stata dataset is labeled framingham.dta. All Stata datasets have .dta extension. Download the dataset and save it to the data folder.
The following table (codesheet) shows variable names, as they appear in the Stata dataset, along with brief descriptions and coding details for each variable.
Table 1. Codesheet
Variable Name  Description  Coding 
ID  Random, unique number for each participant  13000 
AGE  Age at exam, in years  3270 
MALE  Male sex  1=male, 0=female 
TOTAL CHOL  Total Cholesterol, mg/dL  113696 
SBP  Systolic blood pressure, mmHg  83.5295 
DBP  Diastolic blood pressure, mmHg  48141 
BP MEDS  Antihypertensive medications  0=no, 1=yes 
BMI  Body mass index, kg/meters^{2}  15.5451.28 
CURRENT SMOKER  Currently smoking cigarettes  0=no, 1=yes 
CIGS PER DAY  Number of cigarettes smoked per day  070 
GLUCOSE  Serum glucose mg/dL  40394 
DIABETES  Diabetic  0=no, 1=yes 
HEART RATE  Heart rate, beats/minute  45143 
DEATH  Death from any cause over 24year followup  0=no, 1=yes 
STROKE  Stroke over 24year followup  0=no, 1=yes 
CVD  Cardiovascular disease over 24year followup  0=no, 1=yes 
HYPERTENSION  Hypertension over 24year followup  0=no, 1=yes 
BP4

Blood Pressure 4 categories

0= Normal 1=Prehypertension 2=Stage 1 hypertension 3=Stage 2 hypertension 
Now, you are going to open the Framingham dataset. First, start the new Stata session. Next, change the working directory (to link the Stata session to a folder) and create a log file. Name the log file framingham.log. To open the Framingham dataset (framingham.dta) from the menu bar select File > Open. Browse to the data folder, click on framingham.dta, and click Open. The dataset is now loaded in Stata. Since you have already linked the Stata session to data folder, you can also load the dataset by typing
use framingham.dta
in the Command Window and press the Enter or Return key to execute (run) the command. If you had not changed the working directory you would have to type the complete folder path in the Command Window to execute the Stata command.
Once the dataset is loaded all variables in the dataset will appear in the Variables Window.
Now, browse the complete dataset by typing browse in the Command Window. Press the Enter or Return key to execute the command.
Let’s run some descriptive statistics now. There are three Stata commands that are frequently used to describe data: summarize and tabstat to describe quantitative (continuous or discrete) variables and tabulate to describe qualitative/categorical (nominal or ordinal) data.
Look at the codesheet in Table 1 and identify the type of variables: quantitativecontinuous, quantitativediscrete, qualitativenominal, or qualitativeordinal. Download the answer key.
Suppose we want to know mean age of study participants. In the Command Window type
summarize age
and press the Enter or Return key to execute the command. The results of executing the command will appear in the Results Window.
Note: All Stata commands are lowercase.
Mean age of participants is 49.9 years with a standard deviation of 8.6 years. The minimum age is 32 years and the maximum is 70 years.
What if we want to know the median age of participants? You can get an expanded output by typing:
summarize age, detail
You can also use Stata’s graphical user interface (GUI) a.k.a pointandclick to execute commands. All statistics commands can be accessed from the Statistics menu located on top of the screen.
Descriptive statistics for quantitative variables can be computed by clicking Statistics > Summaries, tables, and tests > Summary and descriptive statistics > Summary statistics. Select age from the dropdown menu. Select Display additional statistics. Click OK.
The output will appear in the Results Window. The median (50%) age is 49 years. The interquartile range (IQR=Q3Q1) is 5743 = 14. The variance (SD^{2}) is 73.9.
The mean, median, and mode are called measures of central location. Whereas, standard deviation and IQR are called measures of variability or measures of dispersion.
tabstat – Compact table of summary statistics
We can also use tabstat to summarize quantitative variables in a single table. This Stata command is especially useful for stratified analysis. Let’s say we want to know the mean, median, standard deviation, and IQR of the age of participants stratified by gender. Using the pointandclick menu:
Statistics > Summaries, tables, and tests > Other tables > Compact table of summary statistics
Qualitative variables can be described using Stata’s tabulate command. In the Framingham dataset there are eight nominal (0/1, binary) and one ordinal variable. To compute frequency distribution of CURRENT SMOKER click Statistics > Summaries, tables, and tests > Frequency tables > Oneway table. Alternatively, type
tabulate current_smoker
or
tab current_smoker
in the Command Window. 49% of participants are current smokers. To know the frequency and % of missing values use the option miss, e.g.,
tab current_smoker, miss
Lastly, calculate frequency and percent of CURRENT SMOKING by GENDER by first following the steps outlined above and then clicking by/if/in to repeat the command by selecting the variable MALE. 60% of current smokers are male and 41% are female.
Now, download and complete Mock Table 2 and Mock Table 3.
SUMMARY
 Descriptive statistics are used to describe data
 To summaize quantitative (continuous or discrete) use Stata’s summarize or tabstat commands
 To describe qualtitative/categorical (nominal or ordinal) use Stata’s tabulate command