SAS Training: How to Learn SAS Programming Fast
You might have heard of the 80/20 rule. This rule is also referred to as the ‘Pareto’ principle or distribution. The general gist is that 80% of output will require only 20% input. One example of this phenomena is how approximately 20% of sales representatives tend to generate 80% of the total sales, or how approximately 20% of patients account for 80% of healthcare spending.
As an instructor in the data science space, this principle has helped me greatly in developing my courses. I find that if I set up the first 20% of the course properly, students are 80% there in terms of their understanding. This is because once you understand the general structure and overall pattern of a thing, you are pretty deep into understanding it.
So, enough talk, on to what you want to know.
To understand SAS programming, you need to know the data step and proc step. The data step consists of a group of SAS statements beginning with the data statement and ending with the run statement. A SAS statement either requests SAS to perform an operation or gives information to the system Typically, you want one statement per line, and each line has a semicolon to end the line. For example, here is the most barebones data step:
data empincome; /* Name you chose for the output dataset */
infile '/home/ermin0/salary.txt'; /* File path of dataset*/
input Year Income; /* Names of the variables in the dataset */
run; /* Executes the data step */
These 4 statements (data, infile, input, and run) are all you need to create an output dataset for a typical .txt file. A .csv file would require a bit more work but not much more.
Let’s add a bit more code to our previous data step. We have created a SAS dataset but that’s all we’ve done. We have not performed any data manipulation.
data empincome;
infile '/home/ermin0/salary.txt';
input Year Income;
if Income < 26000; /* We are filtering by Income Variable */
run;
Above, we have performed a bit of manipulation, as I have filtered observations by the Income variable. In this case, only those observations meeting the < than $26,000 will be shown in the output dataset named ‘empincome’.
Another piece of advice is that you should have many data steps, not just one long data step. This is convention but it also makes reading code that much easier and allows you to easily return to a certain point in the manipulation of your dataset. So, you’ll want to do a bunch of related manipulations in one data step and then start a new data step and simply read the existing (previous) dataset (data step). For example:
data empincome;
infile '/home/ermin0/salary.txt';
input Year Income;
if Income < 26000; /* We are filtering by Income Variable */
run;data sales2;
set sales; /* Set allows you to read an existing dataset. */
more statements;
more statements;
more statements;
run;
Finally, you can also use SAS functions within the data step. There are arithmetic functions (i.e. returning the square root), character functions (i.e. extracting part of a string), Date and Time functions (i.e. returning current date and time), and truncation functions (i.e. returning integers).
Now that you know the structure of the data step and its primary purpose (the creation and manipulation of datasets), you just need to look up more SAS statements, and more functions, to see what each of them allows you to do.
On to the proc step. The proc step consists of a group of SAS statements that call and execute a procedure, usually with a SAS data set as input. A proc step starts with proc and ends with the run statement. The primary purpose of a proc step is to analyze data (run statistics on a dataset) and produce formatted reports.
proc corr data=empincome plots=SCATTER(NVAR=ALL);
var Gender Income; /* The Gender variable is coded 0/1 here*/
run;
Above, I am using the empincome dataset as an input to my proc step. The corr refers to the Pearson correlation procedure. I am also stating that I want a scatter plot for all my variables. The var statement specifies the variables to use to calculate correlation statistics because without the var statement it would compute these statistics for all variables.
So, that is all. Now that you get the purpose of the data step and proc step, and you can appreciate its structure, you are on your way to understanding SAS Programming.
Hope you enjoyed reading.
My website: