?>
The bnet action from the Bayesian Net Classifier action set uses Bayesian network models to classify a target variable. It allows for various network structures and variable selection methods to build a predictive model.
| Parameter | Description |
|---|---|
| alpha | Specifies the significance level for independence tests using chi-square or G-square statistics. You can specify up to five values to find the best model. |
| attributes | Changes the attributes of variables used in the action. |
| bestModel | When set to True, selects the best model based on validation data or cross-validation. |
| code | Specifies the settings for generating SAS DATA step scoring code. |
| codeGroup | Specifies a group for the generated code. |
| display | Specifies a list of results tables to be displayed. |
| freq | Specifies the frequency variable for the analysis. |
| id | Specifies variables to be copied to the output table. |
| indepTest | Specifies the method for independence tests (e.g., CHISQUARE, GSQUARE, MI). |
| inNetwork | Specifies the input table that defines links to be included or excluded from the network structure. |
| inputs | Specifies the input variables for the analysis. |
| maxParents | Specifies the maximum number of parents for each node in the network. |
| miAlpha | Specifies the significance level for independence tests that use mutual information. |
| missingInt | Specifies how to handle missing values for interval variables (IGNORE or IMPUTE). |
| missingNom | Specifies how to handle missing values for nominal variables (IGNORE, IMPUTE, or LEVEL). |
| nominals | Specifies the nominal variables to be used in the analysis. |
| numBin | Specifies the number of bins to use for interval variables. |
| outNetwork | Specifies the output table for the network structure and probability distributions. |
| output | Specifies the output table to store predicted values. |
| outputTables | Lists the names of results tables to save as CAS tables. |
| parenting | Specifies the structure learning method (BESTONE or BESTSET). |
| partByFrac | Partitions the input data by specifying fractions for training, testing, and validation. |
| partByVar | Partitions the input data based on the values of a specified variable. |
| preScreening | Specifies the initial screening method for input variables (ONE or ZERO). |
| printtarget | When set to True, generates names for the predicted target and probability variables. |
| resident | Specifies whether the model should be kept in memory. |
| saveState | Specifies the table in which to save the model state for future scoring. |
| structures | Specifies the network structure types to be learned (e.g., NAIVE, TAN, PC). |
| table | Specifies the input data table for the analysis. |
| target | Specifies the target variable for classification. |
| varSelect | Specifies the variable selection method beyond prescreening (ZERO, ONE, TWO, THREE). |
This example creates a simple dataset named 'golf' with weather conditions and a decision on whether to play golf. This dataset will be used to train a Bayesian network classifier.
1 DATA casuser.golf; 2 INFILE DATALINES delimiter=','; 3 INPUT Outlook $ Temperature $ Humidity $ Windy $ Play $; 4 DATALINES; 5 Sunny,Hot,High,False,No 6 Sunny,Hot,High,True,No 7 Overcast,Hot,High,False,Yes 8 Rainy,Mild,High,False,Yes 9 Rainy,Cool,Normal,False,Yes 10 Rainy,Cool,Normal,True,No 11 Overcast,Cool,Normal,True,Yes 12 Sunny,Mild,High,False,No 13 Sunny,Cool,Normal,False,Yes 14 Rainy,Mild,Normal,False,Yes 15 Sunny,Mild,Normal,True,Yes 16 Overcast,Mild,High,True,Yes 17 Overcast,Hot,Normal,False,Yes 18 Rainy,Mild,High,True,No 19 ; 20 RUN;
This example trains a simple Naive Bayes classifier on the 'golf' dataset to predict the 'Play' variable. It uses all other variables as inputs.
| 1 | PROC CAS; |
| 2 | bayesianNetClassifier.bnet |
| 3 | TABLE={name='golf'}, |
| 4 | target='Play', |
| 5 | inputs={'Outlook', 'Temperature', 'Humidity', 'Windy'}, |
| 6 | structures={'NAIVE'}, |
| 7 | OUTPUT={casout={name='golf_scored_simple', replace=true}, copyVars={'Play'}}, |
| 8 | saveState={name='bnet_model_simple', replace=true}; |
| 9 | RUN; |
This example demonstrates a more advanced use case. It partitions the data into training (70%) and validation (30%) sets, then trains a Tree-Augmented Naive (TAN) network. It specifies how to handle missing values and sets a maximum of 2 parents for any node.
| 1 | PROC CAS; |
| 2 | bayesianNetClassifier.bnet |
| 3 | TABLE={name='golf'}, |
| 4 | target='Play', |
| 5 | inputs={'Outlook', 'Temperature', 'Humidity', 'Windy'}, |
| 6 | nominals={'Outlook', 'Temperature', 'Humidity', 'Windy', 'Play'}, |
| 7 | partByFrac={train=0.7, validate=0.3, seed=1234}, |
| 8 | structures={'TAN'}, |
| 9 | maxParents=2, |
| 10 | missingNom='LEVEL', |
| 11 | OUTPUT={casout={name='golf_scored_detailed', replace=true}, copyVars={'Play'}}, |
| 12 | outNetwork={name='bnet_network_detailed', replace=true}, |
| 13 | saveState={name='bnet_model_detailed', replace=true}; |
| 14 | RUN; |