?> bnet - WeAreCAS
bayesianNetClassifier

bnet

Description

The bnet action from the Bayesian Net Classifier action set uses Bayesian network models to classify a target variable. It allows for various network structures and variable selection methods to build a predictive model.

bayesianNetClassifier.bnet result=results status=rc / alpha={double-1, double-2, ...}, attributes={{name='variable-name', format='string', formattedLength=integer, label='string', nfd=integer, nfl=integer}, ...}, bestModel=TRUE | FALSE, code={casOut={...}, comment=TRUE | FALSE, fmtWdth=integer, indentSize=integer, intoCutPt=double, iProb=TRUE | FALSE, labelId=integer, lineSize=integer, noTrim=TRUE | FALSE, pCatAll=TRUE | FALSE, tabForm=TRUE | FALSE}, codeGroup='string', display={...}, freq='string', id={'variable-name-1', 'variable-name-2', ...}, indepTest='ALL' | 'CHIGSQUARE' | 'CHISQUARE' | 'GSQUARE' | 'MI', inNetwork={...}, inputs={{...}, ...}, maxParents=integer, miAlpha=double, missingInt='IGNORE' | 'IMPUTE', missingNom='IGNORE' | 'IMPUTE' | 'LEVEL', nominals={{...}, ...}, numBin=integer, outNetwork={...}, output={...}, outputTables={...}, parenting={'BESTONE', 'BESTSET'}, partByFrac={...}, partByVar={...}, preScreening={'ONE', 'ZERO'}, printtarget=TRUE | FALSE, resident=TRUE | FALSE, saveState={...}, structures={'GENERAL', 'GN', 'MB', 'NAIVE', 'PC', 'TAN'}, table={...}, target='string', varSelect={'ONE', 'THREE', 'TWO', 'ZERO'} ;
Settings
ParameterDescription
alphaSpecifies the significance level for independence tests using chi-square or G-square statistics. You can specify up to five values to find the best model.
attributesChanges the attributes of variables used in the action.
bestModelWhen set to True, selects the best model based on validation data or cross-validation.
codeSpecifies the settings for generating SAS DATA step scoring code.
codeGroupSpecifies a group for the generated code.
displaySpecifies a list of results tables to be displayed.
freqSpecifies the frequency variable for the analysis.
idSpecifies variables to be copied to the output table.
indepTestSpecifies the method for independence tests (e.g., CHISQUARE, GSQUARE, MI).
inNetworkSpecifies the input table that defines links to be included or excluded from the network structure.
inputsSpecifies the input variables for the analysis.
maxParentsSpecifies the maximum number of parents for each node in the network.
miAlphaSpecifies the significance level for independence tests that use mutual information.
missingIntSpecifies how to handle missing values for interval variables (IGNORE or IMPUTE).
missingNomSpecifies how to handle missing values for nominal variables (IGNORE, IMPUTE, or LEVEL).
nominalsSpecifies the nominal variables to be used in the analysis.
numBinSpecifies the number of bins to use for interval variables.
outNetworkSpecifies the output table for the network structure and probability distributions.
outputSpecifies the output table to store predicted values.
outputTablesLists the names of results tables to save as CAS tables.
parentingSpecifies the structure learning method (BESTONE or BESTSET).
partByFracPartitions the input data by specifying fractions for training, testing, and validation.
partByVarPartitions the input data based on the values of a specified variable.
preScreeningSpecifies the initial screening method for input variables (ONE or ZERO).
printtargetWhen set to True, generates names for the predicted target and probability variables.
residentSpecifies whether the model should be kept in memory.
saveStateSpecifies the table in which to save the model state for future scoring.
structuresSpecifies the network structure types to be learned (e.g., NAIVE, TAN, PC).
tableSpecifies the input data table for the analysis.
targetSpecifies the target variable for classification.
varSelectSpecifies the variable selection method beyond prescreening (ZERO, ONE, TWO, THREE).
Data Preparation
Data Creation

This example creates a simple dataset named 'golf' with weather conditions and a decision on whether to play golf. This dataset will be used to train a Bayesian network classifier.

1DATA casuser.golf;
2 INFILE DATALINES delimiter=',';
3 INPUT Outlook $ Temperature $ Humidity $ Windy $ Play $;
4 DATALINES;
5 Sunny,Hot,High,False,No
6 Sunny,Hot,High,True,No
7 Overcast,Hot,High,False,Yes
8 Rainy,Mild,High,False,Yes
9 Rainy,Cool,Normal,False,Yes
10 Rainy,Cool,Normal,True,No
11 Overcast,Cool,Normal,True,Yes
12 Sunny,Mild,High,False,No
13 Sunny,Cool,Normal,False,Yes
14 Rainy,Mild,Normal,False,Yes
15 Sunny,Mild,Normal,True,Yes
16 Overcast,Mild,High,True,Yes
17 Overcast,Hot,Normal,False,Yes
18 Rainy,Mild,High,True,No
19 ;
20RUN;

Examples

This example trains a simple Naive Bayes classifier on the 'golf' dataset to predict the 'Play' variable. It uses all other variables as inputs.

SAS® / CAS Code
Copied!
1PROC CAS;
2 bayesianNetClassifier.bnet
3 TABLE={name='golf'},
4 target='Play',
5 inputs={'Outlook', 'Temperature', 'Humidity', 'Windy'},
6 structures={'NAIVE'},
7 OUTPUT={casout={name='golf_scored_simple', replace=true}, copyVars={'Play'}},
8 saveState={name='bnet_model_simple', replace=true};
9RUN;
Result :
The action trains a Naive Bayes model and creates two tables: 'golf_scored_simple' with predictions and 'bnet_model_simple' containing the model state for future scoring.

This example demonstrates a more advanced use case. It partitions the data into training (70%) and validation (30%) sets, then trains a Tree-Augmented Naive (TAN) network. It specifies how to handle missing values and sets a maximum of 2 parents for any node.

SAS® / CAS Code
Copied!
1PROC CAS;
2 bayesianNetClassifier.bnet
3 TABLE={name='golf'},
4 target='Play',
5 inputs={'Outlook', 'Temperature', 'Humidity', 'Windy'},
6 nominals={'Outlook', 'Temperature', 'Humidity', 'Windy', 'Play'},
7 partByFrac={train=0.7, validate=0.3, seed=1234},
8 structures={'TAN'},
9 maxParents=2,
10 missingNom='LEVEL',
11 OUTPUT={casout={name='golf_scored_detailed', replace=true}, copyVars={'Play'}},
12 outNetwork={name='bnet_network_detailed', replace=true},
13 saveState={name='bnet_model_detailed', replace=true};
14RUN;
Result :
This trains a TAN model using 70% of the data and validates on the remaining 30%. It creates 'golf_scored_detailed' with predictions, 'bnet_network_detailed' with the network structure, and 'bnet_model_detailed' to save the model.

FAQ

What is the primary purpose of the bnet action?
What are the different network structures that can be learned with the bnet action?
How does the bnet action handle missing values in interval variables?
What options are available for handling missing values in nominal variables?
What methods are available for independence tests in the bnet action?
What is the function of the 'maxParents' parameter?
How can the model be saved for future scoring?