bnet - WeAreCAS

Description

The bnet action from the Bayesian Net Classifier action set uses Bayesian network models to classify a target variable. It allows for various network structures and variable selection methods to build a predictive model.

bayesianNetClassifier.bnet result=results status=rc / alpha={double-1, double-2, ...}, attributes={{name='variable-name', format='string', formattedLength=integer, label='string', nfd=integer, nfl=integer}, ...}, bestModel=TRUE | FALSE, code={casOut={...}, comment=TRUE | FALSE, fmtWdth=integer, indentSize=integer, intoCutPt=double, iProb=TRUE | FALSE, labelId=integer, lineSize=integer, noTrim=TRUE | FALSE, pCatAll=TRUE | FALSE, tabForm=TRUE | FALSE}, codeGroup='string', display={...}, freq='string', id={'variable-name-1', 'variable-name-2', ...}, indepTest='ALL' | 'CHIGSQUARE' | 'CHISQUARE' | 'GSQUARE' | 'MI', inNetwork={...}, inputs={{...}, ...}, maxParents=integer, miAlpha=double, missingInt='IGNORE' | 'IMPUTE', missingNom='IGNORE' | 'IMPUTE' | 'LEVEL', nominals={{...}, ...}, numBin=integer, outNetwork={...}, output={...}, outputTables={...}, parenting={'BESTONE', 'BESTSET'}, partByFrac={...}, partByVar={...}, preScreening={'ONE', 'ZERO'}, printtarget=TRUE | FALSE, resident=TRUE | FALSE, saveState={...}, structures={'GENERAL', 'GN', 'MB', 'NAIVE', 'PC', 'TAN'}, table={...}, target='string', varSelect={'ONE', 'THREE', 'TWO', 'ZERO'} ;

Settings

Parameter	Description
alpha	Specifies the significance level for independence tests using chi-square or G-square statistics. You can specify up to five values to find the best model.
attributes	Changes the attributes of variables used in the action.
bestModel	When set to True, selects the best model based on validation data or cross-validation.
code	Specifies the settings for generating SAS DATA step scoring code.
codeGroup	Specifies a group for the generated code.
display	Specifies a list of results tables to be displayed.
freq	Specifies the frequency variable for the analysis.
id	Specifies variables to be copied to the output table.
indepTest	Specifies the method for independence tests (e.g., CHISQUARE, GSQUARE, MI).
inNetwork	Specifies the input table that defines links to be included or excluded from the network structure.
inputs	Specifies the input variables for the analysis.
maxParents	Specifies the maximum number of parents for each node in the network.
miAlpha	Specifies the significance level for independence tests that use mutual information.
missingInt	Specifies how to handle missing values for interval variables (IGNORE or IMPUTE).
missingNom	Specifies how to handle missing values for nominal variables (IGNORE, IMPUTE, or LEVEL).
nominals	Specifies the nominal variables to be used in the analysis.
numBin	Specifies the number of bins to use for interval variables.
outNetwork	Specifies the output table for the network structure and probability distributions.
output	Specifies the output table to store predicted values.
outputTables	Lists the names of results tables to save as CAS tables.
parenting	Specifies the structure learning method (BESTONE or BESTSET).
partByFrac	Partitions the input data by specifying fractions for training, testing, and validation.
partByVar	Partitions the input data based on the values of a specified variable.
preScreening	Specifies the initial screening method for input variables (ONE or ZERO).
printtarget	When set to True, generates names for the predicted target and probability variables.
resident	Specifies whether the model should be kept in memory.
saveState	Specifies the table in which to save the model state for future scoring.
structures	Specifies the network structure types to be learned (e.g., NAIVE, TAN, PC).
table	Specifies the input data table for the analysis.
target	Specifies the target variable for classification.
varSelect	Specifies the variable selection method beyond prescreening (ZERO, ONE, TWO, THREE).

Data Preparation

Data Creation

This example creates a simple dataset named 'golf' with weather conditions and a decision on whether to play golf. This dataset will be used to train a Bayesian network classifier.

1 DATA casuser.golf;
2     INFILE DATALINES delimiter=',';
3     INPUT Outlook $ Temperature $ Humidity $ Windy $ Play $;
4     DATALINES;
5     Sunny,Hot,High,False,No
6     Sunny,Hot,High,True,No
7     Overcast,Hot,High,False,Yes
8     Rainy,Mild,High,False,Yes
9     Rainy,Cool,Normal,False,Yes
10     Rainy,Cool,Normal,True,No
11     Overcast,Cool,Normal,True,Yes
12     Sunny,Mild,High,False,No
13     Sunny,Cool,Normal,False,Yes
14     Rainy,Mild,Normal,False,Yes
15     Sunny,Mild,Normal,True,Yes
16     Overcast,Mild,High,True,Yes
17     Overcast,Hot,Normal,False,Yes
18     Rainy,Mild,High,True,No
19     ;
20 RUN;

Examples

This example trains a simple Naive Bayes classifier on the 'golf' dataset to predict the 'Play' variable. It uses all other variables as inputs.

SAS® / CAS Code

Copied!

1	PROC CAS;
2	bayesianNetClassifier.bnet
3	TABLE={name='golf'},
4	target='Play',
5	inputs={'Outlook', 'Temperature', 'Humidity', 'Windy'},
6	structures={'NAIVE'},
7	OUTPUT={casout={name='golf_scored_simple', replace=true}, copyVars={'Play'}},
8	saveState={name='bnet_model_simple', replace=true};
9	RUN;

Result :
The action trains a Naive Bayes model and creates two tables: 'golf_scored_simple' with predictions and 'bnet_model_simple' containing the model state for future scoring.

This example demonstrates a more advanced use case. It partitions the data into training (70%) and validation (30%) sets, then trains a Tree-Augmented Naive (TAN) network. It specifies how to handle missing values and sets a maximum of 2 parents for any node.

SAS® / CAS Code

Copied!

1	PROC CAS;
2	bayesianNetClassifier.bnet
3	TABLE={name='golf'},
4	target='Play',
5	inputs={'Outlook', 'Temperature', 'Humidity', 'Windy'},
6	nominals={'Outlook', 'Temperature', 'Humidity', 'Windy', 'Play'},
7	partByFrac={train=0.7, validate=0.3, seed=1234},
8	structures={'TAN'},
9	maxParents=2,
10	missingNom='LEVEL',
11	OUTPUT={casout={name='golf_scored_detailed', replace=true}, copyVars={'Play'}},
12	outNetwork={name='bnet_network_detailed', replace=true},
13	saveState={name='bnet_model_detailed', replace=true};
14	RUN;

Result :
This trains a TAN model using 70% of the data and validates on the remaining 30%. It creates 'golf_scored_detailed' with predictions, 'bnet_network_detailed' with the network structure, and 'bnet_model_detailed' to save the model.

FAQ

What is the primary purpose of the bnet action?

What are the different network structures that can be learned with the bnet action?

How does the bnet action handle missing values in interval variables?

What options are available for handling missing values in nominal variables?

What methods are available for independence tests in the bnet action?

What is the function of the 'maxParents' parameter?

How can the model be saved for future scoring?

1	DATA casuser.golf;
2	INFILE DATALINES delimiter=',';
3	INPUT Outlook $ Temperature $ Humidity $ Windy $ Play $;
4	DATALINES;
5	Sunny,Hot,High,False,No
6	Sunny,Hot,High,True,No
7	Overcast,Hot,High,False,Yes
8	Rainy,Mild,High,False,Yes
9	Rainy,Cool,Normal,False,Yes
10	Rainy,Cool,Normal,True,No
11	Overcast,Cool,Normal,True,Yes
12	Sunny,Mild,High,False,No
13	Sunny,Cool,Normal,False,Yes
14	Rainy,Mild,Normal,False,Yes
15	Sunny,Mild,Normal,True,Yes
16	Overcast,Mild,High,True,Yes
17	Overcast,Hot,Normal,False,Yes
18	Rainy,Mild,High,True,No
19	;
20	RUN;

Description

Data Creation

Examples

Simple Naive Bayes Model

Detailed Tree-Augmented Naive (TAN) Model with Data Partitioning

FAQ