?> applyConcept - WeAreCAS
textRuleScore

applyConcept

Description

The applyConcept action performs concept extraction using a predefined or custom concept extraction model (a LITI file). It is part of the Text Analytics Rule Score action set, which provides tools for linguistic rule scoring for categorization, concept extraction, and sentiment analysis. This action processes an input text document or a table of documents and identifies occurrences of concepts defined in the model, outputting detailed match information.

textRuleScore.applyConcept { casOut={...}, docId="string", dropConcepts={"string-1", ...}, factOut={...}, language="string", litiChunkSize="string", matchType="ALL"|"BEST"|"LONGEST", model={...}, parseTableIn={...}, parseTableOut={...}, ruleMatchOut={...}, table={...}, text="string" };
Settings
ParameterDescription
casOutSpecifies the output CAS table to store the concept match results.
docIdSpecifies the name of the variable in the input table that contains the document IDs.
dropConceptsSpecifies a list of concept names to exclude from the output tables. This is useful for filtering out predefined concepts without modifying the model.
factOutSpecifies the output CAS table for storing fact match results.
languageSpecifies the language of the input text. Default is 'ENGLISH'.
litiChunkSizeSpecifies the chunk size for document processing (e.g., '32K', '1M', 'ALL'). Smaller sizes can help manage memory for large documents. Default is '32K'.
matchTypeSpecifies the matching strategy: 'ALL' for all matches, 'BEST' for the best match, or 'LONGEST' for the longest match. Default is 'ALL'.
modelSpecifies the input CAS table containing the user-defined LITI (Language Interpretation for Text Information) model for concept extraction.
parseTableInSpecifies a CAS table containing pre-parsed documents from a previous run, which can improve performance, especially when using the CLAUS_n operator.
parseTableOutSpecifies a CAS table to save pre-parsed documents, which can be used as input for future runs to improve performance.
ruleMatchOutSpecifies the output CAS table to store detailed rule match information, which can be used as input for the ruleGen action.
tableSpecifies the input CAS table that contains the documents to be processed.
textSpecifies the name of the variable in the input table that contains the document text.
Data Preparation
Data Creation

This example creates a sample CAS table named 'my_documents' with two columns: 'doc_id' for the document identifier and 'text' for the document content. This table will be used as input for the concept extraction.

1DATA mycas.my_documents;
2 INFILE DATALINES delimiter='|';
3 LENGTH doc_id $ 10 text $ 300;
4 INPUT doc_id $ text $;
5 DATALINES;
6 doc1|The new SAS Viya platform is a powerful analytics tool.
7 doc2|SAS Cloud Analytic Services (CAS) is the engine behind Viya.
8 doc3|You can use LITI models for concept extraction.
9 ;
10RUN;

Examples

This example applies the default concept extraction model to the 'my_documents' table. It identifies concepts in the 'text' column, using 'doc_id' as the document identifier. The results are stored in a CAS table named 'concept_matches'.

SAS® / CAS Code
Copied!
1PROC CAS;
2 textRuleScore.applyConcept /
3 TABLE={name='my_documents'},
4 docId='doc_id',
5 text='text',
6 casOut={name='concept_matches', replace=true};
7RUN;
Result :
The action will produce an output table 'concept_matches' in the current caslib. This table will contain the concepts found in each document, such as 'SAS' or 'platform', along with their start and end positions.

This example demonstrates a more advanced use case. It first loads a custom LITI model from a table named 'my_liti_model'. Then, it applies this model to the 'my_documents' table. It specifies 'LONGEST' for the match type to only return the longest matching string for overlapping concepts. It generates three output tables: 'concept_matches' for the main results, 'fact_matches' for extracted facts, and 'rulematch_details' for detailed rule matching information used for debugging or further analysis.

SAS® / CAS Code
Copied!
1PROC CAS;
2 textRuleScore.applyConcept /
3 TABLE={name='my_documents'},
4 docId='doc_id',
5 text='text',
6 model={name='my_liti_model'},
7 matchType='LONGEST',
8 casOut={name='concept_matches', replace=true},
9 factOut={name='fact_matches', replace=true},
10 ruleMatchOut={name='rulematch_details', replace=true};
11RUN;
Result :
Three tables will be created in the current caslib: 'concept_matches' with the longest concept matches, 'fact_matches' containing any facts extracted based on the LITI rules, and 'rulematch_details' with granular data about which rules were triggered for each match.

This example shows a two-step process to improve performance. First, `applyConcept` is called with the `parseTableOut` parameter to create a table of pre-parsed documents named 'parsed_docs'. In the second call, this 'parsed_docs' table is used as input via the `parseTableIn` parameter, which can speed up processing, especially with complex models or large documents.

SAS® / CAS Code
Copied!
1PROC CAS;
2 /* Step 1: Parse documents and save the intermediate table */
3 textRuleScore.applyConcept /
4 TABLE={name='my_documents'},
5 docId='doc_id',
6 text='text',
7 parseTableOut={name='parsed_docs', replace=true};
8 
9 /* Step 2: Use the pre-parsed table for faster concept extraction */
10 textRuleScore.applyConcept /
11 TABLE={name='my_documents'},
12 docId='doc_id',
13 text='text',
14 parseTableIn={name='parsed_docs'},
15 casOut={name='concept_matches_fast', replace=true};
16RUN;
Result :
The first step creates the 'parsed_docs' table. The second step uses this intermediate table to create 'concept_matches_fast', which will contain the same concept matches as a single run but may complete more quickly.

FAQ

What is the purpose of the `applyConcept` action in SAS Viya?
What are the primary input and output parameters for the `applyConcept` action?
How can I use a custom concept model with the `applyConcept` action?
What does the `matchType` parameter control?
How can I optimize the performance of the `applyConcept` action, especially with large documents or complex models?
Is it possible to exclude certain concepts from the output results?
What does the `litiChunkSize` parameter do?