?>
The applyConcept action performs concept extraction using a predefined or custom concept extraction model (a LITI file). It is part of the Text Analytics Rule Score action set, which provides tools for linguistic rule scoring for categorization, concept extraction, and sentiment analysis. This action processes an input text document or a table of documents and identifies occurrences of concepts defined in the model, outputting detailed match information.
| Parameter | Description |
|---|---|
| casOut | Specifies the output CAS table to store the concept match results. |
| docId | Specifies the name of the variable in the input table that contains the document IDs. |
| dropConcepts | Specifies a list of concept names to exclude from the output tables. This is useful for filtering out predefined concepts without modifying the model. |
| factOut | Specifies the output CAS table for storing fact match results. |
| language | Specifies the language of the input text. Default is 'ENGLISH'. |
| litiChunkSize | Specifies the chunk size for document processing (e.g., '32K', '1M', 'ALL'). Smaller sizes can help manage memory for large documents. Default is '32K'. |
| matchType | Specifies the matching strategy: 'ALL' for all matches, 'BEST' for the best match, or 'LONGEST' for the longest match. Default is 'ALL'. |
| model | Specifies the input CAS table containing the user-defined LITI (Language Interpretation for Text Information) model for concept extraction. |
| parseTableIn | Specifies a CAS table containing pre-parsed documents from a previous run, which can improve performance, especially when using the CLAUS_n operator. |
| parseTableOut | Specifies a CAS table to save pre-parsed documents, which can be used as input for future runs to improve performance. |
| ruleMatchOut | Specifies the output CAS table to store detailed rule match information, which can be used as input for the ruleGen action. |
| table | Specifies the input CAS table that contains the documents to be processed. |
| text | Specifies the name of the variable in the input table that contains the document text. |
This example creates a sample CAS table named 'my_documents' with two columns: 'doc_id' for the document identifier and 'text' for the document content. This table will be used as input for the concept extraction.
1 DATA mycas.my_documents; 2 INFILE DATALINES delimiter='|'; 3 LENGTH doc_id $ 10 text $ 300; 4 INPUT doc_id $ text $; 5 DATALINES; 6 doc1|The new SAS Viya platform is a powerful analytics tool. 7 doc2|SAS Cloud Analytic Services (CAS) is the engine behind Viya. 8 doc3|You can use LITI models for concept extraction. 9 ; 10 RUN;
This example applies the default concept extraction model to the 'my_documents' table. It identifies concepts in the 'text' column, using 'doc_id' as the document identifier. The results are stored in a CAS table named 'concept_matches'.
| 1 | PROC CAS; |
| 2 | textRuleScore.applyConcept / |
| 3 | TABLE={name='my_documents'}, |
| 4 | docId='doc_id', |
| 5 | text='text', |
| 6 | casOut={name='concept_matches', replace=true}; |
| 7 | RUN; |
This example demonstrates a more advanced use case. It first loads a custom LITI model from a table named 'my_liti_model'. Then, it applies this model to the 'my_documents' table. It specifies 'LONGEST' for the match type to only return the longest matching string for overlapping concepts. It generates three output tables: 'concept_matches' for the main results, 'fact_matches' for extracted facts, and 'rulematch_details' for detailed rule matching information used for debugging or further analysis.
| 1 | PROC CAS; |
| 2 | textRuleScore.applyConcept / |
| 3 | TABLE={name='my_documents'}, |
| 4 | docId='doc_id', |
| 5 | text='text', |
| 6 | model={name='my_liti_model'}, |
| 7 | matchType='LONGEST', |
| 8 | casOut={name='concept_matches', replace=true}, |
| 9 | factOut={name='fact_matches', replace=true}, |
| 10 | ruleMatchOut={name='rulematch_details', replace=true}; |
| 11 | RUN; |
This example shows a two-step process to improve performance. First, `applyConcept` is called with the `parseTableOut` parameter to create a table of pre-parsed documents named 'parsed_docs'. In the second call, this 'parsed_docs' table is used as input via the `parseTableIn` parameter, which can speed up processing, especially with complex models or large documents.
| 1 | PROC CAS; |
| 2 | /* Step 1: Parse documents and save the intermediate table */ |
| 3 | textRuleScore.applyConcept / |
| 4 | TABLE={name='my_documents'}, |
| 5 | docId='doc_id', |
| 6 | text='text', |
| 7 | parseTableOut={name='parsed_docs', replace=true}; |
| 8 | |
| 9 | /* Step 2: Use the pre-parsed table for faster concept extraction */ |
| 10 | textRuleScore.applyConcept / |
| 11 | TABLE={name='my_documents'}, |
| 12 | docId='doc_id', |
| 13 | text='text', |
| 14 | parseTableIn={name='parsed_docs'}, |
| 15 | casOut={name='concept_matches_fast', replace=true}; |
| 16 | RUN; |