/* Copyright Dave Bone 1998 - 2014 All Rights Reserved. No part of this document may be reproduced without written consent from the author. FILE: epsilon_rules.lex Dates: 23 Sept. 2005 Purpose: determine epsilon rules. How: Read the rules' tree in prefix order filtering out: ::, rules-phrase, identifier, NULL adding rules to yes or no piles or maybe list */ /@ @i "/usr/local/yacco2/copyright.w" @** |epsilon_rules| grammar.\fbreak Determine and check for the following:\fbreak \ptindent{1: epsilon rules} \ptindent{2: rules not deriving any string} \ptindent{3: referenced rules are defined} \ptindent{4: recursion cycling deriving only epsilon} \ptindent{5: ambiguous subrules caused by multiple recursion cycling} \fbreak Epsilon definition:\fbreak A rule having a subrule with no grammar symbols present: ie, it derives an empty symbol string. This is explicit epsilon. A second condition is when a rule's subrule has all its symbols as rules and they are all epsiloned. I call this transient epsilon. It doesn't matter how the condition is arrived at. Now a third condition is the ``invisible-shift'' operator usuage. Though it's not part of the token stream per say it is normally used to get out of ambigous situations and so is part of the lookahead set. Once apon a time i thought this was an implicit epsilon: IT IS NOT.\fbreak \fbreak Some epsilon terminalogy:\fbreak Noun: epsilon represents the empty string of symbols. The Greek flavour ($\epsilon$) represents it visually.\fbreak Adjective: epsilonable - has the empty symbol string condition and non-epsilonable generates terminal strings.\fbreak Verb: shake and bake the tenses.\fbreak Verbal: epsiloned.\fbreak \fbreak This grammar demonstrates the elegance of flow control. Review of a grammar's flow control:\fbreak This depends on how the grammar tree is walked. This grammar receives its tokens in prefix formation: a ``rule-def'' token is processed before its ``subrule-def''. Using sentences, a subrule's expression will be processed before its left-handside rule. This means the dependencies are in bottom-up left-to-right order of token recognition. One can use the prefix order to initialize the various conditions, and postfix the results from the bottom-up order. A grammar of no tokens can be programmed using all epsiloned rules whereby the rules act as logic triggers. In this paricular example, the grammar was originally written to detect ``epsilon'' rules. Now it detects whether all rules derive T. The grammar recognition just loads up into a 2 dimensional list the rule x subrule x element. Once the language is accepted, it then traverses the list to derive the terminal strings per subrule.\fbreak \fbreak The Algorithm.\fbreak Pass one: fill up the derives list\fbreak Fill up the triplet list of all subrules: rule $\otimes$ subrule $\otimes$ element. One can view it as a 2 dimensional array where the row is subrule dominent with its subordinent element columns. The rule just tags along providing the parental anchor. The grammar tree is walked top-down (prefix order) to load up the derives list. Only the ``rule-def'' and ``subrule-def'' come in as tokens while the elements are filtered out. This was done for efficiency reasons. Within the grammar at the ``subrule-def'' logic point, the ``subrule-def'' start element is fetched by syntax directed code. \fbreak \fbreak Pass 2: derive T for each subrule\fbreak Try to derive terminal strings for each indeterminate subrule. This stage does multilpe passes thru the derives list whereby each pass is predicated on the fact that the previous pass has a change in the status of a subrule --- advance to the next element or the subrule's elements have been consumed. How is this a 2 dimensional walk? The subrules in the list are the dominant axis while its elements are the secondary axis. This is a dynamic grid of points being consumned. The final outcome is either the grammar's subrules all derive terminal strings and the derives list is empty or the remainder of items are the pathological subrules.\fbreak \fbreak Examples of Pathological conditions:\fbreak Rule {\bf A} calls itself or a variation of {\bf A} calls {\bf B} calls {\bf A} without any terminal strings in their subrules.\fbreak \fbreak Please look at the following test grammars that exercise the pathos...\fbreak \ptindent{1: |ts_path1.lex|} \ptindent{2: |ts_path1a.lex|} \ptindent{3: |ts_path2.lex|} \ptindent{4: |ts_path3.lex|} \ptindent{5: |ts_path4.lex|} \ptindent{6: |ts_path5.lex|} \ptindent{7: |ts_path6.lex|} \ptindent{8: |ts_path7.lex|} They are exercised by |qa_alltests.bat| batch command file. @*2 Recursion and derives a string.\fbreak Derives:\fbreak Greek symbols are used to denote a mixed string of symbol(s) drawn from the nonterminal, terminal vocabularies. The empty string is represented by $\epsilon$. Derives is a relationship that starts with a string of symbols and substitutes equivalent strings of symbols until eventually nomore substitutions can take place. The resulting string is either empty or a string of terminals. The order of substitution will be from left to right. The $\longrightarrow$ symbol represents the derives relation. Sometimes i will subscript the greek symbol with $\epsilon$ denoting that it derives the empty string. The $\not\longrightarrow$\ symbol represents ``no derives'' takes place. I use this where a specific subrule pattern is being discussed and the greek symbol is not present. \fbreak Left recursion: A \subrule \ A $\beta${} \fbreak As shown, the rule {\bf A} calls itself from the left part of its subrule. $\beta${} represents strings of Rules and Terminals\fbreak \fbreak Right recursion: A \subrule \ $\alpha${} \ A \fbreak The rule {\bf A} calls itself from the end of the subrule as shown. $\alpha${} represents strings of Rules and Terminals.\fbreak @*2 Pathological Grammars.\fbreak The below pictures grids the good the bad and the ugly. \fbreak Note: the emitted code contains some imperfections caused by |cweave|. When i have time i'll deal with |cweave|. Reader beware. \fbreak \fbreak @/ fsm (fsm-id "epsilon_rules.lex" ,fsm-filename epsilon_rules ,fsm-namespace NS_epsilon_rules ,fsm-class Cepsilon_rules{ user-prefix-declaration #include "yacco2_stbl.h" using namespace NS_yacco2_terminals; #define EPSILON_YES 0 #define EPSILON_NO 1 #define EPSILON_DONT_KNOW 2 #define EPSILON_MAYBE 3 struct elem_list_type{ rule_def* rule_; T_subrule_def* subrule_; AST* elem_t_; int gen_epsilon_; int gen_t_; elem_list_type (rule_def* Rule ,T_subrule_def* Subrule ,AST* Elem_t){ rule_ = Rule; subrule_ = Subrule; elem_t_ = Elem_t; gen_epsilon_ = EPSILON_DONT_KNOW; gen_t_ = false; }; elem_list_type(){ rule_ = 0; subrule_ = 0; elem_t_ = 0; gen_epsilon_ = false; gen_t_ = false; }; }; *** user-declaration public: std::list derives_list_; rule_def* rule_def_; T_subrule_def* subrule_def_; AST* elem_t_; tok_can< AST* > * ip_can_; void deal_with_derives_list(); void deal_with_undecided_derives_list(); AST* advance_element(AST* Elemt); *** user-implementation AST* Cepsilon_rules::advance_element(AST* Elemt){ AST* nxt_t = AST::brother(*Elemt);// next element for(;;){// bypass thread expression: go to jail ... eosubrule using namespace NS_yacco2_T_enum; CAbs_lr1_sym* sym = AST::content(*nxt_t); switch(sym->enumerated_id__){ case T_Enum::T_T_identifier_: break; case T_Enum::T_T_2colon_: break; case T_Enum::T_T_NULL_: break; case T_Enum::T_T_cweb_marker_: break; case T_Enum::T_T_cweb_comment_: break; default: return nxt_t; } nxt_t = AST::brother(*nxt_t);// skip thread expr: next element } } void Cepsilon_rules::deal_with_derives_list(){ // force cweave to using namespace NS_yacco2_T_enum; bool has_list_chged(false); std::list::iterator i; Read_list:; i = derives_list_.begin(); if(i == derives_list_.end()) return; for(;i != derives_list_.end();){ // force cweave elem_list_type& el = *i; CAbs_lr1_sym* rteos = AST::content(*el.elem_t_);// force cweave switch(rteos->enumerated_id__){// force cweave case T_Enum::T_refered_rule_:{ refered_rule* rr = (refered_rule*)rteos; rule_in_stbl* rstbl = rr->Rule_in_stbl(); using namespace yacco2_stbl; T_sym_tbl_report_card report_card; get_sym_entry_by_sub(report_card,rstbl->stbl_idx()); if(report_card.status_ == T_sym_tbl_report_card::failure){ report_card.err_entry_->set_rc(*rr,__FILE__,__LINE__); parser__->add_token_to_error_queue(*report_card.err_entry_); return;// caused by problem with sym. tbl } if(report_card.tbl_entry_->defined_ == false){ CAbs_lr1_sym* sym = new Err_used_rule_but_undefined; sym->set_rc(*rr,__FILE__,__LINE__); parser__->add_token_to_error_queue(*sym); // keep pouring them into the error queue has_list_chged = true;// go to next element el.elem_t_ = advance_element(el.elem_t_); ++i;// next sr in derive list break; } rule_def* r = rr->Rule_in_stbl()->r_def(); if(r->epsilon() == YES){ has_list_chged = true; el.elem_t_ = advance_element(el.elem_t_); ++i;// next sr in derive list if(el.gen_epsilon_ != EPSILON_NO){ el.gen_epsilon_ = EPSILON_MAYBE; } break; } if(r->derive_t() == YES){ has_list_chged = true; el.elem_t_ = advance_element(el.elem_t_); el.gen_epsilon_ = EPSILON_NO; ++i;// next sr in derive list break; } ++i;// indeterminate spot: next pass might free it up so next sr break; } case T_Enum::T_T_eosubrule_:{ if(el.gen_epsilon_!= EPSILON_NO){ el.rule_->epsilon(YES); el.subrule_->epsilon(YES); }else{ el.subrule_->epsilon(NO); el.rule_->derive_t(YES); } has_list_chged = true; i = derives_list_.erase(i);// finished with sr break; } case T_Enum::T_T_null_call_thread_eosubrule_:{ el.gen_epsilon_ = EPSILON_NO; el.subrule_->epsilon(NO); el.rule_->derive_t(YES); has_list_chged = true; i = derives_list_.erase(i);// finished with sr break; } case T_Enum::T_T_called_thread_eosubrule_:{ el.gen_epsilon_ = EPSILON_NO; el.subrule_->epsilon(NO); el.rule_->derive_t(YES); has_list_chged = true; i = derives_list_.erase(i);// finished with sr break; } case T_Enum::T_refered_T_:{ el.gen_epsilon_ = EPSILON_NO; el.subrule_->epsilon(NO); has_list_chged = true; el.elem_t_ = advance_element(el.elem_t_); ++i;// next sr in derive list refered_T* rt = (refered_T*)rteos; break; } default:{ // lr k el.gen_epsilon_ = EPSILON_NO; el.subrule_->epsilon(NO); has_list_chged = true; el.elem_t_ = advance_element(el.elem_t_); ++i;// next sr in derive list break; } } } if(derives_list_.empty() == true) return;// kosher grammar if(has_list_chged == true){// still o/s but going forward has_list_chged = false; goto Read_list; } } void Cepsilon_rules::deal_with_undecided_derives_list(){ // chk if undefined rules were found: yes out damn spot if(parser__->error_queue()->empty() != true) return; // errors: pathological grammar std::list::iterator i = derives_list_.begin(); if(i == derives_list_.end()) return; for(;i != derives_list_.end();++i){ elem_list_type& el = *i; CAbs_lr1_sym* elem = AST::content(*el.elem_t_); CAbs_lr1_sym* sym = new ERR_sick_grammar; sym->set_rc(*elem,__FILE__,__LINE__); parser__->add_token_to_error_queue(*sym); } } *** op rule_def_ = 0; subrule_def_ = 0; elem_t_ = 0; ip_can_ = (tok_can< AST* >*)parser__->token_supplier__; *** } ,fsm-version "1.0",fsm-date "26 sept. 2005",fsm-debug "false" ,fsm-comments "Determine whether rules are epsilon, derive T, or are pathological.") @"/usr/local/yacco2/compiler/grammars/yacco2_T_includes.T" rules{ Repsilon_rules (){ -> Rrules eog { /@ Pass 2 and greater. @/ op Cepsilon_rules* fsm = (Cepsilon_rules*)rule_info__.parser__->fsm_tbl__; fsm->deal_with_derives_list(); if(fsm->derives_list_.empty() != true){ fsm->deal_with_undecided_derives_list(); } *** } } Rrules (){ -> Rrule -> Rrules Rrule } Rrule (){ -> Rrule_def Rsubrules } Rrule_def (){ -> "rule-def" { /@ Initialize for its subrule findings. @/ op Cepsilon_rules* fsm = (Cepsilon_rules*)rule_info__.parser__->fsm_tbl__; fsm->rule_def_ = sf->p1__; *** } } Rsubrules (){ -> Rsubrule -> Rsubrules Rsubrule } Rsubrule (){ -> Rsubrule_def } Rsubrule_def (){ -> "subrule-def" { /@ Create the entry within the |derives_list_|. This will be walked until either all the subrules' elements are derived or they are pathological. @/ op Cepsilon_rules* fsm = (Cepsilon_rules*)rule_info__.parser__->fsm_tbl__; fsm->subrule_def_ = sf->p1__; AST* sr_t = fsm->subrule_def_->subrule_s_tree(); AST* et = AST::get_spec_child(*sr_t,1); fsm->derives_list_.push_back (elem_list_type(fsm->rule_def_,fsm->subrule_def_,et)); *** } } }// end of rules