/* Copyright Dave Bone 1998 - 2014 All Rights Reserved. No part of this document may be reproduced without written consent from the author. FILE: first_set_rules.lex Dates: 23 Sept. 2005 Purpose: determine first set per rule. How: Read the rules' tree in prefix order filtering out: accepting only rule-def and subrule-def tokens. all other tokens with the grammar tree are bypassed */ /@ @i "/usr/local/yacco2/copyright.w" @** |first_set_rules| grammar.\fbreak Create a rule's first set by building a closure-only state. The terminals within this state are its first set. Each rule's first set within the grammar is built this way including the ``start rule'' of the grammar and possibly rules used only in a ``parallel-la-boundary'' expression. The difference between this first set calculation and the one described in ``The Theory of Parsing, Translation, and Compiling Volume 1: Parsing'' by Aho and Ullman on Page 300 is mine only evaluates the terminal(s) within the state while theirs ``derives'' the terminal string with all their substitutions to arrive at a set of terminals. {\bf Theirs is correct} as i did not completely walk the rule's subsubrules when evaluating a partially epsiloned string at follow set calculation --- it assumed that the rule's first set was already calculated: $\alpha$ string is made up of an epsiloned rule Ra$_{\epsilon}$ followed by $\beta$ string of rules or terminals. The first set was not filled in properly composed of FS($\alpha$) and FS($\beta$) but only the first string FS($\alpha$). When the follow set was calculated, it did not explode the follow set string into its symbol composites. Boy i'm dumb. Lets look at the short of it: the original first set calculation was for efficiency at the expense (when viewed now) to explode per follow string its composite first sets. What a dumb idea --- as all mistakes are! So now its a closure only state with interior symbols added due to epsilon rules. The first set view is all the terminals brought in from the rule's subrules. As other rules can be brought in due to closure, their subrules are added to the state. Now epsilon rules that start the subrule's symbol string are the camillions. They add their own teminals to the first set but also disappear and allow their next right symbol to be included in the state. This is recursive as these partially consumed subrules are then evaluated as above. Below is my then thought process and its assumptions:\fbreak So what's the difference on k = 1 symbol lookahead? Epsilon rules. I do not pursue the lookahead terminal string that would be derived. For the record, this gets done when the {\bf follow set} of a symbol string is calculated. The First set of a rule is used with an epsilon check to determine whether the next symbol in the string should be followed. Why advance if it is an epsilon rule? Epsilon is like a window where u see past the rule into its neighbour's setting; one can view it another way, an epsilon rule plays 2 parts: it provides its first set and provides its to-its-right string. When the follow set calculation hits the end-of-the-string --- i call this right-bounded condition, the remaining follow set is found from the Follow set of its spawning rule(s). This is transitive as the spawning rule's follow set calculation could also hit the end-of-string condition. Now where are these follow set strings found? --- in the state that spawned them. Each state contains the rule's follow set graph. \fbreak \fbreak Definition of First set:\fbreak Terminals that start all substrings generated by the rule's productions. The grammar tree is walked in prefix formation accepting only ``rule-def'' followed by its ``subrule-def'' terminals. Each rule within the grammar follows this pattern: ie, the start-rule is the first to be evaluated. Though it is never referenced in a subrule i still create its first set. \fbreak \fbreak The Algorithm.\fbreak The grammar reads each individual rule-def and all its subrule-def(s). Using its bottom-up recognition, |Rsubrule_def| adds the 1st element of the subrule into the |fs_list_|. |Rrule| processes the |fs_list_| as a closure-only state generating the rule's first set. In generating the first set, the elements in |fs_list_| are consumed as they are evaluated by removal from the list. Referenced terminals are added to the rule's first set. For 1st time referenced rules, their subrules are added at the end of |fs_list_| for eventual consumption. The neat thing about this algorithm is the 1st element in the |fs_list_| is only visited! It's a singular point of evaluation that is thrown out to be replaced by its next in line element: ahh the bank queue and the teller.\fbreak \fbreak Due to |cweave| irregularities in formatting C++ code of this grammar, please see |o2externs| documentation where the routines |GEN_FS_OF_RULE| is coded an external to overcome this deficiency. @/ fsm (fsm-id "first_set_rules.lex" ,fsm-filename first_set_rules ,fsm-namespace NS_first_set_rules ,fsm-class Cfirst_set_rules{ user-prefix-declaration #include "o2_externs.h" *** user-declaration public: rule_def* rule_def_; *** op rule_def_ = 0; *** } ,fsm-version "1.0",fsm-date "26 sept. 2005",fsm-debug "false" ,fsm-comments "Determine first set per rule.") @"/usr/local/yacco2/compiler/grammars/yacco2_T_includes.T" rules{ Rfirst_set_rules (){ -> Rrules eog } Rrules (){ -> Rrule -> Rrules Rrule } Rrule (){ -> Rrule_def { op Cfirst_set_rules* fsm = (Cfirst_set_rules*)rule_info__.parser__->fsm_tbl__; GEN_FS_OF_RULE(fsm->rule_def_); *** } } Rrule_def (){ -> "rule-def" { /@ Initialize for its subrule findings. @/ op Cfirst_set_rules* fsm = (Cfirst_set_rules*)rule_info__.parser__->fsm_tbl__; fsm->rule_def_ = sf->p1__; *** } } }// end of rules