/*
  Copyright Dave Bone 1998 - 2014 
  All Rights Reserved. 
  No part of this document may be reproduced without written consent from the author.
	
FILE:		first_set_rules.lex
Dates:		23 Sept. 2005
Purpose:	determine first set per rule.
How:	Read the rules' tree in prefix order filtering out:
			accepting only rule-def and subrule-def tokens.
			all other tokens with the grammar tree are bypassed
*/
/@
@i "/usr/local/yacco2/copyright.w"
@** |first_set_rules| grammar.\fbreak
Create a rule's first set by building a closure-only
state. The terminals within this state are its first set.
Each rule's first set within the grammar is built this way including the ``start rule'' 
of the grammar and possibly rules used only in a 
  ``parallel-la-boundary'' expression.

The difference between this first set calculation and the one described in
``The Theory of Parsing, Translation,
and Compiling Volume 1: Parsing'' by Aho and Ullman on 
Page 300 is mine only evaluates the terminal(s) within
the state while theirs ``derives'' the terminal string with all their
 substitutions
to arrive at a set of terminals. 
{\bf Theirs is correct} as i did not completely walk
the rule's subsubrules when evaluating a partially epsiloned string
at follow set calculation --- it assumed that the rule's first set was already
calculated: $\alpha$ string is made up of an epsiloned rule Ra$_{\epsilon}$
 followed by $\beta$ string of rules or terminals.
The first set was not filled in properly composed of FS($\alpha$) and FS($\beta$)
but only the first string FS($\alpha$).
When the follow set was calculated, it did not
explode the follow set string into its symbol composites.
Boy i'm dumb. Lets look at the short of it: the original
first set calculation was for efficiency at the expense (when viewed now)
to explode per follow string its composite first sets.
What a dumb idea --- as all mistakes are!
So now its a closure only state with interior symbols added due to epsilon rules.
The first set view is all the terminals brought in from the rule's subrules.
As other rules can be brought in due to closure, their subrules are added to the state.
Now epsilon rules that start the subrule's symbol string
are the camillions. They add their own teminals
 to the first set but also disappear
and allow their next right symbol to be included in the state.
This is recursive as these partially consumed subrules are then evaluated
as above.

Below is my then thought process and its assumptions:\fbreak
So what's the difference on k = 1 symbol lookahead? Epsilon rules.
I do not pursue the lookahead terminal string that would be derived.
For the record, this gets done when the {\bf follow set} of a 
symbol string is calculated.
The First set of a rule is used with an epsilon check to determine whether
the next symbol in the string should be followed. 
Why advance if it is an epsilon rule?
Epsilon is like a window where u see past the rule into its neighbour's setting;
one can view it another way, an epsilon rule plays 2 parts: it provides
its first set and provides its to-its-right string.
When the follow set calculation hits the end-of-the-string --- i call this right-bounded condition,
the remaining follow set is found from the Follow set of its spawning rule(s). 
This is transitive as the spawning rule's follow set calculation could
also hit the end-of-string condition.
Now where are these follow set strings found? --- in the state that spawned them.
Each state contains the rule's follow set graph.
\fbreak
\fbreak
Definition of First set:\fbreak
Terminals that start all substrings generated by the rule's productions. 
The grammar tree
is walked in prefix formation accepting only ``rule-def'' followed by its 
``subrule-def'' terminals. 
Each rule within the grammar follows this pattern:
 ie, the start-rule is the first to be evaluated. 
 Though it is never referenced in a subrule i still create its first set.
\fbreak
\fbreak
The Algorithm.\fbreak
  The grammar reads each individual rule-def and all its subrule-def(s).
  Using its bottom-up recognition, |Rsubrule_def| adds the 1st element
  of the subrule 
  into the |fs_list_|.
  |Rrule| processes the |fs_list_| 
  as a closure-only state
  generating the rule's first set.
  In generating the first set, the elements in |fs_list_| are consumed
  as they are evaluated by removal from the list. 
  Referenced terminals are added to the rule's
  first set.
  For 1st time referenced rules, their subrules are added at the end of |fs_list_| 
  for eventual consumption.
  The neat thing about this algorithm is the 1st element in the |fs_list_| is 
  only visited!
  It's a singular point of evaluation that is thrown out to be replaced by its
  next in line element: ahh the bank queue and the teller.\fbreak
  \fbreak
  Due to |cweave| irregularities in formatting C++ code of this grammar, 
  please see |o2externs| documentation 
  where the routines |GEN_FS_OF_RULE| is coded an external 
to overcome this deficiency.
@/

fsm	
(fsm-id "first_set_rules.lex"
,fsm-filename first_set_rules
,fsm-namespace NS_first_set_rules
,fsm-class Cfirst_set_rules{
  user-prefix-declaration
#include "o2_externs.h"
  ***
  user-declaration
    public:
	rule_def* rule_def_;
  ***
  op
    rule_def_ = 0;
  ***
}
,fsm-version "1.0",fsm-date "26 sept. 2005",fsm-debug "false"
,fsm-comments "Determine first set per rule.")
@"/usr/local/yacco2/compiler/grammars/yacco2_T_includes.T"

rules{
Rfirst_set_rules  
(){
  -> Rrules eog
}

Rrules  (){
  -> Rrule
  -> Rrules Rrule
}

Rrule  (){
  -> Rrule_def {
    op
      Cfirst_set_rules* fsm = (Cfirst_set_rules*)rule_info__.parser__->fsm_tbl__;
      GEN_FS_OF_RULE(fsm->rule_def_);
    ***  
  } 
}

Rrule_def  (){
  -> "rule-def" {
  /@
  Initialize for its subrule findings.
  @/ 
  op
    Cfirst_set_rules* fsm = (Cfirst_set_rules*)rule_info__.parser__->fsm_tbl__;
    fsm->rule_def_ = sf->p1__;
  ***
  }
}

}// end of rules