# Recognizer for Simple Mathematical Expressions
# H. Conrad Cunningham
# 24 September 2006

# This program provides a recognizer for a simple expression language with
# the following grammar.

# BNF for expression
#	expression ::= term    |  term addOp expression
#	term	   ::= factor  |  factor mulOp term
#	factor	   ::= number  |  identifier  |  ( expression )

# Lexical symbols
#	addOp;	    char "+" or "-"
#	mulOp:	    char "*" or "/"
#	identifier: first char alphabetic, all alphanumerics and
#		    underscores that follow    
#	number:	    first char numeric, all numerics that follow
#	space:	    space characters (blanks, tabs, newlines, etc.)
#		    occur anywhere except in identifiers and numbers

# The program is designed and implemented using an ad hoc recursive descent
# parser for the grammar.  For each nonterminal of the grammar, we introduce
# a method whose task is to recognize valid instances of that nonterminal.
# A method calls the appropriate methods to recognize the nonterminals or
# terminal symbols that it can generate in the grammar.

# This program is adapted from one the author wrote in the Haskell
# functional programming language in November 1994.  Hence, this
# implementation is basically a functional program encapsulated within a
# Ruby class.

# The program could perhaps be optimized somewhat by making the token array
# an instance attribute and changing the code to mutate the array directly
# rather than creating copies.

# Design and implementation issues:
#     - little testing has been done so far

class RecogExpr

  # Regular expression constants for lexical processing

  LEXPAT = /([A-Za-z_]\w*)|(\d+)|(\S)/
  IDENT  = /([A-Za-z_]\w*)/ 
  NUM    = /(\d+)/
  ADDOP  = /([+\-])/
  MULOP  = /([*\/])/

# At initialization, store the expression string.

  def initialize(xs)
    if xs.respond_to? :to_s  # If it is String-like, then store string
      @exp = xs.to_s
    else                     # default is zero-length string, not nil
      @exp = ""
    end
  end

# Method "valid?"  returns "true" if and only if the stored expression
# string is an acceptable expression according to the grammar.

  def valid?
    exprRes = expr(lex)
    exprRes[0] && exprRes[1].empty?
  end

  private  # make all methods that follow private to this class

# Method "lex" takes the stored expression string and returns the
# corresponding array of lexical tokens.  Except for spaces, identifiers,
# and numbers, each character is considered a token.

# The lex implementation breaks up the stored expression string using
# String#scan.  The regular expression passed to the method scan has three
# groups: one for identifiers, one for numbers, and one everything else as a
# single character.  Method scan returns an array or arrays of the matches.
# Each of the elements of the outer array holds a match of the whole regular
# expression in order from left to right in the expression string.  Each
# inner array has three elements corresponding to the three different groups
# in the regular expression, left to right.  The element corresponding to
# the group matched will be set to the matched text; the others are nil.
# This method removes all the nils and flattens the array of arrays into a
# single array.

  def lex
    (@exp.scan(LEXPAT).map {|ta| ta.compact}).flatten
  end

# Method "expr" represents the "expression" nonterminal in the grammar.  The
# method takes a token array and returns a result pair.  The first element
# of the pair is "true" if and only if an expression is recognized at the
# beginning of the token array.  If the first element is "true", then the
# second element of the pair is the token array remaining after the
# expression is removed.  Otherwise, the second element is the token array
# remaining at the point an error is discovered.

  def expr(toks)
    if toks.empty?
      return [false,toks]
    end
    tRes = term(toks)
    ntoks = tRes[1].size
    case
    when ntoks == 0                     : tRes
    when tRes[0] && tRes[1][0] == ")"   : tRes
    when tRes[0] && tRes[1][0] =~ ADDOP : expr(tRes[1][1...ntoks])
    else                                  [false,tRes[1]]
    end
    # Note that a pair is returned by each branch, hence by the method.
  end

# Method "term" represents the "term" nonterminal in the grammar.  The
# method takes a token array and returns a result pair.  The first element
# of the pair is "true" if and only if a term is recognized at the beginning
# of the token array.  If the first element is "true", then the second
# element of the pair is the token array remaining after the term is
# removed.  Otherwise, the second element of the pair is the token array
# remaining at the point an error is discovered.

  def term(toks)
    if toks.empty?
      return [false,toks]
    end
    fRes = factor(toks)
    ntoks = fRes[1].size
    case
    when ntoks == 0                     : fRes
    when fRes[0] && fRes[1][0] == ")"   : fRes
    when fRes[0] && fRes[1][0] =~ ADDOP : fRes
    when fRes[0] && fRes[1][0] =~ MULOP : term(fRes[1][1...ntoks])
    else                                  [false,fRes[1]]
    end
  end

# Methods "factor" and "nestexpr" represent the "factor" nonterminal in the
# grammar.

# Method "factor" takes a token array and returns a result pair.  The first
# element of the pair is "true" if and only if a factor is recognized at the
# beginning of the token array.  If the first element is "true", then the
# second element is the token array remaining after the factor is removed.
# Otherwise, the second element is the token array remaining at the point an
# error is discovered.

  def factor(toks)
    ntoks = toks.size
    if ntoks == 0
      return [false,toks]
    end
    case
    when toks[0] =~ IDENT : [true,toks[1...ntoks]]
    when toks[0] =~ NUM   : [true,toks[1...ntoks]]
    when toks[0] == "("   : nestexpr(toks[1..ntoks])
    else                    [false,toks]
    end
  end

# Method "nestexpr" recognizes a nested expression and its closing
# parenthesis.

  def nestexpr(toks)
    if toks.empty?
      return [false,toks]
    end
    eRes = expr(toks)
    ntoks = eRes[1].size
    case
    when ntoks == 0                   : [false,[]]
    when eRes[0] && eRes[1][0] == ")" : [true,eRes[1][1...ntoks]]
    else                                [false,eRes[1]]
    end
  end

end