% Begin file:  surveyLangFinal.bib
%
% A Little Language for Surveys:  Constructing an Internal DSL in Ruby
% Final manuscript submission
% ACM SouthEast 2008
% Date:   22 February 2008
%
% H. Conrad Cunningham, Professor and Chair
% Department of Computer and Information Science 
% University of Mississippi
% 201 Weir Hall
% University, MS 38677 USA
%
% Voice:  (662) 915-5358
% Fax:    (662) 915-5623
% Email:  cunningham at cs.olemiss.edu
%
%2345678901234567890123456789012345678901234567890123456789012345678901234567890
%
\documentclass{sig-alternate}

\newcommand{\tab}{\hspace*{\tabbingsep}}

\begin{document}

\title{A Little Language for Surveys:  \\
Constructing an Internal DSL in Ruby}

\numberofauthors{1} 
\author{
% 1st. author
\alignauthor
H. Conrad Cunningham\\ \affaddr{Department of Computer and Information
Science}\\ \affaddr{University of Mississippi}\\ \affaddr{University,
MS 38677 USA}\\ \email{cunningham@cs.olemiss.edu}
% % 2nd. author
% \alignauthor
% G.K.M. Tobin\titlenote{The secretary disavows
% any knowledge of this author's actions.}\\
%        \affaddr{Institute for Clarity in Documentation}\\
%        \affaddr{P.O. Box 1212}\\
%        \affaddr{Dublin, Ohio 43017-6221}\\
%        \email{webmaster@marysville-ohio.com}
}
\date{21 February 2008}

\conferenceinfo{ACM SE}{'08, March 28-29, 2008, Auburn, Alabama, USA}
\CopyrightYear{2008}

\bibliographystyle{abbrv}

\maketitle
\begin{abstract}
Using a problem domain motivated by Bentley's ``Little Languages''
column \cite{bentley-1986}, this paper explores the use of the Ruby
programming language's flexible syntax, dynamic nature, and reflexive
metaprogramming facilities to implement an internal domain-specific
language (DSL) for surveys.
\end{abstract}

\category{D.3.2}{Programming Languages}{Language Classifications}[Specialized application languages]

\terms{Design, Languages}

\keywords{Domain specific language, Ruby, reflexive metaprogramming}

\section{Introduction}\label{sec:intro}

Hudak defines a \emph{domain-specific language} (or DSL) as ``a
programming language tailored to a particular application domain''
\cite{hudak-1998}. DSLs are usually not general-purpose languages; they
instead ``trade generality for expressiveness in a limited domain''
\cite{mernik-2005}. Thus DSLs must be precise in capturing the
semantics of their application areas \cite{hudak-1998}.  They are
usually small, declarative languages targeted at end users or domain
specialists who are not expert programmers
\cite{hudak-1998,vandeursen-2000}.

% In recent years, interest in DSLs has grown.  Some of this interest
% arises from industry, e.g., such languages play a part in Microsoft's
% software factories technology \cite{greenfield-short-2004} and in
% software product lines \cite{coplien-1998}, and some arises from the
% software engineering research community
% \cite{mernik-2005,vandeursen-2000}.

% DSLs have a long history.  Although not called DSLs at the time, their
% design and use go back to the 1950s.  As Ross notes, ``APT was the
% first and most widely used of the special purpose application-oriented
% languages'' \cite{ross-1978}.  APT is a language for
% \emph{Automatically Programming} numerically controlled (NC) machine
% \emph{Tools} used in manufacturing.  It is a textual language that
% enables a programmer to specify the paths that a machine tool needs to
% follow to shape a needed part.  Programs in the language compile into
% a sequence of specialized instructions for NC machine tools.

DSLs, often known as \emph{little languages}, have long been important
in the Unix operating systems community.  For example, in an
influential 1986 column \cite{bentley-1986}, Bentley describes the
little line-drawing language \texttt{pic} and its preprocessors
\texttt{scatter} (a language for drawing scatter plots of
two-dimen\-sional data) and \texttt{chem} (a language for drawing
molecular structures).  He also describes other well-known little
languages that are used to implement
\texttt{pic}: \texttt{lex} for specifying lexical analyzers,
\texttt{yacc} for specifying parsers, and \texttt{make} for specifying
build processes.

% Dropped citation of fowler-2005a below

Fowler classifies DSLs into two styles---external and internal
\cite{fowler-2007}.  An \emph{external DSL} is a language
that is different from the main programming language for an
application, but that is interpreted by or translated into a program
in the main language.  The little languages from the Unix platform are
in this category. The document preparation languages \LaTeX\ and
BibTeX, which the author is using to format this paper, are also
external DSLs.  External DSLs may use ad hoc techniques, such as
hand-coded delimiter-directed or recursive descent parsers
\cite{fowler-2007}, or may use parser-generation tools such as
\texttt{lex} and \texttt{yacc} or ANTLR \cite{parr-2007}.

What Fowler calls an \emph{internal DSL} (and Hudak calls a
domain-specific \emph{embedded} language \cite{hudak-1998}) transforms
the main programming language itself into the DSL.  This is not a new
idea; usage of syntactic macros has been a part of the Lisp tradition
for several decades.  However, the features of a few contemporary
languages offer new opportunities for constructing internal DSLs.  

% % ATTACH TO PARAGRAPH ABOVE
% For example, the rich algebraic type system of the functional programming
% language Haskell has stimulated research on Haskell-based DSLs for
% several domains including reactive animation and music
% \cite{hudak-1998,hudak-2000}.

The rise in popularity of the Ruby programming language
\cite{thomas-2005} and the associated Ruby on Rails web framework
\cite{thomas-dhh-2006} has simulated new interest in DSLs among
practitioners. In the Ruby environment, there is significant interest
in developing internal DSLs \cite{buck-2006,freeze-2006} that are made
possible by the extensive reflexive metaprogramming facilities of Ruby
\cite{carlson-richardson-2006}. One interesting language of this
nature is \texttt{rake}, a build language implemented as a DSL in Ruby
\cite{fowler-2005b}.

This paper takes a problem motivated by Bentley's ``Little Languages'' column 
\cite{bentley-1986}, constructing a little language for surveys,
explores the DSL capabilities of the Ruby language, and designs an
internal DSL for specifying and executing surveys. Section 2 describes
the Ruby facilities for constructing internal DSLs.  Section 3
analyzes the survey problem domain and designs a simple DSL based on
the analysis.  Section 4 sketches the design and implementation of the
survey DSL processor.  Sections 5 and 6 examine this work from a
broader perspective and conclude the paper.


\section{Ruby's Internal DSL Support}\label{sec:ruby}

Ruby is an interpreted, dynamically typed, object-oriented application
programming language with extensive metaprogramming facilities
\cite{carlson-richardson-2006,thomas-2005}.  Its features
are convenient for both defining and implementing DSLs.  Although
those tasks are interrelated, it is useful to examine them separately.

% \subsection{Defining DSLs}

Ruby has two groups of features that especially support \emph{defining}
internal DSLs \cite{buck-2006,freeze-2006}---its flexible syntax and
its support for blocks (i.e., closures).

A flexible syntax is important for making DSLs read in a natural
way. Of special importance is the syntax for method calls because
methods are the primary means for adding operators and
declarators---the verbs---to an internal DSL.  Ruby method calls are
flexible in three key ways: the parentheses enclosing argument lists
are optional, methods may have a variable number of arguments, and the
use of hash data structures in argument lists provides a mechanism
similar to keyword arguments.  The survey DSL exploits the first two
of these features. For example, the Ruby code

\begin{flushleft}
\tab\tab \texttt{question "Male or female?"}
\end{flushleft}

\noindent calls method \texttt{question} with one argument, a string
literal giving the text for a survey question. The method
\texttt{question} is defined with a second, optional parameter giving
the expected number of responses if greater than 1. If the optional
argument is given in a call, then it is separated from the first
argument by a comma.  

Of course, a DSL must also identify the entities to be operated
upon---the nouns---and their attributes---the adjectives. In some
circumstances, the nouns may be identifiers for host language
variables or classes, but, in other circumstances, quoted string
literals may need to be introduced.  Quoted strings, with their
``noisy'' pairs of quotation marks, tend to make the DSL more
difficult to read and write.  Ruby provides a less ``noisy''
alternative, the \emph{symbol}.  A symbol is an identifier-like
textual constant that is preceded by a colon. For example, a
programmer might combine the use of symbols with ``keyword parameters''
in a call such as

\begin{flushleft}
\tab\tab \texttt{question :text\;=>\;"Male or female?", :nresp\;=>\;1}
\end{flushleft}

\noindent where the two ``arguments'' are collected as mappings in
a hash data structure passed as an argument to the method.

Any Ruby method call may have a block attached.  A block (also called
a Ruby \texttt{Proc} or a closure) is group of executable statements
defined in the environment of the caller and passed unevaluated as an
implicit argument to the called method.  The called method may then
execute the block zero or more times, supplying any needed arguments
for each call. The block executes with access to the environment in
which it was defined. As an example, consider the Ruby code

\begin{flushleft}
\tab\tab \texttt{response "Female" \{ @female = true \}}
\end{flushleft}

\noindent that associates a parameterless block with the one-argument
method call \texttt{response}.  When executed, the block sets instance
variable \texttt{@female} to the constant \texttt{true}.  Blocks can
either be enclosed in a pair of braces, as above, or in a
\texttt{do}-\texttt{end} construct, which is better for multi-line blocks.

The block feature is useful for DSL construction in at least two ways
\cite{buck-2006}.  First, a block provides a structuring mechanism for
DSL statements.  The execution of the Ruby (or DSL) statements inside
the block is controlled by the called method.  Second, a block enables
deferred evaluation.  A block can be stored in a data structure or
passed on as an argument to other methods.  The first way is useful in
defining DSLs. Both are useful in implementing DSLs.

% \subsection{Implementing DSLs}

% As an application programming language, Ruby supports many built-in
% features that are quite powerful for implementing DSLs.  It supports
% regular expressions, dynamically sized arrays, hash data structures,
% iterators, and garbage-collected storage management.  Its malleable
% approach to typing, sometimes called \emph{duck typing}
% \cite{thomas-2005}, gives DSL designers more flexibility than
% most static languages do.  The type of a Ruby object is characterized
% by what methods calls it can accept rather than by what class it
% extends.

Ruby's extensive reflexive metaprogramming facilities are especially
important for \emph{implementing} internal DSLs
\cite{buck-2006,freeze-2006}.  \emph{Metaprogramming} is the
capability of a program to manipulate programs as
data. \emph{Reflexive metaprogramming} is the capability of a program
to manipulate its own program structures
\cite{wikipedia-metaprogramming-2007}. Reflexive metaprogramming is 
possible primarily because Ruby is an interpreted language whose
interpreter provides programmers with hooks into its internal
state. Ruby's facilities include the ability to query an object to
determine its methods, instance variables, and class---features that
are available in mainstream languages such as Java---and also more
exotic facilities such as the ability to evaluate strings as code, to
intercept calls to undefined methods, to define new classes and
methods dynamically, and to react to changes in classes and methods
via \emph{callback} methods. Such features have long been a staple of
languages such as Lisp and Smalltalk, but the recent interest in Ruby
has helped renew interest in metaprogramming.

The implementation of the survey DSL uses the following Ruby reflexive
metaprogramming facilities \cite{freeze-2006,thomas-2005}:

\begin{description}

\item[\texttt{obj.instance\_eval(str)}] takes a string \texttt{str}
and executes it as Ruby code in the context of \texttt{obj}.  This
method allows internal DSL code from a string or file to be executed
by the Ruby interpreter.

\item[\texttt{mod.class\_eval(str)}] takes a string \texttt{str} and
executes it as Ruby code in the context of module \texttt{mod}.  This
enables new methods and classes to be declared dynamically in the
running program.

\item[\texttt{obj.method\_missing(sym,*args)}] is invoked when
there is an attempt to call an undefined method with the name
\texttt{sym} and argument list \texttt{args} on the object \texttt{obj}.
This enables the object to take appropriate remedial action.

\item[\texttt{obj.send(sym,*args)}] calls method \texttt{sym} on
object \texttt{obj} with argument list \texttt{args}. In Ruby
terminology, this \emph{sends a message} to the object.

\end{description}

\noindent Now we examine how these language facilities can be used to
define an internal DSL for a nontrivial domain.


\section{Little Language for Surveys}\label{sec:defineDSL}

The problem domain addressed here is similar to the one for Bentley's
``little language for surveys'' \cite{bentley-1986}.  We first analyze
the domain systematically and then use the results to design an
appropriate DSL syntax and semantics.

% \subsection{Domain Analysis}\label{subsec:analyzeDSL}

The domain is a family of applications for administering simple
surveys.  We \emph{analyze the domain} using
\emph{commonality/variability analysis} \cite{coplien-1998} and
produce four outputs.

\begin{description}

\item[Scope:] the boundaries of the domain---what must we address and
what can we ignore.

\item[Terminology:] definitions for the specialized terms, or
concepts, relevant to the domain.

\item[Commonalities:]  the aspects of the family of applications that
do not change from one instance to another. 

\item[Variabilities:] the aspects of the family of applications that
may change from one instance to another.

\end{description}

The \emph{scope} focuses on the definition of a simple survey and its
presentation to individuals in various forms.  We do not address
issues related to tabulation of the survey results.

Within this scope, we identify several specialized \emph{terms}. These
include \emph{survey}, \emph{title}, \emph{question}, and
\emph{response} related to the survey's structure.  To have a scope
similar to Bentley's surveys \cite{bentley-1986}), we need the
concepts of \emph{conditional question}, which may be omitted under
some conditions, and \emph{silent question}, whose result is
calculated from previous responses.  We also have the concept of
\emph{execution} of the survey and its presentation to a
\emph{respondent}.

The \emph{commonalities} we identify include:

\begin{enumerate}

\item A survey has a \emph{title}.

\item A survey consists of a \emph{sequence of questions} that are to be
presented to the respondents.

\item Each question has a \emph{sequence of responses} that can be
chosen by the respondent.

\item A \emph{conditional question} may be omitted based on the
responses to questions earlier in the sequence.

\item The response to a \emph{silent question} is calculated based on
the responses to questions earlier in the sequence.

\item \emph{Execution} of the survey results in presentation of the
appropriate questions and the possible responses to the
\emph{respondent} and the collection of his or her choices.

\end{enumerate}

The \emph{variabilities} we identify include the:

\begin{enumerate}

\item actual texts displayed for the title, questions, and responses

\item number and order of questions within the sequence

\item number of responses required or allowed for a question

\item number and order of responses within a question

\item condition under which a question may be omitted

\item method for calculating the results of a silent question

\item source of the survey specification

\item manner in which the questions are displayed and the
responses collected during execution.

\end{enumerate}

% \subsection{DSL Design}\label{subsec:designDSL}

% Given that the DSL primarily defines a structure for the survey with
% much of the processing implied by the nature of the domain, we adopt
% an approach that is primarily declarative.  
% % ATTACH TO PARAGRAPH BELOW

We use the analysis above to guide our choices for elements of the
\emph{DSL design} \cite{mernik-2005,thibault-1999}.  The terminology
and commonalities suggest the DSL statements and constructs.  The
commonalities also suggest the semantics of the constructs and the
nature of the underlying computational model.  The variabilities
represent syntactic elements to which the survey programmer can assign
values.

To avoid unnecessary language elements, we do not introduce a
construct for the \emph{survey} itself. Instead, we define a survey as
the content of one DSL ``file''. The syntax of this file consists of
one \texttt{title} statement (commonality 1) and a sequence of
questions.  The \texttt{title} statement has the syntax

\begin{flushleft}
\tab\tab \texttt{title \textit{TEXT}}
\end{flushleft}

\noindent where \texttt{title} is a keyword and \texttt{\textit{TEXT}} is the
user-supplied title text (variability 1).  Syntactically, this is a
call of the Ruby method \texttt{title} with one argument.

According to commonality 2 and variability 2, the body of the survey
consists of a user-defined sequence of questions.  We thus introduce a
\texttt{question} statement to denote the basic survey question.  It
has the syntax 

\begin{flushleft}
\tab\tab \texttt{question \textit{TEXT}, \textit{NUM\_RESPONSES} do} \\
\tab\tab\tab\tab \texttt{\textit{QUEST\_BODY}} \\
\tab\tab \texttt{end}
\end{flushleft}

\noindent where the argument \texttt{\textit{TEXT}} gives the text of
the question (variability 1) and optional argument
\texttt{\textit{NUM\_RESPONSES}} (variability 3) gives the number of
responses expected, with a default value of 1. This construct defines
a question in the survey at that point in the sequence (variability
2).  Syntactically, this is a Ruby method call with one required
argument, one optional argument, and an attached \texttt{do-end}
block.

The \texttt{\textit{QUEST\_BODY}} must be structured according to
commonality 3 and variabilities 1 and 4.  That is, the block consists
of a sequence of possible responses.  However, sufficient data must be
captured to enable commonalities 4 and 5 and variabilities 5 and 6 to
be implemented.  We thus structure the \texttt{\textit{QUEST\_BODY}}
as a sequence consisting of an optional \texttt{condition} statement,
some number of \texttt{response} statements, and an optional
\texttt{action} statement.  It has the syntax:

\begin{flushleft}
\tab\tab \texttt{condition \textit{COND\_BLOCK}} \\
\tab\tab \texttt{response \textit{TEXT} \textit{RESP\_BLOCK}} \\
\tab\tab ... \\
\tab\tab \texttt{action \textit{ACTION\_BLOCK}}
\end{flushleft}

\noindent The statements above are Ruby method calls with blocks
attached.  When the system executes the survey, if a
\texttt{condition} statement is present and its
\texttt{\textit{COND\_BLOCK}} evaluates to false, the question is 
omitted (commonality 4).  Otherwise, the system presents the
\texttt{response} texts in the order given (commonality 3).  When the
respondent selects one or more of these (up to the number given on the
\texttt{question} statement), the system executes the corresponding
\texttt{\textit{RESP\_BLOCK}}s.

% % ATTACH TO PARAGRAPH ABOVE
% Note that the blocks associated with the \texttt{action} and
% \texttt{response} statements are imperative; they change the state of
% the executing survey program.

To respond to commonality 5 and variability 6 for silent questions, we
introduce the statement \texttt{result} at the same syntactic level as
\texttt{question}.  However, the \texttt{result} body defines a
sequence of \texttt{alternative} statements (commonality 3) that, when
executed, will be selected based on the previous responses.  The
\texttt{result} has the following syntax, which also responds to
variabilities 1 and 3 for questions:

\begin{flushleft}
\tab\tab \texttt{result \textit{TEXT}, \textit{NUM\_RESPONSES} do} \\
\tab\tab\tab\tab \texttt{condition \textit{COND\_BLOCK}} \\
\tab\tab\tab\tab \texttt{alternative \textit{TEXT} \textit{GUARD\_BLOCK}} \\
\tab\tab\tab\tab ... \\
\tab\tab\tab\tab \texttt{action \textit{ACTION\_BLOCK}} \\
\tab\tab \texttt{end}
\end{flushleft}

\noindent The \texttt{condition} and \texttt{action} statements are
the same as for \texttt{\textit{QUEST\_BODY}}. The
\texttt{alternative} statements execute in the given sequence.  If a
\texttt{\textit{GUARD\_BLOCK}} is omitted or evaluates to true, then
that alternative is chosen. 

The blocks on the \texttt{action} and \texttt{response} statements
consist of Ruby code that creates and modifies instance variables in
the environment in which the survey is executed.  These instance
variables can then be used in the ``boolean'' blocks on the
\texttt{condition} and \texttt{alternative} statements to allow
commonalities 4 and 5 to be realized, in accordance with the
flexibility needed for variabilities 5 and 6.

Commonality 6 defines the basic computational model for execution of a
survey program; variabilities 7 and 8 define flexible aspects that
must exist in the implementation.  Now we look at the implementation
of this internal DSL in Ruby.

\section{Internal DSL Implementation}\label{sec:implementDSL}

% Language processing is often built around a \emph{read-eval-print
% loop} (REPL) \cite{kamin-1990}.  First, the language processor
% ``reads'' and parses an expression into its syntactic components.
% Second, the processor ``evaluates'' the expression to produce a result
% according to the language's semantics.  Third, the processor formats
% and ``prints'' the result of the expression's evaluation.  If all of
% these steps are done during a single pass through the input (e.g., an
% expression in a sequence of expressions is evaluated and written as
% soon as it has been read and parsed), then the language processor is
% \emph{single-pass}.  Otherwise, it is \emph{multipass}.  For complex
% languages, multipass processors are often necessary.  Even if not
% necessary, a two-pass processor often results in a more elegant and
% flexible software design.

The survey DSL could be implemented with a processor that executes
each question-level statement immediately after it has been parsed.
This \emph{single-pass} approach would, however, strongly couple the
evaluation logic with the parsing logic and make support for
variabilities 7 and 8 difficult.

In most cases the use of a \emph{two-pass} architecture is a better
technique. The first pass reads the input, parses it, generates any
needed error messages, and builds the corresponding \emph{abstract
syntax tree} (AST) \cite{fowler-2007,parr-2007}.  The AST is a
tree-like data structure that represents the input expressions in an
abstract form. The second pass takes the AST, presents the questions
to the respondent in the required order, and collects the responses
(commonality 6).  This approach allows any first-pass processor to be
configured with any second-pass processor, thus supporting
variabilities 7 and 8. In this section, we look at the design and
implementation of the AST, first-pass, and second-pass classes.


% \subsection{Abstract Syntax Tree}\label{subsec:AST}
\subsection{DSL Parsing}\label{subsec:AST}\label{subsec:firstPass}

The Ruby implementation of the survey DSL uses several classes to
implement the AST.  At the top (survey) level, the
class \texttt{SurveyRoot} represents the entire survey as specified in
a DSL input file.  It holds the survey title from the
\texttt{title} statement and a sequence of question-level nodes.

At the second (question) level are the ``abstract'' class
\texttt{QuestionLevelNode} and its two subclasses
\texttt{QuestionNode} and \texttt{ResponseNode}.  The subclasses
represent the \texttt{question} and \texttt{result} statements in the
survey DSL.  They store the question text, the guarding condition (if
any) from the associated \texttt{condition} statement, the action (if
any) from the associated \texttt{action} statement, and a sequence of
``responses'' from the associated \texttt{response} or
\texttt{alternative} statements.

The third (response) level consists of the ``abstract'' class
\texttt{ResponseLevelNode} and its two subclasses
\texttt{ResponseNode} and \texttt{AlternativeNode}.  The subclasses
represent the DSL's \texttt{response} and \texttt{alternative}
statements.

% \subsection{First Pass: Parsing}\label{subsec:firstPass}

% \begin{figure}
% \begin{verbatim}
% class SurveyDSL # Object scoping base
%   ...           # uses Context Variable
%   def question(text,*args) 
%   ... 
% end 
% 
% class SurveyBuilder < SurveyDSL 
%   ... 
%   def read_DSL(rb_dsl_file) 
%   ... 
% end 
% 
% class DSLContext # Context Variable
%   ... 
% end 
% \end{verbatim}
% \vspace*{-1.5\baselineskip}
% \caption{First-pass classes\label{fig:firstPass}}
% \vspace*{-.5\baselineskip}
% \end{figure}

% The first-pass classes (shown in Figure~\ref{fig:firstPass})

The \emph{first-pass parser classes} are structured according to
Fowler's \emph{Object Scoping} DSL pattern \cite{fowler-2007} using an
approach Buck calls \emph{sandboxing} \cite{buck-2006}.  The
``abstract'' class \texttt{SurveyDSL} implements the DSL statements as
methods.  Its subclass \texttt{SurveyBuilder} ``evaluates'' the DSL
statements from a DSL input file using its superclass's methods.  This
evaluation parses the DSL input and builds the AST using the node
classes described above. Figure~\ref{fig:question} shows the
\texttt{question} method.

\begin{figure}
\begin{verbatim}
def question(txt,*args) # txt, ns, block
  if @context.level==:survey_level && block_given?
    @context.level = :question_level
    @context.qtype = :question_type
    ns = 1
    ns = args[0].to_i if args.size > 0
    @context.question=QuestionNode.new(txt.to_s,ns)
    yield   # execute block on DSL question call
    @context.survey.add_question(@context.question)
    @context.level    = :survey_level
    @context.qtype    = :no_type
  else
    # output appropriate error messages
  end
  @context.question = nil
end#question
\end{verbatim}
\vspace*{-1.5\baselineskip}
\caption{Method \texttt{SurveyDSL\#question}\label{fig:question}}
\vspace*{-.5\baselineskip}
\end{figure}

\begin{figure}
\begin{verbatim}
def action(&action)
  if @context.level==:question_level && block_given?
      && @context.question.action == nil
    @context.question.action = action
  else
    # output appropriate error messages
  end
end#action
\end{verbatim}
\vspace*{-1.5\baselineskip}
\caption{Method \texttt{SurveyDSL\#action}\label{fig:action}}
\vspace*{-.5\baselineskip}
\end{figure}

The \texttt{read\_DSL} method of class \texttt{SurveyBuilder} reads
the DSL input from a file and evaluates it by calling the method
\texttt{instance\_eval} described in Section~\ref{sec:ruby}.  The
safety of the program is maintained by encapsulating the relatively
unsafe \texttt{instance\_eval} method call within the ``sandbox''
provided by a \texttt{SurveyBuilder} object.

In this design, the \texttt{SurveyDSL} class uses the \emph{Memento}
design pattern \cite{gamma-1995}.  It uses an object of class
\texttt{DSLContext} to store the state of the DSL parser (i.e.,
\texttt{SurveyDSL} object) so that it can be saved and restored.  This
is also what Fowler calls the \emph{Context Variable} DSL pattern
\cite{fowler-2007}. 

The first-pass design separates the DSL implementation methods and the
parser state from the class that executes the DSL input.  This, along
with the filename of the DSL source being a parameter of the
\texttt{read\_DSL} method, provides the flexibility needed for
variability 7.

Because the survey DSL input is just Ruby code, much of the work of
parsing the DSL is done by the Ruby interpreter.  The DSL parser must
verify that the DSL program is syntactically correct and generate the
corresponding AST.  In addition to what DSL method has been called in
\texttt{SurveyDSL}, the parser state can be characterized by four
primary attributes:
% COMPACT LIST
(1) \texttt{survey}, which is a reference to the partially constructed
AST,
%
(4) \texttt{question}, which is a reference to the
\texttt{Question\-Level\-Node} object currently being constructed,
%
(3) \texttt{qtype}, which gives the current type of question-level
statement (none, \texttt{question}, or \texttt{result}) being parsed,
%
and 
(2) \texttt{level}, which identifies the current level of the DSL
syntax (survey, question, or response) being parsed.
% \begin{itemize}
% 
% \item \texttt{survey}, which is a reference to the partially constructed AST
% 
% \item \texttt{level}, which identifies the current level of the DSL
% syntax (survey, question, or response) being parsed
% 
% \item \texttt{qtype}, which gives the current type of question-level
% statement (i.e., none, \texttt{question}, or \texttt{result}) being
% parsed
% 
% \item \texttt{question}, which is a reference to the
% \texttt{Question\-Level\-Node} object currently being constructed.
% 
% \end{itemize}

Suppose the following \texttt{question} statement appears in the DSL
input:

\begin{verbatim}
    question "What is your gender?" do
      response "Female" { @female = true }
      response "Male" { @female = false }
      action { @male = if @female then false 
                                  else true end }
    end
\end{verbatim}

\noindent The \texttt{read\_DSL} method of class
\texttt{SurveyBuilder} reads this text and evaluates it as Ruby code
by calling \texttt{instance\_eval}.  This causes the
\texttt{question} method in class \texttt{SurveyDSL}
(Figure~\ref{fig:question}) to be called with a string argument and an
attached block.  The ``optional'' second argument (i.e., number of
responses to be given) is set to the default value of 1.

If the parser is in the proper state, the \texttt{question} method
processes the DSL statement to create a new \texttt{question} node. It
changes the parser \texttt{level} to question-level and the parser
\texttt{qtype} to question-type and creates a new
\texttt{QuestionNode} to put in the AST. It then invokes the attached
\texttt{do-end} block using the Ruby \texttt{yield} statement.  When
control returns from the \texttt{yield}, the \texttt{question} method
stores the new node in the AST, resets the parser \texttt{level} to
survey-level, and returns control to the executing
\texttt{instance\_eval} call.

The execution of the block attached to the \texttt{question} statement
invokes the \texttt{response} method in \texttt{SurveyDSL} twice and
the \texttt{action} method once.  The \texttt{response} method calls
create the two response-level nodes in the AST.  The
\texttt{action} method call sets the \texttt{QuestionNode}'s action
attribute to the given block. Figure~\ref{fig:action} shows the code
for the \texttt{action} method.

The blocks attached to the \texttt{response} and \texttt{action}
statements are not executed.  Instead, the blocks themselves (i.e.,
the closures) are stored in the AST node for execution in the second
pass.  If the \texttt{question} statement included a
\texttt{condition} statement, it would be processed similarly to the
\texttt{action} statement.  This technique of storing blocks uses what
Fowler calls the \emph{Deferred Evaluation} DSL pattern
\cite{fowler-2007}.

% Although not exercised in the current version, the prototype system
% also implements the DSL methods according to what Fowler calls the
% \emph{Method Chaining} DSL pattern \cite{fowler-2007}.  That is, the
% DSL methods, which are inherently procedures rather than functions,
% explicitly return references to the \texttt{SurveyBuilder} object as
% their return values.  This enables several calls of the methods to be
% chained together in a single expression to achieve a DSL-like effect.
% This technique enables one to design a \emph{fluent interface} to the
% DSL library using Fowler's \emph{Expression Builder} DSL pattern
% \cite{fowler-2007}. The JMock package used in testing Java programs
% \cite{fowler-2007,freeman-pryce-2006} implements this kind of fluent
% interface DSL.


% \subsection{Second Pass: Interpretation}\label{subsec:secondPass}
\subsection{DSL Interpretation}\label{subsec:secondPass}

The interactions between the \emph{second pass} and AST must support
variability 8, enabling different ``interpreters'' in the second-pass
to be configured into the system.  Because the little survey language
has a straightforward syntax and semantics, the prototype design uses
the \emph{Visitor} design pattern \cite{gamma-1995} to structure this
interaction.  A complex language may require more sophisticated
tree-walking logic \cite{parr-2007}.

To implement the Visitor pattern, each AST class must provide a method
\texttt{accept} that takes a \texttt{SurveyVisitor} object.  This method
must call the appropriate visit operations on the Visitor object and
pass the object to next lower level of the AST as needed.  Figures
\ref{fig:acceptRoot} and \ref{fig:acceptQuestion} show the
\texttt{accept} methods of the top-level AST node \texttt{SurveyRoot} and
second-level AST node \texttt{QuestionNode}, respectively.

\begin{figure}
\begin{verbatim}
def accept(survey_visitor)
  @env.survey_title   = @title # used by DSL block
  @env.survey_answers = []     # used by DSL block
  @env.question_num   = 1      # used by DSL block
  survey_visitor.execute_title(@env,self)
  questions.each do |q| 
    q.accept(@env,survey_visitor)
  end
end#accept
\end{verbatim}
\vspace*{-1.5\baselineskip}
\caption{Method \texttt{SurveyRoot\#accept}\label{fig:acceptRoot}}
\vspace*{-.5\baselineskip}
\end{figure}


\begin{figure}
\begin{verbatim}
def accept(env,survey_visitor)
  env.question_text       = @text # used by DSL
  env.question_num_to_sel = @num_to_sel  #  block
  survey_visitor.execute_question(env,self)
  env.question_text       = nil
  env.question_num_to_sel = nil
end#accept
\end{verbatim}
\vspace*{-1.5\baselineskip}
\caption{Method \texttt{QuestionNode\#accept}\label{fig:acceptQuestion}}
\vspace*{-.5\baselineskip}
\end{figure}

The Visitor class must extend the ``abstract'' superclass
\texttt{SurveyVisitor} and override methods \texttt{execute\_question}
and \texttt{execute\_result} and, if the default behavior is not
appropriate, override method \texttt{execute\_title}.  The
\texttt{execute\_*} methods represent the Visitor pattern's visit
operations for the various AST nodes. The prototype provides the
concrete Visitor class \texttt{SurveyInteractiveText} that implements
an interactive, textual user interface using the standard input and
output streams.  Figure \ref{fig:executeQuestion} shows the visit
operation \texttt{execute\_question} from that class.

\begin{figure}
\begin{verbatim}
def execute_question(env,q) 
  if q.condition == nil || q.condition.call
    display_question(env.question_num,q.text)
    resp  = {}; label = 'a'  # labels from 'a'
    q.responses.each do |r|
      display_response(label,r.text)
      resp[label] = [r.action,r.text]
      label       = label.succ
    end
    answers = get_answers(q.num_to_sel,'a'...label)
    env.survey_answers << [env.question_num,answers]
    answers.each do |a| # evaluate selected actions
      env.response_label = a          # used by
      env.response_text  = resp[a][1] #  DSL block
      act = resp[a][0]
      act.call unless act == nil
      env.response_label = nil
      env.response_text  = nil
    end
    q.action.call unless q.action == nil # eval
  else                           
    env.survey_answers << [env.question_num,[]]
  end
  env.question_num += 1
end#execute_question
\end{verbatim}
\vspace*{-1.5\baselineskip}
\caption{\texttt{SurveyInteractiveText\#execute\_question}
\label{fig:executeQuestion}} 
\vspace*{-.5\baselineskip}
\end{figure}

The second pass starts execution by calling the \texttt{accept} method
of the AST's root, passing a \texttt{SurveyInteractiveText} instance.
In Figures~\ref{fig:acceptRoot} and \ref{fig:acceptQuestion}, we see
that this method calls the \texttt{exectute\_title} operation and then
calls the \texttt{accept} operations for each of the question-level
nodes, which, in turn, call the \texttt{execute\_question}
operation. In Figure~\ref{fig:executeQuestion}, we see the
implementation of the desired survey question semantics.  First, the
method checks whether the associated condition is satisfied.  If it is
not satisfied, then the question is skipped.  If it is satisfied, then
the method displays the question text and the possible responses and
gathers the selections from the respondent.

A key aspect of the \texttt{execute\_question} call is the evaluation
of the blocks (i.e., Ruby \texttt{Proc}s or closures) stored in the
\texttt{QuestionNode} for the conditions and actions.  These are
groups of Ruby statements whose executions have been deferred from the
first to the second pass.  The blocks are invoked using the
\texttt{Proc}'s \texttt{call} method.  These stored blocks are
parameterless, but they can create new instance variables in the
\texttt{SurveyBuilder} object in which they are defined.

The Visitor's \texttt{execute\_*} methods must also make the question
number, response label, and the previous responses available to the
executing condition and action blocks.  The second-pass code uses the
\texttt{missing\_method} callback in the \texttt{SurveyBuilder} class
to dynamically create the needed writer and reader methods.  This
method traps calls to undefined writer methods and uses Ruby's
\texttt{class\_eval} facility to create what is needed.  It then uses
Ruby's \texttt{send} method to re-dispatch the writer call to the new
method. 
% Figure~\ref{fig:missingMethod} shows the \texttt{missing\_method} code. 

% \begin{figure}
% \begin{verbatim}
% def method_missing(sym, *args)
%   if @pass == 1 
%     # DSL syntax error, output error messages
%   elsif @pass == 2 
%     str = sym.to_s
%     if str[-1,1] == "="
%       base = str[0..-2].to_sym
%       if self.respond_to? base
%         # output error message
%       else
%         SurveyBuilder.class_eval
%             "attr_accessor :#{base}"
%         send(sym, *args)
%       end
%     else
%       # output appropriate error message
%     end
%     else
%       # output appropriate error messages
%   end
% end#method_missing
% \end{verbatim}
% \vspace*{-1.5\baselineskip}
% \caption{\texttt{SurveyBuilder\#missing\_method} callback
% \label{fig:missingMethod}} 
% \vspace*{-.5\baselineskip}
% \end{figure}

% In Figure~\ref{fig:missingMethod}, note that the
% ABOVE IS ALTERNATIVE TO FIRST WORD BELOW
% The \texttt{missing\_method} callback serves a different purpose in
% the first pass.  It intercepts attempts to call undefined methods from
% inside the DSL input file.  This likely means that a syntax error
% exists in the DSL input.

Of course, class \texttt{SurveyInteractiveText} is only one possible
Visitor class.  Others could be implemented to provide a GUI user
interface or to print a listing of the survey.  Thus the design of the
second pass supports variability 8.

\section{Discussion} 

Drawing on the experience in designing and evolving the JMock DSL,
Freeman and Pryce make four recommendations for constructing a DSL in
Java \cite{freeman-pryce-2006}.  Although Ruby provides better support
for DSL construction than Java, these ideas are still relevant to the
Survey DSL.
% The recommendations are \cite{freeman-pryce-2006}:

% \begin{enumerate}
% 
% \item ``Separate syntax and interpretation into layers.''
% 
% \item ``Use, and abuse, the host language.''
% 
% \item  ``Don't trap the user'' in the internal DSL.
% 
% \item ``Map error reports to the syntax layer.''
% 
% \end{enumerate}

Freeman and Pryce's first recommendation is to ``separate syntax and
interpretation into layers'' \cite{freeman-pryce-2006}.
% In recommendation 1, Freeman and Pryce suggest separating the syntax
% recognition from the execution of the DSL program.
% For JMock, the syntax layer of the DSL pairs a complex set of Java
% interface hierarchies with an API accessed according to the unusual
% Method Chaining and Expression Builder patterns
% \cite{fowler-2007,freeman-pryce-2006}.  However, the interpretation
% layer is a more conventional object-oriented framework.  The oddball
% aspects are separated from the more routine aspects.
% DETACH BELOW AS NEW PARAGRAPH
% The survey DSL work finds this the separation of the language processing
% into two passes to be beneficial.  
The survey DSL work finds this approach to be beneficial.  The first
pass utilizes various unusual Ruby features (e.g., reflexive
metaprogramming) to express and process the little language for
surveys.  Except for the execution of the closures, the second pass is
more conventional, using the Visitor pattern to execute the survey
stored in the AST.  The first pass implementation is relatively
complex; the second pass more straightforward. As with JMock, the
oddball aspects are separated from the more routine aspects.

The second recommendation is to ``use, and abuse, the host language''
\cite{freeman-pryce-2006} to enable the writing of readable DSL programs.
% In recommendation 2, Freeman and Pryce suggest ignoring the
% conventions of the host language if necessary to enable the writing of
% readable DSL programs. 
% For example, JMock uses relatively long chains of method calls,
% including calls to procedure methods that return the modified objects
% as their function values.  Both of these practices violate the normal
% Java programming conventions. However, by violating the conventions in
% a careful and principled way, JMock's design achieves a readable and
% consistent DSL syntax.
% DETACH BELOW AS NEW PARAGRAPH ADDING TRANSITION WORD
% Similarly, the 
The survey DSL uses the flexible syntax of Ruby---optional parentheses
in method calls, variable-length parameter lists, and blocks---to
express the survey programs in a readable, mostly declarative syntax.
The DSL is defined so that its execution as Ruby code gives a shell of
a recursive descent parser for the language. The DSL parser then
leverages the dynamic, reflexive metaprogramming features of Ruby to
recognize the syntax.  Similarly, storing closures for execution in
the second pass makes the implementation of the interpreter
challenging.  Ruby internal DSLs use the distinctive features of Ruby
and perhaps flirt a bit with danger by using the reflexive
metaprogramming facilities.

Freeman and Pryce's third recommendation is ``don't trap the user'' 
in the internal DSL \cite{freeman-pryce-2006}.
% In recommendation 3, Freeman and Pryce suggest providing extension
% points in both the language and in the interpreter.  This is based on
% their experience of getting many user requests for specialized
% features during the five-plus years of JMock's development. 
% Both the syntax and interpretation layers of JMock are implemented as
% frameworks with components that can be replaced by user-built
% alternatives.  JMock seeks to make the internal facilities used in
% implementing the DSL available to the users in a safe way.
% DETACH BELOW AS NEW PARAGRAPH
The survey DSL implementation addresses this issue in a preliminary
way.  The use of the Visitor pattern in the second pass enables users
to write visitors for other purposes.  In the first pass, new
subclasses can also be implemented for \texttt{SurveyDSL} or 
\texttt{SurveyBuilder} to extend the DSL. However, because the
prototype is a relatively course-grained whitebox framework, a
programmer wishing to extend these classes must have considerable
knowledge of the current implementation.  The tight coupling of the
parser classes with the concrete \texttt{DSLContext} class may also
cause some complications.  Clearly, enabling users to extend the
survey DSL is an area for future work.

The fourth recommendation is to ``map error reports to the syntax
layer'' \cite{freeman-pryce-2006} rather than to the interpretation
layer, which is invisible to the DSL user.
% They attack this difficult issue by requiring that all core objects in
% their system be self-describing.  When an error occurs, the system
% collects the descriptions from the objects involved in the error to
% reconstruct which syntax layer features are involved.
% DETACH BELOW AS NEW PARAGRAPH
Although this paper does not describe the error-reporting facilities,
the survey DSL parser does report error messages linked closely to the
input statements.  Giving specific error messages is not trivial given
that the Ruby interpreter does much of the parsing.  The current
design does not provide error messages during execution that tie back
to the syntactic components.  However, the straightforward mapping
from DSL statements to AST nodes should enable such an approach.
This, too, is an area for future work.

% Bentley observes in his ``Little Languages'' column
% \cite{bentley-1986}, ``Like all great software, great little languages
% are grown, not built.''  He suggests starting with a simple language
% and exploring the domain by trying many examples.  He recommends,
% ``After the language is up and running, iterate designs to add
% features as dictated by real use.''  The designers of JMock seem to
% have taken a similar approach during the multi-year evolution of
% JMock.  They did not hesitate to ``refactor mercilessly'' in their
% quest for a better design \cite{freeman-pryce-2006}.

% The development of the survey DSL has been a similar experience for
% the author.  The initial version of the DSL developed in fall 2006 was
% a simple prototype whose purposes were to explore the capabilities of
% Ruby for the construction of internal DSLs and to provide a concrete
% example to present to students in a graduate class.  As the author
% learned more about the capabilities of Ruby, he added new features and
% refactored the simple framework.  The prototype for this paper is the
% result of new work on the DSL a year later, with a more systematic
% attempt to explore the design space using commonality/variability
% analysis.

\section{Conclusion}

This paper relates some of the author's experiences in the systematic
analysis of the survey domain and the use of the analysis to design
and implement a novel domain-specific language (DSL) for expressing
survey programs.  The language is designed as an internal DSL in Ruby
and implemented using Ruby's distinctive metaprogramming features.
Ruby and the analysis, design, and implementation techniques employed
prove to be effective in this instance.  However, further research is
needed to delineate more rigorous techniques for analyzing the domain
and using the analysis results to motivate a Ruby internal DSL design.
In addition, future work should seek to formulate guidelines for
achieving effective and safe Ruby implementations of the resulting DSL
designs.

\section{Acknowledgments} 

The author thanks Chuck Jenkins, Yi Liu, Pallavi Tadepalli, and Jian
Weng for their helpful suggestions.

\bibliography{surveyLang.bib}
% \bibliography{alsace.bib}

\end{document}

% End file:  surveyLangFinal.bib
