Parsing Engine

danbikel.parser.lang
Class AbstractHeadFinder

java.lang.Object
  extended by danbikel.parser.lang.AbstractHeadFinder
All Implemented Interfaces:
HeadFinder, Serializable
Direct Known Subclasses:
BrokenHeadFinder, HeadFinder, HeadFinder, HeadFinder

public abstract class AbstractHeadFinder
extends Object
implements HeadFinder, Serializable

Provides a default abstract implementation of the HeadFinder interface. Subclasses are encouraged to make use of the defaultFindHead(Symbol,SexpList) method, which finds a head according to head rules that are gotten from a resource specified in the parser's settings file.

See Also:
Serialized Form

Nested Class Summary
protected static class AbstractHeadFinder.HeadFindInstruction
          Data structure for specifying a way to search for a head in a grammar production: a set of symbols to scan for and the direction of that scan.
 
Field Summary
protected static Symbol defaultSym
          The wildcard character used in default head-finding instructions.
protected static String fallbackDefaultHeadTableResource
          The fallback default in case the property for specifying the path to the head table file or resource is not set.
protected  HashMap headFindInstructions
          The map of parent nonterminals to their arrays of AbstractHeadFinder.HeadFindInstruction.
static String headSuffix
          The augmentation for new head nodes added by addHeadInformation(danbikel.lisp.Sexp).
protected static boolean LEFT
          Constant to indicate a left-to-right scan (makes for more readable code for this class and its subclasses).
protected static Symbol leftSym
          The character from a head table's head-finding instruction that indicates a left-to-right scan.
static String postHeadSuffix
          The augmentation for new post-head nodes added by addHeadInformation(danbikel.lisp.Sexp).
static String preHeadSuffix
          The augmentation for new pre-head nodes added by addHeadInformation(danbikel.lisp.Sexp).
protected  double probRandom
          The probability that the head child of a production will be chosen at random.
protected  Random rand
          This class’ random number generator.
protected static boolean RIGHT
          Constant to indicate a right-to-left scan (makes for more readable code for this class and its subclasses).
protected static Symbol rightSym
          The character from a head table's head-finding instruction that indicates a right-to-left scan.
protected  boolean useRand
          Set to true if probRandom is greater than 0.0; otherwise, set to false.
protected  boolean warnDefaultRule
          The value of Settings.headFinderWarnDefaultRule, cached here for readability and convenience.
 
Constructor Summary
protected AbstractHeadFinder()
          Constructs a head-finding object, getting the name of the head table from the value of Settings.get(Settings.headTablePrefix + language), where language is the value of Settings.get(Settings.language).
  AbstractHeadFinder(Sexp headTableSexp)
          Constructs a head-finding object with the specified head table.
 
Method Summary
 Sexp addHeadInformation(Sexp tree)
          Perform head-finding in tree, augmenting nodes that are the head children of their respective parents.
protected  int defaultFindHead(Symbol lhs, SexpList rhs)
          Provides a default mechanism to use the head table to find a head in the specified grammar production.
 int findHead(Sexp tree)
          Finds the head for the production at the root of the specified subtree.
abstract  int findHead(Sexp tree, Symbol lhs, SexpList rhs)
          Finds the head for the grammar production lhs -> rhs.
 String headSuffix()
          Returns the string "-HEAD".
protected  void readHeadTable(Sexp headTableSexp)
          Reads the head table contained in the specified S-expression.
protected  int scan(boolean direction, SexpList rhs, Symbol[] matchTags)
          Scans the RHS of a production in the specified direction.
protected  int scanLeftToRight(SexpList rhs, Symbol[] matchTags)
          Scans the RHS of a production from left to right, returning the index of the first nonterminal that is in the matchTags array.
protected  int scanRightToLeft(SexpList rhs, Symbol[] matchTags)
          Scans the RHS of a production from right to left, returning the index of the first nonterminal that is in the matchTags array.
protected  boolean tagMatches(Symbol tag, Symbol[] matchTags)
          A helper method that returns true if any of the nonterminals in matchTags is tag and returns false otherwise.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

fallbackDefaultHeadTableResource

protected static final String fallbackDefaultHeadTableResource
The fallback default in case the property for specifying the path to the head table file or resource is not set.

See Also:
Constant Field Values

LEFT

protected static final boolean LEFT
Constant to indicate a left-to-right scan (makes for more readable code for this class and its subclasses).

See Also:
Constant Field Values

RIGHT

protected static final boolean RIGHT
Constant to indicate a right-to-left scan (makes for more readable code for this class and its subclasses).

See Also:
Constant Field Values

defaultSym

protected static final Symbol defaultSym
The wildcard character used in default head-finding instructions.

See Also:
readHeadTable(danbikel.lisp.Sexp)

leftSym

protected static final Symbol leftSym
The character from a head table's head-finding instruction that indicates a left-to-right scan.

See Also:
readHeadTable(danbikel.lisp.Sexp)

rightSym

protected static final Symbol rightSym
The character from a head table's head-finding instruction that indicates a right-to-left scan.

See Also:
readHeadTable(danbikel.lisp.Sexp)

preHeadSuffix

public static final String preHeadSuffix
The augmentation for new pre-head nodes added by addHeadInformation(danbikel.lisp.Sexp).

See Also:
Constant Field Values

postHeadSuffix

public static final String postHeadSuffix
The augmentation for new post-head nodes added by addHeadInformation(danbikel.lisp.Sexp).

See Also:
Constant Field Values

headSuffix

public static final String headSuffix
The augmentation for new head nodes added by addHeadInformation(danbikel.lisp.Sexp).

See Also:
Constant Field Values

headFindInstructions

protected HashMap headFindInstructions
The map of parent nonterminals to their arrays of AbstractHeadFinder.HeadFindInstruction. When a head is being found, each HeadFindInstruction is applied in order until one succeeds.

See Also:
readHeadTable(danbikel.lisp.Sexp)

warnDefaultRule

protected boolean warnDefaultRule
The value of Settings.headFinderWarnDefaultRule, cached here for readability and convenience.


probRandom

protected double probRandom
The probability that the head child of a production will be chosen at random.

See Also:
Settings.headFinderRandomProb

useRand

protected boolean useRand
Set to true if probRandom is greater than 0.0; otherwise, set to false.


rand

protected Random rand
This class’ random number generator.

Constructor Detail

AbstractHeadFinder

protected AbstractHeadFinder()
                      throws IOException,
                             FileNotFoundException
Constructs a head-finding object, getting the name of the head table from the value of Settings.get(Settings.headTablePrefix + language), where language is the value of Settings.get(Settings.language). The named head table is searched for in the locations that are searched by the method Settings.getFileOrResourceAsStream(Class,String). See readHeadTable(danbikel.lisp.Sexp) for a BNF description of the syntax of head table files.

This constructor will necessarily be invoked whenever the default constructor of a language-specific HeadFinder is invoked, which is done upon initialization of the Language class.

Throws:
IOException
FileNotFoundException
See Also:
readHeadTable(danbikel.lisp.Sexp), Language, Settings.getFileOrResourceAsStream(java.lang.Class, java.lang.String)

AbstractHeadFinder

public AbstractHeadFinder(Sexp headTableSexp)
Constructs a head-finding object with the specified head table.

Method Detail

defaultFindHead

protected int defaultFindHead(Symbol lhs,
                              SexpList rhs)
Provides a default mechanism to use the head table to find a head in the specified grammar production.

N.B.: The list rhs is, as are all SexpList objects, 0-indexed; however, the return value of this method is meant to map to the children of a node in an S-expression tree, where child indices begin at 1. Therefore, this method returns a 1-based index.

Parameters:
lhs - the left-hand side of a grammar production
rhs - a list of symbols representing the right-hand side of a grammar production
Returns:
the 1-based index of the nonterminal in rhs that is the head, according to the head table of this head-finding object
See Also:
Settings.headFinderWarnDefaultRule

findHead

public int findHead(Sexp tree)
Finds the head for the production at the root of the specified subtree. The general contract of this method is to extract the root nonterminal label of the specified tree, create a list of the child nonterminal labels and call findHead(Sexp,Symbol,SexpList).
Efficiency note: This default implementation creates a new SexpList containing the labels of the children of the root of tree and then calls findHead(Sexp,Symbol,SexpList).

Specified by:
findHead in interface HeadFinder
Parameters:
tree - the subtree for whose root production to find the head
Returns:
the 1-based index of the head child of the production at the root of the specified subtree
See Also:
findHead(Sexp,Symbol,SexpList)

findHead

public abstract int findHead(Sexp tree,
                             Symbol lhs,
                             SexpList rhs)
Finds the head for the grammar production lhs -> rhs. This method may destructively modify rhs.

Specified by:
findHead in interface HeadFinder
Parameters:
tree - the original subtree in which to find the head child, or null if the subtree is not available
lhs - the nonterminal label that is the left-hand side of a grammar production
rhs - a list of symbols that is the right-hand side of a grammar production
Returns:
the 1-based index of the head child in rhs

addHeadInformation

public Sexp addHeadInformation(Sexp tree)
Perform head-finding in tree, augmenting nodes that are the head children of their respective parents. This method is useful for head-finding debugging.

Specified by:
addHeadInformation in interface HeadFinder
Returns:
a reference to the modified tree object
See Also:
headSuffix()

headSuffix

public String headSuffix()
Returns the string "-HEAD". If this conflicts with an existing nonterminal augmentation for a particular treebank or an augmentation added by Training during preprocessing, this method should be overridden.

Specified by:
headSuffix in interface HeadFinder
Returns:
the string "-HEAD"

readHeadTable

protected void readHeadTable(Sexp headTableSexp)
Reads the head table contained in the specified S-expression. The format for the head table is as follows:
(<headrule>+)
where
<headrule>::= (<parent> <instruction>+)
<parent>::= the parent whose head child is to be found or the symbol "*"
(which indicates a default rule)
<instruction>::= (<direction> <scanelement>*)
<direction>::= l | r
where l indicates left-to-right, r right-to-left
<scanelement>::= a child nonterminal label to scan for in the specified direction
For a particular <parent>, each <instruction> is applied in the order in which it appears in the head table, looking for the first element of the right-hand side of a grammar production that is a member of the scanelement list, and the head is the result of the first instruction that succeeds. If the scanelement list is empty, then the rule indicates to choose the first element in its scan (a default instruction). If none of the head-finding instructions succeeds for a particular parent, an implicit default instruction is added that is of the form (d), where d is the direction specified by the last instruction for the parent.

Bugs: We eventually need to replace default instructions such as (l) with a syntax that includes the wildcard character, such as (l *), which will be more consistent with the default rule, where the wildcard matches any parent. Also, we should possibly require default instructions, instead of allowing them to be implicit.

See Also:
findHead(Sexp,Symbol,SexpList)

scanLeftToRight

protected int scanLeftToRight(SexpList rhs,
                              Symbol[] matchTags)
Scans the RHS of a production from left to right, returning the index of the first nonterminal that is in the matchTags array.

Parameters:
rhs - a list of symbols representing the right-hand side (RHS) of a grammar production
matchTags - an array of symbols representing the set of nonterminal labels to try to match in rhs
Returns:
the index of the first nonterminal in rhs that is in the set of nonterminals contained in matchTags when scanning from left to right, or -1 if there is no such matching nonterminal

scanRightToLeft

protected int scanRightToLeft(SexpList rhs,
                              Symbol[] matchTags)
Scans the RHS of a production from right to left, returning the index of the first nonterminal that is in the matchTags array.

Parameters:
rhs - a list of symbols representing the right-hand side (RHS) of a grammar production
matchTags - an array of symbols representing the set of nonterminal labels to try to match in rhs
Returns:
the index of the first nonterminal in rhs that is in the set of nonterminals contained in matchTags when scanning from right to left, or -1 if there is no such matching nonterminal

scan

protected int scan(boolean direction,
                   SexpList rhs,
                   Symbol[] matchTags)
Scans the RHS of a production in the specified direction.

Parameters:
direction - the direction of scan: LEFT for left-to-right, RIGHT for right-to-left.
rhs - a list of symbols representing the right-hand side (RHS) of a grammar production
matchTags - an array of symbols representing the set of nonterminal labels to try to match in rhs
Returns:
the index of the first nonterminal in rhs that is in the set of nonterminals contained in matchTags, or -1 if there is no such matching nonterminal

tagMatches

protected boolean tagMatches(Symbol tag,
                             Symbol[] matchTags)
A helper method that returns true if any of the nonterminals in matchTags is tag and returns false otherwise.

Parameters:
tag - the tag to match
matchTags - an array of symbols

Parsing Engine

Author: Dan Bikel.