Parsing Engine

danbikel.parser.util
Class Util

java.lang.Object
  extended by danbikel.parser.util.Util

public class Util
extends Object

Contains basic utility functions for Sexp objects that represent parse trees.

Author:
Dan Bikel

Method Summary
static
<K,V> void
addToValueSet(Map<K,Set<V>> map, K key, V value)
          Adds value to the set that is the vale of key in map; creates this set if a mapping doesn't already exist for key.
static Sexp collectLeaves(Sexp tree)
          Returns a SexpList that contains all the leaves of the specified parse tree.
static CountsTable collectNonterminals(CountsTable counts, Sexp tree, boolean includeTags)
          Adds the nonterminals in the specified tree to the specified set.
static Sexp collectTaggedWords(Sexp tree)
          Returns a SexpList that contains all the words of the specified parse tree as well as their part of speech tags, where each word is its own SexpList of the form (word (tag)).
static CountsTable collectTags(CountsTable counts, Sexp tree)
          Adds the part of speech tags in the specified tree to the specified set.
static ArrayList collectWordObjects(Sexp tree)
           
static SexpTokenizer ibmTokenizer(Reader inStream, boolean comments)
          Returns a new SexpTokenizer instance where the “ordinary characters’ (metacharacters) are '[' and ']'.
static Sexp ibmToPenn(Sexp sexp)
          A utility method that converts the specified IBM-format tree to a Penn Treebank–format tree.
static String prettyPrint(Sexp tree)
          Returns a string containing the pretty-printed version of the specified parse tree.
static Sexp readIbmTree(SexpTokenizer tok)
          Returns the S-expression for the IBM-format parse tree in the stream to be tokenized by the specified SexpTokenizer instance.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

collectLeaves

public static Sexp collectLeaves(Sexp tree)
Returns a SexpList that contains all the leaves of the specified parse tree.

Parameters:
tree - the tree from which to collect leaves (words)
Returns:
a list of the words contained in the specified tree

collectTaggedWords

public static Sexp collectTaggedWords(Sexp tree)
Returns a SexpList that contains all the words of the specified parse tree as well as their part of speech tags, where each word is its own SexpList of the form (word (tag)).

Parameters:
tree - the tree from which to collect tagged words
Returns:
a list of tagged words from the specified tree

collectWordObjects

public static ArrayList collectWordObjects(Sexp tree)

collectNonterminals

public static CountsTable collectNonterminals(CountsTable counts,
                                              Sexp tree,
                                              boolean includeTags)
Adds the nonterminals in the specified tree to the specified set.

Parameters:
counts - the counts table to which to add the nonterminals present in the specified tree
tree - the tree from which to collect nonterminals
includeTags - indicates whether to treat part of speech tags as nonterminals
Returns:
the specified counts table, modified to contain the counts of the nonterminals present in the specified tree

collectTags

public static CountsTable collectTags(CountsTable counts,
                                      Sexp tree)
Adds the part of speech tags in the specified tree to the specified set.

Parameters:
counts - the counts table to which to add the tags present in the specified tree
tree - the tree from which to collect part of speech tags
Returns:
the specified counts table, modified to contain the counts of the part of speech tags present in the specified tree

prettyPrint

public static String prettyPrint(Sexp tree)
Returns a string containing the pretty-printed version of the specified parse tree.

Parameters:
tree - the tree to pretty-print
Returns:
a string containing the pretty-printed version of the specified parse tree.

addToValueSet

public static final <K,V> void addToValueSet(Map<K,Set<V>> map,
                                             K key,
                                             V value)
Adds value to the set that is the vale of key in map; creates this set if a mapping doesn't already exist for key.

Parameters:
map - the map to be updated
key - the key in map whose value set is to be updated
value - the value to be added to key's value set

ibmTokenizer

public static SexpTokenizer ibmTokenizer(Reader inStream,
                                         boolean comments)
Returns a new SexpTokenizer instance where the “ordinary characters’ (metacharacters) are '[' and ']'.

Parameters:
inStream - the character stream from which to read IBM-format S-expressions/trees
comments - whether semicolon-delimited line comments are allowed
Returns:
a new SexpTokenizer instance capable of reading IBM-format S-expressions/trees

readIbmTree

public static Sexp readIbmTree(SexpTokenizer tok)
                        throws IOException
Returns the S-expression for the IBM-format parse tree in the stream to be tokenized by the specified SexpTokenizer instance. It is exptected that the SexpTokenizer instnace was constructed using the ibmTokenizer(java.io.Reader, boolean) static factory method, or equivalently uses '[' and ']' as metacharacters.

Parameters:
tok - the S-expression tokenizer where '[' and ']' are metacharacters
Returns:
the S-expression for the IBM-format tree contained in the stream wrapped by the specified tokenizer
Throws:
IOException - if there is a problem reading from the underlying character stream wrapped by the specified tokenizer

ibmToPenn

public static Sexp ibmToPenn(Sexp sexp)
A utility method that converts the specified IBM-format tree to a Penn Treebank–format tree.

Parameters:
sexp - the IBM-format tree to convert
Returns:
a “standard” Penn Treebank–format tree

Parsing Engine

Author: Dan Bikel.