|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectdanbikel.parser.util.Util
public class Util
Contains basic utility functions for Sexp
objects that
represent parse trees.
Method Summary | ||
---|---|---|
static
|
addToValueSet(Map<K,Set<V>> map,
K key,
V value)
Adds value to the set that is the vale of key
in map ; creates this set if a mapping doesn't already
exist for key . |
|
static Sexp |
collectLeaves(Sexp tree)
Returns a SexpList that contains all the leaves of the
specified parse tree. |
|
static CountsTable |
collectNonterminals(CountsTable counts,
Sexp tree,
boolean includeTags)
Adds the nonterminals in the specified tree to the specified set. |
|
static Sexp |
collectTaggedWords(Sexp tree)
Returns a SexpList that contains all the words of the
specified parse tree as well as their part of speech tags, where each
word is its own SexpList of the form (word (tag)). |
|
static CountsTable |
collectTags(CountsTable counts,
Sexp tree)
Adds the part of speech tags in the specified tree to the specified set. |
|
static ArrayList |
collectWordObjects(Sexp tree)
|
|
static SexpTokenizer |
ibmTokenizer(Reader inStream,
boolean comments)
Returns a new SexpTokenizer instance where the “ordinary
characters’ (metacharacters) are '[' and ']'. |
|
static Sexp |
ibmToPenn(Sexp sexp)
A utility method that converts the specified IBM-format tree to a Penn Treebank–format tree. |
|
static String |
prettyPrint(Sexp tree)
Returns a string containing the pretty-printed version of the specified parse tree. |
|
static Sexp |
readIbmTree(SexpTokenizer tok)
Returns the S-expression for the IBM-format parse tree in the stream to be tokenized by the specified SexpTokenizer instance. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public static Sexp collectLeaves(Sexp tree)
SexpList
that contains all the leaves of the
specified parse tree.
tree
- the tree from which to collect leaves (words)
public static Sexp collectTaggedWords(Sexp tree)
SexpList
that contains all the words of the
specified parse tree as well as their part of speech tags, where each
word is its own SexpList
of the form (word (tag)).
tree
- the tree from which to collect tagged words
public static ArrayList collectWordObjects(Sexp tree)
public static CountsTable collectNonterminals(CountsTable counts, Sexp tree, boolean includeTags)
counts
- the counts table to which to add the nonterminals present in
the specified treetree
- the tree from which to collect nonterminalsincludeTags
- indicates whether to treat part of speech tags
as nonterminals
public static CountsTable collectTags(CountsTable counts, Sexp tree)
counts
- the counts table to which to add the tags present in the
specified treetree
- the tree from which to collect part of speech tags
public static String prettyPrint(Sexp tree)
tree
- the tree to pretty-print
public static final <K,V> void addToValueSet(Map<K,Set<V>> map, K key, V value)
value
to the set that is the vale of key
in map
; creates this set if a mapping doesn't already
exist for key
.
map
- the map to be updatedkey
- the key in map
whose value set is to be updatedvalue
- the value to be added to key
's value setpublic static SexpTokenizer ibmTokenizer(Reader inStream, boolean comments)
SexpTokenizer
instance where the “ordinary
characters’ (metacharacters) are '[' and ']'.
inStream
- the character stream from which to read IBM-format
S-expressions/treescomments
- whether semicolon-delimited line comments are allowed
SexpTokenizer
instance capable of reading IBM-format
S-expressions/treespublic static Sexp readIbmTree(SexpTokenizer tok) throws IOException
SexpTokenizer
instance. It is exptected
that the SexpTokenizer
instnace was constructed using the ibmTokenizer(java.io.Reader, boolean)
static factory method, or
equivalently uses '[' and ']' as metacharacters.
tok
- the S-expression tokenizer where '[' and ']' are metacharacters
IOException
- if there is a problem reading from the underlying
character stream wrapped by the specified tokenizerpublic static Sexp ibmToPenn(Sexp sexp)
sexp
- the IBM-format tree to convert
|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |