danbikel.parser.english
Class WordFeatures
java.lang.Object
danbikel.parser.lang.AbstractWordFeatures
danbikel.parser.english.WordFeatures
- All Implemented Interfaces:
- WordFeatures, Serializable
public class WordFeatures
- extends AbstractWordFeatures
WordFeatures are orthographic and morphological features of
words. Specifically, the word features encoded by the methods of this class
are:
- capitalization
- hyphenization
- inflection
- derivation
- numeric
The features are encoded into a single symbol of the form:
CcHhIiDdNn, where c encodes capitalization, h
encodes hyphenization, i encodes inflection, d encodes
derivation and n encodes the numeric feature. For example,
"C3H0I0D3N0" encodes the features for the word
"Geography" (that is, non-sentence-initial capitalized,
no hyphenization, no inflection, "graphy" derivation and
non-numeric).
- See Also:
- Serialized Form
Field Summary |
static String |
useUnderscoresProperty
The property obtained from the Settings class to indicate
whether or not to consider underscores when creating the feature vector. |
Constructor Summary |
WordFeatures()
Constructs a new instance of this class for deterministically mapping
English words to word-feature vectors. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
useUnderscoresProperty
public static final String useUnderscoresProperty
- The property obtained from the
Settings
class to indicate
whether or not to consider underscores when creating the feature vector.
- See Also:
- Constant Field Values
WordFeatures
public WordFeatures()
- Constructs a new instance of this class for deterministically mapping
English words to word-feature vectors.
features
public Symbol features(Symbol word,
boolean firstWord)
- Returns the features of a word.
- Specified by:
features
in interface WordFeatures
- Overrides:
features
in class AbstractWordFeatures
- Parameters:
word
- the word.firstWord
- indicates whether word
is the first word
of the sentence in which it occurs
- Returns:
- the encoded feature symbol.
- See Also:
AbstractWordFeatures.unknownWordSym
defaultFeatureVector
public Symbol defaultFeatureVector()
- Description copied from class:
AbstractWordFeatures
- The symbol that represents the case where none of the features fires
for a particular word.
- Specified by:
defaultFeatureVector
in interface WordFeatures
- Specified by:
defaultFeatureVector
in class AbstractWordFeatures
Author: Dan Bikel.