|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectdanbikel.parser.Decoder
danbikel.parser.EMDecoder
public class EMDecoder
Provides the methods necessary to perform constrained CKY parsing on input sentences so as to perform the E-step of the Inside-Outside EM algorithm.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class danbikel.parser.Decoder |
---|
Decoder.TimeoutException |
Field Summary | |
---|---|
protected EMChart |
chart
The parsing chart. |
protected double |
cummulativeInsideLogProb
The value of all sentences' inside probabilities in log-space. |
protected CountsTable |
eventCounts
The map of events to their expected counts (cleared after every sentence). |
protected static int |
MAX_UNARY_PRODUCTIONS
A hack to limit the number of unary productions, instead of doing The Right Thing and computing infinite sums for looping derivations, as described by Stolcke (1995) and Goodman (1999). |
protected static double |
probCertain
The value of Constants.probCertain . |
protected static double |
probImpossible
The value of Constants.probImpossible . |
protected Set |
topProbItemsToAdd
A temporary storage area used by addTopUnaries(int) for storing
items to be added to the chart when iterating over a cell in the chart. |
Constructor Summary | |
---|---|
EMDecoder(int id,
DecoderServerRemote server)
Constructs a new decoder that will use the specified DecoderServer to get all information and probabilities
required for decoding (parsing). |
Method Summary | |
---|---|
protected void |
addPretermHeadEvent(EMItem item,
double expectedCount,
CountsTable counts)
Whenever a preterminal is generated, either as a head child or a modifier of some other item, a trivial head-generation event is added, generating the word from the lexicalized preterminal, which by design always generates its head word with probability 1. |
protected List |
addStopProbs(EMItem item,
List itemsAdded,
int level)
|
protected void |
addSynthesizedTopModEvent(TrainerEvent event,
double expectedCount,
CountsTable counts)
Adds an event as though a tree's non-hidden root is a modifier of +TOP+ (in addition to being a head child). |
protected void |
addTopUnaries(int end)
Adds hiden root nonterminal probabilities. |
protected List |
addUnaries(EMItem item,
List itemsAdded,
int level)
|
protected void |
addUnariesAndStopProbs(int start,
int end)
Finds all possible parent-head (or unary) productions using the root node of each existing chart item within the specified span as the head, creates new items based on these existing items, multiplying in the parent-head probability; then, using these new items, this method also creates additional new items in which stop probabilities have been multiplied; all new items are added to the chart. |
protected void |
complete(int start,
int end)
Constructs all possible items spanning the specified indices and adds them to the chart. |
protected CountsTable |
computeEventCounts()
Returns a counts table with the expected couunt of all top-level events produced when constrain-parsing the current sentence. |
protected void |
computeEventCounts(int start,
int end,
double sentenceProbInverse,
CountsTable counts)
Computes expected counts for top-level (maximal context) events produced for the specified span when decoding the current sentence; stores these events and their expected counts in the specified CountsTable object. |
protected void |
computeOutsideProbs()
Computes outside probabilities for the entire chart. |
protected void |
computeOutsideProbs(int start,
int end)
Computes outside probabilities for all derivations in the specified span. |
protected void |
joinItems(EMItem modificand,
EMItem modifier,
boolean side)
Joins two chart items, one representing the modificand that has not yet received its stop probabilities, the other representing the modifier that has received its stop probabilities. |
protected CountsTable |
parseAndCollectEventCounts(SexpList sentence)
Constrain-parses the specified sentence and computes expected top-level (maximal context) event counts. |
protected CountsTable |
parseAndCollectEventCounts(SexpList sentence,
SexpList tags)
Constrain-parses the specified sentence and computes expected top-level (maximal context) event counts. |
protected CountsTable |
parseAndCollectEventCounts(SexpList sentence,
SexpList tags,
ConstraintSet constraints)
Constrain-parses the specified sentence and computes expected top-level (maximal context) event counts. |
protected void |
seedChart(Symbol word,
int wordIdx,
Symbol features,
boolean neverObserved,
SexpList tagSet,
boolean wordIsUnknown,
Symbol origWord,
ConstraintSet constraints)
Adds a chart item for every possible part of speech for the specified word at the specified index in the current sentence. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static final int MAX_UNARY_PRODUCTIONS
protected static final double probCertain
Constants.probCertain
.
protected static final double probImpossible
Constants.probImpossible
.
protected Set topProbItemsToAdd
addTopUnaries(int)
for storing
items to be added to the chart when iterating over a cell in the chart.
Bugs: It is a design error to have created this Set
member
with the same name as the ArrayList
member in the superclass. The designer of this class should be
appropriately flogged.
protected double cummulativeInsideLogProb
protected CountsTable eventCounts
protected EMChart chart
Constructor Detail |
---|
public EMDecoder(int id, DecoderServerRemote server)
DecoderServer
to get all information and probabilities
required for decoding (parsing).
id
- the id of this parsing clientserver
- the DecoderServerRemote
implementor
(either local or remote) that provides this decoder object with
information and probabilities required for decoding (parsing)Method Detail |
---|
protected void seedChart(Symbol word, int wordIdx, Symbol features, boolean neverObserved, SexpList tagSet, boolean wordIsUnknown, Symbol origWord, ConstraintSet constraints) throws RemoteException
Decoder
seedChart
in class Decoder
word
- the current wordwordIdx
- the index of the current word in the current sentencefeatures
- the word-feature vector for the current wordneverObserved
- indicates whether the current word was never observed
during training (a truly unknown word)tagSet
- a list containing all possible part of speech tags for
the current wordconstraints
- the constraint set for this sentence
RemoteException
- if any calls to the underlying
DecoderServerRemote
object throw a RemoteException
Chart.add(int,int,Item)
protected CountsTable parseAndCollectEventCounts(SexpList sentence) throws RemoteException
sentence
- a list of symbols representing the words of a sentence
RemoteException
protected CountsTable parseAndCollectEventCounts(SexpList sentence, SexpList tags) throws RemoteException
sentence
- a list of symbols representing the words of a sentencetags
- a list of symbols that represent the part-of-speech tags of
the words of the specified sentence (coordinated with the
specified list of words)
RemoteException
protected CountsTable parseAndCollectEventCounts(SexpList sentence, SexpList tags, ConstraintSet constraints) throws RemoteException
sentence
- a list of symbols representing the words of a sentencetags
- a list of symbols that represent the part-of-speech tags
of the words of the specified sentence (coordinated with
the specified list of words)constraints
- a set of parsing constraints
RemoteException
protected void computeOutsideProbs()
protected void computeOutsideProbs(int start, int end)
start
- the index of the first word in the span whose chart items'
outside probabilities are to be computedend
- the index of the last word in the span whose chart items'
outside probabilities are to be computedprotected CountsTable computeEventCounts()
protected void computeEventCounts(int start, int end, double sentenceProbInverse, CountsTable counts)
CountsTable
object.
start
- the index of the first word in the span whose expected event
counts are to be computedend
- the index of the last word in the span whose expected event
counts are to be computedsentenceProbInverse
- the inverse of the total inside probability
of the current sentence under the current modelcounts
- the table in which to store expected event countsprotected void addSynthesizedTopModEvent(TrainerEvent event, double expectedCount, CountsTable counts)
ModWordModelStructure2
) would not contain
counts for words that are the head of the entire sentence (since they are
not generated as modifiers of anything). This enables the (now deprecated)
count-sharing scheme to work, whereby the last back-off level of TopLexModelStructure1
would use the
p(w|t) counts from the last level of ModWordModelStructure2
.
event
- the HeadEvent
instance for an observed tree root,
from which a ModifierEvent
is to be producedexpectedCount
- the expected count of the specified head-generation
eventSettings.trainerShareCounts
protected void addPretermHeadEvent(EMItem item, double expectedCount, CountsTable counts)
Settings.trainerShareCounts
protected void addTopUnaries(int end) throws RemoteException
Decoder
Training.topSym()
has been multiplied to the
existing item's score.
addTopUnaries
in class Decoder
end
- the index of the last word of the sentence being parsed
RemoteException
protected void complete(int start, int end) throws RemoteException, Decoder.TimeoutException
Decoder
complete
in class Decoder
start
- the index of the first word in the span for which all chart
items are to be created and added to the chartend
- the index of the last word in the span for which all chart
items are to be created and added to the chart
RemoteException
Decoder.TimeoutException
- if the boolean value of Settings.maxParseTime
is greater than zero has been reached while
parsingDecoder.joinItems(CKYItem,CKYItem,boolean)
protected void joinItems(EMItem modificand, EMItem modifier, boolean side) throws RemoteException
modificand
- the chart item representing a partially-completed
subtree, to be modified on side
by modifier
modifier
- the chart item representing a completed subtree that
will be added as a modifier on side
of
modificand
's subtreeside
- the side on which to attempt to add the specified modifier
to the specified modificand
RemoteException
protected void addUnariesAndStopProbs(int start, int end) throws RemoteException
Decoder
Training.stopSym()
as
a modifier on either side of a production.
addUnariesAndStopProbs
in class Decoder
start
- the index of the first word in the spanend
- the index of the last word in the span
RemoteException
Decoder.addUnaries(CKYItem, java.util.List)
,
Decoder.addStopProbs(CKYItem, java.util.List)
protected List addUnaries(EMItem item, List itemsAdded, int level) throws RemoteException
RemoteException
protected List addStopProbs(EMItem item, List itemsAdded, int level) throws RemoteException
RemoteException
|
Parsing Engine | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |