SPEECH.TAGGER

GOAL: intelligently identifying the part-of-speech of words in sentence
SYNPOSIS: Individual project | Spring 2018 | Java
KEY SKILLS: Hidden Markov Models | Viterbi Algorithm | Parsing files, nested Maps, Graphs, Priority Qs
VALIDATION: 96.4% accuracy on the Brown corpus (35090/36,394 words correct)
Unfortunately, this program requires reading files with sizes beyond a web-browser’s capability.
Thus, instead of a trinket, I will be adding a simple video here of the tagger in action soon...

Key Process:

  1. Given a testing corpus, sentences with the parts of speech already tagged. Parsed the testing corpus using a map to compute the probability of (a) a word being a particular part of speech (b) parts-of-speeches it transitions to
  2. Combined this information into a graph creating a Hidden Markov model
  3. Implemented the Viterbi algorithm. In a sentence, for each word there are multiple options for parts-of-speeches to follow it. The Viterbi algorithm uses the information you give it to calculate which transition is most likely, thereby indicating what is the most likely sequence of parts-of-speeches.
  4. Used Scanner to create a user interface that tags each word the user types.