CSCI 5832: Natural Language Processing

Instructor Fall 2019: James Martin

Natural Language Processing

NLP is about getting computers to perform useful and interesting tasks involving spoken and written human language. NLP is sometimes referred to as Computational Linguistics to emphasize the fact that involves the combination of CS methods with research insights from Linguistics (the study of human language). Practical applications of NLP include question answering, machine translation, information extraction, and interactive dialog systems (both written and spoken). Modern NLP systems rely heavily on methods involving probability, linear algebra and calculus often in combination with machine learning methods.

Course Topics (Subject to Change)

  • Words, word counting, and lexicons
  • Probabilistic language modeling
  • Text classification with language models
    • Naive Bayes
  • Text classification with single layer neural networks
    • Logistic Regression
  • Vector semantics (word embeddings)
    • Word2Vec
  • Part-of-speech tagging with Hidden Markov Models
    • Viterbi algorithm
  • Dependency parsing
    • Transition-based methods
  • Compositional Semantic Analysis
  • Deep learning models
    • Recurrent networks
    • Transformer networks
  • Information extraction
  • Question answering
  • Machine Translation

Readings

We’ll be using draft chapters from the 3rd Edition of Speech and Language Processing by Dan Jurafsky and James H. Martin. You should not need to buy the current edition, draft pdfs of the new chapters will be available from the textbook website. The week-to-week readings are on Moodle

Grading

Grades will be based on a cumulative score as follows:
● Assignments (40%)
● Quizzes (30%; equally weighted)
● Final (30%)