CS 5134/6034: Natural Language Processing
University of Cincinnati
Fall 2024

Instructor: Tianyu Jiang
TA: Saptarshi Ghosh (ghosh2si at mail.uc.edu)
Time: MWF 9:00 - 9:55 am
Location: RECCENTR 3230
Office Hour:
     Tianyu Jiang, Mon 10-11 am, Rhodes Hall 889
     Saptarshi Ghosh, Tue 11am - 12pm, Rhodes Hall 850E (within CEAS library)

Course Description
This course will provide a basic introduction to natural language processing (NLP). We will learn the fundamentals of different subfields within NLP, and study theoretical concepts and algorithms for various NLP problems. Topics covered include text classification, language modeling, word embeddings, sequence tagging, syntactic parsing, semantic parsing, question answering, and others. By the end of this course, you will have a good understanding of the research questions and methods in different areas of NLP, and have the skills to build NLP tools for new issues.

Grading
  • Assignments (3): 30%
  • Midterm Exam (in-class): 30%
  • Project: 40%
  • Bonus: 5% (class attendance)

  • Late Policy: 24 hour grace period with 10% penalty. No points after 24 hours.
    Regrading Policy: Regrade requests must be made within two weeks of the score being posted on Canvas.
    Electronic Submission: All assignments and project reports need to be submitted electronically via Canvas.

    Prerequisites
    This course assumes a good background in basic probability, statistics, linear algebra, and good programming skills in Python3. Prior knowledge of machine learning is helpful, but not required. The class is mainly for advanced undergraduates and graduate students in computer science, but we welcome other interested students with the necessary background and programming skills.

    Textbook
    Dan Jurafsky and James Martin. Speech and Language Processing, 3rd Edition (Aug 20, 2024 draft).

    Schedule
    Week Date Topic Reading Assignment
    1 08/26 Welcome
    08/28 Introduction to NLP Ch. 1 & 2
    08/30 Morphology Ch. 2
    2 09/02 NO CLASS (Labor Day) a1 out
    09/04 N-gram Language Models Ch. 3
    09/06 N-gram Language Models contd.
    3 09/09 Naive Bayes Ch. 4
    09/11 Logistic Regression Ch. 5
    09/13 Logistic Regression contd.
    4 09/16 Part-of-Speech Tagging Ch. 17 a1 due, a2 out
    09/18 HMM
    09/20 Viterbi
    5 09/23 Sequence Labeling Ch. 17
    09/25 Sequence Labeling contd.
    09/27 Lexical Semantics Ch. 6 & Appendix G
    6 09/30 Distributional Representations Ch. 6 a2 due
    10/02 Word Embeddings Ch. 6
    10/04 Neural Networks for NLP Ch. 7
    7 10/07 Recurrent Neural Network Ch. 8 proposal due, a3 out
    10/09 Recurrent Neural Network contd. Ch. 8
    10/11 Reading Day
    8 10/14 Machine Translation Ch. 13
    10/16 Seq2Seq Ch.13
    10/18 NO CLASS
    9 10/21 Midterm Exam a3 due
    10/23 Attention Ch. 9
    10/25 Attention contd.
    10 10/28 Transformers Ch. 9
    10/30 Transformers contd.
    11/01 Pre-train and Fine-tune Ch. 10&11
    11 11/04 Pre-train and Fine-tune contd. intermediate report due
    11/06 Large Language Models Ch. 10&11
    11/08 Prompting and In-Context Learning Ch. 12
    12 11/11 Veterans Day
    11/13 NO CLASS
    11/15 NO CLASS
    13 11/18 Project Presentations
    11/20 Project Presentations
    11/22 Project Presentations
    14 11/25 Project Presentations
    11/27 Project Presentations
    11/29 Thanksgiving
    15 12/02 Project Presentations
    12/04 Project Presentations
    12/06 Project Presentations slides & final report due

    Project Resources
  • You can choose to reimplement/improve a published work at NLP Conferences from the last 5 years (since 2019):
  •      ACL , EMNLP , NAACL

  • Or from the following tasks:
    1. Winograd Schema Challenge (WSC2016, WinoGrande2020)
    2. BrainTeaser QA (SemEval2024: Task 9)
    3. Hallucination Detection (SemEval2024: Task 6)
    4. Metonymy Detection (link)
    5. Structured Sentiment Analysis (SemEval2022: Task 10)