CS 5134/6034: Natural Language Processing
University of Cincinnati
Spring 2024

Instructor: Tianyu Jiang
TA: Dylan Hutson (hutsondm at mail.uc.edu)
Time: Tues&Thur 9:30 - 10:50 am
Location: RECCENTR 3250
Office Hour:
     Tianyu Jiang, Tues 10:55-11:55 am, Rhodes Hall 889
     Dylan Hutson, Wed 2:00-3:00 pm, Rhodes Hall 850E (inside the CEAS library)

Course Description
This course will provide a basic introduction to natural language processing (NLP). We will learn the fundamentals of different subfields within NLP, and study theoretical concepts and algorithms for various NLP problems. Topics covered include text classification, language modeling, word embeddings, sequence tagging, syntactic parsing, semantic parsing, question answering, and others. By the end of this course, you will have a good understanding of the research questions and methods in different areas of NLP, and have the skills to build NLP tools for new issues.

Grading
  • Assignments (4): 40%
  • Midterm Exam (in-class): 25%
  • Project: 35%
  • Bonus: 5% (class attendance)

  • Late Policy: 24 hour grace period with 10% penalty. No points after 24 hours.
    Regrading Policy: Regrade requests must be made within two weeks of the score being posted on Canvas.
    Electronic Submission: All assignments and project reports need to be submitted electronically via Canvas.

    Prerequisites
    This course assumes a good background in basic probability, statistics, linear algebra, and good programming skills in Python3. Prior knowledge of machine learning is helpful, but not required. The class is mainly for advanced undergraduates and graduate students in computer science, but we welcome other interested students with the necessary background and programming skills.

    Textbook
    Dan Jurafsky and James Martin. Speech and Language Processing, 3rd Edition (Feb 3, 2024 draft).

    Schedule (Tentative)
    Week Date Topic Reading Assignment
    1 01/09 Introduction to NLP Ch. 1 & 2
    01/11 Morphology Ch. 2
    2 01/16 N-gram Language Models Ch. 3 a1 out
    01/18 N-gram contd. Ch. 3
    3 01/23 Naive Bayes Ch. 4
    01/25 Logistic Regression Ch. 5
    4 01/30 Part-of-Speech Tagging Ch. 8 a1 due, a2 out
    02/01 HMM and Viterbi Ch. 8
    5 02/06 Sequence Labeling Ch. 8 project instructions out
    02/08 Lexical Semantics Ch. 6
    6 02/13 Distributional Representations Ch. 6 a2 due
    02/15 Word Embeddings Ch. 6
    7 02/20 Neural Networks for NLP Ch. 7 proposal due, a3 out
    02/22 Recurrent Neural Network Ch. 9
    8 02/27 Machine Translation & Seq2Seq Ch. 13
    02/29 Attention Ch. 9 & 10
    9 03/05 NO CLASS a3 due
    03/07 Midterm Exam
    10 03/12 Spring Break
    03/14 Spring Break
    11 03/19 Post-exam Review intermediate report due, a4 out
    03/21 Transformers Ch. 10
    12 03/26 Pre-train and Fine-tune Ch. 11
    03/28 Pre-train and Fine-tune contd. Ch. 11
    13 04/02 Large Language Models a4 due
    04/04 Project Presentations
    14 04/09 Senior Design Day - NO CLASS
    04/11 Project Presentations
    15 04/16 Project Presentations
    04/18 Project Presentations slides due
    04/19 Classes End Day final report due

    Project Resources
  • You can choose to reimplement/improve a published work at NLP Conferences from the last 5 years (since 2018):
  •      ACL , EMNLP , NAACL

  • Or from the following tasks:
    1. Hate Speech Detection (ISHate Dataset)
    2. Winograd Schema Challenge (WSC2016, WinoGrande2020)
    3. BrainTeaser QA (SemEval2024: Task 9)
    4. Clickbait Challenge (SemEval2023: Task 5)
    5. Structured Sentiment Analysis (SemEval2022: Task 10)