04833510 Computer Vision and Deep Learning
EECS, Peking University
Spring 2022

Instructor:

Teaching Assistant:

Location: Room 302, Teaching Building 2, Peking University

Time: Friday 15:10pm - 17:00pm (every week)

Office hours: Drop an email or wechat message to the instructor or TAs for appointing a course-related face-to-face QA.


Schedule of Lectures

Date Topics
Feb 25, 2022 Introduction - I
  • Course logistrics
  • Introduction to computer vision: illustrative applications and demos
March 4, 2022 Introduction - II
  • Bacics of machine learning
  • Deep Learning: history, key concepts, back-propagation, neural layers etc.
March 11, 2022 Visual Recognition - I
  • Visual Recognition: Task Definition and Challenges
  • Visual Features: Harris Corner, SIFT, MSER, HOG etc.
  • Bag-of-words Models
  • Spatial Pyramid Matching
  • Pyramid Match Kernel
  • Vocabulary Tree
  • Sparse Coding
March 18, 2022 Visual Recognition - II
  • Deep Learning for Visual Recognition: LeNet-5, AlexNet, VGG-16, GoogleNet, ResNet
  • Network Visualizatioin
March 25, 2022 Object Detection - I
  • V-J Face Detector (Integral Image, AdaBoost, Cascade)
  • HOG+SVM with NMS
  • Deformable Part Model (DPM) for Pedestrian Detection
April 1, 2022 Object Detection - II
  • R-CNN
  • Fast R-CNN
  • Faster R-CNN
  • R-FCN
  • Multi-Scale R-CNN
  • Feature Pyramid Network
April 8, 2022 Pixel Computing - I
  • Pixel Labeling: Segmentation, Matting, Parsing
  • Unsupervised Image Segmentation: K-means, Mean-Shift, Normalized Cut
  • Interactive Object Cutout: GraphCut, GrabCut, LazySnapping
  • Image Matting: Poisson Matting, Closed-Form Matting, Robust Color Sampling
  • Image Co-segmentation
  • Image Inpainting / Image Completion
  • Visual Parsing
April 15, 2022 Pixel Computing - II
  • Deep Pixel Labeling: FCN, DeepLab, SegNet, CNN-as-RNN, HRNet
  • Human Pose Estimation: Bottom-Up and Top-Down
April 22, 2022 Reccurent Deep Networks / Non-Local Models
  • Unrolling Computational Graph
  • RNN variants (recurrent through output, sequence-input-single-output, teaching forcing, encoder-decoder, bi/quad-directional RNN etc.)
  • Long short-term memory (LSTM)
  • Transformer and its variants
  • DETR, MLP-mixer
  • Applications
April 29, 2022 Video Computing
  • Introduction of Video Computing Tasks
  • Video Features (STIP, Deep Video, C3D, Trajectory Feature)
  • Optical flows
  • Deep Learning for Video Classification (multi-stream fusion techniques)
  • Video Event Detection and Action Detection
  • An Illustrative System for Video Classification
May 6, 2022 Holiday - no class
May 13, 2022 Visual Tracking
  • Lucus-Kanade
  • Mean-shift
  • KLT
  • Kalman filter
  • Model visual tracking methods
May 20, 2022 3D Vision
  • Camera models
  • Camera calibration
  • Essential and fundamental matrix
  • Epipolar geometry
  • Stereo
  • SLAM
  • Image-based rendering
  • Neural rendering models
May 27, 2022 Vision-Language Learning / Generative Models
  • Image and Video Captioning
  • Image and Video Synthesis
  • Visual Search with Natural Language
  • Language based Visual Navigation
  • Generative adversarial networks (GAN)
  • Variational autoencoder (VAE)
  • Autoregressive models and flows
June 3, 2022 Holiday - no class
June 10, 2022 Course Project Presentation TBA


Textbook:

References:


Course Work

Final Grade
  • Grading will be based on homeworks (25%), survey (25%) and a course project (50%).
  • The end-of-term grade is curved. Your overall grade will depend on your performance relative to your classmates.
Survey & Project
  • The instructor will provide a list of candidate topics for writing the survey.
  • A Course Project is mandatory for all students. The only requirement for the topic is being computer vision related.