04833510 Computer Vision and Deep Learning

Instructor:

Prof. Yadong Mu : myd@pku.edu.cn

Teaching Assistant:

Mr. LI Qiwei : lqw@pku.edu.cn
Mr. Chi Haozhe : haozhe@stu.pku.edu.cn

Location: Room 506, Teaching Building 3, Peking University

Time: Friday 15:10pm - 17:00pm (every week)

Office hours: Drop an email or wechat message to the instructor or TAs for appointing a course-related face-to-face QA.

Schedule of Lectures

Date	Topics
March 6, 2026	Introduction - I Course logistrics Introduction to computer vision: illustrative applications and demos
March 13, 2026	Introduction - II Bacics of machine learning Deep Learning: history, key concepts, back-propagation, neural layers etc.
March 20, 2026	Visual Recognition - I Visual Recognition: Task Definition and Challenges Visual Features: Harris Corner, SIFT, HOG etc. Bag-of-words Models Spatial Pyramid Matching Pyramid Match Kernel Vocabulary Tree Sparse Coding
March 27, 2026	Visual Recognition - II Deep Learning for Visual Recognition: LeNet-5, AlexNet, VGG-16, GoogleNet, ResNet, DenseNet Network Visualizatioin
April 3, 2026	Object Detection - I V-J Face Detector (Integral Image, AdaBoost, Cascade) HOG+SVM with NMS Deformable Part Model (DPM) for Pedestrian Detection
April 10, 2026	Object Detection - II ZFNet (i.e., Zeiler and Fergus (2013)) R-CNN Fast R-CNN Faster R-CNN Yolo, SSD Feature Pyramid Network
April 17, 2026	Pixel Computing - I Pixel Labeling: Segmentation, Matting, Parsing Unsupervised Image Segmentation: K-means, Mean-Shift, Normalized Cut Interactive Object Cutout: GraphCut, GrabCut, LazySnapping Image Matting: Poisson Matting Image Co-segmentation Image Inpainting / Image Completion
April 24, 2026	Pixel Computing - II Deep Pixel Labeling: FCN, DeepLab, SegNet, CNN-as-RNN, HRNet Human Pose Estimation: Bottom-Up and Top-Down
May 1, 2026	Holiday - no class
May 8, 2026	Sequantial Models Unrolling Computational Graph RNN variants (recurrent through output, sequence-input-single-output, teaching forcing, encoder-decoder, bi/quad-directional RNN etc.) Long short-term memory (LSTM) Transformer and its variants DETR, MLP-mixer Applications
May 15, 2026	Video Computing Introduction of Video Computing Tasks Video Features (STIP,Deep Video, C3D, Trajectory Feature) Optical flows Deep Learning for Video Classification (multi-stream fusion techniques) Video Event Detection An Illustrative System for Video Classification Video Moment Localization Vision-Language Grounding in Videos
May 22, 2026	3D Computer Vision - I Epipolar geometry Camera calibration Image rectification Stereo Structure from motion
May 29, 2026	3D Computer Vision - II Image-based rendering Neural rendering models
	Visual Tracking Mean-shift KLT Kalman filter More visual tracking methods
June 5, 2026	Image / Video Generation Generative adversarial networks (GAN) Variational autoencoder (VAE) Autoregressive models and flows Vision-Language Foundation Models
June 12, 2026	Vision-Robot Learning Basics of reinforcement learning Autonomous driving Robot control

Textbook:

There is no textbook for this course.

References:

Jean Ponce, David Forsyth, Computer Vision A Modern Approach, Approach, Prentice Hall, 2011 (main reference)
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag, 2011 (main reference)
Simon Prince, Computer vision: models, learning and inference, 2012 (main reference)
Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep Learning, MIT Press, 2016
Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006
Multiple View Geometry in Computer Vision
Epipolar Geometry in Stereo, Motion and Object Recognition A Unified Approach by Gang Xu, Zhengyou Zhang
An invitation to 3-D Vision
Zhi-hua Zhou, Machine Learning (in Chinese), 2016
CVPR / ICCV / ECCV / NIPS / ICML / ICLR proceedings
Domestic conferences: VALSE, PRCV