CPS842 - Information Retrieval and Web Search
Course Management Form
Fall 2011
Basic Information:
Instructor: |
Dr. Cherie Ding |
Office: |
ENG258 |
Phone: |
416-979-5000 x6965 |
Email: |
cding@scs.ryerson.ca |
Office Hrs.: |
Thursday 14:00 - 17:00 (ENG258) |
Lectures: |
Tuesday 15:00 - 16:00 (ILC100) & Thursday 10:00 - 12:00 (VIC300) |
Labs: |
Wednesday 8:00 - 9:00 (ENG201) |
Description:
This course discusses
basic information retrieval models, evaluation methods, state of art of search
engines and new trends in web search. Topics covered include basic IR models,
indexing, query operation, evaluation, categorization and clustering, web
search, link analysis, web crawling, web mining, etc. After completing this
course, students will have acquired the core techniques in building text
retrieval systems, hands-on experience on building the core parts of a
web-based search engine, and knowledge of IR applications on the World Wide
Web.
Main textbook:
- Modern Information Retrieval: The Concepts and Technology behind Search (2nd Edition), Ricardo Baeza-Yates, and Berthier Ribeiro-Neto,
Addison Wesley, 2010, ISBN 9780321416919
Other References:
- Search Engines: Information Retrieval in Practice, Bruce Croft, Donald Metzler, and Trevor Strohman, Addison-Wesley, 2009, ISBN 0136072240
- Introduction to Information Retrieval, Christopher D.
Manning, Prabhakar Raghavan, Hinrich Schütze,
Cambridge University Press, 2008, ISBN 9780521865715
- Information Retrieval: Algorithms and Heuristics, David A. Grossman, and Ophir Frieder, Springer,
2004, ISBN 1402030045
- Managing Gigabytes: Compressing and Indexing Documents and Images, Ian H. Witten, Alistair
Moffat, and Timothy C. Bell, Morgan Kaufmann, 1999, ISBN 1558605703
Tentative Schedule:
- Introduction (Chapter 1)
- IR models: Boolean, Vector Space, Probabilistic (Chapter 3: 3.1, 3.2)
- Document Pre-Processing, Indexing/Searching(Chapter 6: 6.6, Chapter 9)
- Evaluation of IR Systems (Chapter 4)
- Relevance Feedback and Query Expansion (Chapter 5)
- User Interfaces for Search (Chapter 2)
- Documents, Queries (Chapter 6 & 7)
- Web Search (Chapter 11)
- Web Crawling (Chapter 12)
- New Topics for IR: Recommender Systems, Social Network Analysis, etc.
Evaluation:
Item |
Percent |
Tentative due date |
Homework 1 |
5% |
Oct. 18 |
Homework 2 |
5% |
Nov. 22 |
Lab Assignment 1 |
10% |
Oct. 6 |
Lab Assignment 2 |
10% |
Nov. 3 |
Project |
15% |
Dec. 1 |
Mid-Term Exam |
25% |
Oct. 20 |
Final Exam |
30% |
During exam period |
General Information and Class Policies:
- There are 2 pieces of homework. Each of them will be posted one week before the due date, and their main purpose is to help students prepare for the exam.
- Labs start from the third week (Sept 21) and there are altogether 10 labs. The first 7 labs should be used to complete the 2 lab assignments, and the last 3 labs could be used on the course project. The demonstration of the assignment would also be scheduled during the lab hours.
- There are 2 lab assignments. Each of them can be done either individually or by a group of two. They are continuous and each one is built based on the previous result, and therefore, group members should be stable. Only when necessary (e.g. a group member drops the course), a student can change his/her partner, and in this case, a written request is required and the instructor will evaluate the situation and make a decision.
- A list of ideas for the course project will be posted online. Students can either choose from this list, or propose their own, and in the latter case, a written description of the proposed project should be approved by the instructor. The project can be done by a group of 3 (maximum) students. Some of the code from the assignments can be used in the project. So try to make your program modular and reusable.
- Late submission of the lab assignment or the project will be penalized. The penalty for the late submission is 10% for the 1st day, 25% for the 2nd day, and it will not be accepted afterwards. Dates are subject to change as agreed in class.
- Midterm exam will be in the 7th week, or as agreed in class. It will be an in-class exam and the topics include everything covered in the first 6 weeks. Final exam will be based on the topics covered in the whole semester.
- In order to pass the course, you must get at least 50% in Midterm and Final exams.
- Copied work (both the copied and the original) will be given a grade of zero. Involvement with plagiarism can ultimately result in the course failure and/or the expulsion from the University in accordance with the Student Code of Academic Conduct.
- Grades on tests and assignments will be available on the Blackboard system. As per Ryerson regulation, final grades will be disclosed only by the registrar's office.
All the course materials will be posted on the Blackboard. Students are responsible for
checking the Blackboard web site regularly. Modifications to the above course procedures shall be
made in consultation with the students.