IS 733: Data Mining — Spring 2021

Course Details

Times: Tuesday 7:10pm – 9:40pm

Location: Online (Blackboard Collaborate)

Instructor: Nirmalya Roy

Instructor’s Office Location and Hours: ITE 421 Tuesday 9:00 am – 10:00 am, or by appointment

Instructor’s Email: nroy at umbc dot edu

Course Descriptions:

This course will provide an in-depth understanding of the technical, business, and research issues in the area of data mining, including classification, clustering, association rules, visualization, and data warehousing. New areas of research and development in data mining will also be discussed.

Student learning outcomes: By the end of this course, you will be able to:

  • Apply a variety of data mining techniques to real-world situations,

  • select appropriate strategies for each step in the data mining process, and

  • discuss the underlying theoretical principles behind data mining methods, and the practical implications of these.

Poll Everywhere: Vote on in-class poll questions at PollEv.com/nirmalyaroy910. Register your account for the course at, by week 2 in order to get participation credits.

Course Prerequisites:

  • Requires IS 620, or consent of the instructor.

  • Required knowledge: basic programming ability in a high-level language such as Java, Python, R, or Matlab. No previous background in data mining is required. Although the course will be relatively non-technical, a basic understanding of elementary concepts in continuous and discrete mathematics will be needed (linear algebra, Boolean logic, graphs and trees, ...).

Required Textbook:

Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition (Witten et al.) is the primary textbook. You will need this book for course readings. Until you obtain it, the UMBC library has an electronic copy for online access that you can use. Earlier editions of the textbook are acceptable but not recommended. Some material will be missing, and it will be up to you to convert chapter/section/page numbers to the older edition for the required readings.

Course Requirments and Grading:

  • Homeworks 25% (4 - 5 homeworks)

  • Group projects 35%:

    • Proposal 5% (due 2/23/2021)

    • Mid-term report 5% (due 4/6/2021)

    • Group project poster 10% (presented in class 5/11/2021, digital copy due at the same time)

    • Final report 15% (due Friday 5/14/2021)

  • Final 35%

  • Participation 5% (Poll questions -- 4%, Blackboard participation -- 1%)

The project will be done in groups of 2. Project proposals are to be sent to me by email, and approved by the deadline.

In this course, participation means more than just showing up. It also refers to contributing to everyone's learning, through active engagement in peer instruction exercises, in-class discussions, and BB questions/answers. Participation grades will be assessed as a percentage of peer instruction questions answered (correctly or not), with a 90% response rate being sufficient for full points, and by BB contributions. Two or more contributions (either questions or answers) on BB will earn you 1% of the final grade.

With respect to final letter grades, UMBC's Catalog states that an A indicates "superior" achievement; B, "good" performance; C, "not satisfactory"; D, "unacceptable"; F, "failure." There is specifically no mention of any numerical scores associated with these letter grades. Below is how grades may be assigned based on your final points, accumulated over the semester. Grades will be assigned using a plus/minus system. It is university policy that A+, D+, and D- are not assigned. I do not grade on a curve, so that everyone in the class has the opportunity to succeed.

Homework and Exam Policies:

  • Homeworks are due at the beginning of class on the dates specified.

  • Late Homeworks will not be accepted unless an extension is approved by me in advance. Requests for extensions must be made at least three days before the due date. 2 points will be deducted for each day after the submission deadline.

  • In case of requests to change the HW submission file (upon approval from me), it must be done within 72 hours after the submission deadline. However, the answers modified will be evaluated for 50% of the original score.

  • In the event of class cancellation due to inclement weather, any hard-copy paper assignment or test will be due in the next class meeting. Electronic submissions will still be due on the original due date.

Online Instruction

This course will be taught online via Blackboard Collaborate, with synchronous lectures at the scheduled time. You can access lectures by navigating to the course on Blackboard, then clicking on the "Bb Collaborate" tab. Lectures will be recorded on Blackboard for later viewing, however participation during the scheduled time is expected and participation in polls during the lessons counts toward your grade.


Instructional Methods

Traditional lectures will be augmented with active learning methods, primarily in the form of peer instruction exercises. Research has strongly indicated that active learning improves student outcomes in STEM fields versus traditional lecturing (Freeman et al., 2013). We will be using the Poll Everywhere service for polls and quizzes.

Pre-class reading assignments will be given for each lesson, which are very important for learning, and for making the best use of our limited time together (a partially "flipped classroom" approach). These readings are therefore required.

Schedule

Week

Date

Topic

Details

Assessment

Notes

1

1/26

Course overview, introduction to data mining

Applications, the data mining process, data mining ethics


2

2/2

Know your data

Instances and attributes, plotting and visualization

Witten et al., Ch 2.

Know your data

3

2/9

Data preprocessing

Data cleaning, integration, transformation, reduction, discretization. Principal components analysis.


4

2/16

Data Mining Tutorial using Python and Google Colab

Project brainstorming.

HW1 due, HW2 posted. Project groups formed by this date

5

2/23

Data Warehousing


OLAP vs OLTP, data cubes. Sharing project ideas

Project Proposal due

6

3/02

Knowledge representation

Linear models, trees, rules.



HW2 due (extended to March 5)

Witten et al., Ch 3

Knowledge representation

7

3/09

Supervised learning

Decision trees, decision rules

Naive Bayes, logistic regression, support vector machines.

Witten et al., Ch 4.2 first subsection only, 4.6 (up to and including Logistic Regression), 7.2 (up to and including Nonlinear Class Boundaries)

Supervised learning

8

3/16

Spring Break




9

3/23

Supervised learning (continued)







10

3/30

Evaluation of supervised learning

Hold-out method, cross validation, ROC curves

HW3 due on April 2 (extended)

Evaluation of supervised learning

Witten et al., Ch 5 (up to and including 5.5, can skip The Bootstrap, 5.8 up to and including ROC Curves, can skip Lift Charts)


11

4/6

Ensemble methods

Bagging, boosting, random forests, stacking

Project mid-term progress report due

Project report guidelines

HW4 posted

Ensemble methods

Witten et al., Ch 12 up to 12.4, Ch 12.7

12

4/13

Unsupervised learning





Association rule learning, K-means, hierarchical clustering




Unsupervised learning

Witten et al., Ch 4.5, 4.8


13

4/20

Recommender systems


Content filtering, collaborative filtering


HW4 due

Recommender systems


Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8)


14

4/27

Research project progress pitch [5-minutes oral presentation]


Exam Review




15

5/4

Exam









16

5/11

Group Poster Presentation


Digital copies of posters due. Project final report due Tuesday 05/18/2021 (11:59pm)

The schedule may be subject to change. The summary and details columns are only a guideline of the content likely to be covered, and the dates on which material is covered may shift.

Research Projects

Team No

Team Members

Project Topic/Titles

Devices/Datasets

1

Sahara Ali and Muhammad Hasan Ferdous and Ebuka Ekwomchi

Analyzing effects of communal mobility on Covid-19 spread and mortality rate in the United States.

2

Bryan Dilone and Brahmani Thota

The Study of Opinion Mining on Amazon Electronics

Amazon’s product data

3

Len Mancini and David Fahnestock

Study the impact of COVID on ridership and rider patterns in micro-mobility

4

Geetham Godavarthi and Rajashekara Chennareddy and Supriya Gadhiraju

Stock Market Analysis

5

Karan Dhamecha and Namrata Kelkar and Sree Purani

Predicting Fetal Risk Using Cardiotocographic Data

6

Nathan Antonicelli and Monica Lingham

Sentiment analysis in Politics

7

Leslie Leslie and Guy-Alain Aurelien

Free Vacation, Click Here!

8

Ira Winkler and Henry Ejim

Phishing Susceptibility By Message Lure Type

COVID-19 Policies

Please see this Google doc for UMBC Policies and Resources during COVID-19.

Software

This course will make extensive use of the free, open source WEKA data mining toolkit.

Academic Integrity

UMBC's policies on academic integrity will be strictly enforced (see the University System of Maryland's policy document, UMBC's academic integrity overview page, the student academic conduct policy and the UMBC catalog). In particular, all of your work must be your own. Acknowledge and cite source material in your papers or assignments. While you may verbally discuss assignments with your peers, you may not copy or look at anyone else's written assignment work or code, or share your own solutions. Any exceptions will result in a zero on the assessment in question, and may lead to further disciplinary action. Some relevant excerpts from UMBC's policies, as linked to above, are:

  • "By enrolling in this course, each student assumes the responsibilities of an active participant in UMBC's scholarly community in which everyone's academic work and behavior are held to the highest standards of honesty. Cheating, fabrication, plagiarism, and helping others to commit these acts are all forms of academic dishonesty, and they are wrong." (UMBC's academic integrity overview)

  • "Students shall not submit as their own work any work which has been prepared by others." (USM policy document)

  • "Students shall refrain from acts of cheating and plagiarism or other acts of academic dishonesty." (USM policy document)

  • "Plagiarism means knowingly, or by carelessness or negligence, representing as one's own, in any academic exercise, the intellectual or creative work of someone else." (student academic conduct policy)

  • "Cheating means using or attempting to use unauthorized material, information, study aids, or another person’s work in any academic exercise" (student academic conduct policy)

Accessibility and Disability Accommodations, Guidance and Resources

Accommodations for students with disabilities are provided for all students with a qualified disability under the Americans with Disabilities Act (ADA & ADAAA) and Section 504 of the Rehabilitation Act who request and are eligible for accommodations. The Office of Student Disability Services (SDS) is the UMBC department designated to coordinate accommodations that would create equal access for students when barriers to participation exist in University courses, programs, or activities.

If you have a documented disability and need to request academic accommodations in your courses, please refer to the SDS website at sds.umbc.edu for registration information and office procedures.

SDS email: disAbility@umbc.edu

SDS phone: (410) 455-2459

If you will be using SDS approved accommodations in this class, please contact me (instructor) to discuss implementation of the accommodations. During remote instruction requirements due to COVID, communication and flexibility will be essential for success.

Counseling Center

Diminished mental health can interfere with optimal academic performance. The source of symptoms might be related to your course work; if so, please speak with me. However, problems with other parts of your life can also contribute to decreased academic performance. UMBC provides cost-free and confidential mental health services through the Counseling Center to help you manage personal challenges that threaten your personal or academic well-being.

Remember, getting help is a smart and courageous thing to do -- for yourself and for those who care about you. For more resources get the Just in Case mental health resources Mobile and Web App. This app can be accessed by clicking: counseling.umbc.edu/justincase.

The UMBC Counseling Center is in the Student Development & Success Center (between Chesapeake and Susquehanna Halls). Phone: 410-455-2472. Hours: Monday-Friday 8:30am-5:00pm.


Diversity Statement on Respect

Students in this class are encouraged to speak up and participate during our meetings. Because the class will represent a diversity of individual beliefs, backgrounds, and experiences, every member of this class must show respect for every other member of this class. (Statement from California State University, Chico’s Office of Diversity and Inclusion).


Family Educational Rights and Privacy Act (FERPA) Notice

Please note that as per federal law I am unable to discuss grades over email. If you wish to discuss grades, please come to my office hours.

Sexual Assault, Sexual Harassment, and Gender Based Violence and Discrimination

UMBC’s Policy on Sexual Misconduct, Sexual Harassment and Gender Discrimination and Federal Title IX law prohibit discrimination and harassment on the basis of sex in University programs and activities. Any student who is impacted by sexual harassment, sexual assault, domestic violence, dating violence, stalking, sexual exploitation, gender discrimination, pregnancy discrimination, gender-based harassment or retaliation should contact the University’s Title IX Coordinator to make a report and/or access support and resources:

Mikhel A. Kushner, Title IX Coordinator (she/her/hers)

410-455-1250 (direct line), kushner@umbc.edu

You can access support and resources even if you do not want to take any further action. You will not be forced to file a formal complaint or police report. Please be aware that the University may take action on its own if essential to protect the safety of the community.

If you are interested in or thinking about making a report, please see the Online Reporting Form. Please note that, while University options to respond may be limited, there is an anonymous reporting option via the online form and every effort will be made to address concerns reported anonymously.

Notice that Faculty are Responsible Employees with Mandatory Reporting Obligations:

All faculty members are considered Responsible Employees, per UMBC’s Policy on Sexual Misconduct, Sexual Harassment, and Gender Discrimination. Faculty are therefore required to report possible violations of the Policy to the Title IX Coordinator, even if a student discloses something they experienced before attending UMBC.

While faculty members want you to be able to share information related to your life experiences through discussion and written work, students should understand that faculty are required to report Sexual Misconduct to the Title IX Coordinator so that the University can inform students of their rights, resources and support.

If you need to speak with someone in confidence, who does not have an obligation to report to the Title IX Coordinator, UMBC has a number of Confidential Resources available to support you:

Other Resources:

Child Abuse and Neglect:

Please note that Maryland law and UMBC policy require that I report all disclosures or suspicions of child abuse or neglect to the Department of Social Services and/or the police.

Campus Resources