MIS 431 Master Syllabus

Costello College of Business Logo

MIS 431: Data Mining for Business Applications Master Syllabus


Course Instructor:
Office Number:
Office Hours:
Email:
Course Meeting Times:


Course Description

The business model of the top 6 companies by market capitalization- Apple, Microsoft, Amazon, Alphabet, Tesla, and Facebook, relies on one secret ingredient- DATA. Their ability to stay at the top will also be determined by how they "mine" the "data." Data mining is the art of extracting useful information from large amounts of data. There is a surfeit of data of all kinds, and organizations are struggling to extract powerful insights that will guide them in making smart business decisions. As future managers, you need to understand the possibilities and limitations of data mining. This course introduces you to the tools and theories required to transform data into insights. This course will also introduce you to programming in R, a versatile tool, that would set you to pursue a path in data science. To summarize, this course will enable students to approach business problems data-analytically, envision data mining opportunities, and scope out data analytics projects that would better aid business decisions. 

This course is also a Writing Intensive (WI) Course in Major in the Mason Core. WI courses further integrate rhetorical and field-specific knowledge as students engage the specific writing, critical thinking, and problem-solving methods of their chosen fields across a range of academic, professional, and civic contexts. The students need to fulfill the requirements of WI courses through course projects. All WI courses should explicitly meet both the WI learning outcomes and the WI course criteria. More information about these expectations is available on the WAC Program’s website.


Course Objectives

Upon completion of this course, students will be able to 

  • Apply data mining techniques to solve data management challenges and business decisions for organizations. 
  • Be adept in R and utilize advanced programming techniques and applied statistics, to prepare data, visualize complex datasets, and build machine learning models. 
  • Identify and implement various supervised and unsupervised machine learning algorithms to improve business decisions. 
  • Scope and implement an end-to-end data mining project, and write a comprehensive report detailing the business needs and final recommendations.

Textbooks for Reference

  • R for Data Science (RDS) 
    Hadley Wickham and Garrett Grolemund 
    Authors have made the contents of the book available for free in the link 
    Also available in ’Reilly Learning Platform (Safari) in GMU library databases 
     
  • An Introduction to Statistical Learning: with Applications in R (ISLR) 
    Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani 
    The authors have made the pdf version of the book available for free in the link Introduction to Statistical Learning (statlearning.com). 
    Also available in Springer Textbooks link in George Mason University library databases. 
     
  • Statistical Inference via Data Science: A ModernDive into R and the Tidyverse (MD) 
    Chester Ismay and Albert Y. Kim 
    The authors have made the contents of the book available for free in the link.

While students may benefit immensely by following the recommended lessons from the text, however, the student can learn actively using the videos, tutorials, online courses, and inclass exercises designed specifically for this course.


Course-specific Software

  • R and RStudio
    • RStudio is an open source data science and statistical computing software available as both desktop and as a cloud-based application.
    • An instance would be made available in RStudio Cloud to complete tutorials, in-class exercises, and assignments. However, for long-term learning, students would be actively encouraged to download and use the desktop version of R and RStudio. The link for signing up in RStudio Cloud (Posit Cloud) will be available in Canvas.
    • Specific instructions will be provided to the students for the installation of R and RStudio
  • DataCamp
    • DataCamp offers interactive R and Python courses on topics in data science, statistics, and machine learning. Students can learn through short video tutorials and interactive exercises from within their web browser. Courses on DataCamp do not require any software installation to complete. Students in MIS 431 have been granted access to all DataCamp courses for 6 months. DataCamp will serve as a tool for MIS 431 students to enable learning programming concepts through hands-on interactive exercises. In MIS 431, students are required to complete three DataCamp courses during the semester. Each course requires approximately 4 to 6 hours to complete.
    • Links to the description of the courses are provided below.

Students will receive an access link to join the MIS 431 team in DataCamp through Canvas. After clicking the link, students will first be prompted to create an account. Students must use their George Mason e-mail address to enroll (username@gmu.edu or username@masonlive.gmu.edu ) and add their name as it appears in George Mason records.


Course Website

Canvas will be used for this course. Login and click on the “Courses” tab. You will see MIS 431 course NOTE: Username and passwords are the same as your George Mason email account. You must have consistent access to an internet connection in order to complete the quiz or assignments in this course through Canvas. Note the technology requirements for Costello College of Business in your Canvas course menu—it contains details of minimum technology requirements.


Participation

Courses will be conducted through combination of lecture, exercises, quizzes, and tutorials, and individual course projects. 

Learning can only happen when you are playing an active role. It is important to place more emphasis on developing your insights and skills, rather than transmitting information. Knowledge is more important than facts and definitions. It is a way of looking at the world, an ability to interpret and organize future information. An active learning approach will more likely result in long-term retention and better understanding because you make the content of what you are learning concrete and real in your mind. 

Although an active role can look differently for various individuals, it is expected in this class that you will work to explore issues and ideas under the guidance of the professor and your peers. You can do this by reflecting on the content and activities of this course, asking questions, striving for answers, interpreting observations, and discussing issues with your peers. 

The course will have specific requirements around participation, and will be explained in the later sections.


Assignments

Students are expected to complete and submit all course assignments on the dates scheduled. Accepting an assignment after the scheduled close of the class on its due date is at the sole discretion of the instructor. If you will be unable to complete and submit the assignment by the due date and time, the student must obtain the instructor’s approval prior to the deadline. The instructor will deduct 10% for every day that the assignment is late. Technical Issues: Students should anticipate some technical difficulties during the semester and should, therefore, budget their time accordingly. Late deliverables will not be excused based on individual technical issues. If you experience unexpected technical issues during an exam, you need to instantly take screenshots or pictures of the error screens with timestamps as well as any other possible evidence, and to contact me ASAP about it.



Guidance for Writing Projects

This course is designated as writing intensive. I will spend time to provide guidance on writing the projects, review the initial version of the draft along with a Teaching Assistant, and provide feedback on your writing before the final submission. It is important, that you start working on the final project a little early, so that you are able to incorporate the feedback about your writing in your final version.


Evaluation

  Each Amount Subtotal
Assignment      
DataCamp Courses 20 3 60
Homework Assignments 30 3 90
In-Class Exercises 3 10 30
Quiz      
Online Quiz 10 6 60
(Top 6 out of 10)      
Writing Project      
Mid-Term Data Analysis Project 80 1 80
Final Project 150 1 150

Grading Criteria

The instructor reserves the right to apply a grading curve based on the overall performance of the class. The current one is the default option of Canvas.

Letter Grade Range
A 94 to 100%
A- 90 to < 94%
B+ 87 to < 90%
B 83 to < 87%
B- 80 to < 83%
C+ 77 to < 80%
C 73 to < 77%
C- 70 to < 73%
D+ 67 to < 70%

Course Schedule

Note: This is a tentative schedule, and subject to change as necessary– monitor the Canvas for current deadlines.The due dates for quizzes, homework, and projects are subject to change based on the pace of our class. This list is tentative, so please monitor Canvas closely for the actual due dates.

Weekly Schedule

Week Date Lessons Targets
Lesson 1 21-Jan

Lesson 1: Introduction to Data Mining

  • Introduction to R Programming 
  • Setup R and RStudio 
  • Quiz 0 (Bonus) Due Date 1/23
  • Software 
  • DataCamp 
  • Set up R and RStudio
  • Access to RStudio Cloud 
  • Course Participation
Lesson 2 28-Jan

Lesson 2: Data Mining Process 

  • Scoping of data-analysis projects 
  • Intermediate R programming 
  • Functions and Tidyverse
  • DataCamp 
  • Introduction to R 
  • Quiz 1
Lesson 3 4-Feb

Lesson 3: Data Preparation and Cleaning 

  • Introduction to data analysis with dplyr 

Lesson 4: Data Decisions 

  • Introduction to making decisions and iterations
  • Quiz 2
     
  • DataCamp 
  • Introduction to the Tidyverse
Lesson 4 11-Feb
  • Intermediate data analysis with dplyr
  • Quiz 3 
Lesson 5 18-Feb

Lesson 5: Working with Multiple Datasets 

  • Relational data– Combine datasets 
  • Reshaping and pivoting data with tidyr
  • DataCamp 
  • Exploratory Data Analysis in R
Lesson 6 25-Feb

Lesson 6: Data Visualization 

  • Data Visualization with ggplot2 
  • Lesson 7: Statistics for Data Mining- I
  • Quiz 4
     
  • Homework #1
Lesson 7 4-Mar
  • Probability 
  • Distributions 
  • Descriptive Statistics
  • Quiz 5
Lesson 8 18-Mar

Lesson 8: Statistics for Data Mining- II 

  • Central Limit Theorem 
  • Inferential Statistics · Resampling
  • Quiz 6
Lesson 9 25-Mar

Lesson 9: Introduction to Machine Learning 

  • Model Training Process 
  • Data Preprocessing and Feature Engineering for Machine Learning 
  • Cross Validation
  • Midterm Data Analysis Project
Lesson 10 1-Apr

Lesson 10: Supervised Learning Algorithms- I 

  • Linear Regression 
  • One predictor 
  • Multiple predictors
  • Quiz 7
Lesson 11 8-Apr

Lesson 11: Supervised Learning Algorithms-II 

  • Introduction to Classification 
  • Logistic Regression 
  • Assessing Model Fit
  • Homework #2
  • Quiz 8
Lesson 12 15-Apr

Lesson 12: Supervised Learning Algorithms- III 

  • Discriminant Analysis  
  • LDA, QDA 
  • K-Nearest Neighbors (KNN)
  • Quiz 9
Lesson 13 22-Apr

Lesson 13: Supervised Learning Algorithms- IV 

  • Decision Trees and Random Forests 
  • Hyperparameter tuning with grid search
  • Quiz 10
  • Homework #
Lesson 14 29-Apr

Lesson  14: Introduction to Unsupervised Learning 

  • Principal Components Analysis 
  • K-means Clustering
  • Final Project

Tentative Due Dates

Item Due Date
Quiz 1 28-Jan
Quiz 2 4-Feb
Quiz 3 11-Feb
Quiz 4 25-Feb
Quiz 5 4-Mar
Quiz 6 18-Mar
Quiz 7 1-Apr
Quiz 8 8-Apr
Quiz 9 15-Apr
Quiz 10 22-Apr
   
Datacamp  
Introduction to R 30-Jan
Introduction to the Tidyverse 6-Feb
Exploratory Data Analysis in R 20-Feb
   
Homework 1 27-Feb
Homework 2 10-Apr
Homework 3 24-Apr
   
Midterm Project 27-Mar
Final Project 2-May

To print: Right-click and choose “Print,” then follow your browser’s print settings.
To download: Right-click and choose “Print,” then select “Save as PDF.”