Skip to content

USC DSCI 553 - Foundations & Applications of Data Mining - Spring 2024 - Prof. Wei-Min Shen

License

Notifications You must be signed in to change notification settings

KayvanShah1/usc-dsci553-data-mining-sp24

Repository files navigation

DSCI553 - Foundations & Applications of Data Mining - Spring 2024 - USC

Welcome to the comprehensive repository for course DSCI 510 - Foundations of Data Management taught by Professor Wei-Min When at USC during the Spring 2024 semester. This centralized hub contains all coursework materials, including assignments and project solutions, organized into folders representing distinct modules covered in the course.

Tip

Before exploring the materials, take a moment to review the license and disclaimer for responsible utilization. The repository covers various topics, providing valuable insights and hands-on experience in Data Management.

Course Details:

  • Course Name: DSCI 553 - Foundations & Applications of Data Mining
  • Instructor: Prof. Wei-Min Shen
  • Semester: Spring 2024

Feel free to explore the assignments, projects, and solutions provided as learning aids. Whether you're a beginner or an experienced practitioner, this repository aims to be your companion in mastering the intersection of foundational data mining fundamentals within Data Science & Engineering. Happy learning!

Caution

Please note that this repository serves as a reference guide and should be utilized as a tool for learning and comprehension. It's paramount to refrain from engaging in any activities associated with plagiarism. Embrace the wealth of knowledge herein to enhance your understanding and augment your skill set in Data Mining.

Table of contents

Assignment Topic Covered Grade
HW 1 Data Exploration of Yelp Reviews Dataset with Spark RDD 7/7
HW 2 Implement SON Algorithm to find Frequent Itemsets using Spark and exploration of Ta Feng Dataset 7/7
HW 3 Build Hybrid Recommendation systems integrating Item-based Collaborative Filtering and Model-based approaches using XGBRegressor 7/7
HW 4 Building Graphs and Community Detection based on Graphframes and Girvan-Newman algorithm 7/7
HW 5 Data Streaming Analysis - Bloom Filter, Flajolet-Martin, and Reservoir Sampling 7/7
HW 6 Clustering using Bradley-Fayyad-Reina (BFR) algorithm on synthetic dataset 7/7
--- ---
Competition Recommendation System on Yelp Reviews Dataset 8/8
--- ---
Quizzes Consists of PDF documents with question bank for quizzes

References

  1. USC DSCI 553 Fall 2023 - rutujabhandigani/DSCI553-Data-Mining
  2. USC DSCI 553 Fall 2022 - CyL97/DSCI-553
  3. USC DSCI 553 Fall 2021 - Shayne-Yang/DSCI_553
  4. USC DSCI 553 Spring 2021 - pohann/DSCI553

Authors

  1. Kayvan Shah | MS in Applied Data Science | University of Southern California

LICENSE

This repository is licensed under the BSD 5-Clause License. See the LICENSE file for details.

Disclaimer

The content and code provided in this repository are for educational and demonstrative purposes only. The project may contain experimental features, and the code might not be optimized for production environments. The authors and contributors are not liable for any misuse, damages, or risks associated with the use of this code. Users are advised to review, test, and modify the code to suit their specific use cases and requirements. By using any part of this project, you agree to these terms.