Skip to end of metadata
Go to start of metadata

DS340W.001 Applied Data Sciences

General Guideline

The project will be attacked using a series of milestones including (i) a research topic proposal, (ii) a mid-term report, (iii) a final report. Each team will also have a chance to present the project to the class in the last week of the semester.

Format:

  • For LaTex, please simply use \documentclass[12pt]{article}. Do not change margin or font.
  • For Word, please use the default word setting (Cambria font 12, single space). Do not change the margin or font.

Recommended reading:

 Project proposal

Choose one of the following project topics:

  1. Sentiment Analysis on Movie Reviews: https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews/
  2. Store Item Demand Forecasting Challenge: https://www.kaggle.com/c/demand-forecasting-kernels-only
  3. Kobe Bryant Shot Selection: https://www.kaggle.com/c/kobe-bryant-shot-selection/
  4. Humpback Whale Identification Challenge: https://www.kaggle.com/c/whale-categorization-playground
  5. New York City Taxi Fare Prediction: https://www.kaggle.com/c/new-york-city-taxi-fare-prediction

Then, write a project proposal which contains:

  1. Project Title. Team Name. Team Members. (Names and ID)
  2. Project Summary.  It should be a short description of what your project is.
  3. Project Description.
    1. Problem.
      1. Problem definition. What data science problem is the project aimed at? Be sure it is clearly, precisely, and carefully stated.
      2. Significance. Why is it an interesting and important problem? Which industries and companies may be interested in your solution?
      3. Dataset description. What kind of data will be involved? Where are they originally coming from? How does each data sample look like?
      4. Evaluation metrics. How will the performance of a method be evaluated?
    2. Literature review.
      1. What techniques do people often use to tackle this type of problems? List at least three main references. Briefly describe the methods in each reference. Note: The main references must be in the form of books, reports, or research papers. Although you may seek additional help from other sources such as online forums and Wikipedia, they do not count as your main reference.
    3. Technical approaches.
      1. What are the key technical challenges of this problem?
      2. A high-level description about the initial approach(es) you plan to try. Why you think such approach(es) would be successful? Be sure to provide enough background for people to judge your claims (e.g., similarities to your main references).
    4. Plan of your team work.
      1. How often to meet?
      2. How to coordinate the team work.

Requirements:

  • The report must be written as an essay. No Q&A style. The questions above just serve as a guidance about what contents should be provided.
  • The report must include a log on group activities (30 points for individual grade adjustment). See Grading Criteria below for an example group activity log.
  • Submit a single .docx or .pdf file to the appropriate Dropbox in Canvas. Only one person in each team needs to make the submission. Don't make duplicate submissions.

Midterm report 

Each team is required a report describing your team's progress on the course project. The report should have the following sections:  

  1. Introduction: Introduce the data science problem you aim to tackle, and why it is important and interesting.
  2. Related work: List at least three main references related to your problem. Briefly describe the methods in each reference. Note: The main references must be in the form of books, reports, or research papers. Although you may seek additional help from other sources such as online forums and Wikipedia, they do not count as your main reference.
  3. Approaches: Explain all the methods you have tried. For each method:
    1. Use appropriate math notation to describe the input data and the output of your method.
    2. Use appropriate math expressions, diagrams, and/or graphic figures to explain step-by-step how your method transforms the input data into the output. Be sure to provide a clear explanation for each math symbol you use.
    3. If you method includes parameters or variables which need to be computed based on the available data (e.g., via machine learning), explain how they are computed (e.g., what is your loss function? what optimization method or algorithm are you using to obtain the optimal values?)
    4. Describe any additional implementation details of your method (e.g., learning rate, number of epochs).
  4. Experiments: The midterm report must include at least one experiment you have done. The experiment does not need to be successful (i.e., achieve high accuracy), but you should have attempted something.
    1. Describe what data you use for the experiment. Try to visualize your data and summarize some important characteristics.
    2. Describe the evaluation metrics.
    3. Report the performance of your method(s). For your best performance, please include the screenshot of your Kaggle submission website so we know this is the actual result submitted through the Kaggle system.
    4. Analyze the results. Give some insights about the pros and cons of your method(s). For example: How your method works on some particular data samples? Is there any sample on which it works very well? When does your method fail? When necessary, include tables and/or figures to help illustrate your results.
  5. Future plan: Describe how you plan to improve your current method or other methods you want to try.

Requirements:

  • The report must be written as an essay. No Q&A style. The outline above just serve as a guidance about what contents should be provided.
  • The report must include a log on group activities (30 points for individual grade adjustment). See Grading Criteria below for an example group activity log. 
  • Submit a single .zip file to the appropriate Dropbox in Canvas. Your submission should include (i) the report (.pdf or .docx) and (ii) a folder named “code” containing all your codes (.py or .ipynb). 
  • Only one person in each team needs to make the submission. Don't make duplicate submissions.

Final report

The final report should have the following sections: 

  1. Introduction: Introduce the data science problem you aim to tackle, and why it is important and interesting.
  2. Related work: List at least three main references related to your problem. Briefly describe the methods in each reference. Note: The main references must be in the form of books, reports, or research papers. Although you may seek additional help from other sources such as online forums and Wikipedia, they do not count as your main reference.
  3. Approaches: Explain all the methods you have tried. For each method:
    1. Use appropriate math notation to describe the input data and the output of your method.
    2. Use appropriate math expressions, diagrams, and/or graphic figures to explain step-by-step how your method transforms the input data into the output. Be sure to provide a clear explanation for each math symbol you use.
    3. If you method includes parameters or variables which need to be computed based on the available data (e.g., via machine learning), explain how they are computed (e.g., what is your loss function? what optimization method or algorithm are you using to obtain the optimal values?)
    4. Describe any additional implementation details of your method (e.g., learning rate, number of epochs).
  4. Experiments: For final report grading, more emphasis will be place on (i) the best result you obtained on Kaggle, and (ii) the analysis of your results.
    1. Describe what data you use for the experiment.
    2. Describe the evaluation metrics.
    3. Report the performance of your method(s). For your best performance, please include the screenshot of your Kaggle submission website.
    4. Analyze the results. Give some insights about the pros and cons of your method(s). For example: How your method works on some particular data samples? Is there any sample on which it works very well? When does your method fail? When necessary, include tables and/or figures to help illustrate your results.
  5. Conclusion: Summarize the results and discuss about what you have learnt throughout the project.
  6. References

Requirements:

  • The report must be written as an essay. No Q&A style. The outline above just serve as a guidance about what contents should be provided.
  • The report must include a log on group activities (30 points for individual grade adjustment). See Grading Criteria below for an example group activity log. 
  • Submit a single .zip file to the appropriate Dropbox in Canvas. Your submission should include (i) the report (.pdf or .docx) and (ii) a folder named “code” containing all your codes (.py or .ipynb). 
  • Only one person in each team needs to make the submission. Don't make duplicate submissions.

Final presentation instructions

 Time: 8-min presentation + 2-min Q&A. The total time should be limited to 10 minutes.

Suggested outline:

  • Introduce the problem and dataset. If you are the first team to present a project, please talk a little bit more about the problem (i.e., input, output, evaluation metrics) so that other students working on a different project can better know the context. If you are not the first team to present, please also include this part in your slides but you may skip some slides if they are already covered by the first team. See the team order below.
  • Describe what methods you have tried and how do these methods perform. For your best performance, please include the screenshot of your Kaggle submission website so we know this is the actual result submitted through the Kaggle system.
  • Analyze your results and discuss about why the performance is good/bad.
  • Discuss what you have learned through this project and potential future directions.

Notes:

  • All students are required to attend the classes. But due to the short presentation time, it is fine to have only one person to present for the team.
  • Please practice and make sure your presentation is limited to 8 minutes. Make sure you cover the important parts, especially the method with the best performance.

Presentation time and order:

  • Tuesday 4/28: Teams 3, 5, 8, 9, 10, 11 (Sentiment Analysis on Movie Reviews)
  • Thursday 4/30: Teams 1, 2, 4, 6, 7, 12 (Kobe Bryant Shot Selection, New York City Taxi Fare Prediction, Humpback Whale Identification Challenge)

 


Grading Criteria

Group Report (70 points)

Grading criteria is based on the requirements of each report.

Individual Contribution and Participation (30 points)

You should provide an activity log for each project report.

Team member responsibilities (15 points): The report should clearly indicate the responsibility of each group member. If possible, give a table to describe who was responsible for which part of the report, who wrote which section, who coordinated the group work activities, etc. 

 Here is an example of a group activity table:

Team Member

Responsibilities

Jane Doe

Problem formulation, data collection, literature survey, report writing

John Smith

Problem formulation, method design, experiment 1

Jane Smith

Problem formulation, method design, experiment 2

 As long as a group member contributed to the group project report, the group member can get 15 points.

Group activities (15 points): The report should also provide logs of your group activities, including when the group met, what the group did, and who attended.

Here is an example of a group activity log table:

Date

Activity

Attendance

1/20/2019 11:00 – 11:45 AM

Discussing project ideas

all team members

1/23/2019 1:00 – 1:45 PM

Discussing report structure

all members, except John Doe

A group member who attended more than 80% of group activities will get 15 points. If attendance was between 50% and 80%, 10 points would be given. Below 50%, no point.

  • No labels