Child pages
  • Project
Skip to end of metadata
Go to start of metadata

IST 557 Data Mining: Techniques and Applications

Final project report 


The final report should look like a short paper, summarizing all the things you learnt through the project. It should have the following parts:

  1. Introduction. A brief description of the problem, datasets, input/output.
  2. Method. A detailed description of your method produced the best performance. You need to write all the details about features and parameters, as other people could reproduce the results.
  3. Results.
    (1) A brief description of other baseline methods (these methods could be the methods you did for check points #1 and check points #2) or variants of your own best method with different features or parameters. 
    (2) A table summarizing the results of all the methods (baseline methods and your best method).
    (3) Discuss about the performance in the table, e.g., why some methods are better than others?
    (4) A screenshot of your best method performance (should be the same as the one included in check point #3).
  4. Summary.
    (1) Summarize what you learnt from technical side.
    (2) Summarize what you learnt throughout the project (e.g., teamwork, communication, resource seeking, something you wish you knew earlier, your future plan, etc.)
  5. Team contribution.
    Briefly mention each member's contributions for the project.

What to submit:

  1. project-final.pdf that contains the report (remember to include the screenshot)

  2. only one student needs to make the submission for the team 

Submit the report to Canvas by 11:59pm 12/15 Sunday. No late submission will be accepted. 

Project check point #3 (final result) 

Report:

  1. The best performance. Show your best performance so far. For your best performance, do include the screenshot of your Kaggle submission website (public leaderboard) so we know this is the actual result submitted through the Kaggle system. Use the same given team name as shown below.  If there is no screenshot, the score for this submission will be 0.
  2. The best method. Describe the method that produces the best performance.
  3. Other methods your tried after check point #2. Briefly describe what methods you have tried and how do these methods perform. Discuss about why the performance is good/bad.
  4. Summary.
  5. Team member contributions. Briefly mention each member's contributions. 

What to submit:

  1. project3.pdf that contains the report (remember to include the screenshot)

  2. project3.zip that contains your source code files for the best method

  3. only one student needs to make the submission for the team 

Submit the report to Canvas by 11:59pm 12/8 Sunday. No late submission will be accepted. No extension will be granted to any team. 

 

Project check point #2 (midterm) 

Report:

  1. The best performance. Show your best performance so far. For your best performance, do include the screenshot of your Kaggle submission website (public leaderboard) so we know this is the actual result submitted through the Kaggle system. Use the same given team name as shown below.  If there is no screenshot, the score for this submission will be 0.
  2. The best method. Describe the method that produces the best performance.
  3. Other methods your tried after check point #1. Briefly describe what methods you have tried and how do these methods perform. Discuss about why the performance is good/bad.
  4. Summary and plans.
  5. Team member contributions. Briefly mention each member's contributions.

What to submit:

  1. project2.pdf that contains the report (remember to include the screenshot)

  2. project2.zip that contains your source code files for the best method

  3. only one student needs to make the submission for the team 

Submit the report to Canvas by 11:59pm 11/17 Sunday. No late submission will be accepted. No extension will be granted to any team. 


Project check point #1 (getting started) 

Report:

  1. The best performance. Show your best performance so far. For your best performance, do include the screenshot of your Kaggle submission website so we know this is the actual result submitted through the Kaggle system. Use the given team name as shown below.
  2. The best method. Describe the method that produces the best performance.
  3. Other methods your tried. Briefly describe what methods you have tried and how do these methods perform. Discuss about why the performance is good/bad.
  4. Summary and plans.
  5. Team member contributions. Briefly mention each member's contributions.

What to submit:

  1. project1.pdf that contains the report

  2. project1.zip that contains your source code files for the best method

  3. only one student needs to make the submission for the team

Submit the report to Canvas by 11:59pm 10/27 Sunday. No late submission will be accepted. No extension will be granted to any team. 

 



Results

Project A 

 check point 1check point 2check point 3
10.967430.918640.88398
20.985370.923940.899
31.002810.92460.90544
41.032520.9444270.9068
51.100940.966330.91157
61.228670.971340.91254
72.180990.992350.91662
810.119561.021380.99235
9 1.110380.99309

 


Project B
 

 check point 1check point 2check point 3
10.357250.230490.3683
20.370940.351230.37122
30.424620.374220.37544
40.43690.386580.37647
50.440340.387680.37648
60.533310.388210.38755
70.86970.402850.39159
8 0.454420.39422

 


Project C

 check point 1check point 2check point 3
10.982540.98670.9881
20.981860.986390.98797
30.978960.985630.98731
40.977780.983450.98703
50.977220.98310.98692
60.914710.982520.98648
7 0.981540.98211

 




 

Please use the following given team name when you make submissions on Kaggle.

Project A. Future sales

https://www.kaggle.com/c/competitive-data-science-predict-future-sales/

Team Name  checkpoint 1check point 2check point 3
IST557_A1Chandu DasariVijay Vamsi Nadellathe best (0.96743) the best (0.88398)
IST557_A2JInquan ZhangZitong Shang   
IST557_A3Mihir MehtaMudit Gargrunner-up (0.98537)runner-up (0.92394)runner-up (0.899)
IST557_A4Szu-Chi KuanTing-Yao Hsu   
IST557_A5Tongyao ShenYizhi Liu   
IST557_A6Jeongwon JoGuillermo Romera Rodriguez the best (0.91864) 
IST557_A7Sanjana GautamVrushali Sanjay Girkar   
IST557_A8Sian LeeChieh-Yang Huang   
IST557_A9Sean ParsonsIyadunni Adenuga   

 

Project B. Taxi trip duration

https://www.kaggle.com/c/nyc-taxi-trip-duration  

Team Name  checkpoint 1checkpoint 2check point 3
IST557_B1Dongqin ZhouAmirreza Bagherzadehkhorasanithe best (0.35725)runner-up (0.35123) 
IST557_B2Enyan DaiTianxiang Zhao   
IST557_B3Junzhu ShenZihan Ren   
IST557_B4Samarth GuptaRao Shivansh  runner-up (0.3712)
IST557_B5Xinyang ZhangRen Pang   
IST557_B6Yu LiangDuo Pan   
IST557_B7Jiaqi WangHua Shen   
IST557_B8Kon Woo KimYifan Yangrunner-up (0.37094)the best (0.23049)the best (0.3683)

 

Project C. Toxic comment

https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/overview

Team Name  checkpoint 1check point 2check point 3
IST557_C1Chacha ChenShixiong Jing the best (0.9867)runner-up (0.98797)
IST557_C2Haomiao NiJiachen Liuthe best (0.98254)  
IST557_C3Zeba KarishmaPranav Narayanan Venkit  the best (0.9881)
IST557_C4Anita ChenChi-Yang Hsu   
IST557_C5Rahul KatikiSo Young Choi runner-up (0.98639) 
IST557_C6Tanji Hwang    
IST557_C7Xian WuZhenpeng Linrunner-up (0.98186)  

 

 

  • No labels