DS340W.001 Applied Data Sciences
Each team will sign up for a topic related to data sciences techniques in the context of real-world scenarios, and take turns to give an in-depth presentation to the class.
Objective: Through case studies in multiple domains, the student presentations highlight two fundamental principles in data sciences: (i) data sciences solutions must carefully consider the problem in the real world scenarios, and (ii) predictive modeling can both inform and be informed by relevant knowledge (including theories, models, frameworks) of the relevant domains. At the end of each talk, the class should be able to know (1) what is the problem and why people care about this problem; (2) how the problem is formulated mathematically, (3) how relevant domain knowledge is integrated in the design of the data sciences techniques, and (4) how the solution is evaluated.
Time: 25-minute presentation + 5-minute Q&A. Students can raise questions in the middle. The total time should be limited to 30 minutes.
Grading: The grading of your presentation will be based on (1) the peer evaluation of your presentation, and (2) the instructor's evaluation of your presentation.
- Peer evaluation form: special-topic-peer-eval.pdf
- Introduction ~ 5 min
- What is the problem? Try to give a formal definition of the problem.
- Why this is an important problem? What are the applications?
- Overview of the literature ~ 5 min
- Categories of the approaches. Try to categorize the literature into 2-3 categories. How and why you categorize the presentation in this way?
- Alternatively, you can organize the literature around milestones. Try to identify 2-3 milestones. How are the other works related to the milestones?
- Case study – data sciences techniques ~ 10-15 min
- Pick 1-2 most representative work / milestone.
- Discuss about the methods in detail. Begin with teaching your classmates what is the baseline method if they want to solve this problem. Slow down and make this part very clear. If people do not catch the idea of the baseline approach, it will be hard to follow up with the rest of the talk.
- Talk about the more complicated methods. Tell us what are the challenges the proposed methods try to solve? What is the idea of proposed method? Any interesting experimental results or case studies?
- Summary and discussion ~ 5 min
- What are the limitations of existing approaches? What are the promising future directions?
- Your presentation should be very technical. You should read the papers to present many times, read the source codes for the papers if available.
- You should know everything about the subject to present and be prepared to answer any questions from the class.
- You should try to integrate all aspects of the topic into a coherent presentation, instead of dividing your presentation into disconnected parts.
- For each presentation, upload your slides to the corresponding Canvas dropbox before the class. Only one person in each team needs to make the submission. Don't make duplicate submissions.
- Presentation is an important skill. Practice it with your teammate/friends/roommates...
- Don't go over time. An over-time presentation usually makes audience feels bored. You can go over some slides faster if you feel you are about to run out of time. Again, practice will help you better control the time.
- Try to use figures and examples to illustrate your idea. Dry text or formulas are boring and you will lose the attention of your audience.
- Make sure you convey your contents clearly. You may mistakenly think the audience can follow up with you. So prepare your slides from the audience's point of view to make sure they understand. And have eye contacts with the audience to see their reactions.
Requirement for the students as audience:
- You are required to come to the class. You need to take notes and evaluate the presentations.
- Your participation counts towards "class attendance" in the final grading.
Note: For each topic, we list two representative work for your reference. You may use them in your presentation as case studies, but feel free to choose other papers. And your literature survey should go beyond these two papers to provide a comprehensive overview of the field.
- Image Aesthetics and Computational Photography (Team 6)
- Datta, R., Joshi, D., Li, J., & Wang, J. Z.. Studying aesthetics in photographic images using a computational approach. In European Conference on Computer Vision, pp. 288-301, 2006.
- Yao, L., Suryanarayan, P., Qiao, M., Wang, J. Z., & Li, J.. Oscar: On-site composition and aesthetics feedback through exemplars for photographers. International Journal of Computer Vision, 96(3), 353-383, 2012.
- Natural Language Processing - Sentiment Analysis (Team 10)
- T. Wilson, J. Wiebe, and P. Hoffman. Recognizing contextual polarity in phrase level sentiment analysis. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347-354, 2005.
- Socher, J. Pennington, E. Huang, A. Ng, C. Manning. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of Empirical Methods in Natural Language Processing, pp. 151-161, 2011.
- Natural Language Processing - Metaphor Detection (Team 5)
- Tsvetkov, Y., Boytsov, L., Gershman, A., Nyberg, E., & Dyer, C.. Metaphor detection with cross-lingual model transfer. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol. 1, pp. 248-258, 2014.
- Gao, G., Choi, E., Choi, Y., & Zettlemoyer, L.. Neural Metaphor Detection in Context. arXiv preprint arXiv:1808.09653, 2018.
- Medical Image Segmentation (Team 2)
- T. Shen, H. Li, and X. Huang. Active volume models for medical image segmentation. IEEE Trans. Med. Imaging, vol. 30, no. 3, pp. 774–791, 2011.
- Xue, Y., Xu, T., Zhang, H., Long, L. R., and Huang, X.. SegAN: Adversarial network with multi-scale L1 loss for medical image segmentation. Neuroinformatics, pp.1–10, 2018.
- Machine Learning in Economics (Team 9)
- Blumenstock, J., Cadamuro, C., On, R.. Predicting poverty and wealth from mobile phone metadata. Science 350 (6264), 1073–1076, 2015.
- Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B., and Ermon, S.. Combining satellite imagery and machine learning to predict poverty. Science 353 (6301), 790–794, 2016.
- Geometry in Photo Composition (Team 1)
- Zhou, Z., Farhat, F., & Wang, J. Z.. Detecting dominant vanishing points in natural scenes with application to composition-sensitive image retrieval. IEEE Transactions on Multimedia, 19(12), 2651-2665, 2017.
- He, S., Zhou, Z., Farhat, F., & Wang, J. Z.. Discovering triangles in portraits for supporting photographic creation. IEEE Transactions on Multimedia, 20(2), 496-508, 2018.
- Social Network (Team 4)
- Xu, X., Yuruk, N., Feng, Z., & Schweiger, T. A.. Scan: a structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 824-833, 2007.
- Wang, D., Pedreschi, D., Song, C., Giannotti, F., & Barabasi, A. L.. Human mobility, social ties, and link prediction. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1100-1108, 2011.
- Game Playing (Team 12)
- Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M.. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... & Dieleman, S.. Mastering the game of Go with deep neural networks and tree search. nature, 529(7587), pp. 484-489, 2016.
- Urban Computing - Traffic Prediction (Team 3)
- B. Pan, U. Demiryurek, and C. Shahabi. Utilizing real-world transportation data for accurate traffic prediction. in IEEE 12th International Conference on Data Mining (ICDM), pp. 595–604, 2012.
- H. Yao, X. Tang, H. Wei, G. Zheng, and Z. Li. Revisiting spatial-temporal similarity: A deep learning framework for traffic prediction. In Proceedings of 33rd AAAI Conference on Artificial Intelligence, 2019.
- Speech Recognition (Team 7)
- D. A. Reynolds. Speaker identification and verification using gaussian mixture speaker models. Speech Commun., 17(1-2):91–108, 1995.
- A. Hannun, C. Case, J. Casper, et al. Deepspeech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567, 2014.
- Scene Text Localization (Team 11)
- L. Neumann and J. Matas. Real-time scene text localization and recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3538–3545, 2012.
- D. He, X. Yang, C. Liang, Z. Zhou, A. G. Ororbia, D. Kifer, and C. L. Giles. Multi-scale fcn with cascaded in-stance aware segmentation for arbitrary oriented word spot-ting in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 474–483, 2017.
- Machine Translation (Team 8)
- P.F. Brown, J. Cocke, V. Della-Pietra, S. Della-Pietra, J.D. Lafferty, R.L. Mercer, and P.S. Roossin. A Statistical Approach to MachineTranslation. Computational Linguistics, vol. 16, no. 2, pp. 79–85, 1990.
- Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation.arXiv preprint arXiv:1609.08144, 2016.