The course project offers you the opportunity to showcase your mastery of the course learning goals, encompassing both soft and hard skills, by collaborating with other students to formulate and answer a research question of your choosing based on suitable dataset that you would proudly present to a prospective employer.
The course project is a chance to challenge yourself and work with minimal guidance to explore new territories such as interactive plots, text analysis, and web apps.
Expectations
Below are few expectations from your project:
Professionalism–you are expected to produce a professional data-based report in comparable quality to New York Times and Five Thirty Eight reports–below are some examples:
Accessible to individuals with little data literacy
Self-contained, ie, all project related files such as code and data are in a single repo and relative file paths are always used except for files hosted on the web which are referenced using absolute file paths.
Published online via the provided GitHub Assignment repository
Team Size
You are expected to work in a team of 3-4 students.
Domain Area
Each team is free to select any domain area for their project, ranging from agriculture and biology to education and e-sport. The sole requirement is that the data analysis must be sufficiently sophisticated to engage in a meaningful discussion about it with a prospective employer.
Structure
Your data story should include the following, all within context.
Motivation, research question, & background
What question are you trying to answer?
Why is it important or interesting?
what background information is necessary?
What assumptions, terms, and/or acronyms need to be clarified?
Data
Data collection
What was collected?
When was it collected?
Why was it collected?
How was it collected originally?
Who collected it?
Data acquisition
Where / how did you get the data?
What is the source?
Data understanding
How much data do you have?
What types of measurements?
Anything you needed to clean before getting started?
Data insights
It’s your job to explicitly identify and discuss key insights. Don’t simply present the audience with some code and output and expect them to do that work. Specifically address the following questions:
What are the important takeaways from the data? What was interesting?
Why do these takeaways matter?
Was there anything surprising?
Overall, what do you want the audience to walk away with? What do you want them to understand about your data and research questions?!?
Conclusions / Big Picture
How do the insights connect to answer your research question?
What improvements might someone make to your analysis?
Are there any limitations or weaknesses of your data / analysis?
Deliverables
Due Dates
Check the class Moodle page.
Where to Submit
Most of project-related deliverables are submitted via the provided GitHub Classroom Assignment linked in Moodle.
Project Group GitHub Classroom Assignment
The project is a group (not individual) GitHub Classroom Assignment meaning that all your team members will share the same (online) GitHub repository. This means that conflict will arise whenever multiple members edited the same files and tried to push their changes to GitHub. To be on the safe side, the teams should create separate file for each member to their own analysis in. Afterward, the team members should come together how to put the different pieces together to form the final report.
I am not expecting the GitHub collaboration process to go smooth. So, if you encounter any issue and you could not resolve it on your own in a short period of time, let the instructor know as soon as possible to avoid any delay in your main task–the data analysis.
Proposal
Each team must submit a proposal for their project in the form of a rendered HTML Quarto page added as appendix to project website. The proposal must include the followings
a title
the names of the team members
a short description of the project
the reasons/inspirations behind choosing this project
a rough implementation and responsibility plan, ie, what needs to be accomplished and who will do what when. Think about the list of deliverables when building the plan.
Reflect
When done, each member needs to reflect on this part.
Sketch/Illustration
Each team must submit a rough sketch/illustration, also known as a low-fidelity prototype, depicting the expected data analysis process. Hand-drawn sketches/illustrations, which can be scanned, are encouraged. The purpose of the sketch is to provide visual insight into the process that will be followed when analyzing the data. The sketch should be linked from a rendered HTML Quarto page added as appendix to project website.
Reflect
When done, each member needs to reflect on this part.
Progress Presentation
Each team will be required to present their progress to class multiple times throughout the semester to solicit feedback.
Demo Presentation
Each team will be required to demo their project to class to solicit feedback before recording the final version that goes with the report.
Code, Report, Video, Presentation
Each team must push the code of their project to GitHub. Each repository must include a README.md file that includes clear instructions on:
requirements to run your analysis, eg, required R version and packages
how to run the analysis
any known limitations that the analysis is currently suffer from, eg, known bugs or cases that the analysis can not currently handles
resources that were referenced while doing the analysis
screenshots of the top part of the generated report
The report should be a rendered version of the one or more Quarto files used for the analysis. The rendered version should be part of the project website. See the structure section for details.
The video should be no longer than 5 minutes and should walk the viewer through the analysis process and the results. The video should be thought of as alternative mode of the written report. The video should be uploaded to the team’s GitHub repository and linked in the report.
The presentation should be the one (or similar to the one) used in the video demonstration. A copy of the presentation should be uploaded to the team’s GitHub repository and linked in the report.
Evaluation and Reflection
Each team member will be required to evaluate and reflect on their own performance, as well as the performance of each of their teammates. Additionally, each student will be asked to evaluate the projects of the other teams.
Important Notes
Teamwork
Working in a team is change to improve one communication skills as well as know their teammates better. However, working in a team sometimes pose some challenges. To ensure successful project outcome, below are few expectations:
Active participation–be present, attend classes, do your work, keep your team informed about any unexpected events
Active listening–show interest in other team members’ ideas
Inclusive environment–invite teammates to participate
Each member must be in all main aspects of the project, including coding, reporting, and presentation. It is not acceptable for a single member to solely handle one aspect, such as coding, while another focuses solely on the report, and another solely on the presentation.
Code Backup
When working on your data analysis, ensure that you commit your code frequently to GitHub, accompanied by meaningful commit messages, and push your changes regularly. This practice helps prevent any unforeseen issues or loss of progress.
Code Styling
You should follow the COMP112: Code Styling Guidelines. This practice will ensure your code is more readable and enhance its maintainability.
# Overview {.unnumbered}## Goals- The course project offers you the opportunity to showcase your mastery of the course learning goals, encompassing both soft and hard skills, by collaborating with other students to formulate and answer a research question of your choosing based on suitable dataset that you would proudly present to a prospective employer.- The course project is a chance to challenge yourself and work with minimal guidance to explore new territories such as [interactive plots](https://plotly.com/r/), text analysis, and [web apps](https://shiny.posit.co/).## ExpectationsBelow are few expectations from your project:- Professionalism--you are expected to produce a professional data-based report in comparable quality to New York Times and Five Thirty Eight reports--below are some examples: - [German election issues](https://fivethirtyeight.com/features/six-charts-to-help-americans-understand-the-upcoming-german-election/) - [NYC Ubers](https://fivethirtyeight.com/features/uber-is-taking-millions-of-manhattan-rides-away-from-taxis/) - [Trump tweets](http://varianceexplained.org/r/trump-tweets/) - [Fortune cookies](http://fivethirtyeight.com/features/fortune-cookie-math/) - [Crossword puzzle scanda](http://fivethirtyeight.com/features/a-plagiarism-scandal-is-unfolding-in-the-crossword-world/)- Accessible to individuals with little data literacy- Self-contained, ie, all project related files such as code and data are in a single repo and relative file paths are always used except for files hosted on the web which are referenced using absolute file paths.- Published online via the provided GitHub Assignment repository## Team SizeYou are expected to work in a team of 3-4 students.## Domain AreaEach team is free to select any domain area for their project, ranging from agriculture and biology to education and e-sport.The sole requirement is that the data analysis must be sufficiently sophisticated to engage in a meaningful discussion about it with a prospective employer.## StructureYour data story should include the following, all within context.\### Motivation, research question, & background- What question are you trying to answer?- Why is it important or interesting?- what background information is necessary?- What assumptions, terms, and/or acronyms need to be clarified?### Data- Data collection - What was collected? - When was it collected? - Why was it collected? - How was it collected originally? - Who collected it?- Data acquisition - Where / how did you get the data? - What is the source?- Data understanding - How much data do you have? - What types of measurements? - Anything you needed to clean before getting started?### Data insightsIt’s *your* job to explicitly identify and discuss key insights. Don’t simply present the audience with some code and output and expect them to do that work. Specifically address the following questions:- What are the important takeaways from the data? What was interesting?- Why do these takeaways matter?- Was there anything surprising?- *Overall, what do you want the audience to walk away with? What do you want them to understand about your data and research questions?!?*### Conclusions / Big Picture- How do the insights connect to answer your research question?- What improvements might someone make to your analysis?- Are there any limitations or weaknesses of your data / analysis?## Deliverables### Due DatesCheck the class Moodle page.### Where to SubmitMost of project-related deliverables are submitted via the provided GitHub Classroom Assignment linked in Moodle.::: {.callout-important title="Project Group GitHub Classroom Assignment"}The project is a group (not individual) GitHub Classroom Assignment meaning that all your team members will share the same (online) GitHub repository. This means that conflict will arise whenever multiple members edited the same files and tried to push their changes to GitHub. To be on the safe side, the teams should create separate file for each member to their own analysis in. Afterward, the team members should come together how to put the different pieces together to form the final report.I am not expecting the GitHub collaboration process to go smooth. So, if you encounter any issue and you could not resolve it on your own in a short period of time, let the instructor know as soon as possible to avoid any delay in your main task--the data analysis.:::### ProposalEach team must submit a proposal for their project in the form of a rendered HTML Quarto page added as appendix to project website.The proposal must include the followings- a title- the names of the team members- a short description of the project- the reasons/inspirations behind choosing this project- a rough implementation and responsibility plan, ie, what needs to be accomplished and who will do what when. Think about the list of deliverables when building the plan.::: {.callout-important title="Reflect"}When done, each member needs to reflect on this part.:::### Sketch/IllustrationEach team must submit a rough sketch/illustration, also known as a low-fidelity prototype, depicting the expected data analysis process.Hand-drawn sketches/illustrations, which can be scanned, are encouraged.The purpose of the sketch is to provide visual insight into the process that will be followed when analyzing the data.The sketch should be linked from a rendered HTML Quarto page added as appendix to project website.::: {.callout-important title="Reflect"}When done, each member needs to reflect on this part.:::### Progress PresentationEach team will be required to present their progress to class multiple times throughout the semester to solicit feedback.### Demo PresentationEach team will be required to demo their project to class to solicit feedback before recording the final version that goes with the report.### Code, Report, Video, PresentationEach team must push the code of their project to GitHub.Each repository must include a README.md file that includes clear instructions on:- requirements to run your analysis, eg, required R version and packages- how to run the analysis- any known limitations that the analysis is currently suffer from, eg, known bugs or cases that the analysis can not currently handles- resources that were referenced while doing the analysis- screenshots of the top part of the generated report::: {.callout-tip title="Examples of Well-Strcutured README files"}The [awesome README GitHub page](https://github.com/matiassingers/awesome-readme?tab=readme-ov-file) lists examples of GitHub repository with well-structured README files.Please, check some of them for inspirations--[Aimeos TYPO3 extension project repository](https://github.com/aimeos/aimeos-typo3#readme) is a good example.:::The **report** should be a rendered version of the one or more Quarto files used for the analysis.The rendered version should be part of the project website.See the structure section for details.The **video** should be no longer than 5 minutes and should walk the viewer through the analysis process and the results.The video should be thought of as alternative mode of the written report.The video should be uploaded to the team’s GitHub repository and linked in the report.The **presentation** should be the one (or similar to the one) used in the video demonstration.A copy of the presentation should be uploaded to the team’s GitHub repository and linked in the report.### Evaluation and ReflectionEach team member will be required to evaluate and reflect on their own performance, as well as the performance of each of their teammates.Additionally, each student will be asked to evaluate the projects of the other teams.## Important Notes### TeamworkWorking in a team is change to improve one communication skills as well as know their teammates better.However, working in a team sometimes pose some challenges.To ensure successful project outcome, below are few expectations:- Active participation–be present, attend classes, do your work, keep your team informed about any unexpected events- Active listening–show interest in other team members’ ideas- Inclusive environment–invite teammates to participate- Each member must be in all main aspects of the project, including coding, reporting, and presentation. It is not acceptable for a single member to solely handle one aspect, such as coding, while another focuses solely on the report, and another solely on the presentation.### Code BackupWhen working on your data analysis, ensure that you commit your code frequently to GitHub, accompanied by meaningful commit messages, and push your changes regularly.This practice helps prevent any unforeseen issues or loss of progress.### Code StylingYou should follow the [COMP112: Code Styling Guidelines](https://docs.google.com/document/d/1iftIfk-mHUtomnGfAsZefoCeot1lZDb06Dapz-MflyI/preview).This practice will ensure your code is more readable and enhance its maintainability.\::: content-hidden## I. Goals\- Tell a story using data, on a topic of your choice, and at your own direction. - Why? Working outside the context of homework prompts can be overwhelming. The best way to confront this feeling with confidence is through practice. - How? Keep an open, creative mind and be kind to yourself. Know that there's not one "right" answer to this analysis. Instead of pursuing the "perfect" analysis, pursue a logical set of decisions which lead to a reasonable analysis.\ \- Showcase your interests, skills, etc to future employers, graduate programs, and friends / family. No matter your field of interest, it is important to have some independent projects to highlight, share, or discuss.\ \- Practice critical skills in independent research: a growth mindset, teamwork, time-management, communication, and identification and use of important materials.\ \- Be creative & have fun.\ \\\## II. StructureYou will work in groups of 3 - 4 students which I will help form, using your input.\\- Why groups? - Projects are more successful when you have a team to bounce ideas off of, ask for help when you get stuck, catch your mistakes, complement skill sets, etc. - No matter your eventual career path, it will involve teamwork.\ \- Each group member will be graded on their own contribution to every project stage. Thus grades may vary among group members.\ \ \## III. General contentYour data story should include the following, all within context.\\- **Motivation, research question, & background** - What question are you trying to answer? - Why? Why is it important or interesting? - As necessary: What background information is necessary here? What assumptions or terms or acronyms need to be clarified?- **Data** - Data collection: What was collected, When was it collected, Why was it collected, How was it collected originally, Who collected it - Data acquisition: Where / how did you get the data? What is the source? - Data understanding: How much data do you have? What types of measurements? Anything you needed to clean before getting started?- **Data insights** It’s *your* job to explicitly identify and discuss key insights. Don’t simply present the audience with some code and output and expect them to do that work. Specifically: - What are the important takeaways from the data? What was interesting? - Why do these takeaways matter? - Was there anything surprising? - *Overall, what do you want the audience to walk away with? What do you want them to understand about your data and research questions?!?*- **Conclusions / big picture** - How do the insights connect to answer your research question? - What improvements might someone make to your analysis? Are there any limitations or weaknesses of your data / analysis?\\## IV. Assessment\Your project will be assessed according to the following components:\\+------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| **Learning outcome** | **Requirements** |+------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| **Collaboration** | You fully engage with your group, support your group members, and contribute to each stage of the project. |+------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| **Progress** | You successfully complete all individual & group project checkpoints. |+------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| **Depth & growth** | Your topic selection, analysis, and communication reflect depth and growth. Growth will require you to learn new things, and this looks like different things for different people, depending upon your data / computing experience prior to this course. Sometimes, these are quick tools (eg: learning how to angle axis labels if they’re not readable). Sometimes, these are bigger ideas:\ || | \ || | || | - [interactive viz](https://plotly.com/r/) / [interactive shiny apps](https://www.rstudio.com/products/shiny/) || | || | - [text analysis](https://github.com/aaumaitre/taylor_swift) || | || | - [bringing creative or non-technical elements into your analysis\ || | \ || | ](https://www.nytimes.com/interactive/2023/12/14/opinion/my-life-with-long-covid.html) |+------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| **Final report** | **Due by 5pm on Last Day of Classes.**\ || | Content and delivery meet the expectations. (Details below) |+------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| **Final presentation** | **Final Exam Period** Content and delivery meet the expectations. (Details below) |+------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| **Supporting documentation** | **Due by 5pm on Last Day of Classes.** \| You submit a zip folder with all supplemental material needed to reproduce your final report (e.g. data files, Rmd with all code, etc). \| |+------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+## V. Final report#### ContentYour final report must cover the general content areas outlined in Section III.#### AudiencePeople like your 112 peers – not familiar with your project, but comfortable with data.#### FormatUnless discussed with me, your report must be submitted as an **html** in the form of a **technical blog post**.Here are some examples that give you a decent sense of an appropriate format and delivery, but don’t follow all structural requirements (e.g., most don’t include R code in the body of the paper, but you will).[German election issues](https://fivethirtyeight.com/features/six-charts-to-help-americans-understand-the-upcoming-german-election/) \| [NYC Ubers](https://fivethirtyeight.com/features/uber-is-taking-millions-of-manhattan-rides-away-from-taxis/) \| [Trump tweets](http://varianceexplained.org/r/trump-tweets/) \| [Fortune cookies](http://fivethirtyeight.com/features/fortune-cookie-math/) \| [Crossword puzzle scandal](http://fivethirtyeight.com/features/a-plagiarism-scandal-is-unfolding-in-the-crossword-world/) \| [Yelp tool for restaurant owners](https://www.r-bloggers.com/real-time-yelp-reviews-analysis-and-response-solutions-for-restaurant-owners/)#### LengthRoughly 1000 words.#### Style**Writing\**Your report should be: **engaging** (a broad audience will be turned off by an overly technical post), **concise** (a broad audience has limited time), and **professional** (grammar, spelling, and appropriate citations are always important).It should tell a **cohesive** story – don’t simply present a list of things you did or get distracted by elements that aren’t relevant to your research question.\**\Aesthetics\**Your report should be visually pleasing and easy to follow.Be sure to utilize graphics, tables, etc to help illustrate your findings.\\**Code and Reproducibility\**Your audience might want to reproduce your results.You must weave code throughout your report.This code must be properly commented, formatted, efficient, and easy to follow.Do not contain any code unnecessary to your final report – this is distracting.\\**Accessible, professional graphics\**Your graphs must all have thoughtful axis labels (not just the default variable names), alt text, and figure captions OR titles.They must use color-blind friendly color palettes.**\** **\**\## VI. Final presentation#### ContentYour final presentation must cover the general content areas outlined in Section III.#### AudiencePeople like your 112 peers – not familiar with your project, but comfortable with data.#### Length8-9 minutes for groups of 3 and 10-11 minutes for groups of 4.#### Attendance & participationEach group member must speak for roughly equal amounts of time.\You must be present at & engaged in all other presentations, not just your own.#### SpeakingWhen speaking yourself, work toward the following: confidence, steady pacing, eye\contact, body language (don’t speak to the board), accessible volume. #### Format (slides)Your talk should utilize a set of Google slides (shared with me).These slides should…- be organized and informative- be free of spelling and grammar errors- be clear and engaging (utilize pictures, avoid excessive text, etc)- only include R code when that code, not its output, is the point- utilize effective, accessible, & professional graphics (e.g. use thoughtful axis\ labels, figure captions or titles, and color-blind friendly palettes)#### StyleTell an engaging and **cohesive** story – don’t simply present a list of things you did or get distracted by elements that aren’t relevant to your research question.\\\\\\\\## VII. DataYou can work with any dataset that:1. You haven’t used in this or other MSCS courses2. That’s rich enough to produce an engaging project.Refer to [Dataset](99-data-repo.qmd) for more information\:::