Datasets
This appendix provide a guidance on how to select a dataset for the course project.
Collect Data
Unfortunately, collecting you own data in his course via for example surveys is not feasible–it requires a lot of paperwork, time, and effort.
Have Dataset
If you learned about or collected a dataset in another course, internship, etc., you can use it as long as you have not used it in any other courses.
No Project Idea
If you are looking for a project idea, explore the dataset repositories below for inspiration:
- TidyTuesday
- Awesome Public Datasets (includes links to other data repositories at the end)
- Kaggle
- Data World
- Pro Publica
- Sports Data Sets by Ohio State U
- Large collection of repositories and datasets linked in this Trello board
Have Project Idea
If you already have a project idea but would like to find a suitable dataset, check the search engines below:
- Google Dataset Search
- U.S. Government’s Open Data
- OpenML
- FiveThirtyEight.com Dataset Repository
- BuzzFeedNews Dataset Repository
- Kaggle Search
- Typical search engines with search operators such as
filetype:csv
: return onlycsv
files that match the keywords searched–common file types that can be easily processed in R arecsv
,.tsv
,.xls
,.xlsx
, and.rds
.site:data.gov
: limit results to those fromdata.gov