Learning Goals

The goal of this course is build confidence in carrying out the entire data science pipeline which consists of the following set of processes:

Below are general skills that will be targeted and the specific topics that will be covered.

General Skills

Data Communication

  • In written and oral formats:

    • Inform and justify data cleaning and analysis process and the resulting conclusions with clear, organized, logical, and compelling details that adapt to the background, values, and motivations of the audience and context in which communication occurs.

Collaborative Learning

  • Understand and demonstrate characteristics of effective collaboration (team roles, interpersonal communication, self-reflection, awareness of social dynamics, advocating for yourself and others).
  • Develop a common purpose and agreement on goals.
  • Be able to contribute questions or concerns in a respectful way.
  • Share and contribute to the group’s learning in an equitable manner.
  • Develop a familiarity and comfort in using collaboration tools such as Git and Github.

Course Topics

Specific learning objectives for our course topics are listed below. Use these to guide your synthesis of course material for specific topics. Note that the topics are covered in the order of the data science pipeline, not the order in which we will cover them in class.

Foundation

Intro to R, RStudio, and R Markdown

  • Download and install the necessary tools (R, RStudio)
  • Develop comfort in navigating the tools in RStudio
  • Develop comfort in writing and rendering quarto documents
  • Identify the characteristics of tidy data
  • Use R code: as a calculator and to explore tidy data


Data Visualization

The learning goals may be adjusted before we start the material of this section.

Introduction to Data Visualization

  • Convince ourselves about the importance of data viz.
  • Understand the “grammar of graphics”
  • Use ggplot2 functions to create data viz
  • Understand the different basic univariate visualizations for categorical and quantitative variables


Bivariate

  • Identify appropriate types of bivariate visualizations to visualize relationships between 2 variables, depending on the type of variables (categorical, quantitative)
  • Create basic bivariate visualizations based on real data with ggplot2 functions


Multivariate

  • Understand how to visualize relationships between more than 2 variables.
  • Add aesthetics such as color and size to incorporate a third (or more variables) to a bivariate plot with ggplot2 functions


Spatial

  • Plot data points on top of a map using ggplot()
  • Create choropleth maps using geom_map()
  • Understand the basics of creating a map using leaflet, including adding points and choropleths to a base map.


Effective Visualization

  • Understand and apply the guiding principles of effective visualizations


Data Wrangling

Wrangling Verbs

  • Understand and be able to use the following verbs appropriately: select, mutate, filter, arrange, summarize, group_by
  • Develop an understanding of what code will do conceptually without running it
  • Develop an understanding of working with dates and lubridate functions


Reshaping Data

  • Understand the difference between wide and long data format and distinguish the cases (units of observation) for a given data set
  • Be able to use pivot_wider and pivot_longer from the tidyr package


Joining Data

  • Understand the concept of variables that uniquely identify rows (aka, cases or units of observations)
  • Understand the different types of joins, ie, combining two data frames together
  • Be able to use mutating joins: left_join, inner_join and full_join from the dplyr package
  • Be able to use filtering joins: semi_join, anti_join from the dplyr package


Working with Character Data as Factors

  • Understand the difference between a variable stored as a character vs. a factor
  • Be able to convert a character variable to a factor
  • Be able to manipulate the order and values of a factor with the forcats package to improve summaries and visualizations.


Working with Character Data as Strings

  • Be able to work with strings of text data
  • Use regular expressions to search and replace, detect patterns, locate patterns, extract patterns, and separate text with the stringr package.


Starting a Data Project

The learning goals may be adjusted before we start the material of this section.

Data Import

  • Find existing data sets
  • Save data sets locally
  • Load data into RStudio
  • Do some preliminary data checking and cleaning steps before further wrangling and visualization:
    • Make sure variables are properly formatted
    • Deal with missing values


EDA

  • Understand the first steps that should be taken when you encounter a new data set
  • Develop comfort in knowing how to explore data to understand it
  • Develop comfort in formulating research questions