Chapter 3 Data transformation
We saved the original data as a csv file called ‘Hate_Crimes_by_County_and_Bias_Type_Beginning_2010.csv’.
To get the tidy data, we first dropped the last three columns namely Total.Incidents, Total.Victims, Total.Offenders, as we can still calculate them using the remaining data.
Then we regarded columns County, Year and Crime.Type as ID columns, which means we don’t make changes on them. We used the pivot_longer method to gather the rest of columns to create two columns Subtype and Count. Here Subtype means each small class of hate crimes such as Anti.Male, Anti.Female etc. Count is the corresponding number of cases for each small class. After that, we created a new column Type, containing six general classes that are Anti.Gender, Anti.Age, Anti.Race, Anti.Religion, Anti.Sexual.Minority and Anti.Disability. For each Subtype, it has a corresponding type. For example, the Subtype Anti.Female is related with the type Anti.Gender and the Subtype Anti.White is related with the type Anti.Race’. We use the for-loop to fill in the Type column for all rows.
Here are first few rows of our modified data. And we save our modified data as a csv file called ‘Hate_Crimes_data_Tidy.csv’.
## County Year Crime.Type Subtype
## 1 Albany 2018 Crimes Against Persons Anti.Male
## 2 Albany 2018 Crimes Against Persons Anti.Female
## 3 Albany 2018 Crimes Against Persons Anti.Transgender
## 4 Albany 2018 Crimes Against Persons Anti.Gender.Identity.Expression
## 5 Albany 2018 Crimes Against Persons Anti.Age.
## 6 Albany 2018 Crimes Against Persons Anti.White
## Count Type
## 1 0 Anti.Gender
## 2 0 Anti.Gender
## 3 0 Anti.Gender
## 4 0 Anti.Gender
## 5 0 Anti.Age
## 6 1 Anti.Race