San Francisco Crime Data Visualization and Analysis - Tableau


I am very interested in the crime incident data in San Francisco, especially curious about how many processed crime instances are done by different Police Departments in different Police Department Districts, and whether they prioritize to solve more violent crimes or not. Besides, I’m also very interested in whether specific time during a day, or specific day during a week, or specific month in a year would have more crime instances than other of the time periods. Moreover, with the astonishing ISIS terrorism attack on the night of November 13rd 2015 (Paris local time, which was the afternoon of November 13rd 2015 for PST) in Paris, and the ISIS terrorism attack on December 2nd 2015 (PST) in San Bernardino, CA, I want to explore with the dataset to see whether the two news had certain effects over the next day and the next few days of the violent crime numbers in San Francisco.
With these interests in mind, I came up with the following 4 sets of hypotheses to explore the Crime Incidents Dataset Recorded by San Francisco Police Department, from the link here.


Hypothesis p1: Do more violent crime incidents end up being processed in San Francisco, compared to property crime incidents, public-misbehavior incidents, other crime incidents and non-criminal incidents?

P1 Conclusion: Because the actual number of the violent crimes that are processed is the highest among all the 5 categories, so this support the hypothesis that violent crimes are processed more than other type of crimes. However, it would be great if the initial dataset could have specified what’s under the “OTHER OFFENSES”, one of the most crime incidents under Violent Crimes. But in reality it could vary a lot from different offenses to decide whether a certain crime is violent crime or not.


Hypothesis p2: Are there any relationships or trends between specific months of the year, specific days of the week, or specific time periods of the day in San Francisco?

Hypothesis p2 - Month

With the peaks and low points visually clear, it supports that there are certain relations among the total number of incidents or each incident category with month. For example, February and December tend to have lower crime incidents both in total and in each crime categories, while Violent Crime has clear peak periods in January and May.


Hypothesis p2 - Day of a Week

From this statistic distribution graph, we could see actually there are certain fluctuations for different categories of crimes from the 7 days in a week, and which range is the majority of the data falling into. There are also property crime outliers revealed by this graph on Friday both as extremely low or high values. This could be actually a recording result, because property crimes could be very minor like lost or theft of small personal items.
But since this graph could not show the total number of the incidents and for each crime category, so I supplement it with a stacking bar chart to cover that aspect. We can see for Fridays and Saturdays there tend to have slightly more incidents, which supports my Hypothesis 2.


Hypothesis p2 - Time in a Day

The first dashboard here shows a clear trend of different crime types over a different time during the different days. The overall trends of incident number changing over 24 hours show up amazingly uniformly, which also shows up on the stacked bar chart showing the same trend happening also in terms of total number of all the incidents.

The second dashboard on the next page shows the statistic distribution of all the crime incidents. You can also see the three outliers of Property Crimes happening from 18:00 to 20:00 of a day, which actually makes sense since that period of a day is a night peak, which provide a great chance to thefts to conduct property crimes. These also support my hypothesis 2 in terms of time in a day.


Hypothesis p2 – Days Before and After the ISIS Attacks

  • Is there an increase in the number of crime incidents in San Francisco after the ISIS terrorism attack on November 13rd 2015 at Paris, France? (my expected increase might be a result of the more potential ISIS members in U.S. getting the impact of the Paris attack)
  • Is there a drop in the number of crime incidents in San Francisco after the ISIS terrorism attack on December 2nd 2015 (PST) in San Bernardino, CA? (my expected drop might be a result of the more caution and more police force effort after this nearby shooting within the same state)

For these two hypotheses, it would be best to just show November and December of 2015, so that the subtle changes over each day impacted by the news of ISIS attacks could be the clearest for people to see. I also pointed out the date right after each attack, so that the changes could be identified immediately. From the result in the dashboard above, because there is an upward trend, it supports my hypothesis that there would be an increase of violent crimes right after the ISIS attack in Paris. However, my hypothesis of a drop after the California ISIS attack is refuted due to the opposite trend.


Hypothesis p3: Is there a relationship between certain police department districts and the number of unprocessed crime incidents in San Francisco?

During my exploration, I found that simply comparing the number of the unprocessed incidents for each police department district might not leading to the understanding of whether certain police department district is not putting above average effort into processing those incidents, because of the fact that for one PD district, eg. Southern PD district, they might have the most unprocessed incidents, but meanwhile they might also have the most processed incidents, just because of the total amount of incidents in that PD district is the maximum. From that point, I came up with another chart to show the percentage of unprocessed and processed incidents for each single PD district, so that we can have a more comprehensive view of the relationship between PD districts and the number and percent of unprocessed crime incidents in San Francisco. Since the distinctions among each different PD districts are easy to tell from the dashboard below, this hypothesis is also supported.


Hypothesis p4: Do more crime incidents happen in certain geographic areas that are known for being unsafe specified by the x (longitude) and y (latitude) values in the dataset of San Francisco? 

This Map is based on average of longitude and average of latitude of crime incidents. Color shows different Crime Categories. Size shows the number of crime incidents. Details are shown for actual crime address. The view is filtered on Crime Categories, which shows Property Crimes, Violent Crimes, Other Crimes and Public Misbehavior.

As you can see, certain areas such as downtown San Francisco and Mission Area shows a high density of crime incidents, which supports my hypothesis of that more crime incidents happen in certain geographic areas that are known for being unsafe in San Francisco.