Google Data Analytics Professional Certificate

Google & Coursera Logos

After I graduated from San Jose State University, I made the decision to take on additional learning before searching for employment. The Google Data Analytics Professional Certificate is a program that covers the data analytics process Google follows in their own operations. I chose to complete this program as Google is an excellent example of a company that is ingrained in data analytics and would provide insights into this industry like no other. The following informational sections are summaries of the key concepts learned and applied in their respective course.

Each project file link will be located at the bottom of the webpage in the “Appendix” section. Course #6 and #7 did not have completion certificates granted because they were outlined to be optional for the student.

Files included:

Digital Certficate - Course #1
Digital Certficate - Course #2
Digital Certficate - Course #3
Digital Certficate - Course #4
Digital Certficate - Course #5
Completed Program Certificate

Course #1: Data, Data, Everywhere

The introductory course of the certification provided an overview of what the Data Analyst profession entails and what skills are needed for success in this career path.

Curiosity: explore the unknown and don’t be afraid to look where everyone isn’t
Context: critical understanding of the issue trying to be solved through a business perspective
Technical Mindset: being able to visualize problems in terms of models, components, and outputs
Data Design: being able to work fluently with data types and store them correctly
Data Strategy: being able to derive meaning from data and analysis for business solutions

Below is the full data analytics process described by Google and used by them in their daily operations.

Ask: what is the problem and who are the stakeholders?
Prepare: get your data ready and make sure its credible
Process: data cleaning time and make sure to choose tools wisely
Analyze: derive meaning from your data that provides new solutions and insights to your problem
Share: create visualizations that support your various analyses
Act: use your analyses to make actionable change with your team and aid your stakeholders in using your analyses for their intended purpose

Course #2: Ask Questions to Make Data-Driven Decisions

This course took a deep dive into how the analyst must use due diligence in their first interaction with the problem or business task. From this carefully conceived question phase, the analyst can find the appropriate means for initiating the project.

6 Common Analysis Tasks

Making predictions: sales forecasting
Categorizing things: tracking customer service class
Spotting something unusual: healthy tracking
Identifying themes: product improvements
Discovering connections: reducing wait times for logistics
Finding patterns: finding trends for improving operations

“Structured thinking” translates to the “process of reorganizing the current problem or situation, organizing available information, revealing gaps & opportunities and identifying the options for solutions. One way of staying structured in your thinking is building a “Scope of Work.” This document can act as a project charter and helps the analyst and stakeholders construct a logical progression to solving the business task.

Deliverables
Timeline
Milestones
Reports

Course #3: Preparation of the Data

Preparation of your data is key when performing data analysis. The preparation that goes into this process is supposed to allow the analyst to confirm the data is reliable and ready to use. Some considerations when cleaning preparing data:

Where did the data come from?
First Party Data vs. Second Party Data vs. Third Party Data
Selecting the right data for your purpose

R.O.C.C.C Data

Reliable
Original
Comprehensive
Current
Cited

“Open data” is the flow of data that allows many different parties to participate in using the data to derive real world value.

Data should be whole but in a modifiable format
No restrictions on the data
Free access and usage

Course #4: Process Data from Dirty to Clean

This course outlines the strategies the analyst can take to clean the dataset to make it viable for analysis. Analysis depends solely on the integrity of the data and if the data is proven to be clean, you can trust the results of the analysis.

How can an analyst find a workaround of insufficient data?

Use what available data you have
Wait for more data if the time frame of the “Scope of Work” allows
Talk with stakeholders and adjust objectives
Look for a new dataset with the information you require

What is “dirty” data?

Duplicate data
Outdated data
Incomplete data
Incorrect/inaccurate data
Inconsistent data

The remainder of the course walks the analyst through different ways of using SQL and spreadsheet functions to help process and filter data to provide better results before moving into the analysis phase.

Spreadsheet Techniques

Conditional Formatting: using rules that a specified range of cells fulfill in order for an action regarding that cell to activate. For example, using the rule that all values that fall below the average of the range highlight red.
Remove Duplicates: built in feature with spreadsheets that removes identical observations from the spreadsheet to maintain integrity.
Split Tool: split data in one column or variable using a specified character to be where the data string splits.
TRIM() Function: removes trailing or repeated spaces in values of a range of cells.
Pivot Table: Use this table function to provide a high level summary of the dataset and its values. You can customize what you would like to showcase in the pivot table.

SQL Techniques

SELECT and FROM statements are used in a SQL query to retrieve the data you are wanting to pull from the database.
WHERE statement is a way of filtering the data you are seeking in order to find the results you want from the “selected” data.
UPDATE statements allow for the analyst to modify existing records in a database.
SET is what new data you would like to input into that observation in the database.
CAST statement can be used to turn any type of data into another type of data which reduces limitations on what kind of data you can work with.
CONCAT statement can join two observations together which can create a new data range that can be used in unique ways.

The most important skill proposed is that the analyst should always maintain a record or log of the cleaning process. This can allow the analyst to retrace their steps if a mistake occurs and gives perspective to the stakeholders on how the analyst is cleaning the dataset and what they might possibly derive when they move into the analysis phase.

Course #5: Analyze Data to Answer Questions

The “Analyze” phase of the data analytics process involves the organizing and manipulation of data to derive actionable insights that is related to the business task outlined in the “Ask” phase. The cleaning and analysis of data sometimes move in parallel as the analyst is cleaning the data, they may find some interesting trends and want to see if the potential insight can be related or not to the business task.

Aggregation is key during this phase as it brings all relevant data into one area the analyst can use for the analysis. The VLOOKUP function is an excellent tool when data has been aggregated to determine the connections of the data throughout all the merged data. Another way that data can be tracked and connected after aggregation is to look for “Primary and Foreign Keys.” These keys are observations that are unique in each data set and link the datasets together. An example of this is an “ID number” that represents a unique user across all datasets. This observation can allow for the analyst to track one user’s actions through the aggregated data.

The remainder of this course outlines different statistical calculations and functions an analyst can use to find statistical similarities between data and find key indicators in the data that relate to the business task.

Averages
Summation
Standard Deviation
Confidence Intervals
Margin of Error
Hypothesis Test for Means and Proportions

Course #6: Data Visualization and Report

Data visualization is extremely important in data analysis as this is the phase where you begin to construct the story of your data to be relayed back to your key stakeholders. Strong data visualizations are visuals that use “pre-attentive attributes” to their advantage by highlighting different aspects of the visual to draw the attention of the audience. These attributes include:

Position: placed in relation to other marks
Size: big, small, tall, short
Shape: could help communicate what the mark represented
Color: red, blue, etc

Another main aspect of data visualization is the principles of design. Analysts have to be thinking of the design of their visualization at all times to accurately and effect lively relay information to the audience. Some of the key design principles are:

Balance: colors and shapes should be balanced but not symmetrical
Movement: refers to the path a viewer’s eye takes as they navigate your visualization
Pattern: usage of similar shapes and colors to help provide emphasis on important points of the visual
Rhythm: creating a sense of movement and flow in your visual

Course #7: Using R Programming Language

The final lecture course of the certificate program was focused on the overview of using R Statistical Programming Language. R is an excellent program for constructing statistical models and can lead you into learning other programming languages quickly.

The course focused on using R Studio Cloud for ease of access for the student and covered basic concepts of R such as:

Functions: reusable code to perform general calculations
Variables: representation of a value in R that can be stored for later use in the analysis
Comments: used to take notes on your code for “how to use” situations or reminders
Data Types: the different ways data is interpreted in the program such as logical, integer, double, characters, etc
Vectors: a group of data elements of the same types stored in a sequence
Pipes: a tool in R for expressing a sequence of multiple operations using the operator -> (%>%)

Another helpful tool in R is using the data frame function that allows the analyst to create a subset of variables from the main dataset that can be stored for later use in additional analyses. Main packages used in the lectures during this course were the entire family of the Tidyverse package such as “ggplot2, tidyr, dplyr,” and many other basic packages.

The final portion of the course covered the implementation of using the R Markdown tool to create interactive “code notebooks” that can be exported and used to showcase your analyses to others. These R Markdown files are extremely impactful as they make the exchange of information to teammates seamless. Also, these code notebooks have their own syntax which allows for the file to be constructed in different ways for presentation depending on audience and analyst skill levels.

Course #1 Certificate

Course #2 Certificate

Course #3 Certificate

Course #4 Certificate

Course #5 Certificate

Completed Program Certificate