
Google Data Analytics Professional Certificate
Google Data Analytics Professional Certificate
After I graduated from San Jose State University, I made the decision to take on additional learning before searching for employment. The Google Data Analytics Professional Certificate is a program that covers the data analytics process Google follows in their own operations. I chose to complete this program as Google is an excellent example of a company that is ingrained in data analytics and would provide insights into this industry like no other. The following informational sections are summaries of the key concepts learned and applied in their respective course.
- Each project file link will be located at the bottom of the webpage in the “Appendix” section. Course #6 and #7 did not have completion certificates granted because they were outlined to be optional for the student.
Files included:
- Digital Certficate - Course #1
- Digital Certficate - Course #2
- Digital Certficate - Course #3
- Digital Certficate - Course #4
- Digital Certficate - Course #5
- Completed Program Certificate
Course #1: Data, Data, Everywhere
The introductory course of the certification provided an overview of what the Data Analyst profession entails and what skills are needed for success in this career path.
- Curiosity: explore the unknown and don’t be afraid to look where everyone isn’t
- Context: critical understanding of the issue trying to be solved through a business perspective
- Technical Mindset: being able to visualize problems in terms of models, components, and outputs
- Data Design: being able to work fluently with data types and store them correctly
- Data Strategy: being able to derive meaning from data and analysis for business solutions
Below is the full data analytics process described by Google and used by them in their daily operations.
- Ask: what is the problem and who are the stakeholders?
- Prepare: get your data ready and make sure its credible
- Process: data cleaning time and make sure to choose tools wisely
- Analyze: derive meaning from your data that provides new solutions and insights to your problem
- Share: create visualizations that support your various analyses
- Act: use your analyses to make actionable change with your team and aid your stakeholders in using your analyses for their intended purpose
Course #2: Ask Questions to Make Data-Driven Decisions
This course took a deep dive into how the analyst must use due diligence in their first interaction with the problem or business task. From this carefully conceived question phase, the analyst can find the appropriate means for initiating the project.
6 Common Analysis Tasks
- Making predictions: sales forecasting
- Categorizing things: tracking customer service class
- Spotting something unusual: healthy tracking
- Identifying themes: product improvements
- Discovering connections: reducing wait times for logistics
- Finding patterns: finding trends for improving operations
“Structured thinking” translates to the “process of reorganizing the current problem or situation, organizing available information, revealing gaps & opportunities and identifying the options for solutions. One way of staying structured in your thinking is building a “Scope of Work.” This document can act as a project charter and helps the analyst and stakeholders construct a logical progression to solving the business task.
- Deliverables
- Timeline
- Milestones
- Reports
Course #3: Preparation of the Data
Preparation of your data is key when performing data analysis. The preparation that goes into this process is supposed to allow the analyst to confirm the data is reliable and ready to use. Some considerations when cleaning preparing data:
- Where did the data come from?
- First Party Data vs. Second Party Data vs. Third Party Data
- Selecting the right data for your purpose
R.O.C.C.C Data
- Reliable
- Original
- Comprehensive
- Current
- Cited
“Open data” is the flow of data that allows many different parties to participate in using the data to derive real world value.
- Data should be whole but in a modifiable format
- No restrictions on the data
- Free access and usage
Course #4: Process Data from Dirty to Clean
This course outlines the strategies the analyst can take to clean the dataset to make it viable for analysis. Analysis depends solely on the integrity of the data and if the data is proven to be clean, you can trust the results of the analysis.
How can an analyst find a workaround of insufficient data?
- Use what available data you have
- Wait for more data if the time frame of the “Scope of Work” allows
- Talk with stakeholders and adjust objectives
- Look for a new dataset with the information you require
What is “dirty” data?
- Duplicate data
- Outdated data
- Incomplete data
- Incorrect/inaccurate data
- Inconsistent data
The remainder of the course walks the analyst through different ways of using SQL and spreadsheet functions to help process and filter data to provide better results before moving into the analysis phase.
Spreadsheet Techniques
- Conditional Formatting: using rules that a specified range of cells fulfill in order for an action regarding that cell to activate. For example, using the rule that all values that fall below the average of the range highlight red.
- Remove Duplicates: built in feature with spreadsheets that removes identical observations from the spreadsheet to maintain integrity.
- Split Tool: split data in one column or variable using a specified character to be where the data string splits.
- TRIM() Function: removes trailing or repeated spaces in values of a range of cells.
- Pivot Table: Use this table function to provide a high level summary of the dataset and its values. You can customize what you would like to showcase in the pivot table.
SQL Techniques
- SELECT and FROM statements are used in a SQL query to retrieve the data you are wanting to pull from the database.
- WHERE statement is a way of filtering the data you are seeking in order to find the results you want from the “selected” data.
- UPDATE statements allow for the analyst to modify existing records in a database.
- SET is what new data you would like to input into that observation in the database.
- CAST statement can be used to turn any type of data into another type of data which reduces limitations on what kind of data you can work with.
- CONCAT statement can join two observations together which can create a new data range that can be used in unique ways.
The most important skill proposed is that the analyst should always maintain a record or log of the cleaning process. This can allow the analyst to retrace their steps if a mistake occurs and gives perspective to the stakeholders on how the analyst is cleaning the dataset and what they might possibly derive when they move into the analysis phase.
Course #5: Analyze Data to Answer Questions
The “Analyze” phase of the data analytics process involves the organizing and manipulation of data to derive actionable insights that is related to the business task outlined in the “Ask” phase. The cleaning and analysis of data sometimes move in parallel as the analyst is cleaning the data, they may find some interesting trends and want to see if the potential insight can be related or not to the business task.
Aggregation is key during this phase as it brings all relevant data into one area the analyst can use for the analysis. The VLOOKUP function is an excellent tool when data has been aggregated to determine the connections of the data throughout all the merged data. Another way that data can be tracked and connected after aggregation is to look for “Primary and Foreign Keys.” These keys are observations that are unique in each data set and link the datasets together. An example of this is an “ID number” that represents a unique user across all datasets. This observation can allow for the analyst to track one user’s actions through the aggregated data.
The remainder of this course outlines different statistical calculations and functions an analyst can use to find statistical similarities between data and find key indicators in the data that relate to the business task.
- Averages
- Summation
- Standard Deviation
- Confidence Intervals
- Margin of Error
- Hypothesis Test for Means and Proportions
Course #6: Data Visualization and Report
Data visualization is extremely important in data analysis as this is the phase where you begin to construct the story of your data to be relayed back to your key stakeholders. Strong data visualizations are visuals that use “pre-attentive attributes” to their advantage by highlighting different aspects of the visual to draw the attention of the audience. These attributes include:
- Position: placed in relation to other marks
- Size: big, small, tall, short
- Shape: could help communicate what the mark represented
- Color: red, blue, etc
Another main aspect of data visualization is the principles of design. Analysts have to be thinking of the design of their visualization at all times to accurately and effect lively relay information to the audience. Some of the key design principles are:
- Balance: colors and shapes should be balanced but not symmetrical
- Movement: refers to the path a viewer’s eye takes as they navigate your visualization
- Pattern: usage of similar shapes and colors to help provide emphasis on important points of the visual
- Rhythm: creating a sense of movement and flow in your visual
Course #7: Using R Programming Language
The final lecture course of the certificate program was focused on the overview of using R Statistical Programming Language. R is an excellent program for constructing statistical models and can lead you into learning other programming languages quickly.
The course focused on using R Studio Cloud for ease of access for the student and covered basic concepts of R such as:
- Functions: reusable code to perform general calculations
- Variables: representation of a value in R that can be stored for later use in the analysis
- Comments: used to take notes on your code for “how to use” situations or reminders
- Data Types: the different ways data is interpreted in the program such as logical, integer, double, characters, etc
- Vectors: a group of data elements of the same types stored in a sequence
- Pipes: a tool in R for expressing a sequence of multiple operations using the operator -> (%>%)
Another helpful tool in R is using the data frame function that allows the analyst to create a subset of variables from the main dataset that can be stored for later use in additional analyses. Main packages used in the lectures during this course were the entire family of the Tidyverse package such as “ggplot2, tidyr, dplyr,” and many other basic packages.
The final portion of the course covered the implementation of using the R Markdown tool to create interactive “code notebooks” that can be exported and used to showcase your analyses to others. These R Markdown files are extremely impactful as they make the exchange of information to teammates seamless. Also, these code notebooks have their own syntax which allows for the file to be constructed in different ways for presentation depending on audience and analyst skill levels.
Course #1 Certificate
Course #2 Certificate
Course #3 Certificate
Course #4 Certificate
Course #5 Certificate
Completed Program Certificate