Power Query – Data Transformation: Difference between revisions
 Created page with "Return to Main Page"  | 
				|||
| (10 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
= Module 3: Data Transformation Techniques =  | |||
== Objective ==  | |||
Clean and reshape data using Power Query’s built-in tools in Excel.  | |||
== Files Used ==  | |||
* [[Media:Survey_Results.xlsx|Download Survey_Results.xlsx]]  | |||
* [[Media:Orders_2023.xlsx|Download Orders_2023.xlsx]]  | |||
== Removing Blank Rows and Unnecessary Columns ==  | |||
'''Steps:'''  | |||
# Go to Excel → Data tab → Get Data → From File → From Workbook  | |||
# Select and load '''Survey_Results.xlsx'''  | |||
# In Navigator → select the worksheet → click '''Transform Data'''  | |||
# In Power Query Editor → Go to '''Home''' tab → click '''Remove Rows''' → choose '''Remove Blank Rows'''  | |||
# To remove unwanted columns:  | |||
#* Click the header of the column you want to remove  | |||
#* Hold Ctrl and select additional columns if needed  | |||
#* Right-click any selected header → choose '''Remove Columns'''  | |||
== Changing Data Types ==  | |||
'''Steps:'''  | |||
# In Power Query Editor, locate the column you want to change  | |||
# In the column header, click the small data type icon (e.g. ABC123 or calendar icon)  | |||
# Select the appropriate data type: '''Text''', '''Whole Number''', '''Decimal Number''', '''Date''', etc.  | |||
# Alternatively, go to the '''Transform''' tab → click '''Data Type''' dropdown → choose desired type  | |||
== Renaming Columns ==  | |||
'''Steps:'''  | |||
# In Power Query Editor, right-click on the column header you want to rename  | |||
# Click '''Rename'''  | |||
# Type the new column name and press Enter  | |||
# OR: double-click on the column name to rename directly  | |||
== Filtering and Sorting Data ==  | |||
'''Steps:'''  | |||
# In Power Query Editor, locate the column by which you want to filter or sort  | |||
# Click the dropdown arrow in the column header  | |||
# To filter:  | |||
#* Check/uncheck boxes or use number/text filters  | |||
# To sort:  | |||
#* Click '''Sort Ascending''' or '''Sort Descending''' in the same dropdown menu  | |||
== Grouping Data ==  | |||
'''Steps:'''  | |||
# Select the column that contains the values to group (e.g. Region or Category)  | |||
# Go to the '''Transform''' tab  | |||
# Click '''Group By'''  | |||
# In the dialog:  | |||
#* Choose '''Basic''' or '''Advanced'''  | |||
#* Select the column to group  | |||
#* Choose operation (e.g. Count Rows, Sum, Average)  | |||
# Click '''OK'''  | |||
== Splitting Columns ==  | |||
'''Steps:'''  | |||
# Select a column (e.g. Full Name or "City, State")  | |||
# Go to the '''Home''' tab → click '''Split Column'''  | |||
# Choose a method (e.g. '''By Delimiter''')  | |||
# In the dialog:  | |||
#* Choose the delimiter (e.g. comma, space, custom)  | |||
#* Choose how to split (e.g. at each occurrence, left-most, right-most)  | |||
# Click '''OK'''  | |||
== Unpivoting Columns ==  | |||
'''Goal:''' Reshape data so that monthly sales are listed in rows rather than across columns.  | |||
'''File used:'''  | |||
* [[Media:Monthly_Sales_2023.xlsx|Download Monthly_Sales_2023.xlsx]]  | |||
'''Steps:'''  | |||
# In Excel → go to the '''Data''' tab → click '''Get Data''' → '''From File''' → '''From Workbook'''  | |||
# Select the file: '''Monthly_Sales_2023.xlsx''' and click '''Import'''  | |||
# In the Navigator window:  | |||
#* Select the worksheet that appears (there is only one sheet in this file)  | |||
#* Click the button '''Transform Data''' to open Power Query Editor  | |||
# In Power Query Editor:  | |||
#* Confirm that columns like `Jan`, `Feb`, ..., `Jun` are present  | |||
#* Select all month columns by clicking on `Jan`, then holding Shift and clicking `Jun`  | |||
# Go to the '''Transform''' tab → click '''Unpivot Columns'''  | |||
# Power Query will convert each of the selected columns into two columns:  | |||
#* One called '''Attribute''' (containing the month names)  | |||
#* One called '''Value''' (containing the corresponding sales values)  | |||
# You can rename the new columns:  | |||
#* Right-click the column '''Attribute''' → choose '''Rename''' → type `Month`  | |||
#* Right-click the column '''Value''' → choose '''Rename''' → type `Sales`  | |||
'''Optional:'''  | |||
* Change the data type of the new `Sales` column to a number:  | |||
#* Click the small icon in the `Sales` column header (e.g. ABC123) and choose '''Whole Number'''  | |||
'''Result:'''  | |||
You now have a normalized table where each row represents a single product's sales in a particular month. This structure is ideal for analysis or pivoting in Excel.  | |||
== Reordering Columns ==  | |||
'''Steps:'''  | |||
# In Power Query Editor, click and drag the column header to a new position  | |||
# OR: right-click the column header → choose '''Move''' → select desired direction (Left, Right, To Beginning, To End)  | |||
== Duplicating and Referencing Queries ==  | |||
'''Steps:'''  | |||
# In the Queries pane on the left side of Power Query Editor:  | |||
#* Right-click the query you want to reuse  | |||
# To create a full copy: choose '''Duplicate'''  | |||
## This creates a completely separate copy of the query  | |||
## All applied steps are copied, and changes to the original will not affect the duplicate  | |||
# To create a linked version that depends on the original: choose '''Reference'''  | |||
## This creates a new query that uses the output of the original query as its starting point  | |||
## It does not copy the steps — instead, it acts like a “live link” to the original query’s final result  | |||
'''Example usage:'''  | |||
Let’s say you have a query called `AllSales` that loads all sales records for the year.  | |||
You want to:  | |||
* Create a second version that only includes data for one region (e.g. “West”)  | |||
* Keep the original intact for full reporting  | |||
To do this:  | |||
# Right-click `AllSales` → choose '''Reference'''  | |||
# A new query appears (e.g. `AllSales (2)`)  | |||
# Rename it to something meaningful like `Sales_West`  | |||
# In this new query, apply a filter:  | |||
#* Select the `Region` column  | |||
#* Use the filter dropdown → check only `West`  | |||
'''Why use Reference instead of Duplicate?'''  | |||
* If the source data or logic in the original query changes, the referenced version automatically updates too  | |||
* This is more efficient and avoids duplication of complex logic  | |||
== Cleanup and Pivot Prep ==  | |||
'''Goal:''' Prepare the survey data for summary analysis in a pivot table.  | |||
'''File used:'''  | |||
* [[Media:Survey_Results.xlsx|Download Survey_Results.xlsx]]  | |||
'''Steps:'''  | |||
=== 1. Load the data ===  | |||
# Open Excel  | |||
# Go to the '''Data''' tab → click '''Get Data''' → '''From File''' → '''From Workbook'''  | |||
# Select the file: '''Survey_Results.xlsx''' and click '''Import'''  | |||
# In the Navigator window:  | |||
#* Select the worksheet that contains the survey data  | |||
#* Click '''Transform Data'''  | |||
=== 2. Remove rows with missing satisfaction values ===  | |||
# In Power Query Editor, locate the column that contains satisfaction scores (e.g. `Satisfaction`)  | |||
# Click the filter icon (▾) in the header of that column  | |||
# Uncheck the option '''(null)''' to exclude blank values  | |||
# Click '''OK'''  | |||
=== 3. Rename the satisfaction column to lowercase ===  | |||
# Right-click the header of the satisfaction column  | |||
# Click '''Rename'''  | |||
# Type: `satisfaction` and press Enter  | |||
=== 4. Group the data ===  | |||
# Click the header of the `satisfaction` column to select it  | |||
# Go to the '''Home''' tab → click '''Group By'''  | |||
# In the dialog box:  | |||
#* Ensure the selected column is `satisfaction`  | |||
#* Set the operation to '''Count Rows'''  | |||
#* Leave the new column name as default or rename it to `Responses`  | |||
# Click '''OK'''  | |||
=== 5. Load the final result to Excel ===  | |||
# Go to the '''Home''' tab → click '''Close & Load To...'''  | |||
# Choose '''Table''' → '''New Worksheet'''  | |||
# Click '''OK'''  | |||
'''Result:'''  | |||
You now have a summary table that shows how many responses fall into each satisfaction score. This can be used directly in Excel for charting or further analysis.  | |||
== Add a Conditional Column ==  | |||
'''Goal:''' Categorize satisfaction ratings into "Positive", "Neutral", or "Negative"  | |||
'''Steps:'''  | |||
# In Power Query Editor → go to '''Add Column''' tab → click '''Conditional Column'''  | |||
# Set:  | |||
#* New column name: `SatisfactionLevel`  | |||
#* If Satisfaction = 4 or 5 → then "Positive"  | |||
#* Else if Satisfaction = 3 → then "Neutral"  | |||
#* Else → "Negative"  | |||
# Click '''OK'''  | |||
----  | |||
→ Continue to [[Power Query – Automation and Parameters|Module 4: Automation and Parameters]]  | |||
Return to [[Power_Query_–_Basic|Main Page]]  | Return to [[Power_Query_–_Basic|Main Page]]  | ||
* [[Power Query – Introduction|Module 1: Introduction to Power Query]]  | |||
* [[Power Query – Importing and Managing Data|Module 2: Importing and Managing Data]]  | |||
* [[Power Query – Data Transformation|Module 3: Data Transformation Techniques]]  | |||
* [[Power Query – Automation and Parameters|Module 4: Automation and Parameters]]  | |||
* [[Power Query – Power Pivot Overview|Module 5: Power Pivot Overview]]  | |||
* [[Power Query – Summary and Exercises|Module 6: Summary and Final Exercises]]  | |||
Latest revision as of 10:34, 22 June 2025
Module 3: Data Transformation Techniques
Objective
Clean and reshape data using Power Query’s built-in tools in Excel.
Files Used
Removing Blank Rows and Unnecessary Columns
Steps:
- Go to Excel → Data tab → Get Data → From File → From Workbook
 - Select and load Survey_Results.xlsx
 - In Navigator → select the worksheet → click Transform Data
 - In Power Query Editor → Go to Home tab → click Remove Rows → choose Remove Blank Rows
 - To remove unwanted columns:
- Click the header of the column you want to remove
 - Hold Ctrl and select additional columns if needed
 - Right-click any selected header → choose Remove Columns
 
 
Changing Data Types
Steps:
- In Power Query Editor, locate the column you want to change
 - In the column header, click the small data type icon (e.g. ABC123 or calendar icon)
 - Select the appropriate data type: Text, Whole Number, Decimal Number, Date, etc.
 - Alternatively, go to the Transform tab → click Data Type dropdown → choose desired type
 
Renaming Columns
Steps:
- In Power Query Editor, right-click on the column header you want to rename
 - Click Rename
 - Type the new column name and press Enter
 - OR: double-click on the column name to rename directly
 
Filtering and Sorting Data
Steps:
- In Power Query Editor, locate the column by which you want to filter or sort
 - Click the dropdown arrow in the column header
 - To filter:
- Check/uncheck boxes or use number/text filters
 
 - To sort:
- Click Sort Ascending or Sort Descending in the same dropdown menu
 
 
Grouping Data
Steps:
- Select the column that contains the values to group (e.g. Region or Category)
 - Go to the Transform tab
 - Click Group By
 - In the dialog:
- Choose Basic or Advanced
 - Select the column to group
 - Choose operation (e.g. Count Rows, Sum, Average)
 
 - Click OK
 
Splitting Columns
Steps:
- Select a column (e.g. Full Name or "City, State")
 - Go to the Home tab → click Split Column
 - Choose a method (e.g. By Delimiter)
 - In the dialog:
- Choose the delimiter (e.g. comma, space, custom)
 - Choose how to split (e.g. at each occurrence, left-most, right-most)
 
 - Click OK
 
Unpivoting Columns
Goal: Reshape data so that monthly sales are listed in rows rather than across columns.
File used:
Steps:
- In Excel → go to the Data tab → click Get Data → From File → From Workbook
 - Select the file: Monthly_Sales_2023.xlsx and click Import
 - In the Navigator window:
- Select the worksheet that appears (there is only one sheet in this file)
 - Click the button Transform Data to open Power Query Editor
 
 - In Power Query Editor:
- Confirm that columns like `Jan`, `Feb`, ..., `Jun` are present
 - Select all month columns by clicking on `Jan`, then holding Shift and clicking `Jun`
 
 - Go to the Transform tab → click Unpivot Columns
 - Power Query will convert each of the selected columns into two columns:
- One called Attribute (containing the month names)
 - One called Value (containing the corresponding sales values)
 
 - You can rename the new columns:
- Right-click the column Attribute → choose Rename → type `Month`
 - Right-click the column Value → choose Rename → type `Sales`
 
 
Optional:
- Change the data type of the new `Sales` column to a number:
 
- Click the small icon in the `Sales` column header (e.g. ABC123) and choose Whole Number
 
Result: You now have a normalized table where each row represents a single product's sales in a particular month. This structure is ideal for analysis or pivoting in Excel.
Reordering Columns
Steps:
- In Power Query Editor, click and drag the column header to a new position
 - OR: right-click the column header → choose Move → select desired direction (Left, Right, To Beginning, To End)
 
Duplicating and Referencing Queries
Steps:
- In the Queries pane on the left side of Power Query Editor:
- Right-click the query you want to reuse
 
 - To create a full copy: choose Duplicate
- This creates a completely separate copy of the query
 - All applied steps are copied, and changes to the original will not affect the duplicate
 
 - To create a linked version that depends on the original: choose Reference
- This creates a new query that uses the output of the original query as its starting point
 - It does not copy the steps — instead, it acts like a “live link” to the original query’s final result
 
 
Example usage: Let’s say you have a query called `AllSales` that loads all sales records for the year.
You want to:
- Create a second version that only includes data for one region (e.g. “West”)
 - Keep the original intact for full reporting
 
To do this:
- Right-click `AllSales` → choose Reference
 - A new query appears (e.g. `AllSales (2)`)
 - Rename it to something meaningful like `Sales_West`
 - In this new query, apply a filter:
- Select the `Region` column
 - Use the filter dropdown → check only `West`
 
 
Why use Reference instead of Duplicate?
- If the source data or logic in the original query changes, the referenced version automatically updates too
 - This is more efficient and avoids duplication of complex logic
 
Cleanup and Pivot Prep
Goal: Prepare the survey data for summary analysis in a pivot table.
File used:
Steps:
1. Load the data
- Open Excel
 - Go to the Data tab → click Get Data → From File → From Workbook
 - Select the file: Survey_Results.xlsx and click Import
 - In the Navigator window:
- Select the worksheet that contains the survey data
 - Click Transform Data
 
 
2. Remove rows with missing satisfaction values
- In Power Query Editor, locate the column that contains satisfaction scores (e.g. `Satisfaction`)
 - Click the filter icon (▾) in the header of that column
 - Uncheck the option (null) to exclude blank values
 - Click OK
 
3. Rename the satisfaction column to lowercase
- Right-click the header of the satisfaction column
 - Click Rename
 - Type: `satisfaction` and press Enter
 
4. Group the data
- Click the header of the `satisfaction` column to select it
 - Go to the Home tab → click Group By
 - In the dialog box:
- Ensure the selected column is `satisfaction`
 - Set the operation to Count Rows
 - Leave the new column name as default or rename it to `Responses`
 
 - Click OK
 
5. Load the final result to Excel
- Go to the Home tab → click Close & Load To...
 - Choose Table → New Worksheet
 - Click OK
 
Result: You now have a summary table that shows how many responses fall into each satisfaction score. This can be used directly in Excel for charting or further analysis.
Add a Conditional Column
Goal: Categorize satisfaction ratings into "Positive", "Neutral", or "Negative"
Steps:
- In Power Query Editor → go to Add Column tab → click Conditional Column
 - Set:
- New column name: `SatisfactionLevel`
 - If Satisfaction = 4 or 5 → then "Positive"
 - Else if Satisfaction = 3 → then "Neutral"
 - Else → "Negative"
 
 - Click OK
 
→ Continue to Module 4: Automation and Parameters
Return to Main Page