Power Query – Data Transformation: Difference between revisions

From Training Material
Jump to navigation Jump to search
No edit summary
 
(8 intermediate revisions by the same user not shown)
Line 2: Line 2:


== Objective ==
== Objective ==
In this module, you'll learn how to clean and reshape data in Power Query using built-in transformation tools.
Clean and reshape data using Power Query’s built-in tools in Excel.


== Files Used ==
== Files Used ==
Line 8: Line 8:
* [[Media:Orders_2023.xlsx|Download Orders_2023.xlsx]]
* [[Media:Orders_2023.xlsx|Download Orders_2023.xlsx]]


== Removing Rows and Columns ==
== Removing Blank Rows and Unnecessary Columns ==
'''Steps:'''
'''Steps:'''
# Import the file '''Survey_Results.xlsx'''
# Go to Excel → Data tab → Get Data → From File → From Workbook
# In Power Query Editor:
# Select and load '''Survey_Results.xlsx'''
#* Go to the '''Home''' tab → '''Remove Rows''' → '''Remove Blank Rows'''
# In Navigator → select the worksheet → click '''Transform Data'''
#* Select any unnecessary columns Right-click → '''Remove Columns'''
# In Power Query Editor Go to '''Home''' tab → click '''Remove Rows''' → choose '''Remove Blank Rows'''
# To remove unwanted columns:
#* Click the header of the column you want to remove
#* Hold Ctrl and select additional columns if needed
#* Right-click any selected header choose '''Remove Columns'''


== Changing Data Types ==
== Changing Data Types ==
'''Steps:'''
'''Steps:'''
# Click the icon next to a column header (e.g. ABC123)
# In Power Query Editor, locate the column you want to change
# Choose appropriate type (e.g. '''Whole Number''', '''Date''', '''Text''')
# In the column header, click the small data type icon (e.g. ABC123 or calendar icon)
# Or: go to the '''Transform''' tab → '''Data Type'''
# Select the appropriate data type: '''Text''', '''Whole Number''', '''Decimal Number''', '''Date''', etc.
# Alternatively, go to the '''Transform''' tab → click '''Data Type''' dropdown → choose desired type


== Renaming Columns ==
== Renaming Columns ==
'''Steps:'''
'''Steps:'''
# Double-click the column header, or
# In Power Query Editor, right-click on the column header you want to rename
# Right-click → '''Rename''' Type the new name
# Click '''Rename'''
 
# Type the new column name and press Enter
== Filtering Data ==
# OR: double-click on the column name to rename directly
'''Steps:'''
# Click the filter icon in any column header
# Use checkboxes, number filters, or text search to apply filters


== Sorting Data ==
== Filtering and Sorting Data ==
'''Steps:'''
'''Steps:'''
# Click the dropdown arrow on any column
# In Power Query Editor, locate the column by which you want to filter or sort
# Choose '''Sort Ascending''' or '''Sort Descending'''
# Click the dropdown arrow in the column header
# To filter:
#* Check/uncheck boxes or use number/text filters
# To sort:
#* Click '''Sort Ascending''' or '''Sort Descending''' in the same dropdown menu


== Grouping Data ==
== Grouping Data ==
'''Steps:'''
'''Steps:'''
# Go to the '''Transform''' tab Click '''Group By'''
# Select the column that contains the values to group (e.g. Region or Category)
# Choose the column to group on (e.g. Product or Region)
# Go to the '''Transform''' tab
# Choose an aggregation (e.g. Sum, Count, Average)
# Click '''Group By'''
# In the dialog:
#* Choose '''Basic''' or '''Advanced'''
#* Select the column to group
#* Choose operation (e.g. Count Rows, Sum, Average)
# Click '''OK'''


== Splitting Columns ==
== Splitting Columns ==
'''Steps:'''
'''Steps:'''
# Select a column with combined values (e.g. full name or product code)
# Select a column (e.g. Full Name or "City, State")
# Go to the '''Home''' tab → '''Split Column'''
# Go to the '''Home''' tab → click '''Split Column'''
# Choose method: by delimiter, number of characters, etc.
# Choose a method (e.g. '''By Delimiter''')
# In the dialog:
#* Choose the delimiter (e.g. comma, space, custom)
#* Choose how to split (e.g. at each occurrence, left-most, right-most)
# Click '''OK'''


== Unpivoting Data ==
== Unpivoting Columns ==
'''Use case:''' When you have month names as column headers and want to convert them into rows.
'''Goal:''' Reshape data so that monthly sales are listed in rows rather than across columns.
 
'''File used:'''
* [[Media:Monthly_Sales_2023.xlsx|Download Monthly_Sales_2023.xlsx]]


'''Steps:'''
'''Steps:'''
# Import the file '''Orders_2023.xlsx'''
# In Excel → go to the '''Data''' tab → click '''Get Data''' → '''From File''' → '''From Workbook'''
# Select the columns with month names
# Select the file: '''Monthly_Sales_2023.xlsx''' and click '''Import'''
# Go to the '''Transform''' tab → '''Unpivot Columns'''
# In the Navigator window:
#* Select the worksheet that appears (there is only one sheet in this file)
#* Click the button '''Transform Data''' to open Power Query Editor
# In Power Query Editor:
#* Confirm that columns like `Jan`, `Feb`, ..., `Jun` are present
#* Select all month columns by clicking on `Jan`, then holding Shift and clicking `Jun`
# Go to the '''Transform''' tab → click '''Unpivot Columns'''
# Power Query will convert each of the selected columns into two columns:
#* One called '''Attribute''' (containing the month names)
#* One called '''Value''' (containing the corresponding sales values)
# You can rename the new columns:
#* Right-click the column '''Attribute''' → choose '''Rename''' → type `Month`
#* Right-click the column '''Value''' → choose '''Rename''' → type `Sales`
 
'''Optional:'''
* Change the data type of the new `Sales` column to a number:
#* Click the small icon in the `Sales` column header (e.g. ABC123) and choose '''Whole Number'''
 
'''Result:'''
You now have a normalized table where each row represents a single product's sales in a particular month. This structure is ideal for analysis or pivoting in Excel.


== Reordering Columns ==
== Reordering Columns ==
'''Steps:'''
'''Steps:'''
# Drag and drop columns directly in the preview grid, or
# In Power Query Editor, click and drag the column header to a new position
# Right-click a column → '''Move''' → choose position (Left, Right, Beginning, End)
# OR: right-click the column header choose '''Move''' → select desired direction (Left, Right, To Beginning, To End)


== Duplicating and Referencing Queries ==
== Duplicating and Referencing Queries ==
'''Steps:'''
'''Steps:'''
# In the Queries pane, right-click a query
# In the Queries pane on the left side of Power Query Editor:
# Choose:
#* Right-click the query you want to reuse
#* '''Duplicate''' – makes a copy of the query with independent steps
# To create a full copy: choose '''Duplicate'''
#* '''Reference''' creates a new query that uses the output of the original
## This creates a completely separate copy of the query
## All applied steps are copied, and changes to the original will not affect the duplicate
# To create a linked version that depends on the original: choose '''Reference'''
## This creates a new query that uses the output of the original query as its starting point
## It does not copy the steps — instead, it acts like a “live link” to the original query’s final result
 
'''Example usage:'''
Let’s say you have a query called `AllSales` that loads all sales records for the year.
 
You want to:
* Create a second version that only includes data for one region (e.g. “West”)
* Keep the original intact for full reporting
 
To do this:
# Right-click `AllSales` → choose '''Reference'''
# A new query appears (e.g. `AllSales (2)`)
# Rename it to something meaningful like `Sales_West`
# In this new query, apply a filter:
#* Select the `Region` column
#* Use the filter dropdown → check only `West`
 
'''Why use Reference instead of Duplicate?'''
* If the source data or logic in the original query changes, the referenced version automatically updates too
* This is more efficient and avoids duplication of complex logic
 
== Cleanup and Pivot Prep ==
'''Goal:''' Prepare the survey data for summary analysis in a pivot table.
 
'''File used:'''
* [[Media:Survey_Results.xlsx|Download Survey_Results.xlsx]]
 
'''Steps:'''
 
=== 1. Load the data ===
# Open Excel
# Go to the '''Data''' tab → click '''Get Data''' → '''From File''' → '''From Workbook'''
# Select the file: '''Survey_Results.xlsx''' and click '''Import'''
# In the Navigator window:
#* Select the worksheet that contains the survey data
#* Click '''Transform Data'''
 
=== 2. Remove rows with missing satisfaction values ===
# In Power Query Editor, locate the column that contains satisfaction scores (e.g. `Satisfaction`)
# Click the filter icon (▾) in the header of that column
# Uncheck the option '''(null)''' to exclude blank values
# Click '''OK'''
 
=== 3. Rename the satisfaction column to lowercase ===
# Right-click the header of the satisfaction column
# Click '''Rename'''
# Type: `satisfaction` and press Enter
 
=== 4. Group the data ===
# Click the header of the `satisfaction` column to select it
# Go to the '''Home''' tab → click '''Group By'''
# In the dialog box:
#* Ensure the selected column is `satisfaction`
#* Set the operation to '''Count Rows'''
#* Leave the new column name as default or rename it to `Responses`
# Click '''OK'''
 
=== 5. Load the final result to Excel ===
# Go to the '''Home''' tab → click '''Close & Load To...'''
# Choose '''Table''' → '''New Worksheet'''
# Click '''OK'''
 
'''Result:'''
You now have a summary table that shows how many responses fall into each satisfaction score. This can be used directly in Excel for charting or further analysis.
 
== Add a Conditional Column ==
'''Goal:''' Categorize satisfaction ratings into "Positive", "Neutral", or "Negative"
 
'''Steps:'''
# In Power Query Editor → go to '''Add Column''' tab → click '''Conditional Column'''
# Set:
#* New column name: `SatisfactionLevel`
#* If Satisfaction = 4 or 5 → then "Positive"
#* Else if Satisfaction = 3 → then "Neutral"
#* Else → "Negative"
# Click '''OK'''


----
----


→ Continue to [[Power Query – Automation and Parameters|Module 4: Automation and Parameters]]
→ Continue to [[Power Query – Automation and Parameters|Module 4: Automation and Parameters]]

Latest revision as of 10:34, 22 June 2025

Module 3: Data Transformation Techniques

Objective

Clean and reshape data using Power Query’s built-in tools in Excel.

Files Used

Removing Blank Rows and Unnecessary Columns

Steps:

  1. Go to Excel → Data tab → Get Data → From File → From Workbook
  2. Select and load Survey_Results.xlsx
  3. In Navigator → select the worksheet → click Transform Data
  4. In Power Query Editor → Go to Home tab → click Remove Rows → choose Remove Blank Rows
  5. To remove unwanted columns:
    • Click the header of the column you want to remove
    • Hold Ctrl and select additional columns if needed
    • Right-click any selected header → choose Remove Columns

Changing Data Types

Steps:

  1. In Power Query Editor, locate the column you want to change
  2. In the column header, click the small data type icon (e.g. ABC123 or calendar icon)
  3. Select the appropriate data type: Text, Whole Number, Decimal Number, Date, etc.
  4. Alternatively, go to the Transform tab → click Data Type dropdown → choose desired type

Renaming Columns

Steps:

  1. In Power Query Editor, right-click on the column header you want to rename
  2. Click Rename
  3. Type the new column name and press Enter
  4. OR: double-click on the column name to rename directly

Filtering and Sorting Data

Steps:

  1. In Power Query Editor, locate the column by which you want to filter or sort
  2. Click the dropdown arrow in the column header
  3. To filter:
    • Check/uncheck boxes or use number/text filters
  4. To sort:
    • Click Sort Ascending or Sort Descending in the same dropdown menu

Grouping Data

Steps:

  1. Select the column that contains the values to group (e.g. Region or Category)
  2. Go to the Transform tab
  3. Click Group By
  4. In the dialog:
    • Choose Basic or Advanced
    • Select the column to group
    • Choose operation (e.g. Count Rows, Sum, Average)
  5. Click OK

Splitting Columns

Steps:

  1. Select a column (e.g. Full Name or "City, State")
  2. Go to the Home tab → click Split Column
  3. Choose a method (e.g. By Delimiter)
  4. In the dialog:
    • Choose the delimiter (e.g. comma, space, custom)
    • Choose how to split (e.g. at each occurrence, left-most, right-most)
  5. Click OK

Unpivoting Columns

Goal: Reshape data so that monthly sales are listed in rows rather than across columns.

File used:

Steps:

  1. In Excel → go to the Data tab → click Get DataFrom FileFrom Workbook
  2. Select the file: Monthly_Sales_2023.xlsx and click Import
  3. In the Navigator window:
    • Select the worksheet that appears (there is only one sheet in this file)
    • Click the button Transform Data to open Power Query Editor
  4. In Power Query Editor:
    • Confirm that columns like `Jan`, `Feb`, ..., `Jun` are present
    • Select all month columns by clicking on `Jan`, then holding Shift and clicking `Jun`
  5. Go to the Transform tab → click Unpivot Columns
  6. Power Query will convert each of the selected columns into two columns:
    • One called Attribute (containing the month names)
    • One called Value (containing the corresponding sales values)
  7. You can rename the new columns:
    • Right-click the column Attribute → choose Rename → type `Month`
    • Right-click the column Value → choose Rename → type `Sales`

Optional:

  • Change the data type of the new `Sales` column to a number:
    • Click the small icon in the `Sales` column header (e.g. ABC123) and choose Whole Number

Result: You now have a normalized table where each row represents a single product's sales in a particular month. This structure is ideal for analysis or pivoting in Excel.

Reordering Columns

Steps:

  1. In Power Query Editor, click and drag the column header to a new position
  2. OR: right-click the column header → choose Move → select desired direction (Left, Right, To Beginning, To End)

Duplicating and Referencing Queries

Steps:

  1. In the Queries pane on the left side of Power Query Editor:
    • Right-click the query you want to reuse
  2. To create a full copy: choose Duplicate
    1. This creates a completely separate copy of the query
    2. All applied steps are copied, and changes to the original will not affect the duplicate
  3. To create a linked version that depends on the original: choose Reference
    1. This creates a new query that uses the output of the original query as its starting point
    2. It does not copy the steps — instead, it acts like a “live link” to the original query’s final result

Example usage: Let’s say you have a query called `AllSales` that loads all sales records for the year.

You want to:

  • Create a second version that only includes data for one region (e.g. “West”)
  • Keep the original intact for full reporting

To do this:

  1. Right-click `AllSales` → choose Reference
  2. A new query appears (e.g. `AllSales (2)`)
  3. Rename it to something meaningful like `Sales_West`
  4. In this new query, apply a filter:
    • Select the `Region` column
    • Use the filter dropdown → check only `West`

Why use Reference instead of Duplicate?

  • If the source data or logic in the original query changes, the referenced version automatically updates too
  • This is more efficient and avoids duplication of complex logic

Cleanup and Pivot Prep

Goal: Prepare the survey data for summary analysis in a pivot table.

File used:

Steps:

1. Load the data

  1. Open Excel
  2. Go to the Data tab → click Get DataFrom FileFrom Workbook
  3. Select the file: Survey_Results.xlsx and click Import
  4. In the Navigator window:
    • Select the worksheet that contains the survey data
    • Click Transform Data

2. Remove rows with missing satisfaction values

  1. In Power Query Editor, locate the column that contains satisfaction scores (e.g. `Satisfaction`)
  2. Click the filter icon (▾) in the header of that column
  3. Uncheck the option (null) to exclude blank values
  4. Click OK

3. Rename the satisfaction column to lowercase

  1. Right-click the header of the satisfaction column
  2. Click Rename
  3. Type: `satisfaction` and press Enter

4. Group the data

  1. Click the header of the `satisfaction` column to select it
  2. Go to the Home tab → click Group By
  3. In the dialog box:
    • Ensure the selected column is `satisfaction`
    • Set the operation to Count Rows
    • Leave the new column name as default or rename it to `Responses`
  4. Click OK

5. Load the final result to Excel

  1. Go to the Home tab → click Close & Load To...
  2. Choose TableNew Worksheet
  3. Click OK

Result: You now have a summary table that shows how many responses fall into each satisfaction score. This can be used directly in Excel for charting or further analysis.

Add a Conditional Column

Goal: Categorize satisfaction ratings into "Positive", "Neutral", or "Negative"

Steps:

  1. In Power Query Editor → go to Add Column tab → click Conditional Column
  2. Set:
    • New column name: `SatisfactionLevel`
    • If Satisfaction = 4 or 5 → then "Positive"
    • Else if Satisfaction = 3 → then "Neutral"
    • Else → "Negative"
  3. Click OK

→ Continue to Module 4: Automation and Parameters


Return to Main Page