Power Query – Data Transformation: Difference between revisions

From Training Material
Jump to navigation Jump to search
No edit summary
No edit summary
Line 2: Line 2:


== Objective ==
== Objective ==
Clean and reshape data using Power Query’s built-in tools.
Clean and reshape data using Power Query’s built-in tools in Excel.


== Files Used ==
== Files Used ==
Line 8: Line 8:
* [[Media:Orders_2023.xlsx|Download Orders_2023.xlsx]]
* [[Media:Orders_2023.xlsx|Download Orders_2023.xlsx]]


== Removing Rows and Columns ==
== Removing Blank Rows and Unnecessary Columns ==
'''Steps:'''
'''Steps:'''
# Load '''Survey_Results.xlsx'''
# Go to Excel → Data tab → Get Data → From File → From Workbook
# In Power Query Editor:
# Select and load '''Survey_Results.xlsx'''
#* Go to the '''Home''' tab → '''Remove Rows''' → '''Remove Blank Rows'''
# In Navigator → select the worksheet → click '''Transform Data'''
#* Select columns to remove Right-click → '''Remove Columns'''
# In Power Query Editor Go to '''Home''' tab → click '''Remove Rows''' → choose '''Remove Blank Rows'''
# To remove unwanted columns:
#* Click the header of the column you want to remove
#* Hold Ctrl and select additional columns if needed
#* Right-click any selected header choose '''Remove Columns'''


== Changing Data Types ==
== Changing Data Types ==
'''Steps:'''
'''Steps:'''
# Click the icon next to a column header (e.g. ABC123)
# In Power Query Editor, locate the column you want to change
# Choose data type (e.g. '''Whole Number''', '''Date''', '''Text''')
# In the column header, click the small data type icon (e.g. ABC123 or calendar icon)
# Or: '''Transform''' tab → '''Data Type'''
# Select the appropriate data type: '''Text''', '''Whole Number''', '''Decimal Number''', '''Date''', etc.
# Alternatively, go to the '''Transform''' tab → click '''Data Type''' dropdown → choose desired type


== Renaming Columns ==
== Renaming Columns ==
'''Steps:'''
'''Steps:'''
# Double-click the column name or
# In Power Query Editor, right-click on the column header you want to rename
# Right-click → '''Rename''' Type new name
# Click '''Rename'''
# Type the new column name and press Enter
# OR: double-click on the column name to rename directly


== Filtering and Sorting ==
== Filtering and Sorting Data ==
'''Steps:'''
'''Steps:'''
# Click filter icon in column header
# In Power Query Editor, locate the column by which you want to filter or sort
# Use built-in filters or checkboxes
# Click the dropdown arrow in the column header
# Click arrow again to access '''Sort Ascending''' / '''Sort Descending'''
# To filter:
#* Check/uncheck boxes or use number/text filters
# To sort:
#* Click '''Sort Ascending''' or '''Sort Descending''' in the same dropdown menu


== Grouping Data ==
== Grouping Data ==
'''Steps:'''
'''Steps:'''
# Select a column
# Select the column that contains the values to group (e.g. Region or Category)
# Go to '''Transform''' tab '''Group By'''
# Go to the '''Transform''' tab
# Choose operation: Sum, Count, etc.
# Click '''Group By'''
# In the dialog:
#* Choose '''Basic''' or '''Advanced'''
#* Select the column to group
#* Choose operation (e.g. Count Rows, Sum, Average)
# Click '''OK'''


== Splitting Columns ==
== Splitting Columns ==
'''Steps:'''
'''Steps:'''
# Select a column (e.g. Full Name)
# Select a column (e.g. Full Name or "City, State")
# Go to '''Home''' tab → '''Split Column'''
# Go to the '''Home''' tab → click '''Split Column'''
# Choose method (e.g. by delimiter)
# Choose a method (e.g. '''By Delimiter''')
# In the dialog:
#* Choose the delimiter (e.g. comma, space, custom)
#* Choose how to split (e.g. at each occurrence, left-most, right-most)
# Click '''OK'''


== Unpivoting Columns ==
== Unpivoting Columns ==
'''Steps:'''
'''Steps:'''
# Load '''Orders_2023.xlsx'''
# Go to Excel → Data tab → Get Data → From File → From Workbook
# Select month columns (e.g. Jan, Feb)
# Select and load '''Orders_2023.xlsx'''
# Go to '''Transform''' tab '''Unpivot Columns'''
# In Navigator → select the worksheet → click '''Transform Data'''
# In Power Query Editor:
#* Select all columns representing months (e.g. Jan, Feb, Mar)
#* Go to the '''Transform''' tab
#* Click '''Unpivot Columns'''
# This will convert columns into attribute-value pairs


== Reordering Columns ==
== Reordering Columns ==
'''Steps:'''
'''Steps:'''
# Drag columns to new position or
# In Power Query Editor, click and drag the column header to a new position
# Right-click column → '''Move''' → choose direction
# OR: right-click the column header choose '''Move''' → select desired direction (Left, Right, To Beginning, To End)


== Duplicating and Referencing Queries ==
== Duplicating and Referencing Queries ==
'''Steps:'''
'''Steps:'''
# Right-click a query in the left pane
# In the Queries pane on the left side of Power Query Editor:
# Choose:
#* Right-click the query you want to reuse
#* '''Duplicate''' – creates a full copy
# To create a full copy: choose '''Duplicate'''
#* '''Reference''' – creates a linked copy
# To create a linked version that depends on the original: choose '''Reference'''


== Additional Exercise: Cleanup and Pivot Prep ==
== Additional Exercise: Cleanup and Pivot Prep ==
'''Goal:''' Prepare survey data for analysis.
'''Goal:''' Prepare survey data for pivot table analysis


'''Steps:'''
'''Steps:'''
# Remove rows with missing satisfaction scores
# Load '''Survey_Results.xlsx''' and remove rows with missing values in the satisfaction column
# Change column names to lowercase
# Rename the satisfaction column to lowercase (e.g. "satisfaction")
# Group by satisfaction rating and count rows
# Group the table by satisfaction rating:
# Load results into a new sheet
#* Go to '''Transform''' → '''Group By''' → select satisfaction → operation: Count Rows
# Load the final grouped table to a new worksheet in Excel


== Extension: Conditional Columns ==
== Extension: Add a Conditional Column ==
'''Goal:''' Create a new column that classifies satisfaction as Positive/Neutral/Negative.
'''Goal:''' Categorize satisfaction ratings into "Positive", "Neutral", or "Negative"


'''Steps:'''
'''Steps:'''
# Go to the '''Add Column''' tab → '''Conditional Column'''
# In Power Query Editor → go to '''Add Column''' tab → click '''Conditional Column'''
# Name: `SatisfactionLevel`
# Set:
# Logic:
#* New column name: `SatisfactionLevel`
#* If Satisfaction = 4 or 5 → "Positive"
#* If Satisfaction = 4 or 5 → then "Positive"
#* If Satisfaction = 3 → "Neutral"
#* Else if Satisfaction = 3 → then "Neutral"
#* If Satisfaction = 1 or 2 → "Negative"
#* Else → "Negative"
# Click OK
# Click '''OK'''


----
----

Revision as of 09:54, 22 June 2025

Module 3: Data Transformation Techniques

Objective

Clean and reshape data using Power Query’s built-in tools in Excel.

Files Used

Removing Blank Rows and Unnecessary Columns

Steps:

  1. Go to Excel → Data tab → Get Data → From File → From Workbook
  2. Select and load Survey_Results.xlsx
  3. In Navigator → select the worksheet → click Transform Data
  4. In Power Query Editor → Go to Home tab → click Remove Rows → choose Remove Blank Rows
  5. To remove unwanted columns:
    • Click the header of the column you want to remove
    • Hold Ctrl and select additional columns if needed
    • Right-click any selected header → choose Remove Columns

Changing Data Types

Steps:

  1. In Power Query Editor, locate the column you want to change
  2. In the column header, click the small data type icon (e.g. ABC123 or calendar icon)
  3. Select the appropriate data type: Text, Whole Number, Decimal Number, Date, etc.
  4. Alternatively, go to the Transform tab → click Data Type dropdown → choose desired type

Renaming Columns

Steps:

  1. In Power Query Editor, right-click on the column header you want to rename
  2. Click Rename
  3. Type the new column name and press Enter
  4. OR: double-click on the column name to rename directly

Filtering and Sorting Data

Steps:

  1. In Power Query Editor, locate the column by which you want to filter or sort
  2. Click the dropdown arrow in the column header
  3. To filter:
    • Check/uncheck boxes or use number/text filters
  4. To sort:
    • Click Sort Ascending or Sort Descending in the same dropdown menu

Grouping Data

Steps:

  1. Select the column that contains the values to group (e.g. Region or Category)
  2. Go to the Transform tab
  3. Click Group By
  4. In the dialog:
    • Choose Basic or Advanced
    • Select the column to group
    • Choose operation (e.g. Count Rows, Sum, Average)
  5. Click OK

Splitting Columns

Steps:

  1. Select a column (e.g. Full Name or "City, State")
  2. Go to the Home tab → click Split Column
  3. Choose a method (e.g. By Delimiter)
  4. In the dialog:
    • Choose the delimiter (e.g. comma, space, custom)
    • Choose how to split (e.g. at each occurrence, left-most, right-most)
  5. Click OK

Unpivoting Columns

Steps:

  1. Go to Excel → Data tab → Get Data → From File → From Workbook
  2. Select and load Orders_2023.xlsx
  3. In Navigator → select the worksheet → click Transform Data
  4. In Power Query Editor:
    • Select all columns representing months (e.g. Jan, Feb, Mar)
    • Go to the Transform tab
    • Click Unpivot Columns
  5. This will convert columns into attribute-value pairs

Reordering Columns

Steps:

  1. In Power Query Editor, click and drag the column header to a new position
  2. OR: right-click the column header → choose Move → select desired direction (Left, Right, To Beginning, To End)

Duplicating and Referencing Queries

Steps:

  1. In the Queries pane on the left side of Power Query Editor:
    • Right-click the query you want to reuse
  2. To create a full copy: choose Duplicate
  3. To create a linked version that depends on the original: choose Reference

Additional Exercise: Cleanup and Pivot Prep

Goal: Prepare survey data for pivot table analysis

Steps:

  1. Load Survey_Results.xlsx and remove rows with missing values in the satisfaction column
  2. Rename the satisfaction column to lowercase (e.g. "satisfaction")
  3. Group the table by satisfaction rating:
    • Go to TransformGroup By → select satisfaction → operation: Count Rows
  4. Load the final grouped table to a new worksheet in Excel

Extension: Add a Conditional Column

Goal: Categorize satisfaction ratings into "Positive", "Neutral", or "Negative"

Steps:

  1. In Power Query Editor → go to Add Column tab → click Conditional Column
  2. Set:
    • New column name: `SatisfactionLevel`
    • If Satisfaction = 4 or 5 → then "Positive"
    • Else if Satisfaction = 3 → then "Neutral"
    • Else → "Negative"
  3. Click OK

→ Continue to Module 4: Automation and Parameters


Return to Main Page