Power Query – Data Transformation

From Training Material
Jump to navigation Jump to search

Module 3: Data Transformation Techniques

Objective

Clean and reshape data using Power Query’s built-in tools in Excel.

Files Used

Removing Blank Rows and Unnecessary Columns

Steps:

  1. Go to Excel → Data tab → Get Data → From File → From Workbook
  2. Select and load Survey_Results.xlsx
  3. In Navigator → select the worksheet → click Transform Data
  4. In Power Query Editor → Go to Home tab → click Remove Rows → choose Remove Blank Rows
  5. To remove unwanted columns:
    • Click the header of the column you want to remove
    • Hold Ctrl and select additional columns if needed
    • Right-click any selected header → choose Remove Columns

Changing Data Types

Steps:

  1. In Power Query Editor, locate the column you want to change
  2. In the column header, click the small data type icon (e.g. ABC123 or calendar icon)
  3. Select the appropriate data type: Text, Whole Number, Decimal Number, Date, etc.
  4. Alternatively, go to the Transform tab → click Data Type dropdown → choose desired type

Renaming Columns

Steps:

  1. In Power Query Editor, right-click on the column header you want to rename
  2. Click Rename
  3. Type the new column name and press Enter
  4. OR: double-click on the column name to rename directly

Filtering and Sorting Data

Steps:

  1. In Power Query Editor, locate the column by which you want to filter or sort
  2. Click the dropdown arrow in the column header
  3. To filter:
    • Check/uncheck boxes or use number/text filters
  4. To sort:
    • Click Sort Ascending or Sort Descending in the same dropdown menu

Grouping Data

Steps:

  1. Select the column that contains the values to group (e.g. Region or Category)
  2. Go to the Transform tab
  3. Click Group By
  4. In the dialog:
    • Choose Basic or Advanced
    • Select the column to group
    • Choose operation (e.g. Count Rows, Sum, Average)
  5. Click OK

Splitting Columns

Steps:

  1. Select a column (e.g. Full Name or "City, State")
  2. Go to the Home tab → click Split Column
  3. Choose a method (e.g. By Delimiter)
  4. In the dialog:
    • Choose the delimiter (e.g. comma, space, custom)
    • Choose how to split (e.g. at each occurrence, left-most, right-most)
  5. Click OK

Unpivoting Columns

Goal: Reshape data so that monthly sales are listed in rows rather than across columns.

File used:

Steps:

  1. In Excel → go to the Data tab → click Get DataFrom FileFrom Workbook
  2. Select the file: Monthly_Sales_2023.xlsx and click Import
  3. In the Navigator window:
    • Select the worksheet that appears (there is only one sheet in this file)
    • Click the button Transform Data to open Power Query Editor
  4. In Power Query Editor:
    • Confirm that columns like `Jan`, `Feb`, ..., `Jun` are present
    • Select all month columns by clicking on `Jan`, then holding Shift and clicking `Jun`
  5. Go to the Transform tab → click Unpivot Columns
  6. Power Query will convert each of the selected columns into two columns:
    • One called Attribute (containing the month names)
    • One called Value (containing the corresponding sales values)
  7. You can rename the new columns:
    • Right-click the column Attribute → choose Rename → type `Month`
    • Right-click the column Value → choose Rename → type `Sales`

Optional:

  • Change the data type of the new `Sales` column to a number:
    • Click the small icon in the `Sales` column header (e.g. ABC123) and choose Whole Number

Result: You now have a normalized table where each row represents a single product's sales in a particular month. This structure is ideal for analysis or pivoting in Excel.

Reordering Columns

Steps:

  1. In Power Query Editor, click and drag the column header to a new position
  2. OR: right-click the column header → choose Move → select desired direction (Left, Right, To Beginning, To End)

Duplicating and Referencing Queries

Steps:

  1. In the Queries pane on the left side of Power Query Editor:
    • Right-click the query you want to reuse
  2. To create a full copy: choose Duplicate
    1. This creates a completely separate copy of the query
    2. All applied steps are copied, and changes to the original will not affect the duplicate
  3. To create a linked version that depends on the original: choose Reference
    1. This creates a new query that uses the output of the original query as its starting point
    2. It does not copy the steps — instead, it acts like a “live link” to the original query’s final result

Example usage: Let’s say you have a query called `AllSales` that loads all sales records for the year.

You want to:

  • Create a second version that only includes data for one region (e.g. “West”)
  • Keep the original intact for full reporting

To do this:

  1. Right-click `AllSales` → choose Reference
  2. A new query appears (e.g. `AllSales (2)`)
  3. Rename it to something meaningful like `Sales_West`
  4. In this new query, apply a filter:
    • Select the `Region` column
    • Use the filter dropdown → check only `West`

Why use Reference instead of Duplicate?

  • If the source data or logic in the original query changes, the referenced version automatically updates too
  • This is more efficient and avoids duplication of complex logic

Cleanup and Pivot Prep

Goal: Prepare the survey data for summary analysis in a pivot table.

File used:

Steps:

1. Load the data

  1. Open Excel
  2. Go to the Data tab → click Get DataFrom FileFrom Workbook
  3. Select the file: Survey_Results.xlsx and click Import
  4. In the Navigator window:
    • Select the worksheet that contains the survey data
    • Click Transform Data

2. Remove rows with missing satisfaction values

  1. In Power Query Editor, locate the column that contains satisfaction scores (e.g. `Satisfaction`)
  2. Click the filter icon (▾) in the header of that column
  3. Uncheck the option (null) to exclude blank values
  4. Click OK

3. Rename the satisfaction column to lowercase

  1. Right-click the header of the satisfaction column
  2. Click Rename
  3. Type: `satisfaction` and press Enter

4. Group the data

  1. Click the header of the `satisfaction` column to select it
  2. Go to the Home tab → click Group By
  3. In the dialog box:
    • Ensure the selected column is `satisfaction`
    • Set the operation to Count Rows
    • Leave the new column name as default or rename it to `Responses`
  4. Click OK

5. Load the final result to Excel

  1. Go to the Home tab → click Close & Load To...
  2. Choose TableNew Worksheet
  3. Click OK

Result: You now have a summary table that shows how many responses fall into each satisfaction score. This can be used directly in Excel for charting or further analysis.

Add a Conditional Column

Goal: Categorize satisfaction ratings into "Positive", "Neutral", or "Negative"

Steps:

  1. In Power Query Editor → go to Add Column tab → click Conditional Column
  2. Set:
    • New column name: `SatisfactionLevel`
    • If Satisfaction = 4 or 5 → then "Positive"
    • Else if Satisfaction = 3 → then "Neutral"
    • Else → "Negative"
  3. Click OK

→ Continue to Module 4: Automation and Parameters


Return to Main Page