Power Query for Analysts: Difference between revisions

← Older edit

Latest revision as of 18:43, 17 September 2025

Category:Power Query

Module 1: Introduction to Power Query and Basic Transformations

Objective

In this exercise, you will learn to:

Import data from a CSV file into Power Query.
Review and adjust data types, focusing on converting a date stored as text into a proper date type.
Apply basic data filtering.
Use external help (e.g., ChatGPT) to get guidance on creating custom M code, without directly copying solutions.

Files Provided

You can download the CSV file 📂Media:PQsales.csv which contains sales order data.

Instructions

Step 1: Import the data

Open Power Query in Excel or Power BI.
Import the data from the file sales.csv.
Note that the *OrderDate* column was imported as text due to its format (dd/MM/yyyy).

Step 2: Check and convert data types

Verify that each column has the correct data type.
Manually transform the *OrderDate* column from text to date type.
Tip: If you encounter difficulties, you can ask ChatGPT for hints on how to write an M function to convert text to date.

Step 3: Apply basic filtering

Filter the dataset to keep only rows where the cost is greater than 200.
Suggestion: Use Power Query’s graphical interface or write a simple M script to apply the filter. If needed, consult ChatGPT for implementation ideas.

Step 4: Review and save your work

Confirm that the transformations were applied correctly by reviewing the data preview.
Save the query and document the steps you have taken.

Task

Perform the steps described above in Power Query. Experiment with available transformation options and try to understand how each step affects your data. Use ChatGPT for hints or troubleshooting, but avoid copying complete solutions verbatim.

Module 2: Combining and Merging Data from Multiple Sources

Objective

In this exercise, you will learn to:

Import data from multiple CSV files into Power Query.
Merge data from different sources based on a common key.
Use a Left Outer Join to add customer details to sales orders.
Use external help (e.g., ChatGPT) to get guidance on writing custom M code, without copying complete solutions.

Files Provided

You can download the following two CSV files:

📂 Media:PQ_sales2.csv – Contains sales order data

📂 Media:PQ_customers.csv – Contains customer information

Instructions

Step 1: Import the data

Open Power Query in Excel or Power BI. Import data from both files: PQ_sales.csv and PQ_customers.csv. Check that both queries have been loaded correctly.

Step 2: Check and convert data types

Ensure that each column has the correct data type in both queries. For example, note that the *OrderDate* column in PQ_sales.csv may be imported as text due to a non-standard format. Tip: Use the transformation functions if any adjustments are needed.

Step 3: Merge the data

Merge the *PQ_sales.csv* query with the *PQ_customers.csv* query. Use the *Customer* column as the matching key. Choose the Left Outer Join option so that every sales order is retained along with the corresponding customer details. Suggestion: If you’re not sure how to write M code for this merge, ask ChatGPT for tips on how to merge queries.

Step 4: Review the merged data

Confirm that the resulting query includes additional columns (e.g., *Region*, *CustomerSince*) from the PQ_customers.csv file. Check the merged data to ensure that customer details have been correctly linked to the corresponding sales orders.

Step 5: Save your work

Save your query and document the transformation steps you applied.

Task

Perform the steps described above in Power Query. Experiment with both the graphical interface and custom M code to complete the merge. Use external resources (e.g., ChatGPT) for guidance or troubleshooting, but avoid copying complete solutions verbatim.

Module 3: Creating Custom Columns and Functions

Objective

In this exercise, you will learn to:

Create custom calculated columns in Power Query.
Use built-in Power Query functions to manipulate text, numbers, and dates.
Write custom M functions to automate transformations.
Use ChatGPT to assist in writing and optimizing M code.

Files Provided

The following datasets are used for this exercise:

📂Media:PQ_sales2.csv (used in previous modules) 📂Media:PQ_discounts.csv (new dataset) – contains discount rates based on product type.

Instructions

Step 1: Import the data

Open Power Query in Excel or Power BI. Import both files: *PQ_sales.csv* and *PQ_discounts.csv*. Ensure both tables are loaded correctly.

Step 2: Create a custom column for total cost

In the *PQ_sales* table, add a new custom column: Go to **Add Column → Custom Column**. Name it `TotalCost`. Create a formula to calculate the total cost as:

`Quantity * Cost`

Click OK and review the results.

Step 3: Apply discounts using merge

Merge *PQ_sales* with *PQ_discounts* using the *Product* column as the key. Expand the `DiscountRate` column into the *PQ_sales* table. Add another custom column named `DiscountedPrice`:

`[TotalCost] - ([TotalCost] * [DiscountRate])`

Check that the new column correctly applies the discounts.

Step 4: Create a custom function in M

Create a function to categorize products into price bands: Go to **Home → Advanced Editor**. Write an M function that takes `Cost` as input and returns a category:

Low if Cost < 500   Medium if Cost between 500 and 1500  High if Cost > 1500

Step 5: Assign categories

In the *PQ_sales* table, add a custom column using the function. Name the column `PriceCategory`. Make sure the categories display correctly based on the values in the *Cost* column.

Task

✔ Complete all steps in Power Query.
✔ Experiment with both the graphical interface and M code.
✔ Use ChatGPT for troubleshooting or refining your M scripts.

Module 4: Advanced Data Transformations in Power Query

Objective

In this module, you will learn:

✔ How to pivot and unpivot data in Power Query
✔ How to split and merge columns for better data structure
✔ How to use conditional transformations
✔ How to leverage ChatGPT to build complex M scripts

Files Provided

This exercise introduces a new dataset: 📂 Media:PQ_sales_pivot.csv – contains monthly sales data in a pivoted format.

Instructions

Step 1: Import the data Open Power Query in Excel or Power BI. Import the file *PQ_sales_pivot.csv*. Ensure the table loads correctly.

Step 2: Unpivot the data The current table has a wide format that is not ideal for analysis. Unpivot the monthly columns so the data structure becomes:

Product
Category
Month
Sales Amount

How to do it:

Click **Transform → Use First Row as Headers** to make sure column names are correct.
Select the month columns (e.g., Jan 2025, Feb 2025, etc.).
Click **Transform → Unpivot Columns**.
Rename the resulting columns:
1. Attribute → Month
2. Value → Sales Amount

Step 3: Splitting and merging columns

The `Month` column now contains values like "Jan 2025".
Split this column into `Month Name` and `Year`:
Select the `Month` column.
Click **Transform → Split Column → By Delimiter**.
Choose space (" ") as the delimiter.
Rename the new columns to `Month Name` and `Year`.

Example of merging columns:*

To merge `Product` and `Category`, select both columns:

Click **Transform → Merge Columns**.
Use `" - "` as the separator (e.g., `"Monitor - Electronics"`).

Step 4: Adding conditional transformations Add a new custom column named `Sales Performance` with the following logic:

if [Sales Amount] < 300 then "Low"
else if [Sales Amount] >= 300 and [Sales Amount] < 800 then "Medium"
else "High"

Make sure the column correctly categorizes the sales performance.

Task

✔ Complete all steps in Power Query
✔ Experiment with unpivoting, splitting, merging, and conditional logic
✔ Use ChatGPT for troubleshooting or refining your M scripts

Module 5: Parameterization and Dynamic Queries in Power Query

Objective

In this module, you will learn:

✔ How to create parameters in Power Query
✔ How to use parameters for dynamic filtering and query control

Files Provided

This exercise uses the following files:

📂 Media:PQ_sales2.csv
📂 Media:PQ_parameters.xlsx – contains values for dynamic filtering

Instructions

🔹 Step 1: Load the CSV file into Power Query

Open Excel and go to **Data → Get Data → From File → From Text/CSV**
Select the file `PQ_sales.csv` and load it into Power Query
Make sure Power Query recognizes the data correctly

🔹 Step 2: Process the `Parameters` table

In Power Query, go to the **`Parameters`** table
**Transpose the table** – click **Transform → Transpose**
**Use the first row as headers** – click **Transform → Use First Row as Headers**
**Change the data types** for `startDate` and `endDate` to **Date**:
1. Click the `startDate` column header → choose type `Date`
2. Repeat for `endDate`

🔹 Step 3: Create separate queries for `startDate` and `endDate`

In the `Parameters` table, right-click the value in `startDate` → **Add as New Query**
Repeat this for `endDate`

🔹 Step 4: Change the data type of `OrderDate` in the `PQ Sales` table to date

Go back to the `PQ Sales` query
The `OrderDate` column contains dates in `DD MM YY` format
**Split the column into three parts**:
1. Click **Transform → Split Column → By Delimiter**
2. Choose **Space** (` `) as the delimiter
3. You will get: `OrderDate.1`, `OrderDate.2`, `OrderDate.3` (day, month, year)
**Change their types to `Number` (Int64.Type)**
**Merge into proper `YYYY-MM-DD` format**:
1. Click **Merge Columns**
2. Order the columns as: `OrderDate.2`, `OrderDate.3`, `OrderDate.1` (month, year, day)
3. Use `/` as the separator
4. Rename the new column to `DateOrder`
5. Change its type to **Date**

🔹 Step 5: Add a dynamic filter to `DateOrder`

Open the **Advanced Editor** (`View → Advanced Editor`)
Find the last step before `in`, such as:

"Renamed Columns" = Table.RenameColumns(#"Changed Type2",Template:"Merged", "DateOrder")

🔹 Step 6: Add the filter line:

#"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each [DateOrder] >= startDate and [DateOrder] <= endDate)

Ensure that `startDate` and `endDate` are in Date format.

Update the final `in` line to return the filtered table:

in
#"Filtered Rows"

🔹 Step 7: Check the results

Click **Done**
Verify that the data is correctly filtered
Click **Close & Load** to load the data into Excel

Module 6: Automating Data Combining and Refreshing in Power Query

Objective

In this module, you will learn:

✔ How to automatically import and combine files from a folder
✔ How to handle different column names across files
✔ How to prepare data for reporting regardless of source file structure
✔ How to set up automatic data refresh in Power Query

Files Provided

This exercise uses a set of sales files located in a single folder:

📂 Media:Sales_Jan.xlsx – Sales for January 📂 Media:Sales_Feb.xlsx – Sales for February 📂 Media:Sales_Mar.xlsx – Sales for March

Each file contains similar data, but the sales column names differ:

In *Sales_Jan.xlsx*: the sales column is named `Total Sale`
In *Sales_Feb.xlsx*: the column is named `Revenue`
In *Sales_Mar.xlsx*: the column is named `SalesAmount`

The goal is to combine these files into a single dataset and standardize the column names.

Instructions

Step 1: Load files from a folder

Open Power Query in Excel
Go to **Data → Get Data → From File → From Folder**
Select the folder containing the files (Sales_Jan.xlsx, Sales_Feb.xlsx, Sales_Mar.xlsx)
Click **Load** to add files to Power Query without combining them automatically

Step 2: Use M code to load the data

Open **Advanced Editor** in Power Query
Paste the following M code and click **Done**:

let
// Load files from folder
Source = Folder.Files("C:\Users\pathToFolder..."),

// Add a column to access the Excel file contents
AddContent = Table.AddColumn(Source, "Custom", each Excel.Workbook([Content])),

/ Expand content to view all data
ExpandContent = Table.ExpandTableColumn(AddContent, "Custom", {"Name", "Data"}, {"File Name", "Data"})

in
ExpandContent

After applying the code, you will see a new `Data` column

Step 3: Expand the table contents

Click the expand icon next to the `Data` column
This reveals the full data from each file
Ensure that all relevant columns from all files are visible

Step 4: Remove unnecessary columns

Review the table and remove technical columns (e.g., file path) not needed for analysis
Go to **Transform → Remove Columns** and select what to discard

Step 5: Rename columns

Rename the varying sales columns to a consistent name (e.g., `Sales`)
Use **Transform → Rename Column** to apply a uniform structure

Step 6: Remove unnecessary rows (e.g., repeated headers)

Apply a filter on the column containing sales values
Remove rows with repeated headers caused by merging files
Go to **Transform → Remove Rows → Remove Duplicates**, or filter manually

Step 7: Enable automatic refresh

Go to **Data → Query Properties → Refresh data when opening the file**
Optionally set automatic refresh every X minutes
If a new file (e.g., *Sales_Apr.xlsx*) is added to the folder, Power Query will automatically include it upon refresh!

Task

✔ Load and combine data from *Sales_Jan.xlsx*, *Sales_Feb.xlsx*, and *Sales_Mar.xlsx*
✔ Standardize column names and format the data consistently
✔ Remove empty rows, unnecessary columns, and duplicates
✔ Set up auto-refresh so new files are included automatically
✔ Use ChatGPT to optimize the M code in Power Query

Module 7: Optimizing Query Performance in Power Query

Objective

In this module, you will learn:

✔ How to speed up Power Query when working with large datasets
✔ How to avoid inefficient operations that slow down queries
✔ How to use buffering and database-level transformations
✔ How to minimize the amount of data processed for better performance

Introduction

Power Query enables powerful data transformation, but with large datasets, performance can suffer. In this module, you will learn best practices to reduce query execution time.

Instructions

Step 1: Avoid unnecessary operations on the entire dataset

Load a large CSV file: 📂Media:PQSales_Large.csv
Check the number of rows and columns – the more data, the more important the optimization
Remove unnecessary columns at the beginning of the query instead of the end
Apply early filtering to reduce the number of rows right after import

Step 2: Use buffering (Table.Buffer)

Understand how step-by-step processing works – each operation may cause Power Query to recalculate previous steps
Add `Table.Buffer()` after the filter step to avoid re-processing:

let
Source = Csv.Document(File.Contents("C:\Users\gp\Desktop\PQ\Sales_Large.csv"),[Delimiter=",", Columns=6, Encoding=1252, QuoteStyle=QuoteStyle.None]),
FilteredRows = Table.SelectRows(Source, each [Cost] > 500),
BufferedData = Table.Buffer(FilteredRows)
in
BufferedData

Using `Table.Buffer()` ensures that the results are stored in memory and not recalculated at each step.

Step 3: Minimize the number of loaded rows

When working with large databases or CSV files, load only the needed columns and rows
Use **Keep Top Rows** to load e.g., the first 1000 rows for testing
Apply **Remove Duplicates** early to reduce the volume of data being processed

Step 4: Optimize database connections

If working with SQL Server, Power BI, or another database, avoid importing full tables into Power Query
Instead, apply filtering and grouping on the database side using native SQL

Example:

let
Source = Sql.Database("ServerName", "DatabaseName", [Query="SELECT OrderID, OrderDate, Customer, Product FROM Sales WHERE Cost > 500"])
in
Source

This ensures Power Query pulls only the filtered data instead of processing the entire table in memory.

Step 5: Avoid "drill-down" operations on large datasets

Power Query often suggests drill-downs (e.g., selecting a single value from a table)
When working with large data, operate on whole tables instead of individual records

Step 6: Automatically refresh optimized queries

Once optimized, configure the query to refresh regularly
In Excel, go to **Data → Query Properties → Refresh data when opening the file**

Task

Load the large CSV file (*PQSales_Large.csv*)
Limit the number of loaded rows and columns
Apply `Table.Buffer()` and observe performance improvements
If using a database, optimize your SQL query
Set up auto-refresh for the optimized query
Use ChatGPT to analyze performance and further optimize M code

Module 8: Creating Dynamic Reports and Dashboards in Excel with Power Query

Objective

In this module, you will learn:

✔ How to use Power Query to dynamically generate reports
✔ How to combine data from multiple sources into a single report
✔ How to create interactive reports using PivotTables
✔ How to automate report refreshing in Excel

Files Provided

The following files are used for this exercise:

📂 Media:PQ_Sales_Data.xlsx – Sales data
📂 Media:PQ_Regions.xlsx – Sales regions
📂 Media:PQ_Targets.xlsx – Sales targets

Instructions

Step 1: Import and combine data sources

Open Power Query in Excel
Import the files: *PQ_Sales_Data.xlsx*, *PQ_Regions.xlsx*, and *PQ_Targets.xlsx*
Merge the data using a common key – for example, the `Region` column
Verify that the data is correctly combined and properly formatted

Step 2: Create a dynamic report

Click **Close & Load To...** and select **Pivot Table**
Insert the PivotTable in a new worksheet, using the Power Query output as the source
In the PivotTable Fields pane, set:

 * Rows → `Region`  
 * Columns → `Month`  
 * Values → `Sum of Sales`

Check for accuracy and apply formatting to the table

Step 3: Add conditional formatting

Select the `Sum of Sales` column in the PivotTable
Go to **Conditional Formatting → Color Scales**
Apply gradient colors to highlight low and high sales values
Add a rule: “Greater than” and highlight values above the sales target (from *PQ_Targets.xlsx*) using:

=B5 > VLOOKUP($A5,Targets!$A$2:$B$8,2,0)

Step 4: Automate data refreshing

Go to **Data → Query Properties → Refresh data when opening the file**
Optionally set auto-refresh every X minutes
Test the report by updating the source files and verifying that the report refreshes correctly

Task

✔ Load and combine data from *PQ_Sales_Data.xlsx*, *PQ_Regions.xlsx*, and *PQ_Targets.xlsx*
✔ Create a PivotTable and format it dynamically
✔ Add conditional formatting based on sales targets
✔ Set up automatic data refreshing
✔ Use ChatGPT to analyze and optimize Power Query transformations

Summary: Modules 1–8

Objective

In this exercise, you will summarize all the key concepts learned so far in Power Query by performing a series of transformations on inventory and supplier data. You will apply data import, filtering, merging, column creation, custom functions, and query optimization.

Files Provided

The following files are used for this exercise:

📂 Media:PQ_inventory.csv – Inventory stock data
📂 Media:PQ_suppliers.csv – Supplier information
📂 Media:PQ_orders.csv – Warehouse delivery orders

Instructions

🔹 Step 1: Import data

Open Power Query in Excel or Power BI
Import the three CSV files: `PQ_inventory.csv`, `PQ_suppliers.csv`, `PQ_orders.csv`
Make sure all datasets are loaded correctly

🔹 Step 2: Check and convert data types

Ensure all columns in each dataset have correct data types
Issue to solve: the `StockLevel` column was incorrectly imported as text because it contains units like `150 kg`, `200 l`, `75 pcs`
Transform the `StockLevel` column to extract numeric values and store the unit in a new column `Unit`
Verify that the `SupplierID` column is recognized as an integer

🔹 Step 3: Merge data

Merge `PQ_inventory.csv` with `PQ_suppliers.csv` using the `SupplierID` key
Use a **Left Outer Join** to retain all inventory records
Then merge `PQ_orders.csv` with `PQ_inventory.csv` using the `ProductID` key
Verify that supplier and order info have been successfully added to the inventory table

🔹 Step 4: Create custom columns

Add a column `ReorderLevel` that flags products needing restocking when `StockLevel` is less than `MinimumStock`
Add a column `DaysSinceLastOrder` that calculates the number of days since the last order for each product
Create a custom M function that assigns order priority:

if [StockLevel] < [MinimumStock] and [DaysSinceLastOrder] > 30 then "High" 
else if [StockLevel] < [MinimumStock] then "Medium" 
else "Low"

Add a column `OrderPriority` and apply this function

🔹 Step 5: Conditional filtering and transformations

Remove products that have a `Discontinued` status
Add a new column `SupplierRating` that classifies suppliers by reliability:

if [OnTimeDeliveryRate] < 80 then "Excellent" 
else if [OnTimeDeliveryRate] <= 90 then "Good" 
else "Poor"

Verify that the rating logic works correctly

🔹 Step 6: Reshape the data structure

Unpivot columns `Stock_Jan`, `Stock_Feb`, `Stock_Mar` into: `Product`, `Month`, `Stock Level`
Split the `ProductDetails` column into `ProductName` and `Category`
Merge the `SupplierName` and `Country` columns using `" - "` as a separator

🔹 Step 7: Query optimization

Apply `Table.Buffer()` to improve performance
Remove unused columns and duplicates at the beginning of the transformations, not at the end
If working with large data, limit the loaded rows to a test sample of 1000

🔹 Step 8: Export results

Load the final query as a table into Excel
Test data refresh by updating source files
Set up auto-refresh for the query

Task

✔ Complete all the steps listed above
✔ Experiment with both the graphical interface and M code
✔ Apply query optimization to improve performance
✔ Ensure all transformations are correct and results are as expected
✔ Use ChatGPT for troubleshooting or optimizing your M script

Module 9: Importing and Analyzing PDF Files in Power Query

Objective

In this module, you will learn:

✔ How to import data from PDF files into Power Query
✔ How to transform data and perform analysis on business reports
✔ How to visualize results and draw insights from reports

Files Provided

The following PDF reports are used in this exercise:

📂 Media:Monthly_Sales_Report_Jan2024.pdf – Sales Report
📂 Media:Employee_Attendance_Q1_2024.pdf – Employee Attendance Report
📂 Media:Customer_Feedback_Survey_2024.pdf – Customer Feedback Report

Instructions

🔹 Task 1: Sales Report Analysis

Calculate total sales for all products
Identify the product with the highest and lowest sales
Compute the average transaction value based on transaction count and total sales
Group data by region and calculate total sales per region
Create a pivot table showing sales by region and product

🔹 Task 2: Employee Attendance Report Analysis

Calculate the average attendance rate across all departments
Identify the department with the highest and lowest attendance
Add a new column classifying attendance into categories:
1. High: above 95%
2. Medium: 85%–95%
3. Low: below 85%
Filter the data to show only employees with low attendance
Create a bar chart showing average attendance by department

🔹 Task 3: Customer Feedback Report Analysis

Calculate the average customer rating on a 1–5 scale
Count how many customers gave a rating of 1 or 5
Generate a summary report of the most frequent positive and negative comments
Sort the data by customer rating from lowest to highest
Create a pie chart showing the distribution of customer ratings

Summary

✔ Complete the analysis tasks for each report separately
✔ Apply filtering, sorting, and grouping operations
✔ Use pivot tables to aggregate data
✔ Visualize results using charts in Excel or Power BI
✔ Use ChatGPT if you encounter difficulties during analysis

Module 10: Importing and Analyzing Web Data in Power Query

Objective

In this module, you will learn:

✔ How to import data from statistical tables available on Wikipedia into Power Query
✔ How to transform and analyze data about countries of the world
✔ How to visualize comparison results in Excel or Power BI

Data Sources

In this exercise, we’ll use real tabular data about countries of the world imported directly from Wikipedia. These include:

📊 **Surface area of countries** 🌍
📊 **Population by country** 👥
📊 **Gross Domestic Product (GDP) by country** 💰

Sources:

Instructions

🔹 Step 1: Import data from Wikipedia

Open Power Query in Excel or Power BI
Choose **Get Data → From Web**
Enter the URL of one of the Wikipedia pages above
Once the available tables are loaded, select the one containing statistical data (e.g., country area, population, or GDP)
Click **Load to Power Query** to begin transforming the data

🔹 Step 2: Transform and clean the data

**Remove unnecessary columns**, keeping only those relevant for analysis
**Change data types** so that numbers are correctly interpreted (e.g., `Area` as number, `GDP` as currency)
**Remove empty values** and correct any errors
**Rename columns** to clearer names, such as `Country`, `Area (km²)`, `Population`, `GDP (billion USD)`

🔹 Step 3: Analyze and compare countries

**Calculate population density** by adding a new column with the formula:

Population Density = Population / Area

**Sort countries by GDP** to identify the richest and poorest nations
**Compare area and population** to find the largest, smallest, and most populated countries
**Apply filtering** to display only selected continents or world regions

🔹 Step 4: Visualize the results

Create a **pivot table** in Excel to compare area, population, and GDP
Insert a **bar chart** to show the largest economies
Use a **heat map** to illustrate population density by region
Add **conditional formatting** to highlight countries with extreme statistical values

Task

✔ Import world country data from Wikipedia into Power Query
✔ Transform and clean the data to make it analysis-ready
✔ Calculate population density and other statistical indicators
✔ Create charts and comparison tables in Excel or Power BI
✔ Use ChatGPT if you encounter issues during import or analysis

Module 11: Working with Relational Data in Power Query

Objective

In this module, you will learn:

✔ How to import multiple related tables into Power Query
✔ How to use **Merge Queries** to combine employee, department, and salary grade data
✔ How to perform basic aggregations and grouping using the **Group By** feature
✔ How to answer typical business questions using point-and-click operations in Power Query

Data Sources

In this exercise, we will use three small Excel files that simulate a classic HR database:

Download files:

📂 File:Emp.xlsx → Employees (ID, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)
📂 File:Dept.xlsx → Departments (DEPTNO, DNAME, LOC)
📂 File:Salgrade.xlsx → Salary Grades (GRADE, LOSAL, HISAL)

Instructions

🔹 Step 1: Import all three files

Open Excel → Data → Get Data → From File → From Workbook
Select **emp.xlsx**, then repeat for **dept.xlsx** and **salgrade.xlsx**
You will now see three queries in the Power Query editor: `EMP`, `DEPT`, and `SALGRADE`

🔹 Step 2: Explore the EMP table

Check column data types (e.g. SAL should be number, HIREDATE should be date)
Remove unnecessary columns if needed
Rename columns to user-friendly names (e.g. Employee Name, Department No, Salary)

🔹 Step 3: Combine EMP and DEPT

Use **Home → Merge Queries**
Join EMP.DEPTNO with DEPT.DEPTNO
Expand columns from DEPT (Department Name, Location)

🔹 Step 4: Join EMP with SALGRADE

Use **Merge Queries** again
Match EMP.SAL with SALGRADE ranges: SAL between LOSAL and HISAL
(Hint: do a cross join with helper column =1 in both tables, then filter SAL between LOSAL and HISAL)
Expand the GRADE column

🔹 Step 5: Analyze and group data

Use **Group By** to count employees per department
Add aggregations such as Min Salary, Max Salary, Average Salary
Create new columns with custom calculations if needed (e.g. Salary Difference = Max − Min)

Task

Below are 10 questions.

List all employees together with their **department name** and **location**.
Find all employees who have a **commission (COMM)** value greater than 0.
Show the **highest salary** in each department.
Calculate the **average salary** of managers vs. salesmen (group by JOB).
Find the department(s) that have **no employees assigned**.
Display employees hired in **1981**, sorted by hire date.
Show the **top 3 salaries** in the Sales department.
For each department, count how many **different job titles** are present.
Assign each employee a **salary grade** using the SALGRADE table.
Find employees whose salary is **above the average salary of their department**.

Deliverables

✔ Import and combine EMP, DEPT, SALGRADE into Power Query
✔ Use Merge Queries to enrich employees with department and grade information
✔ Apply Group By and filters to answer the 10 tasks above
✔ Load the transformed data into Excel or Power BI for reporting
✔ Practice solving relational queries with Power Query UI only (no coding)

Power Query for Analysts: Difference between revisions

Latest revision as of 18:43, 17 September 2025

Module 1: Introduction to Power Query and Basic Transformations

Objective

Files Provided

Instructions

Task

Module 2: Combining and Merging Data from Multiple Sources

Objective

Files Provided

Instructions

Task

Module 3: Creating Custom Columns and Functions

Objective

Files Provided

Instructions

Task

Module 4: Advanced Data Transformations in Power Query

Objective

Files Provided

Instructions

Task

Module 5: Parameterization and Dynamic Queries in Power Query

Objective

Files Provided

Instructions

Module 6: Automating Data Combining and Refreshing in Power Query

Objective

Files Provided

Instructions

Task

Module 7: Optimizing Query Performance in Power Query

Objective

Introduction

Instructions

Task

Module 8: Creating Dynamic Reports and Dashboards in Excel with Power Query

Objective

Files Provided

Instructions

Task

Summary: Modules 1–8

Objective

Files Provided

Instructions

Task

Module 9: Importing and Analyzing PDF Files in Power Query

Objective

Files Provided

Instructions

Summary

Module 10: Importing and Analyzing Web Data in Power Query

Objective

Data Sources

Instructions

Task

Module 11: Working with Relational Data in Power Query

Objective

Data Sources

Instructions

Task

Deliverables

Navigation menu

Search