When working with Power BI or Excel, merging (or joining) tables is a crucial task. It allows you to combine data from different sources, offering comprehensive insights. Here’s an overview of the most common merge options in Power Query, explained with simple analogies and real-life examples.
Join TypesNow we will go into the different types of join functions Power Query has made available to us
1. Left Outer Join (All from first, matching from second)
• Analogy: Imagine you’re planning a wedding. You have a guest list (Table 1), and then you collect RSVPs (Table 2). A left outer join ensures that everyone from your guest list (Table 1) will be included, even if they didn’t RSVP. For those who did RSVP, you’ll also have their response.
• Practical Example: In a company, if you have a list of all employees (Table 1) and another list of their completed training sessions (Table 2), a left outer join shows all employees, whether or not they have completed any training (Table 2).
2. Right Outer Join (All from second, matching from first)
• Analogy: Now, you’re only interested in the RSVPs (Table 2). You still want to know if they were on the original guest list (Table 1), but you also want to include people who RSVP’d without being invited. A right outer join shows you all RSVPs, whether or not they were on the guest list.
• Practical Example: If you have a list of all customers who placed orders (Table 2) and another list of all potential customers (Table 1), a right outer join will show all customers who placed orders, even if they weren’t on the original prospect list.
3. Full Outer Join (All rows from both)
• Analogy: Now you want to see both your guest list and RSVPs entirely. You want to know who was invited and who RSVP’d, and also see if anyone RSVP’d who wasn’t on your list. A full outer join shows all the data from both tables, matching where possible and leaving blanks where there are no matches.
• Practical Example: If you have a list of products sold (Table 1) and a list of products in inventory (Table 2), a full outer join will give you all products, both sold and in inventory, even if some products were sold but are not in inventory and vice versa.
4. Inner Join (Only matching rows)
• Analogy: You’re only interested in the people who were on your guest list and RSVP’d. If someone was invited but didn’t RSVP, or RSVP’d without being on the guest list, you don’t want to see them. An inner join only shows rows that exist in both tables.
• Practical Example: If you have a list of current employees (Table 1) and a list of people who completed mandatory training (Table 2), an inner join will show only those who are both current employees and have completed the training.
5. Left Anti Join (Rows only in first)
• Analogy: You want to see who was on your guest list but didn’t RSVP. A left anti join shows only the people who are in Table 1 (your guest list) but have no matching record in Table 2 (RSVPs).
• Practical Example: If you have a list of all employees (Table 1) and another list of those who completed training (Table 2), a left anti join will show you the employees who haven’t completed the training.
6. Right Anti Join (Rows only in second)
• Analogy: Now you want to see who RSVP’d but wasn’t on the original guest list. A right anti join shows only the people who are in Table 2 (RSVPs) but have no match in Table 1 (the guest list).
• Practical Example: If you have a list of employees who signed up for a training session (Table 2) but weren’t originally required to take the training (Table 1), a right anti join will show you the unexpected attendees.
Key Insights:
• Use left outer join to retain all the data from the first table and match as much as you can from the second.
• Use right outer join to retain all the data from the second table and match as much as you can from the first.
• Use full outer join to combine everything from both tables, regardless of matching.
• Use inner join when you only care about data that exists in both tables.
• Use left anti join and right anti join to find rows that exist only in one of the tables but not both.