Power Query’s "Remove Duplicates" has me questioning EVERYTHING I’ve ever done in PQ
I had an epic failure with Power Query that has me worried about every spreadsheet I’ve ever created.
A simplified version of my issue was that I had a list containing "User", "Password", and "Date" (when the user created their password):
| User | Password | Date |
|---|---|---|
| Alice | Apple1 | 01-Jan-26 |
| Bob | Blue1 | 01-Jan-26 |
| Bob | Blue2 | 04-Feb-26 |
| Carl | Cat1 | 02-Jan-26 |
| Carl | Cat2 | 01-Feb-26 |
| Carl | Cat3 | 07-Mar-26 |
I needed a list of only each user’s most recent password. In Excel, I could sort by "Dates" in descending order, remove duplicates in the "Name" column, then sort "Names" in ascending order and would get exactly what I was looking for:
| User | Password | Date |
|---|---|---|
| Alice | Apple1 | 01-Jan-26 |
| Bob | Blue2 | 04-Feb-26 |
| Carl | Cat3 | 07-Mar-26 |
But when I tried this in Power Query, it didn’t keep the first name in the sorted table - it kept the first name in the original unsorted table:
| User | Password | Date |
|---|---|---|
| Alice | Apple1 | 01-Jan-26 |
| Bob | Blue1 | 01-Jan-26 |
| Carl | Cat1 | 02-Jan-26 |
I found a convoluted workaround (create a separate query to group by name and max date, and then merge this with the original query to append the password), but why would I ever suspect that the order of removing duplicates would be different in Power Query? The worst part is that it was only a single data point that was incorrect - someone else caught the error - and it took me far too long to figure out what the problem was.
TL;DR: Now I’m concerned there are other non-obvious pitfalls out there that will bite me in the rear. What else do I need to look out for?
[link] [comments]
Want to read more?
Check out the full article on the original site