4 min readfrom Microsoft Excel | Help & Support with your Formula, Macro, and VBA problems | A Reddit Community

How to either remove all duplicate rows including original, or isolate all unique rows

Been doing a lot of googling and coming up empty so far, please if anyone can help at all with this it would be much appreciated. Sorry for the wall of text, trying to keeping it as concise as I can without leaving important details out.

I created an example table below. The table I am working with has hundreds of rows and more columns, but this should get this point across.

I am looking for a way to either:

a) Remove/highlight every duplicate row, including the original/first appearance of a row. In this case rows 2 and 5 should both be deleted and everything else should stay. A row should be considered duplicate if the data matches in every column excluding column B.

b) Isolate/highlight every row that is totally unique excluding column B. In this case that would be rows 1, 3, 4, and 6. Rows 2 and 5 are treated as same/duplicate because every column matches exactly, ignoring column B.

In other words, rows 2 and 5 are the only "right" rows in the table. These rows "pass", and every other row "fails". For every BBB, there is supposed to be an exact YYY copy. If there exists either a BBB that does not have an equivalent YYY, or vice versa, I am looking for some way to identify/isolate those.

A lot of google searches were pointing towards making a helper column that concatenates a string that contains the data of all the columns in a row, and then using that helper column to make comparisons/determine uniqueness. But the problem with my scenario is that, looking at rows 3 and 6, their concatenated strings would be the same because of the blank cells (I assume), but they are not the same rows, they must be treated as distinct/not duplicates. I was also seeing people using COUNTIF conditional formatting, but those seemed to get very complicated and lengthy and to be honest I was having a hard time following them, especially with how many columns the sheet I am working with has. I'd hope there is a simpler way to do this, I am not very experienced with Excel but I truly can't imagine this is that niche of a use case.

If it helps to provide more context, initially I had two separate sheets. One sheet had all of the BBB's and one sheet had all of the YYY's. Every row in the BBB sheet is supposed to match every row is YYY sheet, but it turns out there are some discrepancies between the two, so now I am trying to isolate only the rows that are in one sheet but not the other. If I was in the BBB sheet, I would want to take each row, and see if there are any rows in the YYY sheet that match that row for every single column, and if so/if not, highlight it or mark it in some way. My first attempt was to create a new sheet and essentially paste the data from both sheets into one, with the column B created to denote which sheet the row came from. And then once I had that, use the Remove Duplicates feature, unchecking column B, to remove anything considered a duplicate. But then I ran into the issue that excel keeps the first row and only removes any duplicate rows after that first one. That doesn't help because then I'm left with a sheet of rows that may or may not have been duplicates.

Hopefully this made sense. For anyone that took the time to read this, thank you in advance.

Example table:

A B C D E F G H
Alpha BBB 1 5 blue red
Alpha BBB 5 10 green white
Alpha BBB 10 20 black yellow
Alpha YYY 1 5 blue green
Alpha YYY 5 10 green white
Alpha YYY 10 20 black yellow
submitted by /u/ttappy
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#rows.com
#row zero
#Excel alternatives for data analysis
#generative AI for data analysis
#financial modeling with spreadsheets
#natural language processing for spreadsheets
#real-time data collaboration
#big data management in spreadsheets
#conversational data analysis
#Excel compatibility
#google sheets
#intelligent data visualization
#data visualization tools
#enterprise data management
#big data performance
#Excel alternatives
#data analysis tools
#data cleaning solutions
#real-time collaboration
#duplicate rows