Get all unique values in column of pipe-separated values
Let's say I have a column with three rows:
A|D
B|A
C
My goal is to be able to pull all unique values in this list. So in this case, that would be A, B, C, and D. The straightforward way to do this might be to TEXTJOIN all rows into a single string, and then break it into rows with TEXTSPLIT:
=UNIQUE(TOCOL(TEXTSPLIT(TEXTJOIN("|",TRUE,A1:A3), "|")))
This works well in small datasets. However, TEXTJOIN has a 32k character limit so it is not scalable to larger datasets, where the words in the list are long. For context, my dataset isn’t particularly large (2000 rows only), and the number of unique values in the columns I’m working with is fewer than 20. It’s just that the pipe-separated list per row can get very long. For example:
A|B|C
B|C|D|E
(where the letters represent long words, of course). So when you TEXTJOIN hundreds of rows like these, you can easily reach the 32K limit, even if there's just a bunch of duplicated words in that string.
Here are the alternatives I tried so far:
=UNIQUE(TOCOL(TEXTSPLIT(A1:A3, "|")))- It only returned A, B, and C (ignoring the D). It seems TEXTSPLIT only breaks rows into separate columns when I apply it to a single cell (likeTEXTSPLIT(A1, "|")), but when I try to use TEXTSPLIT on a range, it splits each cell and only keeps the first column.=UNIQUE(TOCOL(BYROW(TOCOL(A1:A3), LAMBDA(r, TEXTSPLIT(r,"|")))))- I thought this would work since it would do the TEXTSPLIT row by row. But it ended up giving me the same exact list as in #1.
I'm at a loss now. I'm not very skilled with Excel, so I don't know if there are functions available that I'm just not familiar with. Would appreciate any help on this, thank you!
[link] [comments]
Want to read more?
Check out the full article on the original site