Question about PLS-DA hyperparameter tuning [R]

Hi all! I am a bioinformatician and I am working on learning some ML tools for some disease/biomarker stuff. I am working with sparse PLS-DA at the moment. Before actually tuning the model, I run on overall global model (without sparsity) to get an idea of what my data looks like and to get to a starting point. Here is what that global model ends up looking like:

global model

So from this, I'm seeing that I should include 2 latent components in my model tuning and I chose to use the centroids.dist. So I tune the model with two components, it gives me the # of features to keep on each component and then I run the final model. However, when I do performance assessment on the final model, it looks like this:

final model (sparse)

I guess I am a little confused. From what I am reading online, and from my own data, error rates should go down with added components. It also doesn't make a ton of sense to me because I should have only picked the features that best distinguish two conditions, so again, I should be seeing error rates decrease.

Can someone please help me understand what I'm seeing here and what could be causing this? I am still learning how all of this works, so amy sort of guidance is appreciated. Thank you!

submitted by /u/dacherrr
[link] [comments]

Question about PLS-DA hyperparameter tuning [R]

Want to read more?

Tagged with