Kakakuona Forum

HOW TO KNOW WHICH NUMBER OF COMPONENTS IS GOOD

On above documentary i explain to you about PCA but i did not tell you exactly how we choose number of components for PCA if we choose wrong number of component there will be higher variance difference and we can lead to wrong model Teaching.

Lets say we have our array and we dont know exactly how many number of components can be useful so then we apply explained_variance ratio concept

from sklearn.decomposition import PCA pca = PCA() new_feature = pca.fit_transform(array) np.var(new_feature) pca.explained_variance_ratio_

Return

array([0.92461872, 0.05306648, 0.01710261, 0.00521218])

And after that we apply cumulative sum to make our data visual presented

pca.explained_variance_ratio_ cumsum = np.cumsum(pca.explained_variance_ratio_) cumsum

Return

array([0.92461872, 0.97768521, 0.99478782, 1. ])

And then Plotting

plt.plot(cumsum) plt.axhline(y=0.97, c='r', linestyle='-') plt.grid(True) plt.show()

download.png

After that we manually inspect to found best number of component that will save original Data Variance by 97%

d = np.argmax(cumsum > 0.97) + 1 print(d)

Return

So thats how we choose n_component = 2 and saave model efficiency at the same time

Kakakuona Forum

Performing Principle Component Analysis, Why and How ??