Skip to content
  • 0 Votes
    2 Posts
    74 Views
    A
    HOW TO KNOW WHICH NUMBER OF COMPONENTS IS GOOD

    On above documentary i explain to you about PCA but i did not tell you exactly how we choose number of components for PCA if we choose wrong number of component there will be higher variance difference and we can lead to wrong model Teaching.

    Lets say we have our array and we dont know exactly how many number of components can be useful so then we apply explained_variance ratio concept

    from sklearn.decomposition import PCA pca = PCA() new_feature = pca.fit_transform(array) np.var(new_feature) pca.explained_variance_ratio_

    Return

    array([0.92461872, 0.05306648, 0.01710261, 0.00521218])

    And after that we apply cumulative sum to make our data visual presented

    pca.explained_variance_ratio_ cumsum = np.cumsum(pca.explained_variance_ratio_) cumsum

    Return

    array([0.92461872, 0.97768521, 0.99478782, 1. ])

    And then Plotting

    plt.plot(cumsum) plt.axhline(y=0.97, c='r', linestyle='-') plt.grid(True) plt.show()

    download.png

    After that we manually inspect to found best number of component that will save original Data Variance by 97%

    d = np.argmax(cumsum > 0.97) + 1 print(d)

    Return

    2

    So thats how we choose n_component = 2 and saave model efficiency at the same time