Advanced Data Analytics and Predictive Modeling Projects
This collection of projects demonstrates advanced techniques in data analytics and predictive modeling, tackling complex problems in classification, clustering, and predictive analytics.
Repo: github.com
- Tech Stack: Python, scikit-learn, pandas, NumPy, Matplotlib, Boosting and ML Algorithms
Key Features
- Dimensionality reduction using PCA, TSNE, and Truncated SVD
- Handling imbalanced data with SMOTE
- Implementation of various classifiers and clustering algorithms
- Performance evaluation using metrics like F1-score and precision
It is all about
Project 1: Predictive Modeling for Drug Activity
Developed a binary classification model to distinguish active compounds from inactive ones, achieving 77.6% accuracy and 0.716 F1-score on a dataset of 18,000 compounds.
Project 2: Ensemble Classifier for Drug Activity Prediction
Enhanced prediction accuracy with a Decision Tree classifier, achieving 0.83 precision, using SMOTE and AdaBoost for better performance.
Project 3: Predictive Modeling for Drug Activity Using Ensemble Models and Boosting
Enhance the accuracy of predicting the activity of drug compounds using ensemble models and boosting techniques, thereby improving the identification of potential drug candidates.
Project 4: K-Means Clustering on High dimensional Data
Applied K-Means clustering to high-dimensional data of 10,000 images (image clustering), identifying optimal clusters with the Elbow method and visualizing the results.
Implemented a custom K-Means algorithm, effectively reducing dataset dimensions using TSNE and identifying optimal clusters using the Elbow method.