Advanced Data Analytics and Predictive Modeling Projects

This collection of projects demonstrates advanced techniques in data analytics and predictive modeling, tackling complex problems in classification, clustering, and predictive analytics.

Repo: github.com

  • Tech Stack: Python, scikit-learn, pandas, NumPy, Matplotlib, Boosting and ML Algorithms

Key Features

  • Dimensionality reduction using PCA, TSNE, and Truncated SVD
  • Handling imbalanced data with SMOTE
  • Implementation of various classifiers and clustering algorithms
  • Performance evaluation using metrics like F1-score and precision

It is all about

Project 1: Predictive Modeling for Drug Activity

Developed a binary classification model to distinguish active compounds from inactive ones, achieving 77.6% accuracy and 0.716 F1-score on a dataset of 18,000 compounds.

Project 2: Ensemble Classifier for Drug Activity Prediction

Enhanced prediction accuracy with a Decision Tree classifier, achieving 0.83 precision, using SMOTE and AdaBoost for better performance.

Project 3: Predictive Modeling for Drug Activity Using Ensemble Models and Boosting

Enhance the accuracy of predicting the activity of drug compounds using ensemble models and boosting techniques, thereby improving the identification of potential drug candidates.

Project 4: K-Means Clustering on High dimensional Data

Applied K-Means clustering to high-dimensional data of 10,000 images (image clustering), identifying optimal clusters with the Elbow method and visualizing the results.

Implemented a custom K-Means algorithm, effectively reducing dataset dimensions using TSNE and identifying optimal clusters using the Elbow method.