• Gallery
  • Location
  • Contact Us
  • Search
  • Login
  • ×
News Update
Back to Previous Page
Muhammad Tanveer Islam
Research Associate, E-33, Sher-E-Bangla Nagar, Agargaon, Dhaka – 1207
Email: tanveer.islam@bigm.edu.bd


Education
  • M.Sc
    Institute of Information and Communication Technology, Bangladesh University of Engineering and Technology, Bangladesh
    2017 - 2020

  • B.Sc. Engg
    Computer Science and Engineering, Islamic University of Technology, Bangladesh
    2013 - 2016
Professional Experience
  • Research Associate
    Bangladesh Institute of Governance and Management Bangladesh
    March 2023 - Till Date

  • Business Intelligence Developer
    Wunderman Thompson Bangladesh
    January 2022 - March 2022

  • Assistant Manager (IT)
    Apex Footwear Limited Bangladesh
    January 2021 - December 2021

  • Business Intelligence Developer
    Walton Hi-Tech Industries Limited Bangladesh
    October 2019 - December 2019

  • Jr. Software Engineer
    IQVIA Bangladesh
    January 2017 - August 2017
Areas of Interest
  • Data Mining
  • Machine Learning
  • Data Analysis
  • Data Science
  • Deep Learning
Professional Responsibilities
Skills
  • SQL
  • python (Pandas, Numpy, scikit learn)
  • Power BI
  • Microsoft Excel
Acomplishments
Certifications
  • February 2022
    One exam is given to achieve this certificate which includes simple queries, relationships, and aggregators.

  • February 2022
    One exam is given to achieve this certificate which includes complex joins, unions, and sub-queries.

  • August 2022
    One exam is given to achieve this certificate which covers topics like query optimization, data modeling, Indexing, window functions, and pivots in SQL.

  • August 2022
    One exam is given to achieve this certificate which covers basic topics of Data Structures (such as Arrays, Strings) and Algorithms (such as Sorting and Searching).

  • September 2022
    Introduction to Data Modeling for Power BI is an introductory video course about data modeling, which is a required skill to get the best out of Power BI, Power Pivot for Excel, and Analysis Services. The training is aimed at users that do not have a background knowledge in data modeling for analytical systems and reporting.

  • August 2019
    This certificate above verifies that Tanveer Islam successfully completed the course Microsoft Power BI Desktop for Business Intelligence on 08/23/2019 as taught by Maven Analytics, Chris Dutton, Aaron Parry on Udemy. The certificate indicates the entire course was completed as validated by the student. The course duration represents the total video hours of the course at time of most recent completion.
Publications
Journals
  • Enhanced power demand forecasting for Bangladesh: using feature engineering associated with environmental and economic impact
    Forecasting power demand is crucial for developing countries like Bangladesh for various reasons including resource planning due to limited resources. Limited research was found on short-term power demand forecasting of Bangladesh. In this study, a preprocessing pipeline is proposed to generate powerful features including hourly demand, weather and economic data to generate both short- and medium-term load forecasting. Our method achieved the lowest 2.3% MAPE on PGCB dataset in forecasting energy loads for January and February 2024. The efficacy of the generated features, produced from pre-processing pipeline, was validated by utilising 2 machine-learning models including FB-Prophet and LSTM.; February 2025

  • Multi-Layer Hybrid (MLH) balancing technique: A combined approach to remove data imbalance
    Data is one of the most important elements currently for business decisions as well as for scientific research. However, data imbalance is a critical issue that affects the outcome of business decisions or the performance of a model as the decision would be biased towards the majority class (MaC). Existing data balancing techniques have a major drawback: these create new artificial samples randomly which create outliers and hamper the potentiality of the original dataset. In this paper, we propose a Multi-Layer Hybrid (MLH) Balancing Scheme which combines three oversampling techniques in two layers. By combining the characteristics of ADASYN, SVM-SMOTE, and SMOTE+ENN with our data processing techniques, our scheme gives a distributed, noise-free output. It also creates new data points within the range of the original dataset, which keeps the originality of the new data points. Thus, the generated dataset is much suitable for machine learning models to achieve results with higher accuracy for highly imbalanced data. Experimental results on datasets with an imbalance ratio of up to 59 show that our proposed scheme can effectively generate a balanced dataset. We apply the resultant dataset to Random Forest and Artificial Neural Network algorithms; comparison with existing techniques shows that our scheme gives better results.; January 2023

  • Multi-layer hybrid balancing technique to remove data imbalance
    Data is one of the essential elements nowadays for discovering business decisions, de- cision optimization, and scientific research and growing exponentially due to the use of different kinds of applications in various business organizations and production indus- tries. The proper dataset offers organizations and researchers to analyze their showcas- ing techniques, make effective data-driven choices and make superior advertisements. In real-life scenarios, most data sources create a gap among class attribute elements which reduces to build a proper decision in the prediction. An imbalanced dataset cre- ates a critical problem that affects the business decisions and makes a biased result towards the major class. However, existing data balancing techniques can solve the problems of data balancing. Existing data balancing techniques have a major draw- back: these create new artificial samples randomly, which create outliers and hamper the potentiality of the original dataset. Our thesis work proposes a Multi-Layer Hybrid (MLH) Balancing Scheme that combines three over-sampling techniques and processes output in a proper way. This scheme gives a balanced and noise-free output by combin- ing the characteristics of ADASYN, SVM-SMOTE, and SMOTE+ENN. It also creates new data points within the range of the original dataset, which keeps the originality of the new data points. Thus, the generated output from three layers is proper balancing output for machine learning models. We use 34 different imbalanced datasets with dif- ferent imbalance ratios, and experimental results show balanced and proper output for the proposed scheme. We apply the resultant dataset to Random Forest (RF) and Ar- tificial Neural Network (ANN); comparing existing techniques shows that our scheme gives better results. We used various types of the dataset in our thesis and got a differ- ent amount of result for these datasets; so we combined the results and got the average output for different metrics. Using the RF, we achieved, 82%, 83%, 83%, 84% and 91% average Accuracy; 45%, 63%, 72%, 58% and 88% average G-Mean; 39%, 55%, 62%, 51% and 83% average F-Measure for Original Dataset, ADASYN, SMOTEENN, SVMSMOTE and Proposed MLH, respectively. Using the ANN, we achieved, 78%, 77%, 74%, 80% and 79% average Accuracy; 30%, 71%, 73%, 69% and 77% aver- age G-Mean; 26%, 59%, 59%, 60% and 67% average F-Measure for Original Dataset, ADASYN, SMOTEENN, SVMSMOTE and Proposed MLH, respectively. Using our proposed approach, we got a better outcome for the imbalanced dataset than the exist- ing approach and observed a better performance for our proposed approach using the Random Forest.; June 2021

  • Forecasting Tetouan energy demand employing shift approach in machine-learning: complementing econometric insights
    GDP growth with sustainable development for a country is highly dependent on power supply and consumption, and in the modern world, human development cannot think without electricity. It is used in every human development process in a particular country. Power consumption is a crying need for economic growth for a growing nation and economy like Morocco. However, producing electricity is costly, and it is necessary to make it practical for future use. Predicting electricity consumption for effective power management is crucial, and many existing research studies have been conducted on the power consumption demand forecast for the Tetouan City of Morocco using the traditional approach. Still, their outputs are not efficient and accurate compared with our approach. Traditional techniques use target variables directly and do not maintain past data trends. Our study solves this by proposing a consumption shift approach where past consumption and other variables form predictor variables to forecast future consumption. In our study, we use our proposed shift approach for the Quads, Boussafou, and Smir power zone data of Tetouan City for 2017 and the combination (average) of these three power zones for 2017. We used two machine-learning models for future consumption prediction: fb-prophet and neural prophet. Our analysis shows that Tetouan City’s power usage forecasts performed better than traditional forecasting. MAPE increased by 2% and R2 by 5% for 10-minute intervals and by 1.5% and R2 by 4.5% for hourly intervals. Compared with the benchmark study on the same dataset, our approach gives 23.33% and 88% better RMSE for 10-minute and hourly interval datasets. Instead of using the machine-learning model for prediction, we use an econometric model (OLS) separately in our study to identify the relationship between power demand and environmental features and observed temperature and wind speed have a positive impact. In contrast, humidity has a negative impact on the power consumption of Tetouan City.;

  • PC-NCA: a hybrid feature extraction technique for classification in machine-learning
    In the burgeoning field of artificial intelligence (AI), interaction with high-dimensional data is critical for classification problems due to noisy data points in feature variables and a lack of class separation. This paper introduces PC-NCA, a hybrid feature extraction method that links the statistical robustness of Principal Component Analysis (PCA) with the class-discriminative power of Neighborhood Component Analysis (NCA). By integrating these paradigms, the suggested method compensates for noise and redundancy and enhances class separability in high-dimensional, often non-linear data spaces. Empirical studies on 31 public datasets of varying Imbalance Ratio (1.05–18.1) across medicine, chemistry, finance, and computer security domains reveal statistically significant improvements in F1 score, G-mean, AUC, and MCC metrics, with performance improvement averaging 35.16% over the baseline and 10.57% over traditional reduction methods. The method’s supremacy is additionally established through Wilcoxon’s signed rank statistical test (p value < 0.001) to ensure its robustness at varying magnitudes of class imbalance and feature heterogeneity. Beyond accuracy metrics, diversity analysis highlights that PC-NCA achieves 7.4% higher Neighborhood Purity (NP) and 62% lower Tomek Link Rate (TLR) compared to contemporary approaches, indicating stronger intra-class cohesion and reduced boundary ambiguity. In terms of efficiency, PC-NCA requires more runtime than PCA, LDA, and PCA-LDA (approximately 8x, 5x, and 3x, respectively) but remains consistently faster than standalone NCA while delivering comparable or superior accuracy. By integrating non-linear transformation in a denoised space, PC-NCA is an effective and adaptable alternative to the prevailing dimensionality reduction methodologies, making it a significant addition to the machine learning arsenal.;
Books
  • Multi-layer hybrid balancing technique to remove data imbalance
    June 2021

    Data is one of the essential elements nowadays for discovering business decisions, de- cision optimization, and scientific research and growing exponentially due to the use of different kinds of applications in various business organizations and production indus- tries. The proper dataset offers organizations and researchers to analyze their showcas- ing techniques, make effective data-driven choices and make superior advertisements. In real-life scenarios, most data sources create a gap among class attribute elements which reduces to build a proper decision in the prediction. An imbalanced dataset cre- ates a critical problem that affects the business decisions and makes a biased result towards the major class. However, existing data balancing techniques can solve the problems of data balancing. Existing data balancing techniques have a major draw- back: these create new artificial samples randomly, which create outliers and hamper the potentiality of the original dataset. Our thesis work proposes a Multi-Layer Hybrid (MLH) Balancing Scheme that combines three over-sampling techniques and processes output in a proper way. This scheme gives a balanced and noise-free output by combin- ing the characteristics of ADASYN, SVM-SMOTE, and SMOTE+ENN. It also creates new data points within the range of the original dataset, which keeps the originality of the new data points. Thus, the generated output from three layers is proper balancing output for machine learning models. We use 34 different imbalanced datasets with dif- ferent imbalance ratios, and experimental results show balanced and proper output for the proposed scheme. We apply the resultant dataset to Random Forest (RF) and Ar- tificial Neural Network (ANN); comparing existing techniques shows that our scheme gives better results. We used various types of the dataset in our thesis and got a differ- ent amount of result for these datasets; so we combined the results and got the average output for different metrics. Using the RF, we achieved, 82%, 83%, 83%, 84% and 91% average Accuracy; 45%, 63%, 72%, 58% and 88% average G-Mean; 39%, 55%, 62%, 51% and 83% average F-Measure for Original Dataset, ADASYN, SMOTEENN, SVMSMOTE and Proposed MLH, respectively. Using the ANN, we achieved, 78%, 77%, 74%, 80% and 79% average Accuracy; 30%, 71%, 73%, 69% and 77% aver- age G-Mean; 26%, 59%, 59%, 60% and 67% average F-Measure for Original Dataset, ADASYN, SMOTEENN, SVMSMOTE and Proposed MLH, respectively. Using our proposed approach, we got a better outcome for the imbalanced dataset than the exist- ing approach and observed a better performance for our proposed approach using the Random Forest.
Conference & Research Seminar
  • Performance Comparison of Three Classifiers for the Classification of Breast Cancer Dataset (4th International conference on electrical information and communication technology (EICT))
    Breast Cancer is one of the threatening issues for women's existence nowadays. It is increasing in our society due to pursuing modern/western cultures and careless in food and living habits. It has some syndromes and based on those syndromes we can easily identify whether a patient has breast cancer or not. Support Vector Machine (SVM), Artificial Neural Network (ANN) and Naïve Bayes Algorithms are very popular and powerful supervised learning algorithms to classify an unknown label/result. We select a dataset from WBCD (Wisconsin Breast Cancer Diagnosis) which contains 9 attributes column and 1 class column. The attribute columns are the causes and the class column is the result of the attribute columns. In this paper, we trained different parts of SVM, ANN and Naïve Bayes based on a particular training dataset (WBCD). Based on the highest accuracy, we voted the best model from the described models in this paper and selected it to use in the future for the client dataset (clinical data). The best model is Linear SVM for the WBCD dataset and accuracy is 96.72%.; December 2019
Op-eds
Languages
  • Bangla
    Native

  • English
    Good