• Old Website
  • Gallery
  • Location
  • Contact Us
  • Search
  • Login
  • ×
Fundamentals of Research Methodology | INVITATION FOR RE-TENDER (IFT No-09/2023-2024) | Eid Vacation | Introduction to Machine Learning with Python | WE MOURN | INVITATION FOR TENDER (IFT No-08/2023-2024) | ''INVITATION FOR TENDER'' (IFT No-07/2023-2024) | MPA Admission Test Result 2023-2024 | Call for Papers BIGM Journal of Policy Analysis
News Update
Back to Previous Page
Muhammad Tanveer Islam
Research Associate, E-33, Sher-E-Bangla Nagar, Agargaon, Dhaka – 1207
Email: [email protected]
Official Telephone No: 01626004179

Education
  • M.Sc
    Institute of Information and Communication Technology, Bangladesh University of Engineering and Technology, Bangladesh
    2017 - 2020

  • B.Sc. Engg
    Computer Science and Engineering, Islamic University of Technology, Bangladesh
    2013 - 2016
Professional Experience
  • Research Associate
    Bangladesh Institute of Governance and Management Bangladesh
    March 2023 - Till Date

  • Business Intelligence Developer
    Wunderman Thompson Bangladesh
    January 2022 - March 2022

  • Assistant Manager (IT)
    Apex Footwear Limited Bangladesh
    January 2021 - December 2021

  • Business Intelligence Developer
    Walton Hi-Tech Industries Limited Bangladesh
    October 2019 - December 2019

  • Jr. Software Engineer
    IQVIA Bangladesh
    January 2017 - August 2017
Areas of Interest
  • Data Mining
  • Machine Learning
  • Data Analysis
  • Data Science
  • Deep Learning
Professional Responsibilities
Skills
  • SQL
  • python (Pandas, Numpy, scikit learn)
  • Power BI
  • Microsoft Excel
Acomplishments
Certifications
  • February 2022
    One exam is given to achieve this certificate which includes simple queries, relationships, and aggregators.

  • February 2022
    One exam is given to achieve this certificate which includes complex joins, unions, and sub-queries.

  • August 2022
    One exam is given to achieve this certificate which covers topics like query optimization, data modeling, Indexing, window functions, and pivots in SQL.

  • August 2022
    One exam is given to achieve this certificate which covers basic topics of Data Structures (such as Arrays, Strings) and Algorithms (such as Sorting and Searching).

  • September 2022
    Introduction to Data Modeling for Power BI is an introductory video course about data modeling, which is a required skill to get the best out of Power BI, Power Pivot for Excel, and Analysis Services. The training is aimed at users that do not have a background knowledge in data modeling for analytical systems and reporting.

  • August 2019
    This certificate above verifies that Tanveer Islam successfully completed the course Microsoft Power BI Desktop for Business Intelligence on 08/23/2019 as taught by Maven Analytics, Chris Dutton, Aaron Parry on Udemy. The certificate indicates the entire course was completed as validated by the student. The course duration represents the total video hours of the course at time of most recent completion.
Publications
Journals
  • Multi-Layer Hybrid (MLH) balancing technique: A combined approach to remove data imbalance
    Data is one of the most important elements currently for business decisions as well as for scientific research. However, data imbalance is a critical issue that affects the outcome of business decisions or the performance of a model as the decision would be biased towards the majority class (MaC). Existing data balancing techniques have a major drawback: these create new artificial samples randomly which create outliers and hamper the potentiality of the original dataset. In this paper, we propose a Multi-Layer Hybrid (MLH) Balancing Scheme which combines three oversampling techniques in two layers. By combining the characteristics of ADASYN, SVM-SMOTE, and SMOTE+ENN with our data processing techniques, our scheme gives a distributed, noise-free output. It also creates new data points within the range of the original dataset, which keeps the originality of the new data points. Thus, the generated dataset is much suitable for machine learning models to achieve results with higher accuracy for highly imbalanced data. Experimental results on datasets with an imbalance ratio of up to 59 show that our proposed scheme can effectively generate a balanced dataset. We apply the resultant dataset to Random Forest and Artificial Neural Network algorithms; comparison with existing techniques shows that our scheme gives better results.; January 2023
Books
  • Multi-layer hybrid balancing technique to remove data imbalance
    June 2021

    Data is one of the essential elements nowadays for discovering business decisions, de- cision optimization, and scientific research and growing exponentially due to the use of different kinds of applications in various business organizations and production indus- tries. The proper dataset offers organizations and researchers to analyze their showcas- ing techniques, make effective data-driven choices and make superior advertisements. In real-life scenarios, most data sources create a gap among class attribute elements which reduces to build a proper decision in the prediction. An imbalanced dataset cre- ates a critical problem that affects the business decisions and makes a biased result towards the major class. However, existing data balancing techniques can solve the problems of data balancing. Existing data balancing techniques have a major draw- back: these create new artificial samples randomly, which create outliers and hamper the potentiality of the original dataset. Our thesis work proposes a Multi-Layer Hybrid (MLH) Balancing Scheme that combines three over-sampling techniques and processes output in a proper way. This scheme gives a balanced and noise-free output by combin- ing the characteristics of ADASYN, SVM-SMOTE, and SMOTE+ENN. It also creates new data points within the range of the original dataset, which keeps the originality of the new data points. Thus, the generated output from three layers is proper balancing output for machine learning models. We use 34 different imbalanced datasets with dif- ferent imbalance ratios, and experimental results show balanced and proper output for the proposed scheme. We apply the resultant dataset to Random Forest (RF) and Ar- tificial Neural Network (ANN); comparing existing techniques shows that our scheme gives better results. We used various types of the dataset in our thesis and got a differ- ent amount of result for these datasets; so we combined the results and got the average output for different metrics. Using the RF, we achieved, 82%, 83%, 83%, 84% and 91% average Accuracy; 45%, 63%, 72%, 58% and 88% average G-Mean; 39%, 55%, 62%, 51% and 83% average F-Measure for Original Dataset, ADASYN, SMOTEENN, SVMSMOTE and Proposed MLH, respectively. Using the ANN, we achieved, 78%, 77%, 74%, 80% and 79% average Accuracy; 30%, 71%, 73%, 69% and 77% aver- age G-Mean; 26%, 59%, 59%, 60% and 67% average F-Measure for Original Dataset, ADASYN, SMOTEENN, SVMSMOTE and Proposed MLH, respectively. Using our proposed approach, we got a better outcome for the imbalanced dataset than the exist- ing approach and observed a better performance for our proposed approach using the Random Forest.
Conference & Research Seminar
  • Performance Comparison of Three Classifiers for the Classification of Breast Cancer Dataset (4th International conference on electrical information and communication technology (EICT))
    Breast Cancer is one of the threatening issues for women's existence nowadays. It is increasing in our society due to pursuing modern/western cultures and careless in food and living habits. It has some syndromes and based on those syndromes we can easily identify whether a patient has breast cancer or not. Support Vector Machine (SVM), Artificial Neural Network (ANN) and Naïve Bayes Algorithms are very popular and powerful supervised learning algorithms to classify an unknown label/result. We select a dataset from WBCD (Wisconsin Breast Cancer Diagnosis) which contains 9 attributes column and 1 class column. The attribute columns are the causes and the class column is the result of the attribute columns. In this paper, we trained different parts of SVM, ANN and Naïve Bayes based on a particular training dataset (WBCD). Based on the highest accuracy, we voted the best model from the described models in this paper and selected it to use in the future for the client dataset (clinical data). The best model is Linear SVM for the WBCD dataset and accuracy is 96.72%.; December 2019
Languages
  • Bangla
    Native

  • English
    Good