- Enhanced power demand forecasting for Bangladesh: using feature engineering associated with environmental and economic impact
Forecasting power demand is crucial for developing countries like Bangladesh for various reasons including resource planning due to limited resources. Limited research was found on short-term power demand forecasting of Bangladesh. In this study, a preprocessing pipeline is proposed to generate powerful features including hourly demand, weather and economic data to generate both short- and medium-term load forecasting. Our method achieved the lowest 2.3% MAPE on PGCB dataset in forecasting energy loads for January and February 2024. The efficacy of the generated features, produced from pre-processing pipeline, was validated by utilising 2 machine-learning models including FB-Prophet and LSTM.; February 2025
- Multi-Layer Hybrid (MLH) balancing technique: A combined approach to remove data imbalance
Data is one of the most important elements currently for business decisions as well as for scientific research. However, data imbalance is a critical issue that affects the outcome of business decisions or the performance of a model as the decision would be biased towards the majority class (MaC). Existing data balancing techniques have a major drawback: these create new artificial samples randomly which create outliers and hamper the potentiality of the original dataset. In this paper, we propose a Multi-Layer Hybrid (MLH) Balancing Scheme which combines three oversampling techniques in two layers. By combining the characteristics of ADASYN, SVM-SMOTE, and SMOTE+ENN with our data processing techniques, our scheme gives a distributed, noise-free output. It also creates new data points within the range of the original dataset, which keeps the originality of the new data points. Thus, the generated dataset is much suitable for machine learning models to achieve results with higher accuracy for highly imbalanced data. Experimental results on datasets with an imbalance ratio of up to 59 show that our proposed scheme can effectively generate a balanced dataset. We apply the resultant dataset to Random Forest and Artificial Neural Network algorithms; comparison with existing techniques shows that our scheme gives better results.; January 2023
- Multi-layer hybrid balancing technique to remove data imbalance
Data is one of the essential elements nowadays for discovering business decisions, de- cision optimization, and scientific research and growing exponentially due to the use of different kinds of applications in various business organizations and production indus- tries. The proper dataset offers organizations and researchers to analyze their showcas- ing techniques, make effective data-driven choices and make superior advertisements. In real-life scenarios, most data sources create a gap among class attribute elements which reduces to build a proper decision in the prediction. An imbalanced dataset cre- ates a critical problem that affects the business decisions and makes a biased result towards the major class. However, existing data balancing techniques can solve the problems of data balancing. Existing data balancing techniques have a major draw- back: these create new artificial samples randomly, which create outliers and hamper the potentiality of the original dataset. Our thesis work proposes a Multi-Layer Hybrid (MLH) Balancing Scheme that combines three over-sampling techniques and processes output in a proper way. This scheme gives a balanced and noise-free output by combin- ing the characteristics of ADASYN, SVM-SMOTE, and SMOTE+ENN. It also creates new data points within the range of the original dataset, which keeps the originality of the new data points. Thus, the generated output from three layers is proper balancing output for machine learning models. We use 34 different imbalanced datasets with dif- ferent imbalance ratios, and experimental results show balanced and proper output for the proposed scheme. We apply the resultant dataset to Random Forest (RF) and Ar- tificial Neural Network (ANN); comparing existing techniques shows that our scheme gives better results. We used various types of the dataset in our thesis and got a differ- ent amount of result for these datasets; so we combined the results and got the average output for different metrics. Using the RF, we achieved, 82%, 83%, 83%, 84% and 91% average Accuracy; 45%, 63%, 72%, 58% and 88% average G-Mean; 39%, 55%, 62%, 51% and 83% average F-Measure for Original Dataset, ADASYN, SMOTEENN, SVMSMOTE and Proposed MLH, respectively. Using the ANN, we achieved, 78%, 77%, 74%, 80% and 79% average Accuracy; 30%, 71%, 73%, 69% and 77% aver- age G-Mean; 26%, 59%, 59%, 60% and 67% average F-Measure for Original Dataset, ADASYN, SMOTEENN, SVMSMOTE and Proposed MLH, respectively. Using our proposed approach, we got a better outcome for the imbalanced dataset than the exist- ing approach and observed a better performance for our proposed approach using the Random Forest.; June 2021
- Forecasting Tetouan energy demand employing shift approach in machine-learning: complementing econometric insights
GDP growth with sustainable development for a country is highly dependent on power supply and consumption, and in the modern world, human development cannot think without electricity. It is used in every human development process in a particular country. Power consumption is a crying need for economic growth for a growing nation and economy like Morocco. However, producing electricity is costly, and it is necessary to make it practical for future use. Predicting electricity consumption for effective power management is crucial, and many existing research studies have been conducted on the power consumption demand forecast for the Tetouan City of Morocco using the traditional approach. Still, their outputs are not efficient and accurate compared with our approach. Traditional techniques use target variables directly and do not maintain past data trends. Our study solves this by proposing a consumption shift approach where past consumption and other variables form predictor variables to forecast future consumption. In our study, we use our proposed shift approach for the Quads, Boussafou, and Smir power zone data of Tetouan City for 2017 and the combination (average) of these three power zones for 2017. We used two machine-learning models for future consumption prediction: fb-prophet and neural prophet. Our analysis shows that Tetouan City’s power usage forecasts performed better than traditional forecasting. MAPE increased by 2% and R2 by 5% for 10-minute intervals and by 1.5% and R2 by 4.5% for hourly intervals. Compared with the benchmark study on the same dataset, our approach gives 23.33% and 88% better RMSE for 10-minute and hourly interval datasets. Instead of using the machine-learning model for prediction, we use an econometric model (OLS) separately in our study to identify the relationship between power demand and environmental features and observed temperature and wind speed have a positive impact. In contrast, humidity has a negative impact on the power consumption of Tetouan City.;
- PC-NCA: a hybrid feature extraction technique for classification in machine-learning
In the burgeoning field of artificial intelligence (AI), interaction with high-dimensional data is critical for classification problems due to noisy data points in feature variables and a lack of class separation. This paper introduces PC-NCA, a hybrid feature extraction method that links the statistical robustness of Principal Component Analysis (PCA) with the class-discriminative power of Neighborhood Component Analysis (NCA). By integrating these paradigms, the suggested method compensates for noise and redundancy and enhances class separability in high-dimensional, often non-linear data spaces. Empirical studies on 31 public datasets of varying Imbalance Ratio (1.05–18.1) across medicine, chemistry, finance, and computer security domains reveal statistically significant improvements in F1 score, G-mean, AUC, and MCC metrics, with performance improvement averaging 35.16% over the baseline and 10.57% over traditional reduction methods. The method’s supremacy is additionally established through Wilcoxon’s signed rank statistical test (p value < 0.001) to ensure its robustness at varying magnitudes of class imbalance and feature heterogeneity. Beyond accuracy metrics, diversity analysis highlights that PC-NCA achieves 7.4% higher Neighborhood Purity (NP) and 62% lower Tomek Link Rate (TLR) compared to contemporary approaches, indicating stronger intra-class cohesion and reduced boundary ambiguity. In terms of efficiency, PC-NCA requires more runtime than PCA, LDA, and PCA-LDA (approximately 8x, 5x, and 3x, respectively) but remains consistently faster than standalone NCA while delivering comparable or superior accuracy. By integrating non-linear transformation in a denoised space, PC-NCA is an effective and adaptable alternative to the prevailing dimensionality reduction methodologies, making it a significant addition to the machine learning arsenal.;