Machine Learning models learn to perform their functions based on rules and behaviors they deduce from data: they are “trained” on the datasets provided to them.
This is why it is said that Artificial Intelligence is data-driven, and the development of such algorithms requires quality datasets: extensive, varied, and correctly structured. Training on partial or insufficient datasets can significantly compromise the accuracy and reliability of a Machine Learning model; whereas a very varied and precise dataset ensures that biases are minimized and the accuracy of the results is high.
The composition of the dataset is particularly relevant for the development of analysis models such as forecasting and pattern recognition: for this reason, in some contexts, where we build particularly sophisticated forecasting and analysis systems, at Aidia we complement the development of models with the work of collecting, processing, and refining the available data.
For example, a consulting company partner asked us to develop a financial analysis support solution, an intelligence platform that would provide forecasts on stock trends and could signal any emerging anomalies or discrepancies.
Based on Classification, Forecasting, and Pattern Recognition algorithms, the recommendation system required a lot of data: accurate and consolidated historical series, rich in variability - and with enough anomalous events within it for the algorithms to derive patterns.
However, the available dataset was limited, both in terms of anomalies and variety: these limitations made it difficult to think of achieving the desired performance.
The project's challenge, therefore, concerned not only the development of the platform itself but also the expansion and enrichment of the existing dataset - to ensure that the developed models could “run” optimally and meet the client's specific needs.
First of all, to overcome the lack of data and ensure the proper functioning of the analysis model, we developed a Data Augmentation system that filled some of the existing gaps and completed the data already available to the client.
The solution involved reworking the available dataset and generating and inserting “Gaussian noise” within the datasets. The noise allowed us to introduce more randomness into the data used and enabled the algorithms to learn to recognize the “normality” and patterns of the variables more precisely. Patterns, in fact, can indicate a change in trends or an anomaly, and having more “wrong” data in the training data allows for more precise and generalizable detection.
After regularizing the dataset, we then developed the analysis system itself. In this phase, we implemented algorithms based on deep neural networks (DNN) and statistical models based on the calculation of main statistical properties, such as variance or correlation.
After various tests on the predictive capabilities of the individual algorithms, we integrated them into a single analysis system to ensure greater depth of interpretation and 360° analyses.
In parallel with the work on creating and fine-tuning the analysis models, we developed the platform for our client.
Developed with microservices, the platform's core is an interactive dashboard, through which it is possible to study the movements of the variables and perform various pre-set analyses, related to trend performance and recognition of relationships between variables.
In other sections of the software, it is possible to follow more detailed predictions or modify some of the considered parameters - this allows the client's users to customize their analyses and always keep the results optimized.
Overall, thanks to the significant initial work on the dataset and the customized development of the models, we were able to address the concerns related to data limitations and create an advanced analytical platform.
Thanks to the greater flexibility and accuracy of the analyses, the client's consultants have also seen their performance improve.
The developed model demonstrated an accuracy of 94.5%
The dataset has been systematized and optimized