Today, the situation of data available to businesses is characterized by a large volume and a wide variety in the sources and types of data. Additional dimensions such as time and velocity must be taken into account, with data that changes in real time for example. All this makes the work of data specialists difficult.
On average, we consume 25 trillion bytes of data every day. These different information flows consist of published videos, different messages sent, climate information, records of online transactions and GPS signals, publications on social applications, data from sensors, etc.
The concept of “Big Data” refers to techniques for collecting, storing and handling these “big” sets of data, where general data techniques and tools are limited. These techniques were born thanks to the engineering teams at web giants such as Facebook and Google. With time and the level of popularity they gained, they gave birth to several open source technologies. This is the case of Hadoop and Spark for example.