Basic research into machine learning began in the 1950s. But only since the nineties has the technical and financial environment developed in such a way that broad commercial use has become possible. Essential here are the rapid dissemination of new research results in blogs and open source systems, the exponentially growing supply of data, sufficient computing power for the commercially interesting analysis of this data and, last but not least, ample capital in the coffers of the technology companies.
The interaction of these factors can be seen, among other things, in the evolution of machine image recognition. A breakthrough was made in 1998 with LeNet-5, one of the first so-called Convolutional Neural Networks (CNN), which is one of the original forms of deep learning that has been used commercially since the early 1990s. Since 2009, researchers and developers have been working with the public image database ImageNet. This maintains a repository of fourteen million manually named images. The commanding victory of CNN’s “Alex- Net” in the 2012 ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) popularized the use of GPUs to accelerate Deep Learning through parallelization. The competition took place from 2010 to 2017, and the source code of the winning models was published for use and further development by third parties.
Similar synergies accelerated machine imaging after the invention of Generative Adversarial Networks (GAN) in 2014. GAN are “networks in creative competition”: An artificial neural network, called a generator, produces image or sound data. A second network, the “discriminator” (examiner), is trained to distinguish the output of the generator from real recordings. From the examiner’s feedback, the generator iteratively learns to produce realistic artifacts.
In machine text production and translation, transformers and the pre-trained ML models BERT (Bidirectional Encoder Representations from Transformers, 2018) and GPT-3 (Generative Pretrained Transformer, 2020) based on them have been used since 2017. From huge corpora, transformers learn, among other things, semantic analogies, such as king behaves to queen as man behaves to woman.
In reinforcement learning, on the other hand, the system develops and refines a strategy for solving a problem not according to templates, but via a reward function. This process has produced artificial intelligences such as AlphaGo (2017) for the complex board game Go or AlphaStar (2019) for the video game StarCraft II, against which human opponents now have no chance.
How can the research on machine learning be made usable in retail? The following heuristics have proven their worth.
End-to-end digitization requires machine learning.
Almost all business processes include work steps that can only be automated with integrated ML functions. If you want to convert a supermarket to self-service checkouts, for example, you have to automate the weighing of fruit and vegetables. This requires image recognition that reliably distinguishes not only bananas from potatoes, but also varieties of the same species. Doing away with cash registers altogether, as Amazon is currently testing in stationary stores in several major U.S. cities, requires even more precise image recognition. Among other things, this must distinguish between customer property, such as a newspaper purchased elsewhere, and unpaid goods from the assortment, and record whether the customer places an item in the basket with the intention of buying it or puts it back on the shelf.