August 30, 2024
A guest post from Fabrício Ceolin, DevOps Engineer at Comet. Inspired by the growing demand…
The proliferation of machine learning and deep learning algorithms has been ubiquitous and has not left any mobile devices with an ounce of processing power behind, even our smartphones.
This development can largely be attributed to the increased power that our mobile phones have and a strong desire by consumers to have better features. Hardware has seen a dramatic performance improvement, and software has grown to match these capabilities. Better photographs, recommendation systems, text prediction, and other features have become a staple in even the simplest of modern smartphones. This drives me to ask what challenges existed to deliver this seemingly unreachable dream to us?
Let’s look at the hardware and software strides necessary to make this leap in progress.
Implementing deep learning architectures involves numerous stages, including training data, training the model, inference, and model evaluation.
The above steps are critical for any deep learning model to succeed, but one quickly hits a constraint wall when you attempt to train these models on hardware found on a smartphone. Trying to train a model on potentially gigabytes or terabytes of data becomes an impossible hill to climb.
For the above reasons, engineers and developers developed clever workarounds that leverage existing deep learning solutions to the problem. Some of these solutions are commonly used in other domains.
One of the most formidable ways to deal with hardware shortcomings is using efficient architectures. These architectures are designed to use smaller processing capacity to be accurate.
Some of the architectures that fall under this category are MobileNet and ShuffleNet. Howard et al. describe that they “introduce two simple global hyperparameters that allow the model builder to choose the right sized model for their application based on the constraints of the problem [1].”
These models make inevitable tradeoffs but can still perform exceedingly well on tasks such as facial recognition, object detection, and other domains on mobile phones.
We can move on to the next domain, as many of these models are in the common domain.
Existing implementations of these models allow developers to deploy them on mobile devices as they deem fit for any given purpose.
Transfer learning involves using an already-trained model to improve the performance of a new model. Rather than coming up with an implementation that might be inefficient and less capable than is already possible, why not use a proven method?
APIs such as Keras allow the democratization of these models, and they are even encouraged to be used as they will likely improve the outcome of these efficient architectures.
So far, we might have noticed a pattern that indicates that smaller is preferable for hardware with lower capacity. The above techniques do a phenomenal job of allowing models to perform well. Still, they do not represent the struggle to ensure deep learning models perform optimally on mobile devices.
Additional integral techniques include pruning, quantization, and distillation, among others. The advent of LLMs has brought some of these techniques, primarily quantization, to the limelight. We’ll dive into these techniques.
The simplest definition of pruning is that it reduces the parameters in a neural network. It sounds unintuitive but tactfully minimizes the amount of storage a model requires while maintaining good enough performance for a given task (such as image recognition). However, going further than is needed reasonably leads to a reduction in the performance of a model.
Quantization can be defined as a precision-reducing process that also aims to reduce the amount of memory or storage required for a model. This is done by changing floating point numbers (float data types) to integers (e.g., a float32 to an int8). It uses less memory and has higher efficiency.
The above is a highly simplified explanation of the standard software techniques in building models for devices with comparably smaller processing power. Fortunately, interventions are not limited to the software domain as robust hardware solutions have steadily risen since the necessity arose.
Phones and other mobile devices have increasingly come to encompass the staples integral to our other computing devices. One of the most important aspects of deep learning is the hardware required to run it. GPUs are essential as their architecture is perfect for running deep learning models.
GPUs and Neural Processing Units are highly specialized hardware, and they perform the computations necessary in combination with the software mentioned above interventions. Lite versions of deep-learning libraries are also dedicated and optimized to run exclusively on this class of efficient machinery.
Building deep learning models is difficult despite the advances made to make them accessible. Phones and other mobile devices have become more powerful as the need to leverage more processing power for consumer devices has gained traction. For this reason, multiple software, hardware, and even off-machine solutions such as cloud computing have been made a staple.
Deeper research into these hardware and software interventions will increase the possibilities of phones and other mobile devices.
[1] Howard, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. “Mobilenets: Efficient convolutional neural networks for mobile vision applications.” arXiv preprint arXiv:1704.04861 (2017).