AI Use Cases: From training to inference, understanding how to deploy AI models

Posted on: 2025-03-15

It's fair to say that AI is all over the news, and if you work in this field like I do, it can be hard to keep track of new AI models and tools coming out on an almost daily basis. There are a lot of options from Large Language Models (LLMs) like ChatGPT, to specialized vision models, tools that allow you to edit photos and videos, text-to-speech options to clone voices, image recognition software, and so on. It can also be hard to pin-point how easy or cheap a specific use case can be.

For example, I hear all the time people making the claim that Deepseek is open source, which means anyone can self-host the full model on their laptop, which is patently false. In this short primer I will attempt to give some basic information on how to navigate AI models and how to pick the right tool based on your use-case.


Types of models

First, let's take a look at the various types of models available:

On top of the type of model used, there are also additional features that a model may have. For example, the big improvement that Deepseek brought to the industry is its reasoning ability. Now, new LLMs all feature some kind of reasoning or deep thought function. The difference is that the model now uses step-by-step reasoning or chain-of-thought prompting, rather than generating text based on statistical patterns. It makes them better at logical questions like mathematics and sciences.


Model size considerations

The size of a model is a crucial concept to understand. They are typically categorized based on how many parameters are used to train the model. AI models tend to range from small (a few hundred million parameters) to massive (100+ billion parameters). Size impacts how much resources the model will require:

One additional concept when it comes to model size is quantization. This is a technique used to reduce the memory and compute requirements for running an AI model, by reducing the precision of the model. Instead of using the full 16-bit or 32-bit floating point numbers (FP16 or FP32), quantized models use 8-bit, 4-bit or even lower precision representations. This significantly reduces the amount of VRAM needed to run a model, but it also reduces performance as a result. For example, if you go to the Ollama models page and search for Deepseek, you can see that the model is available in its full 671B version, but also in much smaller 70B, 32B, 14B, 8B, 7B and even 1.5B versions. The lower you go, the less resources the model needs, but also the dumber it ends up being.


Training, fine-tuning and RAG

There are two distinct phases in the use of AI: training and inference. Training is when you build the model by using a ton of data, while inference is when you ask questions. One of the biggest misconceptions in AI is that every use case requires training a model from scratch. In reality, very few companies do this. There are three main ways to adapt AI to your needs, each with different resources requirements:

When it comes to images, you may have heard terms like model checkpoints and LoRA (Low-Rank Adaptation) which relate to the same concepts. A checkpoint is a saved state for a base model, done during its training process. This allows the training to be paused and resumed at a later point. LoRA is a method of fine-tuning an existing model. For example, you may have a model trained on a large quantity of generic images. But if you need a model that is especially good at identifying breeds of dogs, you can create a LoRA by feeding that model hundreds of pictures of dogs with captions specifying the breed.


Which tools to use

Now that we've covered the basic concepts, we can come to the final decision of which tools you should use for your specific use case. There are many options, both cloud services and self-hosted. Some applications are easy to use and can work out of the box, like ChatGPT for generic questions, or GitHub Copilot as a coding assistant. Then there are services that provide an API so you can embed them in your own applications, such as the OpenAI API. There are also base models that can be fine-tuned directly through the cloud, like Amazon Bedrock. Finally, you may instead want to host your own model, either a base model that's available open source, or one you trained or fine tuned. Here are some common use cases that can be improved with AI with suggested tools:

FieldUse caseModel typesSample tools
HealthcareImage analysis (X-Ray, MRIs)Vision model using CNNs or transformersOpenCV, SimpleITK, IBM Watson Health
Drug discovery and protein foldingTransformersGoogle DeepMind AlphaFold, OpenFold, Azure Drug Discovery
Virtual health assistantsChat bot (LLMs)Rasa, IBM Watson, Google Dialogflow
FinanceFraud detectionAnomaly detection modelsAWS Fraud Detector, H2O AI, Scikit-learn
Algorithmic tradingReinforcement learningStable Baseline, TensorTrade
Risk assessmentPredictive modelXGBoost, LightGBM
ManufacturingPredictive maintenanceTime series models (LSTMs)Azure AI for Predictive Maintenance, TensorFlow, Prophet
Quality controlVision model (transformer)AWS Rekognition, OpenCV, Google Vision AI
TransportationAutonomous vehiclesMultimodal modelsAWS Bedrock, TensorFlow, Autoware, Apollo Auto
Traffic optimizationReinforcement learningAzure Percept, OpenAI Gym, TensorFlow
EducationPersonalized learningRecommendation enginesGoogle Education, AWS Personalize, Surprise
Automated gradingNLP (transformers)Google Natural Language, Amazon Comprehend, spaCy
RetailProduct recommendationsCollaborative filtering, transformersAWS Personalize, PredictionIO, RecBole
Customer serviceChatbots (LLMs)Google DialogFlow, IBM Watson Assistant, OpenAI API, Rasa
CybersecurityThreat detectionAnomaly detection modelsAWS GuardDuty, Microsoft Defender, H2O AI
Automated security responseReinforcement learningMicrosoft Sentinel, AWS Security Hub, TensorFlow

Of course each use case is different and you should evaluate tools based on your specific needs, but hopefully this primer gave you some good basis on how to do this.