FLAN-T5-XXL

by Google

Flan-t5-xxl model available on Huggingface is a Large Language Model that is cabable of various language generation tasks. It could have a good potential for our intent-detection task as a model running locally.

FLAN-T5-XXL

by Google

Main use cases: Model for speech generation, which can be used for translations, text summaries, sentiment analysis or intent recognition. The quality of language generation lags behind larger, more modern models, while intent recognition, for example, is similarly good.

Input length: 512 tokens (approx. 384 words) is basic, trained up to 2048 tokens (approx. 1536 words)

Languages: English, French, Romanian, German

Model size: ~11 billion parameters

Input length: 512 tokens (approx. 384 words) is basic, trained up to 2048 tokens (approx. 1536 words)

Languages: English, French, Romanian, German

Model size: ~11 billion parameters

Test results

Use case: intent detection

Quality

The F1 values for the individual concerns vary slightly (0.75-1), but at a high level.

Four out of seven concerns show a balanced recognition pattern with very high values for Precision, Recall and F1: "Product defective/deficient", "Package has not arrived", "Change password" and "Submit meter reading, record".

In two cases, "I would like to receive my money" and "Please no more advertising", the model generalizes too little, so that hardly any false-positive hits are returned (precision at 95-100% - purple), but only a fraction of the actual targets are returned (lower recall - gold). In the case of the request "Delete account or customer account", the proportions are rather reversed, i.e. there is a tendency for the model to generalize too much and to return false positive hits. However, the pattern is not striking, so it is only a tendency.

Overall, the model shows an above-average recognition rate, although two concerns with F-values of 0.75 and 0.8 indicate that concern recognition is not outstanding across the board. Nevertheless, this model is very well suited to recognizing customer concerns in emails.

We only varied one parameter in the test, namely whether the names of the concerns were formulated in German or English. English formulations led to the better results here.

Response time

In our tests for recognizing requests, we found very short response times with an average value of 0.07 seconds per e-mail. The model is therefore also suitable for real-time applications.

Median: 0.07 sec.
Mean: 0.08 sec.
Minimum: N/A
Maximum: N/A

Expenses

This model was run locally on our servers, so there were no direct costs. In practice, the price depends very much on the setup and the hardware used. In general, larger models are more expensive than smaller ones: Google-FLAN-T5-XXL can be considered large with a size of ~11 billion parameters.

Hosting

Local Hosting possible, GPU needed

Recommendation

Due to the above-average quality, the very short response times and the possibility of hosting the model ourselves, we can give a clear product recommendation for this model if the recognition of concerns in German-language customer emails is desired. This is especially true if the response time is relevant.