Main use cases: Model that has been fine-tuned for intent recognition and text classification. It is based on the RoBERTa variation of the basic BERT model and can recognize intents even in complex emails if only the name of the intent is specified (zero shot).
Input length: 512 tokens (approx. 384 words)
Languages: English, French, German, Spanish, Greek & 10 others
Model size: ~355 million parameters
Main use cases: Model that has been fine-tuned for intent recognition and text classification. It is based on the RoBERTa variation of the basic BERT model and can recognize intents even in complex emails if only the name of the intent is specified (zero shot).
Input length: 512 tokens (approx. 384 words)
Languages: English, French, German, Spanish, Greek & 10 others
Model size: ~355 million parameters
The F1 values for the individual concerns vary relatively little (0.69-0.93). However, only two concerns, "Parcel has not arrived" and "I would like to receive my money", show a balanced recognition pattern of recall and precision. These concerns also have the highest F1 values and are also very well recognized at 93% and 88% respectively.
The requests "Change password" and "Please no more advertising" also performed well, while "Delete account or customer account" and "Product defective/deficient" performed rather moderately. In all four cases, the model generalizes too little, so that hardly any false-positive hits are returned (precision at >87% - purple), but less than three quarters of the actual targets are returned ( recall <75% - gold).
The opposite pattern is found in the case of the "transmit meter reading, record" task, which also performs only moderately. In this case, the model generalizes too much, so that although all targets are recognized (Recall 100% - gold), many false hits are also reported, which means that the precision (purple) is comparatively low.
Overall, only three out of seven concerns are just below an F1 value of 0.75, which essentially means that one out of four hits is a false positive and one out of four targets is not found. This means that this model seems to be quite well suited for concern recognition in customer service emails, especially since the results could possibly be significantly improved by further improvement measures (training, fine-tuning, etc.).
We varied the following three parameters in the tests: We tested different formulations of the concerns. This concerned aspects such as the simplicity of the wording or positive vs. negative statements, for example "stop advertising" as opposed to "no more advertising". In addition, we varied the threshold value (0-1) from which a similarity was counted as a hit. The best configuration showed a value of 0.4. Finally, we tested for each configuration whether and what effect it has if a text can be assigned to several concerns at the same time. This resulted in poorer results in our test series. The results shown above represent the best combination of the parameters described.
Our tests for recognizing requests showed comparatively long response times with an average value of 1.28 seconds per email. The model is therefore only suitable for real-time applications to a limited extent.
Median: N/A
Mean: 1.28 sec.
Minimum: N/A
Maximum: N/A
This model was run locally on our servers, so there were no direct costs. In practice, the price depends very much on the setup and the hardware used. In general, larger models are more expensive than smaller ones: XLM-RoBERTa-large-XNLI can be considered rather small with a size of ~355 million parameters.
Due to the promising quality and the probably very low costs, we can, despite the comparatively long response times, give a limited product recommendation for this model if you wish to have requests from German customer inquiries (e-mail) identified. In particular, the application may be worthwhile if VIER's own hosting is urgently required, for example for data protection reasons.