GPT-3.5 is the model that was behind ChatGPT when it was released. It is one of the most powerful models offered by OpenAI and also one of the cheapest models. For this reason, the test will be performed with GPT-3.5 first.
GPT-3.5 is the model that was behind ChatGPT when it was released. It is one of the most powerful models offered by OpenAI and also one of the cheapest models. For this reason, the test will be performed with GPT-3.5 first.
Main use cases: Can be used for any task that requires speech generation. For example summaries, chatbots, voicebots, but also intent or sentiment recognition.
Input length: Two different models with 4,096 tokens (approx. 3,072 words) or 16,385 tokens (approx. 12,288 words)
Languages: 95 natural languages
Model size: 110 billion parameters
Main use cases: Can be used for any task that requires speech generation. For example summaries, chatbots, voicebots, but also intent or sentiment recognition.
Input length: Two different models with 4,096 tokens (approx. 3,072 words) or 16,385 tokens (approx. 12,288 words)
Languages: 95 natural languages
Model size: 110 billion parameters
The quality of GPT-3.5 Turbo's summaries is very good in all evaluation categories, i.e. the summaries produced are fluent, correct in content and concise. For a better understanding of the quality of the results, we compare the ratings of the machine summaries with the reference summaries (human experts) of the same texts in the chart. In the fluency category, GPT-3.5 Turbo even receives better ratings than human-written summaries. We therefore advocate its use for German texts and in particular for transcripts, some of which are of low quality.
Our tests showed considerable fluctuations in the response times, presumably due to the utilization of the OpenAI API. Otherwise, the response times were in the middle range with an average and median of around 11 seconds and are acceptable for the complexity of the task and the intended use cases.
Median: 11.16 sec.
Mean: 11.54 sec.
Minimum: 0.51 sec.
Maximum: 22.72 sec.
The summary of the 109 transcripts cost around €0.07, i.e. around 1 cent for 15 transcripts. The costs for GPT-3.5 Turbo are therefore very low.
The OpenAI version of the model is hosted in the USA. There is a version on Azure that is hosted in Europe. We already have access to this version and can use it for product purposes.
Due to the good quality, the acceptable speed and the low price, we can give a clear product recommendation for this model if summaries of German conversations or transcripts of conversations are required. The model impresses with an overall package of complete, fluently formulated and well-structured summaries, which could only be a little shorter and more concise.
The F1 values for all concerns are extremely high (0.93-1), i.e. all concerns are detected precisely and reliably. This can also be seen in the equally high values for Precision and Recall. We therefore recommend using it for German texts, especially emails in customer service.
General findings during the test were that it works better if it is not explicitly asked to recognize multiple requests. If the model is sure that several requests exist in a text, then it recognizes them anyway. A prompt in German and English mixed (concerns / descriptions in German, rest in English) works better than a completely German prompt. An example increases the accuracy marginally, the model is already very good even zero-shot (only with naming the concern). The temperature as a model setting did not make a big difference in our examples. We would recommend a low temperature of 0.3.
In our tests on request recognition, there were fluctuations in the response times in the form of a few delayed responses, presumably due to the utilization of the OpenAI API. In general, however, the response times are relatively short with an average value of 0.6 seconds and are also suitable for real-time applications.
Median: 0.59 sec.
Mean: 0.6 sec.
Minimum: 0.5 sec.
Maximum: 3.49 sec.
The request recognition for the 790 texts cost €0.89, i.e. around 1 cent per 10 customer requests (without data cleansing). In general, the costs for GPT-3.5 are rather low.
The OpenAI version of the model is hosted in the USA. There is a version on Azure that is hosted in Europe. We already have access to this version and can use it for product purposes.
Due to the above-average quality, the relatively short response times and the low price, we can give this model a clear product recommendation if you want to recognize German customer inquiries by e-mail.