Main use cases: Can be used for any task that requires speech generation. Very high quality for summaries, chat applications, concern and sentiment recognition as well as the generation of creative content, coding or general knowledge. Is able to evaluate the performance of other models in dialog tasks. Can also process images as input.
Input length: Three different models with 8,192 tokens (approx. 6,144 words), 32,769 tokens (approx. 24,576 words) and 128,000 tokens (over 300 pages of continuous text)
Languages: 95 natural languages, is better than GPT-3.5 Turbo in at least 26 languages
Model size: ~1.8 trillion parameters
Main use cases: Can be used for any task that requires speech generation. Very high quality for summaries, chat applications, concern and sentiment recognition as well as the generation of creative content, coding or general knowledge. Is able to evaluate the performance of other models in dialog tasks. Can also process images as input.
Input length: Three different models with 8,192 tokens (approx. 6,144 words), 32,769 tokens (approx. 24,576 words) and 128,000 tokens (over 300 pages of continuous text)
Languages: 95 natural languages, is better than GPT-3.5 Turbo in at least 26 languages
Model size: ~1.8 trillion parameters
The quality of GPT-4's summaries is outstanding in all evaluation categories, i.e. it produces fluent and concise summaries with correct content. For a better understanding of the quality of the results, we compare the ratings of the machine summaries with the reference summaries (by human experts) of the same texts in the chart. GPT-4 achieved better ratings than human-written summaries in all six categories. We therefore advocate its use for German texts and especially for transcripts, some of which are of low quality.
Our tests showed considerable fluctuations in the response times, presumably due to the utilization of the OpenAI API. Otherwise, the response times were in the middle range with an average and median of around 10 seconds and are acceptable for the complexity of the task and the intended use cases.
The summary of the 109 transcripts cost around €2.22, i.e. around 2 cents per transcript. The costs for GPT4 are comparatively high.
Similar to ChatGPT, the OpenAI version of the model is hosted in the US. There is a version on Azure that is hosted in Europe. We already have access to that version and can use it in our products.
Despite the comparatively high price, we can give a clear product recommendation for this model due to its outstanding quality and acceptable speed if summaries of German conversations or transcripts of conversations are required. The model impresses with an overall package of complete, fluently formulated, well-structured and concise summaries.