The model was just released and some first tests showed that it might be better then the other models we tested before. This is why we wanted to try it out immediately, even before API access is possible from Germany.
The model was just released and some first tests showed that it might be better then the other models we tested before. This is why we wanted to try it out immediately, even before API access is possible from Germany.
Main use cases: Can be used similarly to GPT-4 for any form of language generation, for example for creative content creation, text summarization, text editing, in-depth dialog, understanding complex contexts or coding.
Input length: 100,000 tokens (approx. 300 pages of continuous text)
Languages: best in English, but also possible in at least 43 other languages
Model size: ~130 billion parameters
Main use cases: Can be used similarly to GPT-4 for any form of language generation, for example for creative content creation, text summarization, text editing, in-depth dialog, understanding complex contexts or coding.
Input length: 100,000 tokens (approx. 300 pages of continuous text)
Languages: best in English, but also possible in at least 43 other languages
Model size: ~130 billion parameters
Claude 2 offers very good summaries. In our test, the summaries were rated almost identically to human summaries, especially in the categories "Completeness" and "Structure". The high score for "No hallucinations" shows that there is hardly any incorrect information in the summaries. Furthermore, these related exclusively to numbers, e.g. incorrect telephone numbers. The second category in which Claude 2's summaries were rated slightly lower than the human summaries is "Relevance", i.e. that the summary only contains the most important aspects and is as short as possible. In contrast, Claude 2 performs significantly better than human summaries in "fluidity" and also visibly better in the general evaluation.
Although the ratings for GPT4 are even better in all categories, Claude 2 is on a par with human summaries here. From a quality perspective, the model can definitely be used for summarizing German-language imperfect transcripts.
The response speed in our test was very good for the area of application. The average response time was around 6 seconds. The fluctuations were significantly lower than with most other models. - Median: 6 sec.- Mean: 6,67 sec.- Minimum: 3 sec.- Maximum: 15 sec.
The summary of the 109 transcripts cost around €1.26, i.e. around €0.01 per transcript. The costs are significantly higher than for GPT3.5 Turbo, but lower than for GPT4 or Luminous Surpreme Control.
Similar to ChatGPT and Claude v1, the effort to get started with Claude 2 is very low. You can just write a (simple) prompt and get going. However there is a bigger barrier to get access to the model in general. We requested access some month ago and still did not get it, because for now the model is only available in the US and UK.
As soon as the access is granted, the barrier to get started is similar low as with ChatGPT.
Claude 2 is hosted in the US, but they make their model available via Google Cloud. So it should be possible to get a version hosted in the EU later.
Due to the very good quality, the comparatively short response times and the moderate price, we can give a clear product recommendation for Claude 2, even if German-language imperfect transcripts are to be summarized. The model formulates easily readable and structured summaries that contain all the important information.
Attention - the model is not yet available for the European market via API.