Claude is a model that was released by Antropic in two different versions. For this testing, we used Claude v1, the bigger model. There is a version called Claude instant, that is smaller and even faster and lots cheaper than the big model. We are going to test the second model as soon as we got API access.
Anthropic is a company that was founded by former OpenAI leaders and developers to develop AI that follows ethical principles. They believe in the vast impact of AI and want to steer it into a better direction. Their goal are reliable, interpretable and steerable systems. The claim is a good one and they seem an interesting company.
Claude v1 already shows very good summaries and a similar pattern in the results as GPT3.5 Turbo. In terms of completeness - i.e. integrating all important information - it is comparable to human summaries and the summaries are even more pleasant to read (fluidity). While slightly more incorrect information is included than in human summaries, the structure of the text is also slightly worse for the annotators, so that the overall rating is also slightly lower than in human summaries. The biggest challenge for this model is still to create summaries that only contain the important aspects and are as short as possible (relevance). Overall, however, the quality is very good and almost comparable to human summaries. From a quality perspective, the model can definitely be used for summarizing imperfect German transcripts.
The response speed in our test was very good for the area of application. The average speed was 6.87 seconds and the mean value was 7 seconds. The fluctuations were significantly lower than with most other models.
The summary of the 109 transcripts cost around €0.75, i.e. around €0.007 per transcript. The costs are higher than for GPT3.5 Turbo, but lower than for GPT4 or Luminous Surpreme Control.
As with ChatGPT, the effort to get started with Claude is generally very low. You can just write a prompt and get going. But there is a big barrier to get access to the model in general, we requested acces some time ago and still did not get it.
As soon as the access is available, the barrier to get started is similar low as with ChatGPT.
Based on the results, we can give a clear product recommendation for Claude v1, even if German, imperfect transcripts are to be summarized. The model formulates easy-to-read, structured summaries that contain all the important information, but could be a little shorter.
Attention - the model is not yet available for the European market via API.
Claude is hosted in the US, but they make their model available via Google Cloud. So it should be possible to get an EU hosted version later.
Models
Models
Model comparison
Model comparison
Evaluation methods
Summarization
Intent detection