AI LLM Prompting Tests - My Results on Prompt Engineering
Manage episode 377209197 series 3311112
I discuss my experience testing different AI systems prompting including Google Bard, OpenAI GPT-4 / GPT 3.5, Anthropic Claude 2, Llama 2, and Jasper to generate location-specific content. Most of this is based on the last 18 months of building out prompts, and now testing on models released over the last 4-6 weeks.
Google Bard
- Released major update on July 13, 2023
- Prompt strategy: Long paragraphs, numbered tasks, multiple iterations
- Couldn't produce high quality content without heavy editing
- Issues following instructions, needing reminders
OpenAI GPT-4
- Works well with conversational, transcribed prompt
- Able to follow directions and produce high quality content
- No need for shot prompting
OpenAI GPT-3.5
- Uses revised GPT-4 prompt plus follow up to enforce formatting
- Gets content production-ready after second prompt
- Quality close to GPT-4 with additional data/content provided
Anthropic Claude 2
- No API access, using text interface
- Required revising prompt structure significantly
- XML tagging of data types improves context
- Built-in prompt diagnosis/suggestions helpful
- Single prompt can produce high quality output
Meta Llama 2
- Free to use commercially if you have the hardware
- Expected behavior similar to GPT-3.5
- GPT-4 prompt worked well
- Quality closer to GPT-3.5 but better privacy
- Could refine with prompt chaining
- Issues following instructions precisely
Jasper API
- Access useful for building AI tools
- Long prompt length capability
- Appears to use GPT-4 or variant
- Zero shot performs as well as GPT-4
- Able to produce high quality content easily
Conclusion
- GPT-4 and Jasper produce quality results most easily
- Pleasantly surprised by Claude 2 quality and formatting of prompt
- Llama 2 needs refinement to reach GPT-4 level
- Curious about prompt strategies working across models
Full show notes: https://opinionatedseo.com/2023/07/ai-prompting/
41 bölüm