What Is This? — DoYouPrompt

Imagine you just got hired. It’s your first day. Your boss walks in and says: “Here’s a pile of messy data, a confused stakeholder email, and an AI assistant. Make sense of it by 5pm.” That’s what we simulate. Except we also watch how you do it.

Basically: it’s the real work, minus the commute.

The Walkthrough

A Day in the Lab

Here’s exactly what happens from the moment you click “Start” to the moment you get your results.

1

Pick Your Track

First, you choose what kind of AI professional you are (or want to be). Each track has different tasks tailored to that role. A Prompt Engineer gets different challenges than a Research Analyst.

Prompt Engineer

AI Product Manager

AI Ops Specialist

Content & Marketing

Research Analyst

Support AI Designer

Developer Using LLMs

Each track has 3–5 tasks at calibrated difficulty levels. Pick the one closest to your role.

2

The Workspace Opens

You enter a professional 3-pane workspace. Think of it as your AI command center. Everything you need is right there — no alt-tabbing, no distractions.

DoYouPrompt Workspace — Task 2 of 5

LEFT

Task Brief

“Your mission, should you choose to accept it...”

Source Materials

vendor_proposals.csv

requirements_doc.md

stakeholder_email.txt

CENTER

AI Assistant

Compare the three vendor proposals on cost, timeline, and SOC2 compliance. Flag any contradictions.

Based on my analysis of the three proposals:

Vendor A: Lowest cost ($42k), but 6-week timeline exceeds your deadline. SOC2 Type II certified.
Vendor B: Mid-range ($58k), 3-week delivery. Claims SOC2 but no certificate provided…

Good. Now weight compliance more heavily — we’re fintech, SOC2 is non-negotiable.

Write a prompt…

RIGHT

Scratchpad

Key decision factors:
- SOC2 compliance (must-have)
- Timeline < 4 weeks
- Integration w/ existing stack

Note: Vendor B’s claim needs
independent verification…

TODO: Ask AI about add-on costs

12:34 remaining

1,847 / 4,000 tokens used

Confidence:

Low Med High

3

You Get a Mission

Each task is a realistic business scenario. Not trivia. Not “what temperature should I use?” Real work that real people face every day.

Debugging

A broken prompt is causing your company’s product descriptions to hallucinate features. Fix it before the clients revolt.

Research

Three vendors sent proposals. They all contradict each other. Your VP needs a recommendation by Friday.

Trust & Safety

The AI chatbot just told a customer something wildly inaccurate. Figure out what went wrong and design guardrails.

4

You Prompt, Iterate, Refine

Here’s where the magic happens. You write a prompt. The AI responds. You read the response. Think about it. Then you decide: is this good enough? Or can I do better?

Live Conversation — Vendor Analysis Task

You

Summarize the three vendor proposals, comparing them on cost, timeline, and compliance status.

Iteration 1

AI

AI Assistant

Here’s a comparison of the three proposals: Vendor A offers the lowest price at $42k with a 6-week timeline. Vendor B comes in at $58k with 3-week delivery. Vendor C is $61k but includes ongoing support. All three claim compliance readiness…

You notice something off

You

Wait — Vendor B’s pricing doesn’t include implementation add-ons. Factor those in and recalculate the total cost of ownership for each vendor over 12 months.

Iteration 2

AI

AI Assistant

Good catch. With add-ons factored in, the 12-month TCO changes significantly: Vendor A: $42k + $8k support = $50k. Vendor B: $58k + $22k add-ons = $80k. Vendor C: $61k all-inclusive = $61k. This reshuffles the ranking entirely…

Getting closer

You

Now add a risk section. Highlight the contradictions between Vendor B’s compliance claims and the actual documentation. Format the final output as an executive memo.

Final Iteration

AI

AI Assistant

EXECUTIVE MEMO: Vendor Evaluation
Recommendation: Vendor C (with conditions).
Risk Flag: Vendor B claims SOC2 Type II but provided no certificate. Their add-on costs inflate TCO by 38%…

See that? Three iterations. Each one better than the last. That’s what we measure.

5

We Watch Everything (In a Good Way)

While you work, the platform quietly records your process. Not to spy — to understand your thinking. Here’s what we’re looking at:

How long you think before your first prompt

Do you read the brief first, or fire immediately?

Planning

Whether you actually read the source materials

The data is right there. Did you use it?

Diligence

How you react when the AI gives you garbage

Do you accept it, or push back and iterate?

Resilience

Whether you verify facts or just trust the AI blindly

The AI makes things up sometimes. Do you catch it?

Trustworthiness

How many attempts it takes you to converge

Fewer, more deliberate iterations > scattered guessing.

Efficiency

Whether your confidence matches your accuracy

Knowing what you don’t know is a superpower.

Calibration

6

Submit & Get Scored

When you’re done, you submit your answer with a confidence rating: High, Medium, or Low. Fun fact: saying “High” when you’re wrong hurts more than saying “Low” when you’re right. We reward honest self-assessment.

35%

Performance

Is your final output accurate, complete, and useful?

30%

Process

Did you plan, iterate, and recover from failures?

20%

Trustworthiness

Did you verify claims and flag uncertainty?

15%

Consistency

Can you perform reliably across different tasks?

The Truth

Spoiler: It’s Not Who Writes
the Fanciest Prompt

We see the same three archetypes over and over again. Guess which one gets hired.

The Lucky Guesser

Fires one prompt, gets lucky
Submits immediately without reading the response carefully
No verification, no source checking
Says “High confidence” every time

Score: Mediocre

The Template Robot

Pastes the same elaborate template every time
Doesn’t adapt when the task changes
Ignores source materials, relies on template
Output looks fancy but misses the point

Score: Below Average

The Thoughtful Operator

Reads the brief carefully, plans approach
Iterates deliberately, each prompt building on the last
Verifies claims against source materials
Adapts strategy when things go wrong

Score: Excellent

We’re looking for Column 3. Are you Column 3?

Think You’re Good? Prove It.

Every account gets one free assessment. No credit card. No commitment. Just you, an AI, and some real tasks.

Take the Challenge

Average completion time: 45–90 minutes. Best done with coffee.

22 Unique Tasks

7 Tracks

4 Scoring Dimensions

0 Multiple Choice Questions

100% Real Work Scenarios

For Everyone

But What If I’m Not a “Prompt Engineer”?

You don’t need to be. If you use ChatGPT, Claude, or any AI tool at work, you have prompting skills. The question is: how good are they?

Product managers, marketers, researchers, customer support leads, developers — everyone who works with AI can benefit from knowing where they stand.

Product Managers Marketers Researchers Support Leads Developers Operations Teams

FAQ

Frequently Asked Questions

Is this a test?

Yes, but it’s designed to feel like real work, not a school exam. You get realistic business scenarios, a proper workspace, and an AI assistant. Think of it as a simulated first day on the job, not a pop quiz.

How long does it take?

45–90 minutes depending on the track and how thorough you are. Each task has its own timer, so you can manage your pace. Most people find it goes faster than expected because the work is engaging.

Can I use my own AI tools?

No — you use the built-in AI assistant. This ensures fairness: everyone gets the same model (Claude by Anthropic), the same token budget, and the same starting conditions. It’s about your skill, not your tool subscription.

What if I fail?

There’s no pass/fail. You get a skill profile showing your strengths and areas to improve across four dimensions: Performance, Process, Trustworthiness, and Consistency. Even a low score gives you actionable insight into how to get better.

Is my data shared?

Only if you’re invited by a recruiter. If you register on your own, your data stays private. Recruiter-invited candidates’ results are shared with the recruiter who invited them — but you’ll always know exactly who can see your results.

Can I retake it?

Your first run is free. Additional runs require a subscription or an invitation from a recruiter who has granted you extra runs. This keeps the assessment meaningful — it’s measuring real skill, not test-prep.

What AI model do you use?

Claude by Anthropic. The same model for everyone, ensuring a level playing field. We chose Claude for its strong reasoning capabilities, consistent behavior, and safety characteristics. No one gets an unfair advantage from using a different or more powerful model.

So... What ExactlyDo You Do Here?

A Day in the Lab

Pick Your Track

The Workspace Opens

You Get a Mission

You Prompt, Iterate, Refine

We Watch Everything (In a Good Way)

How long you think before your first prompt

Whether you actually read the source materials

How you react when the AI gives you garbage

Whether you verify facts or just trust the AI blindly

How many attempts it takes you to converge

Whether your confidence matches your accuracy

Submit & Get Scored

Performance

Process

Trustworthiness

Consistency

Spoiler: It’s Not Who Writesthe Fanciest Prompt

The Lucky Guesser

The Template Robot

The Thoughtful Operator

Think You’re Good? Prove It.

But What If I’m Not a “Prompt Engineer”?

Frequently Asked Questions

So... What Exactly
Do You Do Here?

Spoiler: It’s Not Who Writes
the Fanciest Prompt