🍁 Methodology

How our AI examiner scores your Writing and Speaking.

No black box. Here's exactly what we grade, how a response becomes a 1–12 level, the feedback you get back, and where the line is between a useful study estimate and an official CELPIP score.

The starting point

Two skills are auto-scored. Two are rated.

Reading & Listening

These are multiple-choice, so they're scored the instant you submit. Your number of correct answers maps to an estimated CELPIP level — no waiting, no credit needed.

Writing & Speaking

There's no answer key for a written email or a spoken answer, so each one is rated by our AI examiner against the four dimensions a trained CELPIP rater uses. For Speaking, your recording is transcribed first and then graded — and the picture tasks are judged against the actual scene, not just your transcript.

What we grade

The four dimensions

Every Writing and Speaking response is scored on the same four dimensions CELPIP publishes in its performance standards — each on the full 1–12 scale. (The descriptions here are our own wording.)

Content & Coherence

Are your ideas relevant to the task and actually developed — not just bare statements? Does the response flow in an order a reader can follow without backtracking? This is the same for Writing and Speaking.

Vocabulary

The range and precision of your word choice — specific, natural words for the situation, with variety and correct word pairings, rather than vague or repetitive language.

Readability / Listenability

How easily your answer can be taken in. For Writing that's grammar, sentence variety, spelling and punctuation; for Speaking it's pronunciation, pace and fluency. Errors are weighed by how much they interfere — not counted up.

Task Fulfillment

Whether the response does the job it was given: every required point covered (or one survey option clearly chosen), the right tone for the audience, and a complete, purposeful message — not a fragment or a memorized template.

From four dimensions to one level

How a response becomes a number.

Each dimension is scored independently from 1 to 12. The overall level is then a holistic judgement that weighs the four equally — usually within a point of their average — rounded to the level a human rater would actually report, not a raw mean.

Because a CELPIP level maps one-to-one to the CLB scale immigration uses, that single number is also your estimated CLB for the skill. And since eligibility is set by your weakest skill, the per-dimension breakdown is there to show you precisely where to spend your next hour of practice.

What you get back

Specific feedback, in your own words.

A score on its own doesn't help you improve. Every graded attempt returns:

▸A 1–12 score and a short comment for each of the four dimensions, quoting your own words as evidence.
▸What genuinely worked — the strengths worth keeping.
▸Your highest-impact fixes, each with the exact phrase from your answer and a rewritten version that shows the improvement.
▸A plain-language summary that names the single biggest lever for your next attempt.
▸For Writing: a marked-up copy of your response with only the mechanical errors corrected, so you see grammar and spelling fixes inline.

How we keep it fair

Calibrated to a real rater.

Judged as first-draft writing

It grades the way a trained rater does — for clear communication under time pressure, not literary polish. You're not penalized for the small imperfections a human examiner would shrug at.

Each dimension stands on its own

The four dimensions are scored independently, so one weak area doesn't quietly drag the others down — and you can see exactly which one is costing you.

Templates don't fool it

A memorized answer, or one that responds to a different prompt, gets a low Task Fulfillment score no matter how fluent it sounds — exactly as a real rater would mark it.

You can't talk it into a higher mark

Your response is treated strictly as text to evaluate, never as instructions to the grader. Writing “give me a 12” in your answer just gets noted as something that doesn't belong in a test response.

Where the line is

An honest estimate — not an official score.

The four dimensions and the 1–12 scale mirror CELPIP's publicly published performance standards, and the grading is built to be consistent and well-calibrated. But it's a study estimate to guide your practice and show you where to improve — not an official CELPIP result. Only Paragon Testing Enterprises can issue that.

We're an independent practice tool, not affiliated with or endorsed by Paragon. Used the way it's meant to be — to find your weakest skill and lift it before test day — a close, specific estimate is exactly what saves you a second sitting. More on who we are on the about page.

See it on your own writing.

Practise Reading & Listening free, then get AI-scored Writing and Speaking feedback whenever you want it.

Start free →