If you treat language learning like an experiment, mapping inputs, tracking progress, testing hypotheses, you are already ahead of most learners. The trap is measuring the wrong variable. Logging input hours and cards seen feels scientific, but it does not tell you whether you can write a single character. The only honest test for writing is production from memory, so that is the data you should be collecting. Here is how to point the rigor at the right thing.
The metric most trackers get wrong
Input volume is seductive because it is easy to count: minutes immersed, episodes watched, cards reviewed. But none of it answers the real question, can you produce the character without looking, and it is easy to rack up impressive input numbers while your writing stays flat. That is the same blind spot behind why purely reading Mandarin leads to character wipeout: the input was logged, the production was never tested. A good experiment measures the outcome you care about, not the effort you spent.
Production is the only honest test
For writing, the outcome is production: reproducing the character from memory, on a blank grid, with nothing to copy. Make that the measured event. The testing effect shows that retrieving information is not just a measurement but itself strengthens memory, and producing rather than recognizing engages the generation effect. So a from-memory attempt is two things at once: the data point that tells you whether you know the character, and the practice that helps you learn it. Recognition gives you neither honestly, which is the heart of the active-output argument.
Let the results steer the schedule
Once production is the measured event, the schedule writes itself from the data. A character you produce cleanly can wait; one you fumble should come back soon. That is performance-driven spacing, and it is more efficient than reviewing everything on the same cadence, because it concentrates effort where recall is actually weak. Combining retrieval with spacing compounds the gain, and decades of distributed-practice research confirm that spread-out, results-driven review beats massed repetition. Your hit rate becomes the controller, the way serious learners try to merge comprehensible input with a real writing path.
Input tracking versus recall testing
| Tracking input | Testing recall |
|---|---|
| Counts hours and cards | Counts from-memory productions |
| Flatters progress | Exposes the writing gap |
| No correction signal | Stroke-order feedback |
| Fixed review cadence | Performance-driven spacing |
The rigor is welcome; just aim it at production, the variable that actually moves, the same instinct behind honoring comprehensible output directly.
A plan to track the right variable
- Define success as producing the character from memory.
- Make each from-memory attempt your data point.
- Ignore input hours as a measure of writing ability.
- Let your hit rate set each character’s next review.
- Concentrate practice where recall is weakest.
How Hanzi Write Practice fits
Hanzi Write Practice makes production the measured event by design. It hides the character, you write it from memory, and it checks stroke order and structure, then schedules the character by how you did. There is no credit for input you merely consumed; the data it collects is whether you can actually produce the character, which is the only metric that predicts writing. For a learner who thinks in hypotheses and measurements, that is the honest instrument, and it complements the offline, manual-writing focus immersion methods often leave open. The app is in early access.
Bottom line
Self-tracking is good, but track the right variable: not input hours, which flatter progress, but production from memory, which predicts writing. Make each from-memory attempt the data point and let performance drive review. Hanzi Write Practice measures production and schedules by it, and it is in early access, so join the list.
Frequently asked questions
Should I track input hours or recall when learning to write Hanzi?
Track recall. Input hours feel rigorous but do not predict whether you can write a character; the measure that matters is production, reproducing the character from memory. Treat each from-memory attempt as the data point and let your hit rate drive what you review. Hanzi Write Practice is built around that production-as-measurement model.
What does it mean to test a learning hypothesis with characters?
It means deciding what you expect to be able to do, then checking it honestly. The useful test is not recognizing a character but producing it from memory, because that is the skill you are after. If you can write it cold, the hypothesis held; if not, it returns for review. Production is the experiment, recognition is not.
Why is recognition a misleading metric?
Because recognition is far easier than production, so a high recognition score hides a large writing gap. You can recognize thousands of characters you cannot write. Measuring recognition flatters your progress while leaving the actual writing ability untested, which is why the honest metric is from-memory production.
How should performance drive my review schedule?
Characters you produce cleanly should wait longer; ones you fumble should return sooner. That performance-driven spacing concentrates effort where recall is weak, which is more efficient than reviewing everything equally. Hanzi Write Practice schedules each character by how your last from-memory attempt went.
Measure what matters? Join early access and make production your metric.