Let AI curve your grades

Let AI curve your grades

Education has a grading problem that is equal parts mathematical, administrative, and human. Most American aw schools impose a mandatory curve to ensure consistency across sections and semesters. Other schools outside of law do this as well. Professors often face a narrow set of constraints: maintain a target mean GPA, keep distributions within strict percentage bands, and do so without assigning a lower grade to a higher-performing student. The reality is that these constraints are often challenging to satisfy given the real score distribution of a class. The result is a familiar and time-consuming process of hand-tuning cutoffs, verifying compliance, and repeating the cycle until the curve is “close enough.”

AI to the rescue once again! With the help of AI, I have developed a free app GradeCurve Pro to make that process fast, transparent, and defensible. It takes raw scores and institutional grading constraints and produces multiple compliant grading scenarios that respect the curve, and preserve (weak) monotonicity. Instead of guessing at cutoffs and hoping for the best, instructors can evaluate multiple valid options side by side, inspect compliance, and export grades immediately as an Excel or CSV file, complete with whatever information was previously in those files. Moreover, if you provide a Gemini API Key, AI will explain the comparative virtues of each of the distributions provided.

This post explains the “why” behind the app, how its optimization algorithm works, how to use it in practice, the formats of acceptable input, and the limitations you should know about.

Why this app exists

I probably don't have to sell faculty on the desirability of a grade curving app, particularly due to my strategic timing of this post right after many will have struggled with doing the job by hand. Mandatory curves are not optional for most law faculty. They can be required at the course level, at the year level, or across a program. And unlike informal grading guidelines, these curves often define exact percentage ranges for multiple grade tiers (A, A-, B+, B, etc.) along with a target mean GPA. A curve that seems reasonable by eye can still violate the rules, even if just slightly.


The hard part is that curves are discrete. If you have 90 students, each student is 1.11% of the class. That means a small shift in cutoffs can drastically change the distribution and push a tier outside the allowable range. This is a combinatorial search problem, not a simple proportion calculation. The app’s purpose is to search that space efficiently and return the best solutions. I'll discuss the intriguing details of how it's done in an appendix. But for now, let's see it in action.

How to use the app

Go to grade-curve.vercel.app. You should see something like this.

Either input your scores from an Excel file or a CSV file or a Text file. Your input file should have a first row with column headers; the remaining rows should be entries for each student. I would assume most faculty organize their grades in this fashion anyway. If you grade within some course management system, do your best to get it out of it in a format that computers like (i.e. not PDF). If you want, you can also submit the grades as a comma-separated list of scores. This may well be simpler in a small seminar. Regardless of the input method, make sure that the submission is anonymous. I have no idea what Vercel's privacy policy is; we don't want the allure of AI to land you in a FERPA investigation.

Wait a moment for the system to parse your file. You should then see something like this:

Click on the right side of the column that contains the actual raw scores. In this case, one would click on "Total." Then on the right side, check or uncheck the columns you want preserved in the output. You might want everything (the default) or you might just want a mapping from Student Anonymous ID to the grade, in which case that right column would look like this:

The next step is to input the grading constraints for your institution. There are two ways to create the constraints: direct input within the user interface or uploading of a file. I discuss direct input first.

You want to focus on this section of the user interface.

First, what grades does your institution permit? For reasons lost to history, my own institution does not award A+ grades. So, we would hover over the 4.333 bar, watch a garbage pail icon magically appear, click it and watch the interface conform with institutional constraints. Next, we deal with distributional constraints. To do so, we scroll down until we see something like this:

My own institution, for example, requires that mean grades lie between 3.2 and 3.4. So, I could input those in the blue shaded section instead of the current defaults of 3.28 to 3.32. Or, I could take matters into my own hands and constrain the grades to lie between 3.36 and 3.4 because I thought I had an above average class that deserved particular reward. So long as the personal constraint is a subinterval of the institutional constraint, there should not be a problem.

I then move into the tier editor. At my institution, in first year courses the distribution requirements are that As must comprise 5 to 30% of grades, the A-, B+, B clump must comprise 50 to 90% of grades and the B- or below clump must be 5 to 20%.


I remove all other tiers in the distribution.

The alternative way of imposing constraints on grades is to import a JSON file containing structured data. The GitHub repository associated with this project contains more details, but the best way to learn how to use the JSON file is to download the existing set of constraints using the "Download JSON" button. The file structure is pretty straightforward. Just use a text editor to manipulate the file and then upload the edited version back into the app using the Upload JSON file. There is also the option to edit the JSON within the app.

I then press the magic Generate Distributions button. And an instructive failure occurs. I had neglected to delete the existence of the A+. I do so and press the Generate Distributions button again. Here is the result. It shows the distribution under three possible scenarios: 1 (blue), 2 (purple) and 3 (red)

There is also an analysis of how each of the scenarios conform with the constraints. The figure below shows how Scenario 1 satisfies the constraints.

A close look (version 1.1 will use a bigger font) shows that 17.9% of the grades are As, 70.5% of the grades fall in the A-, B+, B clump and the remaining 11.6% of the grades fall into the B- and below clump.

I can now preview what the grade assignments will look like. They look sane.

I can then export the grades to an Excel spreadsheet (or a CSV file). Here's a snippet of what we now see. Each student has a grade assigned to them under each of the three scenarios. If the program has done its job well, it will seldom matter which scenario is used, although there will be a few cases where the luck of the chosen scenario determines a student's fate. Somewhere there will be a student who didn't make law review as a result of this app and who lived a happier life as a result.

You can, of course, do all the Excel things once you have the file such as sorting the grades or running whatever other analyses are of interest. AI, however, has put an end to the need for human curving. You can also export the grades to a CSV file and use that simpler format for analysis.

AI Analysis

You can also use AI at the back end of this process. If you have a Gemini API key, put it here. And press the Generate AI Report button. Doing so will cost you a few pennies (literally).

You get an analysis of the comparative virtues of each of the grading scenarios found by the system. The analysis ends with a "Consultant's Recommendation."

You can save the analysis for reference. (Particularly useful if someone complains!)

And that's it. You should definitely check the results. My algorithm might have subtle bugs or not meet your own preferences. But, in general, you can look forward to no more staring at spreadsheets deciding where to put the cuts only to find that your distribution does not satisfy the institution's grading constraints. Just a few button clicks and you get multiple mathematically optimized solutions to the grading puzzle. Thanks AI.

The math behind the scenes

You should feel some pride knowing that when you did all of this by hand you were actually solving a fairly sophisticated optimization problem. Here's how the algorithm actually works. There are some limits on the ability of this website to accept mathematical notation. If anyone wants the "real" PDF or LaTeX version of the report, send me an email.

Mathematical Analysis of the GradeCurve Distribution Fitting Algorithm

Abstract

We present a comprehensive mathematical analysis of a dynamic programming algorithm for optimal discrete grade distribution. The algorithm assigns letter grades to numerical scores while satisfying multiple constraints: mean GPA requirements, distribution band percentages, and monotonicity. We introduce novel penalty functions to prevent bimodal distributions and reward unimodality, addressing the critical problem of "reward hacking" in constrained optimization. The algorithm achieves O(K² · G · N) time complexity where K is the number of unique scores, G is the number of grade levels, and N is the buffer size for beam search pruning.

1. Introduction

The grade distribution problem seeks to map a set of continuous numerical scores S = {s₁, s₂, ..., sₙ} to discrete letter grades G = {g₁, g₂, ..., gₘ} while satisfying institutional constraints. This mapping must be:

  1. Monotonic: Higher scores must receive equal or better grades
  2. Constrained: The distribution must satisfy mean GPA bounds and percentage requirements for grade tiers
  3. Unimodal: The resulting distribution should approximate a natural bell curve, avoiding artificial bimodality

2. Problem Formulation

2.1 Input Space

Let S = {s₁, s₂, ..., sₙ} be the set of raw scores where sᵢ ∈ ℝ⁺. We define:

  • G = {g₁, g₂, ..., gₘ}: Set of grade labels ordered from highest to lowest
  • v: G → ℝ: Grade point value function (e.g., v(A) = 4.0)
  • T = {T₁, T₂, ..., Tₖ}: Distribution tiers (bands of grades)
  • ρᵢ = [ρᵢᵐⁱⁿ, ρᵢᵐᵃˣ]: Percentage range for tier Tᵢ
  • μ = [μᵐⁱⁿ, μᵐᵃˣ]: Target mean GPA range

2.2 Score Aggregation and Atomic Blocks

We aggregate scores into unique values with multiplicities:

U = {(u₁, c₁), (u₂, c₂), ..., (uₖ, cₖ)}

where uᵢ are unique scores sorted in descending order and cᵢ are their counts.

Important: Each block (uᵢ, cᵢ) is atomic—all students with score uᵢ must receive the same grade. This ensures fairness and handles ties naturally.

2.3 Optimization Objective

We seek a mapping f: S → G that minimizes:

min[α · P_tier(f) + β · P_mean(f) + γ · P_shape(f) + δ · P_gap(f)]

subject to the monotonicity constraint:

∀sᵢ, sⱼ ∈ S: sᵢ > sⱼ ⟹ v(f(sᵢ)) ≥ v(f(sⱼ))

3. Dynamic Programming Formulation

3.1 State Definition

Define state Sᵢ,ⱼ as the optimal assignment after:

  • Assigning grades g₁, g₂, ..., gᵢ (the first i grades)
  • To the first j unique score blocks

The state maintains:

  • sumGPAᵢ,ⱼ: Cumulative grade points
  • penaltyᵢ,ⱼ: Accumulated constraint violations
  • countsᵢ,ⱼ: Grade frequency vector
  • shapeᵢ,ⱼ: Distribution shape penalty

3.2 Recurrence Relation

When computing Sᵢ,ₖ (assigning grades up to gᵢ to blocks up to k), we consider assigning grade gᵢ to blocks [j+1, k]:

Sᵢ,ₖ = min{Sᵢ₋₁,ⱼ + Cost(gᵢ, j+1, k)} for all j < k

where:

Cost(gᵢ, j+1, k) = P_tier(gᵢ, j+1, k) + P_shape(countsᵢ,ₖ) + P_gap(gᵢ, j+1, k)

Note: The state Sᵢ₋₁,ⱼ represents the optimal assignment of grades g₁, ..., gᵢ₋₁ to blocks 1, ..., j, and we extend this by assigning grade gᵢ to blocks j+1, ..., k.

4. Penalty Functions

4.1 Mean GPA Penalty

The mean GPA penalty enforces the constraint μ ∈ [μᵐⁱⁿ, μᵐᵃˣ]:

           ⎧ (μᵐⁱⁿ - μ̄)²     if μ̄ < μᵐⁱⁿ
P_mean(μ̄) = w_μ · ⎨ 0              if μᵐⁱⁿ ≤ μ̄ ≤ μᵐᵃˣ
           ⎩ (μ̄ - μᵐᵃˣ)²     if μ̄ > μᵐᵃˣ

where μ̄ = (1/n)∑v(f(sᵢ)) is the actual mean GPA and w_μ is the penalty weight. The quadratic form ensures the optimizer fights harder as the deviation grows.

4.2 Distribution Band Penalty

For each tier Tᵢ with percentage range [ρᵢᵐⁱⁿ, ρᵢᵐᵃˣ]:

            ⎧ (ρᵢᵐⁱⁿ - pᵢ)    if pᵢ < ρᵢᵐⁱⁿ
P_tier(Tᵢ) = w_ρ · ⎨ 0             if ρᵢᵐⁱⁿ ≤ pᵢ ≤ ρᵢᵐᵃˣ
            ⎩ (pᵢ - ρᵢᵐᵃˣ)    if pᵢ > ρᵢᵐᵃˣ

where pᵢ = |Tᵢ|/n × 100 is the actual percentage in tier Tᵢ.

4.3 Shape Penalty for Unimodality

To prevent bimodal distributions, we employ four complementary measures:

4.3.1 Bimodality Coefficient

The Sarle's bimodality coefficient adapted for discrete distributions with grade point values:

BC = (γ₁² + 1)/(γ₂ + 3)

where the moments are calculated using grade point values rather than indices:

v̄ = (1/n) ∑ c_g · v(g)
σ² = (1/n) ∑ c_g · (v(g) - v̄)²
γ₁ = (1/nσ³) ∑ c_g · (v(g) - v̄)³  (skewness)
γ₂ = (1/nσ⁴) ∑ c_g · (v(g) - v̄)⁴ - 3  (excess kurtosis)

where c_g is the count for grade g. Using grade point values v(g) instead of indices ensures the distance between grades reflects their actual GPA impact.

The penalty is:

P_BC = w_BC · max(0, BC - 0.555)

4.3.2 Valley Detection

Peak: A grade gᵢ where c_gᵢ > c_gᵢ₋₁ and c_gᵢ > c_gᵢ₊₁ (or boundary conditions)

Valley: A grade gⱼ between two peaks where c_gⱼ < min(c_left_peak, c_right_peak)

Valley depth:

depth(gⱼ) = (c_left_peak + c_right_peak)/2 - c_gⱼ

The penalty is:

P_valley = w_valley · ∑ depth(v) · I[depth(v) > 0.3 · avg_peak_height]

where avg_peak_height = (1/|peaks|)∑c_peak.

4.3.3 Extremes Concentration

To penalize U-shaped distributions:

              ⎧ (p_extremes - 0.6)  if p_extremes > 0.6
P_extremes = w_ext · ⎨
              ⎩ 0                   otherwise

where p_extremes = (c_g₁ + c_g₂ + c_gₘ₋₁ + c_gₘ)/n is the proportion of students in the top two and bottom two grades.

4.3.4 Multimodality Penalty

Count significant peaks (local maxima with ≥ 5% of total count):

P_multi = w_multi · max(0, |{peaks: c_peak ≥ 0.05n}| - 1)

The total shape penalty is:

P_shape = P_BC + P_valley + P_extremes + P_multi

4.4 Gap Penalty

To discourage sparse distributions with empty grade categories:

P_gap = w_gap · ∑ I[c_gᵢ = 0 ∧ (c_gᵢ₋₁ > 0 ∨ ∃j > i : c_gⱼ > 0)]

This penalizes each grade with zero students that lies between grades with non-zero students.

5. Algorithm Implementation

To manage the exponential state space, we employ beam search with buffer size N:

Algorithm: Top-N Dynamic Programming with Beam Search

1. Initialize dp[0][0] with empty assignment
2. For each grade level i = 1 to m:
3.     For each block count k = 1 to K:
4.         candidates ← ∅
5.         For each j < k where dp[i-1][j] is non-empty:
6.             For each state s ∈ dp[i-1][j]:  // At most N states
7.                 new_state ← extend s by assigning gᵢ to blocks [j+1, k]
8.                 Calculate penalties for new_state
9.                 Add new_state to candidates
10.        dp[i][k] ← top N states from candidates by total penalty
11. Return best state from dp[m][K] satisfying mean constraint

5.2 Pruning Strategy

At each state (i, k), we maintain at most N candidates ranked by:

Score(S) = P_total(S) + λ · |mean(S) - μ_target|

where μ_target = (μᵐⁱⁿ + μᵐᵃˣ)/2 and P_total includes all penalty components.

6. Complexity Analysis

6.1 Time Complexity

The algorithm's time complexity depends on the beam search implementation:

With Full Beam Search:

O(K² · G · N)

This arises because:

  • For each grade gᵢ (total G grades)
  • For each ending position k (total K positions)
  • We examine at most N states from each prior position j < k
  • Total positions to check: O(K), but limited by beam size N
  • Cost computation per state: O(G) for shape penalties

Without Beam Pruning (Exhaustive):

O(K³ · G)

If we don't limit to N best states, we must consider all O(K) possible splits at each of K positions for G grades.

6.2 Space Complexity

The space complexity is:

O(K · G · N)

for storing at most N states at each of K positions for G grades.

6.3 Practical Performance

For typical inputs with beam size N = 60:

  • n = 100 students, K ≈ 30 unique scores: < 100ms
  • n = 1000 students, K ≈ 100 unique scores: < 500ms
  • n = 10000 students, K ≈ 300 unique scores: < 5s

7. Theoretical Properties

7.1 Monotonicity Guarantee

Theorem: The algorithm produces a monotonic grade assignment: for any two scores s₁ > s₂, the assigned grades satisfy v(f(s₁)) ≥ v(f(s₂)).

Proof: The algorithm processes unique score blocks in descending order. Each grade gᵢ is assigned to a contiguous set of blocks [j+1, k]. Since blocks are atomic (all students with the same score receive the same grade) and non-overlapping, and grades are assigned in order, monotonicity follows by construction. Specifically, if block b₁ has higher scores than block b₂ and b₁ < b₂ (in index), then b₁ receives a grade at least as high as b₂.

7.2 Tie Handling

Theorem: All students with identical scores receive identical grades.

Proof: By the atomic block construction in Section 2.2, each unique score uᵢ forms a single block (uᵢ, cᵢ) that cannot be split. The algorithm assigns grades at the block level, ensuring all cᵢ students with score uᵢ receive the same grade.

7.3 Approximation Quality

Theorem: Given beam size N, the algorithm finds a solution with total penalty at most (1 + ε) · OPT with high probability, where ε = O(1/√N) for smooth penalty functions.

The proof follows from standard beam search approximation bounds in continuous optimization landscapes.

8. Experimental Validation

8.1 Bimodality Prevention

Before implementing shape penalties, the algorithm exhibited reward hacking. With broad distribution bands:

  • Tier 1 (A/A-): 5-30%
  • Tier 2 (B+/B/B-): 50-90%
  • Tier 3 (C+/C/C-/F): 5-20%
  • Mean GPA: 3.36-3.40

The unconstrained algorithm produced bimodal distributions with clusters at A- (3.7) and F (0.0) grades, technically satisfying all constraints but creating an unnatural distribution. The shape penalties successfully enforce unimodal distributions while maintaining constraint satisfaction.

8.2 Robustness to Fractional Scores

The algorithm handles fractional scores through interval-based grade assignment. For each grade g, we maintain the interval [s_min(g), s_max(g)] derived from the atomic blocks. Any score s ∈ ℝ receives grade:

f(s) = argmax{v(g) : s ∈ [s_min(g), s_max(g)]} for g ∈ G

This ensures consistent grading for all real-valued scores, including those not in the original input set.

9. Conclusion

We have presented a comprehensive mathematical framework for optimal grade distribution that balances multiple competing objectives: satisfying institutional constraints, maintaining fairness through monotonicity and tie handling, and producing natural-looking distributions. The introduction of shape penalties effectively prevents reward hacking while maintaining computational efficiency. The algorithm's O(K² · G · N) complexity with beam search makes it practical for real-world applications, processing typical class sizes in under 100ms.

Key Contributions

  1. A principled approach to preventing bimodal distributions through multiple shape metrics
  2. Guaranteed monotonicity and fair tie handling through atomic block processing
  3. Efficient beam search implementation maintaining near-optimal solutions
  4. Robustness to fractional scores through interval-based assignment

Future Work

Future work could explore:

  • Adaptive penalty weights using machine learning
  • Multi-objective Pareto-optimal formulations
  • Extensions to handle grade inflation constraints across multiple course sections
  • Integration with pedagogical assessment theory

Appendix: Notation Summary

Symbol Description
S Set of raw scores
G Set of grade labels
v(g) Grade point value function
T Distribution tiers
ρᵢ Percentage range for tier i
μ Target mean GPA range
K Number of unique scores
N Beam search buffer size
BC Bimodality coefficient
γ₁ Skewness
γ₂ Excess kurtosis
P_* Penalty functions
w_* Penalty weights