WA on 1 testcase- Lazy Cow (Silver prefix sums)

Here is the link to the problem: http://www.usaco.org/index.php?page=viewproblem2&cpid=416.

I glanced at the first part of the editorial, and my solution uses the same approach. I rotate the grid 45 degrees clockwise and make the diagonals of the original grid the rows of my new grid (with 0s in between the numbers). Then, I take prefix sums and try all N^2 starting locations and their corresponding (2K+1)\times (2K+1) subarrays.

#include <bits/stdc++.h>

using namespace std;

const int MX = 400;

int n, k;
vector<vector<int>> grid;

int main()
{
    freopen("lazy.in","r",stdin);
    freopen("lazy.out","w",stdout);

    cin >> n >> k;
    grid.resize(2*n, vector<int>(2*n));

    for(int i=0; i<n; ++i){
        for(int j=0; j<n; ++j){
            int diag = i+j+1;
            // convert to a diagonal grid, 0s in between elements
            cin >> grid[diag][n-(diag-1)+2*j];
        }
    }

    for(int i=1; i<2*n; ++i){
        for(int j=1; j<2*n; ++j){
            // take prefix sums
            grid[i][j] += grid[i][j-1] + grid[i-1][j] - grid[i-1][j-1];
        }
    }

    auto f = [&](int x){ // for bounding the indices
        return max(min(x,2*n-1),1);
    };

    int mxSum = 0;
    for(int i=0; i<2*n; ++i){
        for(int j=0; j<2*n; ++j){
            int bi = f(i+k), bj = f(j+k), // big i, big j
                si = f(i-k-1), sj = f(j-k-1); // small i, small j
            int lr = grid[bi][bj],
                ur = grid[si][bj],
                ll = grid[bi][sj],
                ul = grid[si][sj];
            mxSum = max(mxSum, lr - ll - ur + ul);
        }
    }

    cout << mxSum << '\n';
}

This gives a WA on test case 9 even with the correct logic. However, after debugging a little bit, I found that this brute-force loop passes 10/10:

    for(int i=0; i<2*n; ++i){
        for(int j=0; j<2*n; ++j){
            int bi = f(i+2*k+1), bj = f(j+2*k+1), si = i, sj = j;
            int lr = grid[bi][bj],
                ur = grid[si][bj],
                ll = grid[bi][sj],
                ul = grid[si][sj];
            mxSum = max(mxSum, lr - ll - ur + ul);
        }
    }

In my first code, I specify a starting location and sum the surrounding square of side length 2K+1. In the second code, I specify the top-left corner of the rectangle ((i+1,j+1)) and explicitly try all (2K+1)\times (2K+1) squares present in the grid.

By the word problem statement, it seems that the logic of my first code should work, but it doesn’t. Can you please explain why?

Hello, can you please respond?