CSES - Border Subgrid Count I - How can I get rid of log n?

Joacru · August 25, 2025, 6:43pm

I’m trying to solve this problem Border Subgrid Count I.
My current solution runs in O(n^2\log{n}), but it gets TLE, and I’m not sure how to improve it.

Here’s my idea:

Let’s define four functions:

u(i,j): the number consecutive cells equals to grid_{i,j} going upwards from cell (i,j).
d(i,j): the number consecutive cells equals to grid_{i,j} going downwards from cell (i,j).
l(i,j): the number consecutive cells equals to grid_{i,j} going left from cell (i,j).
r(i,j): the number consecutive cells equals to grid_{i,j} going right from cell (i,j).

The main idea is to take advantage of the fact that different squares share vertices by traversing the grid along the diagonals.

When I’m at a cell (i,j) I activate it for the next min(d(i,j),r(i,j)) cells. Then, I can treat this cell as the lower-right corner of a square and match it with any of the previous min(u(i,j),l(i,j)) active cells.

For handling the activations efficently, I’m using 26 Fenwick Trees. To avoid overhead, I don’t create 26 trees per diagonal; instead, I initialize them once at the start, keeping the overall complexity at: O(n^2\log{n}+26n).

I don’t think posting my code is necessary here. The issue is more about the algorithmic idea than the implementation details, I think.

Thanks in advance for the help .

Benq · August 26, 2025, 2:24am

Actually, I’m pretty sure that’s the intended complexity, but the time limit for this problem is really tight …

Here’s my implementation in O(n^2\log n) (0.86s): CSES - Shared code

Not all of the optimizations I made were necessary. I think the most important optimization was copying these values to a temporary array before using them.

FOR(x, L, R + 1) {
	int y = x + diff;
	tmp_c[x] = grid[x][y] - 'A';
	tmp_u[x] = min(u[x][y], r[x][y]);
	tmp_d[x] = min(d[x][y], l[x][y]);
}

Honestly, I’m not 100% sure why this helped, but here was what I was thinking. Basically, I was trying to make sure that all the arrays I was working with in the part of the code after this loop were mostly contiguous and fit into L2 cache (~256KB). This reduces the number of cache misses, which can significantly slow down the program.

Joacru · August 26, 2025, 5:00am

Thanks for reply!

I tried my best, but I’m still getting TLE.
However, with some optimizations, I managed to pass testcases 1, 2, and 4.

The optimization recommended by you.

for(int x=-n+1;x<n;x++){
	int limit=n-abs(x);
	for(int i=x<0?-x:0, j=x<0?0:x, y=0; y<limit; i++,j++,y++){
		h[y]=grid[i][j]-'A';
		f[y]=min(d[i][j],r[i][j]);
		g[y]=min(u[i][j],l[i][j]);
	}
	...

Maintain the number of active vertices, so I avoid a range query in the Fenwick Tree and only have to subtract what my current cell does not reach.

	for(int y=0; y<limit; y++){
		short c=h[y];
		add(c,y,+1);
		on[c]++;
		es[y+f[y]-1].push_back(c<<15|y);	
		ans[c]+=on[c]-get(c,y-g[y]);
		...

Save deactivations with a single integer. I’m not sure if this makes a difference.

		for(auto &e: es[y]){
			add(e>>15,e&((1<<15)-1),-1);
			on[e>>15]--;
		}
		es[y].clear();
	}
}

I don’t really know how to continue, but I feel like I’m pretty close. My last submission.

I saw that you don’t use 26 Fenwick Trees, but I don’t quite understand that approach.