I’m trying to solve this problem Border Subgrid Count I.
My current solution runs in O(n^2\log{n}), but it gets TLE, and I’m not sure how to improve it.
Here’s my idea:
Let’s define four functions:
u(i,j): the number consecutive cells equals to grid_{i,j} going upwards from cell (i,j).
d(i,j): the number consecutive cells equals to grid_{i,j} going downwards from cell (i,j).
l(i,j): the number consecutive cells equals to grid_{i,j} going left from cell (i,j).
r(i,j): the number consecutive cells equals to grid_{i,j} going right from cell (i,j).
The main idea is to take advantage of the fact that different squares share vertices by traversing the grid along the diagonals.
When I’m at a cell (i,j) I activate it for the next min(d(i,j),r(i,j)) cells. Then, I can treat this cell as the lower-right corner of a square and match it with any of the previous min(u(i,j),l(i,j))active cells.
For handling the activations efficently, I’m using 26 Fenwick Trees. To avoid overhead, I don’t create 26 trees per diagonal; instead, I initialize them once at the start, keeping the overall complexity at: O(n^2\log{n}+26n).
I don’t think posting my code is necessary here. The issue is more about the algorithmic idea than the implementation details, I think.
Not all of the optimizations I made were necessary. I think the most important optimization was copying these values to a temporary array before using them.
FOR(x, L, R + 1) {
int y = x + diff;
tmp_c[x] = grid[x][y] - 'A';
tmp_u[x] = min(u[x][y], r[x][y]);
tmp_d[x] = min(d[x][y], l[x][y]);
}
Honestly, I’m not 100% sure why this helped, but here was what I was thinking. Basically, I was trying to make sure that all the arrays I was working with in the part of the code after this loop were mostly contiguous and fit into L2 cache (~256KB). This reduces the number of cache misses, which can significantly slow down the program.