CSES: All Letter Subgrid Count I

Hi all! My first post here.

I’m trying to solve this task: CSES - All Letter Subgrid Count I

We are given an N x N grid of letters. N <= 3000, # letters <= 26. In it, we must count the number of square submatrices that contain every letter.

The best idea I see is as follows:
For each cell, treat it as the bottom right cell of a square. Binary search for the largest square we can make under this condition. To validate the square, do 26 2D prefix sum queries.

This is O(n^2 log n * 26) which is far too slow.

I don’t believe there is a way to 2D range query “are all letters present in this submatrix”. Or even 1D. Since OR operation is non-invertible.

Does anyone have any hints or guidelines to this solution? I’m expecting one of 26 * n^2, or n^2 log n to work. Can’t really find anything online about this question.

1 Like

The constraints for this problem are quite tight – it looks like the fastest solutions all use O(N^2K) time and O(NK) memory …

basically dp[i][j][k] should be the side length of the smallest square with lower-left corner at (i,j) that contains character k. but we don’t need to store all these values simultaneously so we can remove a factor of N from the memory.

Here’s a sample implementation:

Code
int main() {
	// read read read
	setIO();
	def(int, N, K);
	const int big = 1e9;
	int tmp[2][N + 1][K];
	F0R(a, 2) F0R(b, N + 1) F0R(c, K) tmp[a][b][c] = big;
	ll ans = 0;
	R0F(i, N) {
		def(str, s);
		R0F(j, N) {
			int mx = 0;
			F0R(k, K) {
				tmp[i & 1][j][k] =
				    min(s[j] == ('A' + k) ? 0 : big,
				        min({tmp[(i + 1) & 1][j][k], tmp[i & 1][j + 1][k],
				             tmp[(i + 1) & 1][j + 1][k]}) +
				            1);
				ckmax(mx, tmp[i & 1][j][k]);
			}
			ans += max(N - max(i, j) - mx, 0);
		}
	}
	ps(ans);
	dbg(time_elapsed());
	// you should actually read the stuff at the bottom
}

You can use sparse table for this: Sparse Table - Algorithms for Competitive Programming

O(N^2\log N) preprocessing and O(1) time queries.

Hm, I thought the build time for a 2D sparse table is NMlogN*logM which made it untenable?

Urgh, I actually thought of this solution but I did my loop order different. I realized I could switch the loop order but was missing a small observation about the memory requirements. Will study this thank you.

You only need to store the bitwise or for squares with side length having powers of two, so only one log factor.

Ben, I finally got it working thanks to your advice. I had made a mistake in my thinking earlier and failed to realize this loop order is actually more memory efficient O(nk) instead of O(nm).

However this problem has crazy tight time constraints, I had to really optimize it, took hours, had to remove some branches from the hot paths.

I’ve left a working implementation O(nmk) time O(nk) space here, along with my TLE version O(nmk) time O(n^2) space, should anyone encounter this thread in the future: leetcode/problems/CSES/All_Letter_Subgrid_Count_I.cpp at main · ishaanbuildsthings/leetcode · GitHub

My friend also proposed an alternate O(nmk) idea which I have not tested.

Enumerate on each major diagonal, for each major diagonal, we will go from top-left to bottom-right. We maintain a 2d square sliding window, for the largest window that does not contain every letter. As we slide the bottom right corner down, we may need to drop off the top left corner, if we have every letter. The transitions between slides are O(26). However the memory for this is O(26nm) for all these prefix counters.

Alternatively, we could do the sliding window but use a 2d sparse table, that makes the transitions O(1) supposedly (I am yet to understand how to get a fast 2D build for a sparse table) but I’m guessing still TLEs.

To be clear are you talking about a 2D sparse table or N 1D sparse tables? I did misspeak we can definitely query [l, r] using a bitwise OR in 1D using sparse tables. But I don’t see how to do it in 2D or if that applies to this.

2D \quad

I understand now. Since it’s square queries we drop a log. I implemented a 2D sparse table with n^2 log n build time here for posterity: https://github.com/ishaanbuildsthings/leetcode/blob/main/templates/SquareSparse.cpp

Also tried this solution for this problem, it’s n^2 log n time and space which TLEs. But the cool thing is the factor of K is less important, since we do K/W (for bitmasks) operations. So this could be more feasible for like K=500 and N=1000.