Floodfill 2016 January Silver Problem 3

aaravdodhia · March 17, 2021, 3:05am

Link to the problem: http://www.usaco.org/index.php?page=viewproblem2&cpid=596.

I understand the idea of doubling FJ’s path segments so that each enclosed region contains at least one point inside it. Then we can use floodfill to find the number of connected regions in the graph, being careful to exclude the points that FJ has visited in the floodfill.

Please note that the coordinates in the problem vary between [-2N, 2N]. I want my array coordinates in the range [-2N-1, 2N+1], so we can traverse through the outer boundaries of the (potentially infinite) grid. Incrementing each coordinate by 2N+1 transposes the range of array indices to [0, 4N+3).

#include <bits/stdc++.h>

using namespace std;

const int MX = 4003;

int N;
string s;
bool vis[MX][MX], FJ[MX][MX];

int dx[4] = {0, 0, -1, 1};
int dy[4] = {-1, 1, 0, 0};
void ff(int x, int y){
    vis[x][y] = true;
    for(int i=0; i<4; i++){
        int nx = x + dx[i];
        int ny = y + dy[i];
        if(nx < 0 || nx >= 4*N+3 || ny < 0 || ny >= 4*N+3){
            continue;
        }
        if(FJ[x][y] || vis[nx][ny]){
            continue;
        }
        ff(nx, ny);
    }
}

int main(){
    freopen("gates.in","r",stdin);
    freopen("gates.out","w",stdout);
    cin >> N >> s;
    // double the lengths of segments so
    // there's at least 1 point inside each enclosed region
    int cx = 2*N+1, cy = 2*N+1; // FJ starts at the origin
    FJ[cx][cy] = true;
    for(char c : s) for(int i=0; i<2; i++){
        switch(c){
        case 'N':
            cy++;
            break;
        case 'E':
            cx++;
            break;
        case 'W':
            cx--;
            break;
        case 'S':
            cy--;
            break;
        }
        FJ[cx][cy] = true;
    }
    int R = 0;
    // find the number of connected regions
    for(int i=0; i<MX; i++){
        for(int j=0; j<MX; j++){
            if(!FJ[i][j] && !vis[i][j]){
                R++;
                ff(i, j);
            }
        }
    }
    cout << R-1 << endl;
}

I checked that this code is very similar to the internal solution provided on the USACO Guide. However, my code gives a memory limit or runtime error when I run it.

The submission succeeds with 7/10 test cases if I instead make the following bound replacements.

Change the out-of-bounds base case in the floodfill function to

if(nx < 0 || nx >= 2*N || ny < 0 || ny >= 2*N){ // supposed to be nx >= MX || ny >= MX
    continue;
}

Change the origin point of cx and cy to

int cx = N, cy = N; // origin is supposedly (2N+1, 2N+1)

Modify the iterative loops when finding the connected regions to

for(int i=0; i<2*N; i++){ // supposed to be i<MX
    for(int j=0; j<2*N; j++){ // supposed to be j<MX
        if(!FJ[i][j] && !vis[i][j]){
            R++;
            ff(i, j);
        }
    }
}

Can you please explain why I receive a “runtime error or memory limit exceed” exception in my original code? The memory limit is greater than 4003^2, and I don’t think I access an out-of-bounds array index anywhere.

infimum · March 17, 2021, 8:47am

The overall problem is because the stack memory of a program is not designed to go 16million deep: https://en.wikipedia.org/wiki/Stack-based_memory_allocation

To fix this, just replace the recursive DFS with a stack-based DFS: https://www.geeksforgeeks.org/iterative-depth-first-traversal/

Benq · March 17, 2021, 12:45pm

Also, this doesn’t give the right answer on the sample, as your bounds are off:

Mentioned at https://github.com/cpinitiative/usaco-guide/pull/388/files but only for Python …

aaravdodhia · March 17, 2021, 2:38pm

Thanks for the responses. I tried implementing the stack version of DFS, but it exceeded the time limit on the sample test case. The program took nearly 5 seconds, so something seems severely wrong.

#include <bits/stdc++.h>

using namespace std;

const int MX = 4003;

int N;
string s;
bool vis[MX][MX], FJ[MX][MX];

void ff_stack(int a, int b){
    stack<pair<int, int>> nodes;
    nodes.push({a, b});
    while(!nodes.empty()){
        int x = nodes.top().first, y = nodes.top().second;
        nodes.pop();
        // stack may contain the same node twice
        if(vis[x][y]) continue;
        vis[x][y] = true;
        for(int i=0; i<4; i++){
            int nx = x + dx[i];
            int ny = y + dy[i];
            if(nx < 0 || nx >= MX || ny < 0 || ny >= MX){
                continue;
            }
            if(FJ[nx][ny] || vis[nx][ny]){
                continue;
            }
            nodes.push({nx, ny});
        }
    }
}

int main(){
    freopen("gates.in","r",stdin);
    freopen("gates.out","w",stdout);
    cin >> N >> s;
    // double the lengths of segments so
    // there's at least 1 point inside each enclosed region
    int cx = 2001, cy = 2001;
    FJ[cx][cy] = true;
    for(char c : s) for(int i=0; i<2; i++){
        switch(c){
        case 'N':
            cy++;
            break;
        case 'E':
            cx++;
            break;
        case 'W':
            cx--;
            break;
        case 'S':
            cy--;
            break;
        }
        FJ[cx][cy] = true;
    }
    int R = 0;
    // find the number of connected regions
    for(int i=0; i<MX; i++){
        for(int j=0; j<MX; j++){
            if(!FJ[i][j] && !vis[i][j]){
                R++;
                ff_stack(i, j);
            }
        }
    }
    cout << R-1 << endl;
}

I am not sure if it’s because the stack size and runtime are 16 million. This shouldn’t matter, right, since the stack size doesn’t affect the program’s internal memory?

I understand the stack-based memory allocation is an internal process that can only handle a certain amount of function calls. Thus deep recursions are problematic. Is “stack overflow” the same as the this memory allocation error? Does stack overflow still occur when the user implements a manual stack? Does stack overflow interfere with deep recursion calls?

I appreciate you providing me a Wiki article for the above, but I am trying to grasp the bare minimum of this idea so I know when it comes up in my code. I would appreciate it if you can explain the gist of stack overflow in a few simple sentences.

Benq · March 17, 2021, 6:19pm

Good questions.

It doesn’t compile … If I add dx and dy then it passes in ~1200ms. 16\cdot 10^6 is pretty big, so it’s not surprising that it takes a while + a lot of memory.

Probably useful: https://www.learncpp.com/cpp-tutorial/the-stack-and-the-heap/

Take everything below with a grain of salt, since I’m not very familiar with stack vs heap.

True. But USACO doesn’t have a limit on stack size (aside from the usual 256 MB).

#include <iostream>
 
int main()
{
    int stack[10000000];
    std::cout << "hi";
    return 0;
}

The link above says that the above code will produce an error since the stack size is usually limited to something small by default (say, 8MB on Mac). But if you submit this code to USACO it should produce the expected result.

I don’t think this qualifies as a stack overflow error. The issue is that you’re using too much memory, not that the stack specifically is using too much memory.

No. The elements of stack<pair<int, int>> nodes; will be stored on the heap, not the stack (source), so this can’t qualify as stack overflow. A manual stack will (probably) use less memory than the corresponding recursive version, but of course, you’ll still get memory limit exceeded if nodes grows too large.

infimum · March 17, 2021, 6:23pm

You should also be setting [nx][ny] to visited while pushing it into the stack, not after popping.

Benq · March 17, 2021, 6:25pm

Sure, that will result in fewer entries being added to nodes (but the current version works as well).

aaravdodhia · March 17, 2021, 6:33pm

Yes, that’s true. However, it lends itself to more of a BFS approach. Consider the graph

Traversal according to your method gives 0 \rightarrow 1 \rightarrow 3 and 0 \rightarrow 2.
Regular DFS should give 0 \rightarrow 1 \rightarrow 3 \rightarrow 2.

aaravdodhia · March 17, 2021, 6:52pm

Sorry, I don’t follow what dx and dy represent. What are those variables used for? Also, do all test cases on USACO pass in ~1200ms, or only the sample? I think 16\cdot 10^6 is both within the USACO runtime and memory limit of 256MB.

Your code seems to initialize an array of size 10^7 called stack instead of a real stack. However, this code fails on my Windows machine.

#include <bits/stdc++.h>
using namespace std;

int main()
{
    stack<int> st;
    for(int i=0; i<1e8; i++){
        st.push(i);
    }
    cout << "hi" << endl;
}

As I understand it, nodes is a manual stack I created (although its values are stored in the heap). The stack that @infimum referred to in post #2 was an internal ‘call stack’, where every recursive call was pushed on top of this call stack. This internally allocated stack grew too large in size. Am I missing something here?

When you say I use too much memory, do you mean that the program explicitly allocates too much memory (16\cdot 10^6 bytes) on the heap? Doesn’t this lead to overflow since previous memory gets deleted and re-used? OR do you mean that the internal call stack grows too large (note that my modified code doesn’t use recursion).

Benq · March 17, 2021, 6:55pm

They’re in your original code …

All of them.

Yes.

seems correct

This. Your modified code works …

Works for me.

aaravdodhia · March 17, 2021, 6:58pm

Thanks for letting me know my modified code works. I don’t quite understand what you mean by adding dx and dy. Don’t I already use those arrays here?

        for(int i=0; i<4; i++){
            int nx = x + dx[i];
            int ny = y + dy[i];
            if(nx < 0 || nx >= MX || ny < 0 || ny >= MX){
                continue;
            }
            if(FJ[nx][ny] || vis[nx][ny]){
                continue;
            }
            nodes.push({nx, ny});
        }

Benq · March 17, 2021, 6:59pm

As I wrote above, your code here doesn’t compile:

aaravdodhia · March 17, 2021, 7:06pm

That’s strange. When I submit the following solution to USACO, I get TLE on the sample test case. It takes about 5 seconds on my local machine.

#include <bits/stdc++.h>

using namespace std;

const int MX = 4003;

int N;
string s;
bool vis[MX][MX], FJ[MX][MX];

int dx[4] = {0, 0, -1, 1};
int dy[4] = {-1, 1, 0, 0};
void ff_stack(int a, int b){
    stack<pair<int, int>> nodes;
    nodes.push({a, b});
    while(!nodes.empty()){
        int x = nodes.top().first, y = nodes.top().second;
        nodes.pop();
        // stack may contain the same node twice
        if(vis[x][y]) continue;
        vis[x][y] = true;
        for(int i=0; i<4; i++){
            int nx = x + dx[i];
            int ny = y + dy[i];
            if(nx < 0 || nx >= MX || ny < 0 || ny >= MX){
                continue;
            }
            if(FJ[nx][ny] || vis[nx][ny]){
                continue;
            }
            nodes.push({nx, ny});
        }
    }
}

int main(){
    freopen("gates.in","r",stdin);
    freopen("gates.out","w",stdout);
    cin >> N >> s;
    // double the lengths of segments so
    // there's at least 1 point inside each enclosed region
    int cx = 2001, cy = 2001;
    FJ[cx][cy] = true;
    for(char c : s) for(int i=0; i<2; i++){
        switch(c){
        case 'N':
            cy++;
            break;
        case 'E':
            cx++;
            break;
        case 'W':
            cx--;
            break;
        case 'S':
            cy--;
            break;
        }
        FJ[cx][cy] = true;
    }
    int R = 0;
    // find the number of connected regions
    for(int i=0; i<MX; i++){
        for(int j=0; j<MX; j++){
            if(!FJ[i][j] && !vis[i][j]){
                R++;
                ff_stack(i, j);
            }
        }
    }
    cout << R-1 << endl;
}

Benq · March 17, 2021, 7:10pm

Looks like I used

int dx[4]{1,0,-1,0};
int dy[4]{0,1,0,-1};

instead o_O

This also works.

#include <bits/stdc++.h>

using namespace std;

const int MX = 4003;

int N;
string s;
bool vis[MX][MX], FJ[MX][MX];

int dx[4] = {0, 0, -1, 1};
int dy[4] = {-1, 1, 0, 0};
void ff_stack(int a, int b){
    stack<pair<int, int>> nodes;
    nodes.push({a, b}); vis[a][b] = 1;
    while(!nodes.empty()){
        int x = nodes.top().first, y = nodes.top().second;
        nodes.pop();
        // stack may contain the same node twice
        // vis[x][y] = true;
        for(int i=0; i<4; i++){
            int nx = x + dx[i];
            int ny = y + dy[i];
            if(nx < 0 || nx >= MX || ny < 0 || ny >= MX){
                continue;
            }
            if(FJ[nx][ny] || vis[nx][ny]){
                continue;
            }
            vis[nx][ny] = 1;
            nodes.push({nx, ny});
        }
    }
}

int main(){
    freopen("gates.in","r",stdin);
    freopen("gates.out","w",stdout);
    cin >> N >> s;
    // double the lengths of segments so
    // there's at least 1 point inside each enclosed region
    int cx = 2001, cy = 2001;
    FJ[cx][cy] = true;
    for(char c : s) for(int i=0; i<2; i++){
        switch(c){
        case 'N':
            cy++;
            break;
        case 'E':
            cx++;
            break;
        case 'W':
            cx--;
            break;
        case 'S':
            cy--;
            break;
        }
        FJ[cx][cy] = true;
    }
    int R = 0;
    // find the number of connected regions
    for(int i=0; i<MX; i++){
        for(int j=0; j<MX; j++){
            if(!FJ[i][j] && !vis[i][j]){
                R++;
                ff_stack(i, j);
            }
        }
    }
    cout << R-1 << endl;
}

aaravdodhia · March 17, 2021, 7:26pm

Yes, you are right. Small optimizations in the order of traversing neighbors or order of visiting nodes give ~1260ms or ~1940ms. In particular, using BFS or simply marking a node as visited while pushing it to the stack pave a path to a solution.

If all other threads in this discussion are resolved, I should just stick with bounding the floodfill using minX, maxX, minY, maxY. Those give the smallest size of the grid needed to ensure we account for all connected components. Since the sum of FJ’s step lengths is at most 2000, the maximum area I will floodfill on is 1000\cdot 1000 using AM-GM. My key takeaways from this discussion are to bound the search as much as possible if

I will certainly be visiting all values in the matrix. This is the case with floodfill.
I am doubling the lengths of segments to account for midpoints or ensure that a connected component includes at least 1 point.
I am using several deep recursion calls. This can happen with tedious applications of dfs and floodfill.
I am allocating millions of bytes all at once. The gap between 1000^2 and 4000^2 is quite large and can mean solving or not solving a problem.

Are the lessons I learnt characteristic of the insights we had in this thread? If there’s anything more @Benq or @infimum would like to add or suggest, please do let me know.

infimum · March 17, 2021, 8:40pm

I guess BFS is actually optimal for this problem in terms of memory usage, so you can switch the stack for a queue. These constant-factor optimizations are not usually necessary, so I wouldn’t try to memorize rules like such.

If you are confident your solution is fundamentally correct and actually need constant factor optimization, then go for the lowest hanging fruit first. In this case that was recursion -> iteration.

I would stay away from optimizations like computing the bounding box because that can lead to errors.

Benq · March 18, 2021, 12:27am

Iterative solution is still pretty slow though

Alternatively just don’t double the length of each segment. Then each fence segment removes an edge between two adjacent squares.

aaravdodhia · March 18, 2021, 3:16pm

That was my first instinct as well. Consider a 2 \times2 square with vertices (1,1), (-1, 1), (1, -1), (-1, -1). We can then travel from (0, 2) to (0,1) to (0, 0), which would merge the entire grid into a single connected component.

You might mean that we should ignore all points that FJ visits and floodfill on the rest. I tried looking at a 1 \times 1 square. We cannot spot that there is a region inside that 1\times 1 grid unless we have a lattice point inside the region. I couldn’t think of any way to ensure this other than doubling the lengths of the segments.

Benq · March 18, 2021, 3:27pm

Let each entry in your 2D array represent a 1\times 1 square instead of a single point.

firebolt360 · June 26, 2021, 11:46pm

I was working on a this problem and came across these 2 vital discoveries:

For each contiguous closed region, there must be 1 gate. In a minimum state, each region has exactly 1 gate. - This I know how to prove
Each time FJ comes across a node he has visited along an edge he has not, he creates exactly 1 new region. - This I do not know how to prove.

My implementation of such let to 10/10 in 11ms with 3.5 mb of memory. So my question is: How would one go about proving statement 2?

Implementation:

#include <bits/stdc++.h>
using namespace::std;

typedef long long ll;
#define F0R(i, n) for (int i = 0; i < n; i++)
#define R0F(i, n) for (int i = n-1; i >= 0; i--)
#define FOR(i, a, n) for (int i = a; i < n; i++)
#define pb push_back
#define fastio ios::sync_with_stdio(0); cin.tie(0)
#define FMOD 998244353
#define MOD 1000000007

int main() {
    freopen("gates.in", "r", stdin);
    freopen("gates.out", "w", stdout);
    fastio;
    set<pair<pair<int, int>, pair<int, int>>> visedge;
    set<pair<int, int>> visnode;
    int n; cin >> n;
    pair<int, int> prev = {0, 0};
    visnode.insert(prev);
    string s; cin >> s;
    int ans = 0;
    F0R(i, n) {
        int x = prev.first; int y = prev.second;
        if (s[i]=='N') {
            x++;
        } else if (s[i]=='S') {
            x--;
        } else if (s[i]=='E') {
            y--;
        } else {
            y++;
        }
        if (visedge.find({{x, y}, prev})==visedge.end() && visnode.find({x, y})!=visnode.end()) {
            ans++;
        }
        visedge.insert({{x, y}, prev});
        visedge.insert({prev, {x, y}});
        visnode.insert({x, y});
        prev.first = x;
        prev.second = y;
    }
    cout << ans << "\n";
}

Edit: One more thing, I probably will open a PR to put this code into the USACO Guide