Paths on Grids: E - Swap

I was trying to solve the problem E - Swap in this section and I’ve understood the internal solution.

But inside the users solution there is one that is much shorter and seems much more elegant and that is this one:

#include<bits/stdc++.h>
using namespace std;
#define int long long
const string str = "KEY";
string s;
int k;
map<string , map<int,int>> mp;
int dfs(string s,int k) {
    //cout << s << ' ' << k << '\n';
    int n = s.size();
    if(k < 0) return 0;
    if(k == 0 || n == 1) return 1;
    if(mp[s][k] != 0) return mp[s][k];
    for(int i = 0;i < 3;i++) {
        for(int j = 0;j < s.size();j++) {
            if(str[i] == s[j]) {
                mp[s][k] += dfs(s.substr(0,j) + s.substr(j + 1) , k - j);
                break;
            }
        }
    }
    return mp[s][k];
}
int32_t main() {
    cin.tie(0)->sync_with_stdio(0);
    cin >> s >> k;
    cout << dfs(s,k) << '\n';
    return 0;
}

I’ve tried to understand this code for a few hours but in the end I didn’t understand why this code runs within the time limit (if you use unordered_map its almost as fast as the iterative bottom up dp solution).

The depth of the dfs is 30 and at each call you have at maximum 3 calls, that’s 3^30.

There is memoization and that complicates things, what would be the time complexity of this code? I’ve checked and in the worst case the dfs function gets called at maximum ~= 260k times.

I think the worst case is time complexity is O(n^2 * 2^n), still quite slow but ok for the problem.

There are poly(|S|) DP states, because the s in each state is formed by just removing some prefix of each character.