Proof of CSES Stick Divisions?

thecodingwizard · November 18, 2020, 7:21pm

Does anyone know how to (formally) prove the greedy solution to CSES Stick Divisions?

aaravdodhia · November 18, 2020, 10:05pm

I am not sure. The best greedy algorithm I can think of is taking splitting a sum using the maximum available number. For example, the sample test case would be split into:
8
3 5 (+8)
2 3 (+5)

I played around with it, but then I found this case:
36 6
1 4 5 7 9 10

The greedy solution will yield:
36
10 26 (+36)
9 17 (+26)
7 10 (+17)
5 5 (+10)
1 4 (+5)
For a total of 94.

However, while experimenting I found this solution based on my intuition of splitting into nearest-possible halves:

36
18 18 (+36)
9 9 10 8 (+18+18 = +36)
4 5 7 1 (+9+8 = +17)
For a total of 89.

I do not know if this will always work. For instance in the case of 8=2+3+3,
I get
8
4 4
(No solution)

(3,5) would be the closest pair for the second step, but I don’t know how to definitively tell what a step should be. In the above experimentation, I put (18,18), where 18 is not already one of the desired numbers.

If you can guide me to the greedy approach, that would be great. I am not sure how to prove it, though. Thanks!

thecodingwizard · November 19, 2020, 1:09am

Here’s my approach (untested)…

Solution to CSES Stick Divisions

“Reverse” the process: add small sticks to form one big stick
Put all the desired stick lengths into a multiset
Greedily take the two smallest sticks, then merge them, then put the resulting stick back into the multiset
Repeat until there’s only one stick left

nchn27 · November 19, 2020, 5:39am

That should be correct. This problem is essentially equivalent to Huffman encoding (https://en.wikipedia.org/wiki/Huffman_coding#Basic_technique) which I think you could figure out if you look at the similarity between what you do and what the Huffman algorithm does…

So technically I don’t know the formal proof but I know that the proof of Huffman encoding would be it

Benq · November 19, 2020, 3:40pm

There’s a simple proof here.

Azzmi · February 14, 2023, 9:48am

We can use induction.

We’ll induct the number of sticks, n\ge1. n \leq 3 is easy to manually verify. Assume that it’s true for n=k\ge3 sticks. We’ll show that it’s also true for k+1 sticks.

Let the k+1 sticks be a_1,a_2,\dots,a_{k+1}.

If the first step does not involve a_1 and a_2, then in the second step, a_1 and a_2 will be selected by the induction hypothesis since after the first step, there are k sticks. Note that we can switch the first step and the second step. The first step is now a_1 and a_2. We’re done.
After k steps, all k+1 sticks have combined into 1. We discuss the case where a_1 and a_2 have already combined in the x^{th} step where x < k. We focus on the “cluster” that contains a_1 and a_2 in the x^{th} step. We want to show that we can have the same cluster but by choosing a_1 and a_2 in the first step. The strategy is to greedily start by choosing sticks in the cluster. It’s easy to see that this works.
If a_1 and a_2 combined in the final step, then, there are two cases:

a. a_1 and a_3 is the first step. Note that the second step must be a_2 and a_4 (because we have already discussed the case where a_1 and a_2 is combined). Note that that no matter the steps, the cost is always a linear combination of a_1,a_2,\dots,a_{k+1}. Let x(a_1+a_3) + y (a_2+a_4) be the contributions of a_1,a_2,a_3,a_4 to the cost. Since a_2+a_4 \ge a_1+ a_3, we can assume that x \ge y by rearrangement inequality since we want to minimize the cost. Note that if the first step is a_1 and a_2 and the second step is a_3 and a_4, we would have a lower cost since x(a_1+a_3) + y(a_2+a_4) \ge x(a_1+a_2) + y(a_3+a_4) \iff x(a_3 - a_2) \ge y (a_3-a_2) \iff x \ge y.

b. a_1 and a_4 is the first step. Note that the second step must be a_2 and a_3. From, here we continue just like a., except that we must make cases based on which a_2+a_3 and a_1+a_4 is greater.

c. a_1 and a_i where i > 4 is the first step. The second step is a_2 and a_3. Switch the first step and the second step so that the first step is a_2 and a_3. Note that a more optimal choice is choosing a_1 and a_4 as the second step instead. This is case b.