DataStructure_Algorithm_HandBook_PreForLeetCode

40. Combination Sum II

Given a collection of candidate numbers (candidates) and a target number (target), find all unique combinations in candidates where the candidate numbers sum to target.

Each number in candidates may only be used once in the combination.

Note: The solution set must not contain duplicate combinations.

Example 1:

Input: candidates = [10,1,2,7,6,1,5], target = 8
Output: 
[
[1,1,6],
[1,2,5],
[1,7],
[2,6]
]

Example 2:

Input: candidates = [2,5,2,1,2], target = 5
Output: 
[
[1,2,2],
[5]
]

Constraints:

1 <= candidates.length <= 100
1 <= candidates[i] <= 50
1 <= target <= 30

solution

注意每个元素只能选择一次，所以我们在考虑 i 之后，应该要考虑下一个位置。

但是因为数组是有重复元素的，即便往后选，会出现重复的组合。

比如candidates = [10,1,2,7,6,1,5] target = 8

那么很可能会有[1,7] [7,1]这样的组合。

那么升序排序 [1, 1, 2, 5, 6, 7, 10] 之后，还是可能会出现重复的问题，因此这里涉及到一个技巧，我们需要同层去重。

在同一个 for 循环里面，选择了相同的元素，就会重复了。

在解决组合总和问题时，对数组进行排序有几个关键的好处，尤其是当需要处理重复元素或者需要优化搜索效率时：

var combinationSum2 = function(candidates, target) {
    let result = [];
    candidates.sort((a, b) => a - b);  // 对候选数组进行排序
    dfs(0, 0, [], candidates, target, result);
    return result;
};

function dfs(index, sum, path, candidates, target, result) {
    if (sum > target) return;
    if (sum === target) {
        result.push([...path]);
        return;
    }
    for (let i = index; i < candidates.length; i++) {
        //这边有一个容易出错的点， i 必须大于 index，因为如果说是i>0 的话，可能是这个元素你上一层用到了，但是这一层没有用到，但是也被跳过了。
        //而题目要求的是每个数字只能使用一次，但是不是说相同的数字不能出现
        if (i > index && candidates[i] === candidates[i - 1]) continue;  // 跳过当前递归层次的重复元素
        path.push(candidates[i]);
        dfs(i + 1, sum + candidates[i], path, candidates, target, result);  // 直接传递 path
        path.pop();
    }
}

排序的好处：

方便去重：

当数组排序后，重复的元素会排列在一起，这让我们可以很容易地通过比较当前元素与前一个元素是否相同来跳过重复的组合，而无需使用额外的数据结构（如 Set）来跟踪重复。这样可以减少内存使用并提高代码的执行效率。

优化搜索过程：

在搜索过程中，如果累积的和已经超过目标值 target，则可以立即停止进一步探索当前分支，因为所有后续的元素都将是非负的（假设输入数组全为非负数），进一步加和只会使得总和更大。

排序后，当遇到一个元素使得累积和加上该元素已经超过目标值时，可以直接中断当前循环，避免无效的递归调用。

减少递归深度：排序可以帮助算法更早地识别那些不可能达到目标的路径，从而尽早地剪枝，这在一定程度上减少了递归调用的深度和数量，从而提高算法效率。

Set 方法正确版本

var combinationSum2 = function(candidates, target) {
    let result = [];
    candidates.sort((a, b) => a - b);  // 对候选数组进行排序
    dfs(0, 0, [], candidates, target, result);
    return result;
};

function dfs(index, sum, path, candidates, target, result) {
    if (sum > target) return;
    if (sum === target) {
        result.push([...path]);
        return;
    }
    let set = new Set(); 
    for (let i = index; i < candidates.length; i++) {
        //剪枝优化,因为已经排序了
        if(sum+candidates[i]>target) continue; 
        if(set.has(candidates[i])) continue;  
        // if (i > index && candidates[i] === candidates[i - 1]) continue;  // 跳过当前递归层次的重复元素
        path.push(candidates[i]);
        set.add(candidates[i]);
        dfs(i + 1, sum + candidates[i], path, candidates, target, result);  // 直接传递 path
        path.pop();
    }
}

错误示范

var combinationSum2 = function(candidates, target) {
    let result = [];
    // candidates.sort((a, b) => a - b);  // 对候选数组进行排序
    dfs(0, 0, [], candidates, target, result);
    return result;
};

function dfs(index, sum, path, candidates, target, result) {
    if (sum > target) return;
    if (sum === target) {
        result.push([...path]);
        return;
    }
    let set = new Set(); 
    for (let i = index; i < candidates.length; i++) {
        if(set.has(candidates[i])) continue;  
        // if (i > index && candidates[i] === candidates[i - 1]) continue;  // 跳过当前递归层次的重复元素
        path.push(candidates[i]);
        set.add(candidates[i]);
        dfs(i + 1, sum + candidates[i], path, candidates, target, result);  // 直接传递 path
        path.pop();
    }
}

使用 Set 来判断是否重复而不排序数组确实是一种可能的方法，但它有一些限制和效率问题，特别是在处理具有重复元素的组合问题时。下面是使用这种方法的一些考量和潜在问题：

1. 局限性：

使用 Set 来避免在同一层次递归中使用重复的元素可以防止生成重复的组合。但这种方法只能保证当前递归层级不会选取重复的元素，对于整个解空间的去重来说可能不够。在不排序的情况下，相同的数字可能出现在递归树的不同分支上，你的 Set 在每次递归调用时都会重新初始化，这不能防止跨层级的重复组合。

2. 效率问题：

在不对数组排序的情况下，即使使用 Set 防止了某些重复，你的算法可能仍然会探索许多不必要的路径。比如，如果 candidates 中有大量重复元素或者元素的顺序导致频繁地触发 sum > target 的情况，这会增加算法的计算负担。排序后，当你发现一组元素已经超过 target 时，可以立即停止这一分支的进一步探索，从而更有效地减少递归的次数和深度。

3. 代码复杂性和维护：

使用排序加上比较相邻元素的方式从代码逻辑上更直观，也更容易维护。使用 Set 来控制重复元素，虽然看起来解决了问题，但实际上增加了代码的复杂性，因为你需要在每次递归调用中维护和更新 Set。

总结：

尽管在不排序的情况下使用 Set 可以在某种程度上避免重复组合，但这种方法通常不如排序后直接通过索引和值的比较来得有效和清晰。排序不仅帮助简化去重逻辑，还可以优化算法的整体性能，尤其是在处理包含重复元素的大数组时。因此，虽然你可以按照你的方法实现，但从长远来看，排序可能是一个更优的选择。