2031. Count Subarrays With More Ones Than Zeros
You are given a binary array nums
containing only the integers 0
and 1
. Return the number of subarrays in nums that have more 1
's than 0
's. Since the answer may be very large, return it modulo 109 + 7
.
A subarray is a contiguous sequence of elements within an array.
Example 1:
Input: nums = [0,1,1,0,1]
Output: 9
Explanation:
The subarrays of size 1 that have more ones than zeros are: [1], [1], [1]
The subarrays of size 2 that have more ones than zeros are: [1,1]
The subarrays of size 3 that have more ones than zeros are: [0,1,1], [1,1,0], [1,0,1]
The subarrays of size 4 that have more ones than zeros are: [1,1,0,1]
The subarrays of size 5 that have more ones than zeros are: [0,1,1,0,1]
Example 2:
Input: nums = [0]
Output: 0
Explanation:
No subarrays have more ones than zeros.
Example 3:
Input: nums = [1]
Output: 1
Explanation:
The subarrays of size 1 that have more ones than zeros are: [1]
Constraints:
1 <= nums.length <= 105
0 <= nums[i] <= 1
Solution
Note that this problem is pretty tricky and definitely not a medium question. The categorization is likely wrong on leetcode. Don’t feel bad if you don’t get it the first time. It took me a whole night to figure it out.
Let’s first pretend 0s are actually -1s so that 1 and -1 cancels each other out and we just need to find subarrays with sum > 0.
On first look, this problem looks like a sliding window problem. We loop though each right pointer and expand the window and adjust the left pointer to the right if the window sum is ≤ 0. And the number of subarrays within the window is right — left + 1 which we add to the total count.
On further examination, we come to the realization that this approach is flawed because although the subarray of the window may have a sum greater than 0, the individual subarrays within the window may not necessarily have a sum greater than 0. For example, [0, 1, 1] has sum > 0 but [0, 1] does not.
Since the problem is about incremental sum changes, a natural thing to think about is prefix sum. For each subarray ending at index i with a sum prefix_sum[i], what we want to find is the number of prefix subarrays (subarray that starts from 0 and ending where j < i) whose sum is smaller than prefix_sum[i] so that sum(nums[i:j]) > 0.
To calculate the count quickly, we can use a hashmap of sum: count of prefix subarrays to keep track of the frequency. Let’s call it the sum_to_count hashmap. And we can go through each element, use it as the last element of the subarray, track the running sum, and go through every pair in the hashmap and add the count to the result if a sum is smaller than the current sum.
class Solution:
# O(n^2) solution, does not pass all test cases
def subarraysWithMoreZerosThanOnes(self, nums: List[int]) -> int:
nums = [i if i else -1 for i in nums]
n = len(nums)
presum = [0] * (n + 1)
for i in range(1, n+1):
# sum of nums from nums[0] to nums[i - 1]
prefix_sum[i] = prefix_sum[i-1] + nums[i-1]
cnt = Counter()
res = 0
mod = 1000000007
for i in range(n+1):
cnt[prefix_sum[i]] += 1
res += sum([v for k, v in cnt.items() if k < prefix_sum[i]]) % mod
return res % mod
This passes the smaller test cases but fails the larger cases. And the problem is `sum([v for k, v in cnt.items() if k < prefix_sum[i]]) % mod` takes O(n) time and the overall time complexity is O(n²). The constraints is n ≤ 10⁵ and O(n²) is too slow. As a rule of thumb, 10⁴ is the max for O(n²) solution that could pass all tests in most online coding platform (a list of runtime to constraints mapping).
Dynamic programming
So we need to optimize the inner loop to O(n). Another piece of information we haven’t used is: at index i, we already have the result from index i-1 and there are only two possibility for nums[i], 1 or 0, so we can build on the previous result and consider each case of 1 or 0.
Let’s loop through each element, keeping tracking of the running sum and two states for the previous step
- prev_count_0, the count of subarrays up until index i — 1 whose sum = 0
- prev_count_1, the count of subarrays until index i — 1 whose sum > 1
We also have two states for the current step
- current_count_0, the count of subarrays up until index i whose sum = 0
- current_count_1, the count of subarrays up until index i whose sum = 0
What we can see is
- if nums[i] == 1, since the current element contributes 1 to the sum, all previous subarrays with sum >1 still have sum > 1 with 1 added, so we add the count. Additionally, we can add 1 to all previous subarrays with sum = 0 and make it a qualifying subarray with sum = 1 > 0. The overall transition formula is current_count_1 = prev_count_1 + prev_count_0 + 1
- if nums[i] == 0, the current element contribute net negative 1 to the running sum. What this means is that out of all the previous subarrays whose sum > 1, some of the subarrays would now fail to qualify after adding the current element’s negative 1 contribution. And what is this count? Remember we add the current element to the running sum, and we know the count of that sum from sum_to_count[sum]. All the previous subarrays with sum_to_count[sum] as prefix with the current element added would have their sums go to 0 and disqualify. The formula is current_count_1 = current_count_1 + (previous_count_1 — sum_to_count[sum]).
At the end of each step, we move current_count_1 to previous_count_1 and current_count_0 to previous_count_0.
class Solution:
def subarraysWithMoreZerosThanOnes(self, nums: List[int]) -> int:
MOD = 1000000007
from collections import defaultdict
cur_sum = 0
prev_count_0 = prev_count_1 = 0
sum_to_count = defaultdict(int)
sum_to_count[0] = 1 # empty array has sum 0
ans = 0
for num in nums:
if num == 1:
cur_sum += 1
else:
cur_sum -= 1
if num == 1:
cur_count_1 = (prev_count_1 + prev_count_0 + 1) % MOD
elif num == 0:
cur_count_1 = (prev_count_1 - sum_to_count[cur_sum] + MOD) % MOD
prev_count_0, prev_count_1 = sum_to_count[cur_sum], cur_count_1
sum_to_count[cur_sum] = sum_to_count[cur_sum] + 1
ans = (ans + cur_count_1) % MOD
return ans