Using BucketSort to sort an array of colors

A common question, especially as warm-up question in technical interviews, is the following: given a sorted list of colored balls where some balls are black and some are white, rearrange the set in such a way to have the black balls ahead of the white ones. Once the candidate is done with it, a follow-up question comes with an introduction of a third color, say green. The image below exemplifies the before-after states for this question:


I've seen solutions using the following approaches:
1) Brute-force N^2-time (similar to BubbleSort)
2) Two-pointers (head and tail) solutions (which gets complicated with more colors)
3) Even a QuickSort-like solution (attempting to get it down to NLogN-time)

There is an easier solution that works well for this case, but there is one key characteristic for this problem: the order of the balls within a group with the same color is irrelevant. This is crucial in order to be able to use a bucket (or count) sort approach.

The approach will be the following:
A) Have a HashTable (HT). The max number of elements in the HT will be the number of colors, which should be small
B) Do one pass (O(n)-time) across all the colors and add the count of each color to the HT
C) Decide the order that you want to use to sort the balls (put that order in an array)
D) Go thru the array in (C) and using the count in the HT, assign the colors to the array

And that's it. The total time is O(n) and the extra space is actually constant assuming a small and fixed number of different colors. Code is below - cheers, Marcelo.

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Numerics;
using System.Text;
using System.Threading.Tasks;
using System.IO;

namespace LeetCode
{
    class Program
    {
        static void Main(string[] args)
        {
            Solution sol = new Solution();
            sol.SortColors(new char[] { 'R', 'G', 'B', 'B', 'R', 'G', 'B', 'R', 'R', 'G', 'B', 'R', 'B', 'G', 'G', 'B' });
        }
    }
    public class Solution
    {
        public void SortColors(char[] colors)
        {
            Hashtable buckets = new Hashtable();

            //1. Adding to the bucket while printing the "before" order
            Console.WriteLine("Before: ");
            for (int i = 0; i < colors.Length; i++)
            {
                Console.Write("{0} ", colors[i]);
                if (!buckets.ContainsKey(colors[i])) buckets.Add(colors[i], 0);
                buckets[colors[i]] = (int)buckets[colors[i]] + 1;
            }
            Console.WriteLine();

            //2. Set the order that you want to use for the sorting
            char[] sortingOrder = { 'R', 'G', 'B' };

            //3. Bucket-sort it
            int index = 0;
            foreach (char so in sortingOrder)
            {
                int count = 0;
                if (buckets.ContainsKey(so)) count = (int)buckets[so];
                for (int i = 0; i < count; i++)
                {
                    colors[index++] = so;
                }
            }

            //4. Print sorted array
            Console.WriteLine("After: ");
            for (int i = 0; i < colors.Length; i++)
            {
                Console.Write("{0} ", colors[i]);
            }
            Console.WriteLine();
        }
    }
}

Comments

  1. Nice - it's a variation of a famous national dutch flag problem, which is usually given to see an ability to understand and maintain program invariants, since its fastest implementation involves "splitting" an input array into 4 subranges - all red, all green, unexplored and all blue. It's not hard to implement, but requires attention to details, whereas bucket sort is super easy to implement making it a great solution to this problem.

    ReplyDelete
  2. FYI, this problem is also on leetcode - https://leetcode.com/problems/sort-colors/.
    Here is my naive implementation:

    class Solution {
    public:
    void sortColors(vector& nums) {
    int onesStart = 0;
    int twosStart = nums.size();
    for (int i = 0; i < twosStart; ) {
    if (nums[i] == 0) {
    swap(nums[onesStart++], nums[i++]);
    } else if (nums[i] == 1) {
    i += 1;
    } else {
    swap(nums[i], nums[--twosStart]);
    }
    }
    }
    };

    ReplyDelete

Post a Comment

Popular posts from this blog

Golang vs. C#: performance characteristics (simple case study)

Claude vs ChatGPT: A Coder's Perspective on LLM Performance

ProjectEuler Problem 719 (some hints, but no spoilers)