Welcome

My name is Jason and I am a software developer in the Bay Area. For fun I like to hike, run, shoot some photos, and take advantage of the many other activities California state has to offer. To the right you will see my resume.

Recent Books
  • Head First Design Patterns
    Head First Design Patterns
    by Elisabeth Freeman, Eric Freeman, Bert Bates, Kathy Sierra, Elisabeth Robson
« Interview question: Find the most used character using a Hashtable | Main | Quick Fix: “Invalid number” error in DOS batch script »
Sunday
Jan012012

Interview question: Finding the most used ASCII character

This question has come up in a few interviews and there's 2 ways to solve it depending on if you're using ASCII or Unicode character sets. This first post is to example how to solve the problem for the ASCII character set. 

The problems goes like this: You're to make a method that will return the most used character in a given string. The character set of the string is ASCII. How would you go about doing this? 

Hearing this you know you'll have some type of FOR loop happening here as you move through the string counting each character. Now how do you plan on storing the count of each character? The easiest way would be to use some type of hash table where the character is the key and the value is the number of times the character has been found. 

Great! You've got a plan. But, there should be something nagging you. Why did they emphasizing that the string consists of ASCII characters? This is a hint on how they want you to solve the problem and they're testing your knowledge a bit here. In most languages a CHAR can be used as a INT and vice versa. This means that you can create a simple hash table using an INT[] where the position in the array, or “key”, you want is determined by the value of the current CHAR. So if you're currently looking an A character, whose INT value is 65, you would increment the 65th element of the INT[]. Any time you found the letter D, you would increment the 68th element of the INT[]. And so on. 

About here is where you should ask if this algorithm should be case sensitive or not. Either way, it shouldn't affect your implementation too much, but it just shows you're thinking ahead for test cases. You should also know how the language you're using handles the creation of INT[]. Does it zero out all the elements on creation? Or do you have to do this manually? Knowing things like this shows your mastery of the language. 

So this is what the algorithm looks like. Just to make is a little more interesting I made it case insensitive and only counting alphabetical characters. 

public static char FindMostUsedAsciiChar(final String givenString)
      throws IllegalArgumentException {

   // Verifying the string is not NULL.
   if (givenString == null) {
       throw new IllegalArgumentException("The parameter givenString cannot be null.");
   }

   // To reduce the size of the int[] that's storing the count information I'm only making it
   // 26 elements in size. This means that any time the key char is used that its value must
   // be adjusted accordingly.
   int charHash[] = new int[26];

   // Looping through the string and counting the number of times a alphabetical character
   // appears. We're doing a case insensitive search.
   final char givenChars[] = givenString.toUpperCase().toCharArray();
   for (int index = 0; index < givenChars.length; index++) {
      int key = givenChars[index];
      if (65 <= key && key <= 90) {
         charHash[key - 65]++;
      }
   }

   // Finding the first letter to have the highest use count by the order that they appeared.
   int mostUsedChar = 0;
   int highestCharCount = 0;
   for (int index = 0; index < givenChars.length; index++) {
      int key = givenChars[index];
      if (key < 65 || 90 < key) {
         continue;
      }
      int currentCount = charHash[key - 65];
      if (highestCharCount < currentCount) {
         mostUsedChar = key;
         highestCharCount = currentCount;
      }
   }

   // Done
   return (char)mostUsedChar;
}

Things to possibly take into consideration is how the most used character is found in this scenario. I'm returning the first largest value. So if the string used was “caca” with both 'a' and 'c' having the same number of instances, 'c' would be returned as the character with the most hits. Your interviewer may want the results returned by alphabetical order, meaning the result would be 'a' instead of 'c'. But generally speaking, they don't care. They're just seeing if you understand how your algorithm works. If this comes up it's for the purpose of how to test the method and the expected results in this scenario. 

Another question that may come up, or is at least good to bring to up to once again show your knowledge, is what Big O notation this performs at. Although we are accessing the INT[] in O(1) fashion, the bottle necks are the FOR loops. We do have 2 of them, but since they're not nested our algorithm is O(n). 

In the next post I'll explain how to implement this using the Unicode character set. 

Would you have written this any differently? Are there any possible optimizations that you would have added?

Full source code here.

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>