The Curious Case Of Strings In Java
Following extract from the String class shows the declaration of the attribute which holds the String's value
1 2 | /** The value is used for character storage. */ private final char value[]; |
Because this char array is declared as final, once a String object is assigned a value, it can never be changed. However, the references to a String object are mutable, which means , we can point to another String object using the same reference.
Let's look at the following code which explains the above concept :
1 2 3 4 5 6 7 8 9 10 11 12 13 | public class TestStrings { public static void main(String[] args) { String s1 = "MARY"; // creates a String object "MARY" and assign reference variable s1 to point to it. s1.concat(" ROCKS"); // looks like s1 is now "MARY ROCKS" , lets see what happens when we try to print the value of s1 System.out.println(s1); // the output is only "MARY" , so this means the value of the String is not changed even after the concat operation. This is because the 'char[] value' array that // stores the value of the String is final. Once you have assigned the value "MARY" to it, it cannot be reassigned any other value. // By the way, where did "MARY ROCKS" go?? Its lost , as we don't have a reference to it. Then what should be done?? s1 = s1.concat(" ROCKS"); // here we are assigning the s1 reference to point to output of the concat operation. System.out.println(s1); // Now it prints "MARY ROCKS" } } |
The above example makes us curious about the impact on memory usage due to the creation of these many number of redundant String objects. JVM handles this by designating an area of the memory to a 'String constant pool'. Whenever a new String literal is created, the Java compiler first scans this pool to see if such a literal already exists in the pool. If yes, it will make the new reference point to the same String literal, it won't create a new one in the pool.
But now that two references are pointing to the same String , what if using one reference, we manipulate the value of the String?
1 2 3 4 5 6 7 8 9 10 11 12 13 | public class TestStrings { public static void main(String[] args) { String s1 = "MARY"; String s2 = "MARY"; // Only one String object in pool "MARY", two references pointing to it. s1 and s2, How do I know that?? Lets see if s1 and s2 are both pointing to same object. System.out.println(s1==s2); //This return true. Confirms our understanding. s2.replace('A', 'a'); //Ohhh...looks like the s2 changed the value of "MARY" object in pool to "MaRY".. What will happen to s1, it doesn't know about the change!!! System.out.println("s1="+s1); //It prints s1=MARY .. How come .. s2 changed the value of the String?? System.out.println("s2="+s2); // It prints s2=MARY ... Oh.. where did "MaRY" go..?? why is s2 printing "MARY"?? } } |
We cannot, because Strings are immutable. In line 4, an object got created in String constant pool and reference s1 was made to refer to it.When we tried to create the same String literal "MARY" again in line 5, compiler scanned the String constant pool and found that this literal already exists. So, it just made the reference s2 point to that same object. If String were not immutable, s1 may change the value of the String at any time, and s2 which is pointing to the same object and expecting it to contain "MARY", will be in for a shock when at some point it tries to use that value. So, immutability helps in making use of a String pool and hence efficient use of memory.
OK, but what if I extend the String class and make it mutable? No, we can't. The String class has been made final to preserve its immutability.
How about the following then, this should also create only one String object in the pool?
1 2 3 4 5 6 7 8 9 10 11 | public class TestStrings { public static void main(String[] args) { String s1 = "MARY"; String s2 = new String("MARY"); //So, this should have created one object in pool and both references pointing to it?? Lets see. System.out.println(s1==s2); // oh No!! this returned false, they are not pointing to same object. System.out.println(s1.equals(s2)); //This one checks the contents , so this returns true } } |
In the above example, in line 5 , a String object "MARY" is created in the pool, and the s1 reference points to it. In line 6, the argument passed to the constructor is a String literal. So, Java compiler tries to see if this exists in pool. Yes, its there,so it doesn't create a new one. Now it creates a new String object in the same way as any other object is created in heap. Mind it, its not in the String constant pool, its in normal area of memory. The instance variable value is made to point to the value field of the object in the argument . So, even though the contents of the String are equal, they are not representing the same object as seen using the == operator.
The two references s1 and s2 point to different objects, so the hashCode() of the two should be different, isn't it. Lets confirm through the following piece of code.
1 2 3 4 5 6 7 8 9 10 11 12 | public class TestStrings { public static void main(String[] args) { String s1 = "MARY"; String s2 = new String("MARY"); //Two different object, so they should have different hashCodes, right?? Lets see. System.out.println(s1.hashCode()); //Prints 2359003 System.out.println(s2.hashCode()); //Prints 2359003, so hashCodes are the same. How come?? } } |
Oops, both have the same hashCode. But how come. Lets see what happens when we construct a new String object by passing another String as an argument to the constructor.
1 2 3 4 | public String(String original) { this.value = original.value; this.hash = original.hash; } |
So, both the value and the hash attributes get the same content as the original String.And hence they get the same hashCode and value. Want to see how the hashCode for a String is calculated?
1 2 3 4 5 6 7 8 9 10 11 12 | public int hashCode() { int h = hash; if (h == 0 && value.length > 0) { char val[] = value; for (int i = 0; i < value.length; i++) { h = 31 * h + val[i]; } hash = h; } return h; } |
The value if copied to a new char array. And then the following computation is done: s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation.
Because of this, the hashcode of the String "ARMY" will be different from "MARY" , the sequence of the character plays a role in calculation of the hashcode.
Well, this turned out to be a pretty long article. So,wrapping it up now.See you next time with another one.
Comments