Category: computers

  • Java String is final and immutable. Yet, java string contents can be modified.

    I’ve wrote this article several years ago, and intended JVM target is Java 8. Since then, JVM advanced in many areas. Perhaps the code in the article must be modified in order to work. However, the code is for demonstration purposes only, the idea of the post IMHO is still as valid as it was years ago when original text was created.

    Java ecosystem is vast. A lot of things happen all the time. It is sometime difficult to be up-to-date and don’t forget things one doesn’t immediately use. In order to check myself I read interview questions blogs and groups from time to time. Sometimes I encounter things that I forgot, sometimes learn something new.. and sometimes I see answers that, well, IMHO are not precise, or incomplete, or incorrect.

    Some time ago I encountered question “Why java string made final and immutable?”. Suggested reasons were thread safety, performance, and security.
    I totally agree with first reason, it would be nightmare if anything as heavily used as String would require some sort synchronization for safe usage in multi-threaded.

    I’m not sure about second reason ,”Java concurrency in practice” mentions that immutable object do have some performance advantage, but authors of “Java performance: definitive guide” point out that it is simply not correct in modern JVM. Both sources are considered very solid, and I don’t know JVM internals enough to decide on myself. Perhaps there is no immediate advantage to immutability , JIT generates the same code, interpreter works in the same manner. However,one case where immutability is certainly advantageous is garbage collection : containers of immutable objects are as “young” as youngest member, so I can easily imagine situations that GC just skips entire container in minor collection. And of cause, less objects to scan means less time… So, containers of strings can be skipped and not scanned. Also, String class is around from the very beginning of Java so perhaps at some point of time there WAS performance penalty for not being immutable so String was designed with this in mind. Then, JVM was improved, but there were no reason to change the String class.So, let’s say I think this point is valid.

    But third reason – String class was designed immutable and final for security seems to me absolutely wrong. The example in answer was loading class by name, which is string, so if hacker can change it just before class loader loads the bytecode it is going to be security breach. Well, IMHO the reasoning is flawed because , if hacker can run arbitrary code in your JVM, you probably have bigger problem than those string immutability is “protecting” from. On the other hand, if you want to shoot yourself in foot, who can prevent you from doing it ? After all it is YOURS JVM.

    Having this feeling I asked myself question arising from the “security hypothesis”: does immutable design of string really mean that contents of String object can’t be replaced after the string is created? The answer, as you can see below , is, of corse, it can.

    First of all, anything can be changed via JNI. Once you have your c code called, you can break things in so many ways I can’t even count… So, let’s put this aside and modify contents of a string using only pure java. It’s not hard to do. Below is proof of concept:

    public class StringModifier {
        public static void main(String[] str){
            try {
                String test="aaaa";
                String test2 =test;
                String test3 = new String(test);
                String test4 = new String(test.toCharArray());
                Field values = String.class.getDeclaredField("value");
                values.setAccessible(true);
                char[] ref = (char [])values.get(test);
                ref[0] = 'b';
                
                System.out.println(test+" "+test2+" "+test3);
            } catch (NoSuchFieldException|SecurityException|IllegalArgumentException|IllegalAccessException ex) {
            }
        }
    }
    

    The output is :

    baaa baaa baaa aaaa
    

    So it takes only 4 lines of code to modify a string content. It’s all the same whether the string is in const pool or allocated on heap, all aliases “see” the change in referenced object..

    Several questions arise from this “discovery”:

    Is it possible to use this? I think, sometimes yes. E.g. there are situations of storing sensible information in strings. It is most definetely bad practice, exactly for the reason of having the information around after it is no longner needed. Only garbage collector decides when memory of unreferenced object is reclaimed, so your precious info can be found perhaps a little longer than you think. But some API require password as strings in the method arguments, sometimes the parameters are passed from environment etc. In such cases of imposed using of string one can clean up sensitive data by modifying string contents… Probably there are some other scenarios in which it is desirable to modify string..

    Another note is about scope of the code : as of today AFAIK there is a discussion of modification of the way String stores its value. A lot of strings usage scenarios don’t require it to be unicode. So it is suggested to use one byte per character in those situations. It is quite possible this String class change will be introduced in Java 9, so the above code won’t work as is. But probably it is not too hard to adapt it to the new design.

    The string can be modified, so, how those changes manifest in multi threaded environment? After all, most important reason (IMHO) for immutability of the string is correct and expectable behaviour in multi-threading. Is it possible to get discrepancy in values of the string object in different threads? Well, I believe it is actually a case. If one doesn’t operate within “normal” API he shouldn’t expect regular JVM guarantees to be fulfilled. I tried to play a bit with concurrent access and modification of strings by several threads, but wasn’t able to get the discrepancy. But I believe it is just me not doing good enough tests..

    That’s all for thoughts on modification of immutable strings in java.Hope you enjoyed the reading.

  • Java String is final and immutable. Yet, java string contents can be modified.

    I’ve wrote this article several years ago, and intended JVM target is Java 8. Since then, JVM advanced in many areas. Perhaps the code in the article must be modified in order to work. However, the code is for demonstration purposes only, the idea of the post IMHO is still as valid as it was years ago when original text was created.

    Java ecosystem is vast. A lot of things happen all the time. It is sometime difficult to be up-to-date and don’t forget things one doesn’t immediately use. In order to check myself I read interview questions blogs and groups from time to time. Sometimes I encounter things that I forgot, sometimes learn something new.. and sometimes I see answers that, well, IMHO are not precise, or incomplete, or incorrect.

    Some time ago I encountered question “Why java string made final and immutable?”. Suggested reasons were thread safety, performance, and security.
    I totally agree with first reason, it would be nightmare if anything as heavily used as String would require some sort synchronization for safe usage in multi-threaded.

    I’m not sure about second reason ,”Java concurrency in practice” mentions that immutable object do have some performance advantage, but authors of “Java performance: definitive guide” point out that it is simply not correct in modern JVM. Both sources are considered very solid, and I don’t know JVM internals enough to decide on myself. Perhaps there is no immediate advantage to immutability , JIT generates the same code, interpreter works in the same manner. However,one case where immutability is certainly advantageous is garbage collection : containers of immutable objects are as “young” as youngest member, so I can easily imagine situations that GC just skips entire container in minor collection. And of cause, less objects to scan means less time… So, containers of strings can be skipped and not scanned. Also, String class is around from the very beginning of Java so perhaps at some point of time there WAS performance penalty for not being immutable so String was designed with this in mind. Then, JVM was improved, but there were no reason to change the String class.So, let’s say I think this point is valid.

    But third reason – String class was designed immutable and final for security seems to me absolutely wrong. The example in answer was loading class by name, which is string, so if hacker can change it just before class loader loads the bytecode it is going to be security breach. Well, IMHO the reasoning is flawed because , if hacker can run arbitrary code in your JVM, you probably have bigger problem than those string immutability is “protecting” from. On the other hand, if you want to shoot yourself in foot, who can prevent you from doing it ? After all it is YOURS JVM.

    Having this feeling I asked myself question arising from the “security hypothesis”: does immutable design of string really mean that contents of String object can’t be replaced after the string is created? The answer, as you can see below , is, of corse, it can.

    First of all, anything can be changed via JNI. Once you have your c code called, you can break things in so many ways I can’t even count… So, let’s put this aside and modify contents of a string using only pure java. It’s not hard to do. Below is proof of concept:

    public class StringModifier {
        public static void main(String[] str){
            try {
                String test="aaaa";
                String test2 =test;
                String test3 = new String(test);
                String test4 = new String(test.toCharArray());
                Field values = String.class.getDeclaredField("value");
                values.setAccessible(true);
                char[] ref = (char [])values.get(test);
                ref[0] = 'b';
                
                System.out.println(test+" "+test2+" "+test3);
            } catch (NoSuchFieldException|SecurityException|IllegalArgumentException|IllegalAccessException ex) {
            }
        }
    }
    

    The output is :

    baaa baaa baaa aaaa
    

    So it takes only 4 lines of code to modify a string content. It’s all the same whether the string is in const pool or allocated on heap, all aliases “see” the change in referenced object..

    Several questions arise from this “discovery”:

    Is it possible to use this? I think, sometimes yes. E.g. there are situations of storing sensible information in strings. It is most definetely bad practice, exactly for the reason of having the information around after it is no longner needed. Only garbage collector decides when memory of unreferenced object is reclaimed, so your precious info can be found perhaps a little longer than you think. But some API require password as strings in the method arguments, sometimes the parameters are passed from environment etc. In such cases of imposed using of string one can clean up sensitive data by modifying string contents… Probably there are some other scenarios in which it is desirable to modify string..

    Another note is about scope of the code : as of today AFAIK there is a discussion of modification of the way String stores its value. A lot of strings usage scenarios don’t require it to be unicode. So it is suggested to use one byte per character in those situations. It is quite possible this String class change will be introduced in Java 9, so the above code won’t work as is. But probably it is not too hard to adapt it to the new design.

    The string can be modified, so, how those changes manifest in multi threaded environment? After all, most important reason (IMHO) for immutability of the string is correct and expectable behaviour in multi-threading. Is it possible to get discrepancy in values of the string object in different threads? Well, I believe it is actually a case. If one doesn’t operate within “normal” API he shouldn’t expect regular JVM guarantees to be fulfilled. I tried to play a bit with concurrent access and modification of strings by several threads, but wasn’t able to get the discrepancy. But I believe it is just me not doing good enough tests..

    That’s all for thoughts on modification of immutable strings in java.Hope you enjoyed the reading.

  • What it takes to make a java class immutable?And what does it mean, actually?

    One can find a lot of publications, blog posts, articles etc. about java class immutability, sometimes suggesting confilicting requirements for class to be immutable. In this post I’ll try to clarify some points regarding it by attempting to look at immutability from JVM point of view. I intentionally leave out of discussion importance of immutable object to system design or code optimisations and concentrate only on multi-threading aspects of immutability.

    Let’s informally say that class is immutable, if its internal state can’t be affected after the object construction finalised.
    That includes the following

    • there are no methods in class interface ( including methods inherited from superclass) that directly or indirectly change internal state in accordance to their arguments.
    • there is no instance fields that can be directly modified from outside
    • if class has a field containing reference to an object, then either referenced object is immutable itself or the reference can’t be obtained from outside of the class
    • references for the object can’t be obtained outside the class before constructor finishes

    JVM specification assures two things if the object is immutable:

    1. one can access the object without synchronisation in any thread
    2. objects of immutable class don’t require safe publication (reference is safely published by initialising an object reference from static initializer, storing reference into volatile, final, AtomicReference variable, or variable guarded by a intrinsic lock)

    Let’s look at the first item:

    In order to understand it let me remind the reader why discrepancy possible at all. To be efficient, JVM maintains copies of the same object for different threads. Most modern architectures are NUMA, that is different cores has different access times to regions of memory. Threads write to their copies of object, and at some point of time JVM propagates changes to other threads. Without JVM and compiler taking special care, it is possible that one thread doesn’t observe changes made by others or, even worse, observes only partial changes.. Java memory model specifies conditions on which copies are guaranteed to be up-to-date.

    For particular kind of classes, called effectively immuctable,JMM guarantees that any thread observes correct values of fields of the object of the class. Object is effectively immutable if it is fully constructed by the time any thread accesses it, and it’s state doesn’t change after construction.

    Let’s see why this happens. After the object has been constructed by JVM, any thread that tries to access it first time, copies state of the object to its local memory and then works with this copy. But if object never changes, all local copies contain identical information. And there is a reason for why no synchronization needed – local copy is up to date. Note that JVM doesn’t do anything special, it is not even aware of “immutability” of the object.

    With this in mind, one easily understands requirement of not escaping reference before constructor finishes for class to be effectively immutable. If constructor haven’t finished, there is no guarantee that newly obtained local copy doesn’t contains only partailly initialized data.

    For immutable class there is no way to modify local copy of data, by definition of the immutability. Hence immutable class is always effectively immutable. So copies of the object maintained by all threads are identical, and this is “for free” from JVM point of view..

    Immutable object doesn’t require safe publication – meaning that JVM must ensure that thread that obtain the object reference gets it right (e.g. there are no certain types of access optimisations JVM is allowed and performed on “regular” object references). JVM does additional work for this, and it must decide whether apply additional guarantees, prevent some optimisations or not. Many places on the net state that class must be final in order to be immutable, but it is not true. For example BigInteger is immutable and not final as one can see from documentation. Once again, if one thinks about the matter from JVM “point of view”, absence of this requirement is quite clear. When object is constructed JVM always knows its exact class. It doesn’t matter that there are some subclasses that are not immutable. At the point of construction compiler knows whether object is immutable or not. At some other points in code it can’t determine for sure, e.g. if argument of a methd is such immutable but not final class. In this case compiler can’t assume that object is indeed immutable, and perform optimisations that would be possible if class was final. But construction is not such a case, and compiler is able to produce code accordingly ( or, perhaps JVM can take this in account).

    Immutable class doesn’t have to be final, but what rules one has to follow so compiler does recognizes the class as immutable? Once again, from JVM point of view meaning of immutability is there is no way to affect state of objects of the class after construction. That is, contents of memory representing object and all the objects it references can’t be changed.

    JVM specification assures two things if the object is immutable:

    • making all non-private fields final
    • ensuring there is no method in the class that changes object’s fields directly or indirectly
    • all referenced objects can’t be modified from outside, e.g. by ensuring they are immutable themselves

    Let’s take a look on some examples in order to get a taste of what can’t be done if one desires to make class immutable:

    public class Mutable1 {
     ............
     int samePackageAccessible;
     ............
    }
    

    Even if member is not public, possibility of modification ( by code of other classes of the same package) disqualifies the class from being immutable. Note that it is not important whether other classes of the package modify samePackageAccessible at time of compilation. Such a code can be added later.

    public class Mutable2 {
    
       private int mutable;
    
    
       private void someMehtod(int newVal) {
        .....
        mutable = newVal;
        ......
       }
    
       public void someOtherMethod(..) {
         ....
         if ( something )
            someMethod(int value that depends on arguments of someOtherMethod);
         ....
    
    }
    
    

    mutable is not accessible directly from outside, but it can be changed by calling someOtherMethod

    public class MutableClass {
       int  mutable;
    }
    
    public class Mutable3 {
      public final MutableClass myFinal;
      ....
    }
    
    

    Mutable3 disqualified from being immutable even though all its non-private members are final, because one can change its state by modifying

    Mutable3 myObj = new Mutable3();
    myObj.myFinal.mutable = ...
    
    public class MutableClass {
            int mutable;
    }
    
    public class Mutable4 {
      private final MutableClass myPrivite;
    
      // there are no methods that modify myPrivate
    
      public Mutable4(MuttbleClass myPrivate) {
        ....
        this.myPrivate = myPrivate;
      }
    
    }
    
    

    Mutable4 is disqualified from being immutable, because it is possible that outside object retains reference to myPrivate object passed to constructor, and can modify its state…

    public class Mutable5 {
       private Random r = new Random();
       private int mutable;
       
       private void someMethod(){
         mutable = r.nextInt();
       }
    
       ...
       public void someOtherMethod() {
         ...
          someMethod();
         ...
       }
    
    }
    
    

    Mutable5 class is disqualified from being immutable even though mutable value doesn’t depend on any input from outside of the class. Calling someOtherMethod still changes internal state of the object.

    At this point it seems that immutable class just can’t contain mutable fields, all fields are final ,and all referneced objecs are immutable…

    Beleive it or not – immutable object can contain mutable fields that change with time! But all access to the fields must yield the same result. If compiler can assure this , class still immutable.

    Let me explain with example. String is immutable class. But for performance reasons it doesn’t compute its hash code at construction time. It contains field

    private int hash; // defaulted by 0
    

    and

    public int hashCode(){
    
     if (hash ==  0) {
            
          hash = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] 
                  //where s is array of chars representing the string
     }
     return  hash;
    }
    
    

    What is extremly important is computation of hash depends only on final array of characters that doesn’t change after string construction. This way hash can take only 2 values – 0 and result of computation that depends only on non-changeable fields of String object. Compiler can verify that any read access to hash happens after its value set as result of deterministic computation which depends only on internal unchangeable data of the class… Also, it is important that type of the field is not long or double. Writes to fields of those types are not atomic, if hash was long there is possible data race – thread computing the hash code writes only part of the value, and another thread reads this partial data… As one can see, pulling the trick of having mutable fields in immutable class requires subtle reasoning, and very error prone. Don’t do it unless really have to…

    So, let me summarize the post:

    • immutable class can define public fields
    • immutable class is not required to be final
    • if there is any possibility of changing internal state, even state of objects referenced by objects of the class or calling of method that calls a method that calls method changing internal state, the class is not immutable. Even if at compile time no other code calls this method or holds reference to object that is part of internal state…
    • immutable class even can contain some private mutable fields, but the behavior of the class must be as if those fields always contain the same value. Creating immutable class with such fields is very tricky and error prone
  • What it takes to make a java class immutable?And what does it mean, actually?

    One can find a lot of publications, blog posts, articles etc. about java class immutability, sometimes suggesting confilicting requirements for class to be immutable. In this post I’ll try to clarify some points regarding it by attempting to look at immutability from JVM point of view. I intentionally leave out of discussion importance of immutable object to system design or code optimisations and concentrate only on multi-threading aspects of immutability.

    Let’s informally say that class is immutable, if its internal state can’t be affected after the object construction finalised.
    That includes the following

    • there are no methods in class interface ( including methods inherited from superclass) that directly or indirectly change internal state in accordance to their arguments.
    • there is no instance fields that can be directly modified from outside
    • if class has a field containing reference to an object, then either referenced object is immutable itself or the reference can’t be obtained from outside of the class
    • references for the object can’t be obtained outside the class before constructor finishes

    JVM specification assures two things if the object is immutable:

    1. one can access the object without synchronisation in any thread
    2. objects of immutable class don’t require safe publication (reference is safely published by initialising an object reference from static initializer, storing reference into volatile, final, AtomicReference variable, or variable guarded by a intrinsic lock)

    Let’s look at the first item:

    In order to understand it let me remind the reader why discrepancy possible at all. To be efficient, JVM maintains copies of the same object for different threads. Most modern architectures are NUMA, that is different cores has different access times to regions of memory. Threads write to their copies of object, and at some point of time JVM propagates changes to other threads. Without JVM and compiler taking special care, it is possible that one thread doesn’t observe changes made by others or, even worse, observes only partial changes.. Java memory model specifies conditions on which copies are guaranteed to be up-to-date.

    For particular kind of classes, called effectively immuctable,JMM guarantees that any thread observes correct values of fields of the object of the class. Object is effectively immutable if it is fully constructed by the time any thread accesses it, and it’s state doesn’t change after construction.

    Let’s see why this happens. After the object has been constructed by JVM, any thread that tries to access it first time, copies state of the object to its local memory and then works with this copy. But if object never changes, all local copies contain identical information. And there is a reason for why no synchronization needed – local copy is up to date. Note that JVM doesn’t do anything special, it is not even aware of “immutability” of the object.

    With this in mind, one easily understands requirement of not escaping reference before constructor finishes for class to be effectively immutable. If constructor haven’t finished, there is no guarantee that newly obtained local copy doesn’t contains only partailly initialized data.

    For immutable class there is no way to modify local copy of data, by definition of the immutability. Hence immutable class is always effectively immutable. So copies of the object maintained by all threads are identical, and this is “for free” from JVM point of view..

    Immutable object doesn’t require safe publication – meaning that JVM must ensure that thread that obtain the object reference gets it right (e.g. there are no certain types of access optimisations JVM is allowed and performed on “regular” object references). JVM does additional work for this, and it must decide whether apply additional guarantees, prevent some optimisations or not. Many places on the net state that class must be final in order to be immutable, but it is not true. For example BigInteger is immutable and not final as one can see from documentation. Once again, if one thinks about the matter from JVM “point of view”, absence of this requirement is quite clear. When object is constructed JVM always knows its exact class. It doesn’t matter that there are some subclasses that are not immutable. At the point of construction compiler knows whether object is immutable or not. At some other points in code it can’t determine for sure, e.g. if argument of a methd is such immutable but not final class. In this case compiler can’t assume that object is indeed immutable, and perform optimisations that would be possible if class was final. But construction is not such a case, and compiler is able to produce code accordingly ( or, perhaps JVM can take this in account).

    Immutable class doesn’t have to be final, but what rules one has to follow so compiler does recognizes the class as immutable? Once again, from JVM point of view meaning of immutability is there is no way to affect state of objects of the class after construction. That is, contents of memory representing object and all the objects it references can’t be changed.

    JVM specification assures two things if the object is immutable:

    • making all non-private fields final
    • ensuring there is no method in the class that changes object’s fields directly or indirectly
    • all referenced objects can’t be modified from outside, e.g. by ensuring they are immutable themselves

    Let’s take a look on some examples in order to get a taste of what can’t be done if one desires to make class immutable:

    public class Mutable1 {
     ............
     int samePackageAccessible;
     ............
    }
    

    Even if member is not public, possibility of modification ( by code of other classes of the same package) disqualifies the class from being immutable. Note that it is not important whether other classes of the package modify samePackageAccessible at time of compilation. Such a code can be added later.

    public class Mutable2 {
    
       private int mutable;
    
    
       private void someMehtod(int newVal) {
        .....
        mutable = newVal;
        ......
       }
    
       public void someOtherMethod(..) {
         ....
         if ( something )
            someMethod(int value that depends on arguments of someOtherMethod);
         ....
    
    }
    
    

    mutable is not accessible directly from outside, but it can be changed by calling someOtherMethod

    public class MutableClass {
       int  mutable;
    }
    
    public class Mutable3 {
      public final MutableClass myFinal;
      ....
    }
    
    

    Mutable3 disqualified from being immutable even though all its non-private members are final, because one can change its state by modifying

    Mutable3 myObj = new Mutable3();
    myObj.myFinal.mutable = ...
    
    public class MutableClass {
            int mutable;
    }
    
    public class Mutable4 {
      private final MutableClass myPrivite;
    
      // there are no methods that modify myPrivate
    
      public Mutable4(MuttbleClass myPrivate) {
        ....
        this.myPrivate = myPrivate;
      }
    
    }
    
    

    Mutable4 is disqualified from being immutable, because it is possible that outside object retains reference to myPrivate object passed to constructor, and can modify its state…

    public class Mutable5 {
       private Random r = new Random();
       private int mutable;
       
       private void someMethod(){
         mutable = r.nextInt();
       }
    
       ...
       public void someOtherMethod() {
         ...
          someMethod();
         ...
       }
    
    }
    
    

    Mutable5 class is disqualified from being immutable even though mutable value doesn’t depend on any input from outside of the class. Calling someOtherMethod still changes internal state of the object.

    At this point it seems that immutable class just can’t contain mutable fields, all fields are final ,and all referneced objecs are immutable…

    Beleive it or not – immutable object can contain mutable fields that change with time! But all access to the fields must yield the same result. If compiler can assure this , class still immutable.

    Let me explain with example. String is immutable class. But for performance reasons it doesn’t compute its hash code at construction time. It contains field

    private int hash; // defaulted by 0
    

    and

    public int hashCode(){
    
     if (hash ==  0) {
            
          hash = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] 
                  //where s is array of chars representing the string
     }
     return  hash;
    }
    
    

    What is extremly important is computation of hash depends only on final array of characters that doesn’t change after string construction. This way hash can take only 2 values – 0 and result of computation that depends only on non-changeable fields of String object. Compiler can verify that any read access to hash happens after its value set as result of deterministic computation which depends only on internal unchangeable data of the class… Also, it is important that type of the field is not long or double. Writes to fields of those types are not atomic, if hash was long there is possible data race – thread computing the hash code writes only part of the value, and another thread reads this partial data… As one can see, pulling the trick of having mutable fields in immutable class requires subtle reasoning, and very error prone. Don’t do it unless really have to…

    So, let me summarize the post:

    • immutable class can define public fields
    • immutable class is not required to be final
    • if there is any possibility of changing internal state, even state of objects referenced by objects of the class or calling of method that calls a method that calls method changing internal state, the class is not immutable. Even if at compile time no other code calls this method or holds reference to object that is part of internal state…
    • immutable class even can contain some private mutable fields, but the behavior of the class must be as if those fields always contain the same value. Creating immutable class with such fields is very tricky and error prone