Saturday, September 12, 2009

Combine hashCode and equals

Moving further to see how hashCode and equals are combined in Java Code. Consider simple Java classes down which are Employee, EmployeeId and HashCodeTest. I will be using EmployeeId class Object as my key and Employee class Object will be used as values in HashCodeTest (which is mine main class) and creates a HashMap to store key Object (EmployeId) and value Object (Employee).

class Employee {
private String name;
public Employee(String name){
this.name = name;
}}

class EmployeeId {
private String id;
public EmployeeId(String id){
this.id = id;
}
public String toString(){
return id;
}}

public class HashCodeTest{
public static void main(String[] args) {
Map employees = new HashMap();
employees.put(new EmployeeId("111"), new Employee("Top Gun"));
employees.put(new EmployeeId("222"), new Employee("Reader")); // Line A
employees.put(new EmployeeId("333"), new Employee("Million Dollar baby"));
Employee emp = employees.get(new EmployeeId("222")); // Line B
System.out.println(emp); // Line C
}

When we run this program the output at Line C will result in "null". Reason-> EmployeeId added at Line A is not the same EmployeeId, because our put creates a new Key with data as "222" and when we call get it creates a different EmployeeId at Line C. There is no way to tell or identify heap that the Object which we have put at Line A is same as the Object we want to get at Line B. Both are different Objects on heap, and default equals implementation (from Object class) will check for "==" comparison.
To overcome this we have to implement equals in our Key Object class which is EmployeeId i.e. Object identity.

class EmployeeId{
/* All the above implementation remains same, Adding equals to EmployeeId */
@Override
public boolean equals(Object obj){
if(obj == null)
return false;
if(obj.getClass() != getClass()){
return false;
}
EmployeeId empId = (EmployeeId)obj;
if(this.id == empId.id){
return true;
}
return false;
}}

We did a great job by implementing equals and now our code will pass equals and the key Object which we have put at Line A and get we are calling at Line B both are same. Run our HashCodeTest program, surprisingly result at Line C is still "null". HashMap is not perfroming as expected, hashing our object and returing the same value in constant time complexity is not working. What's wrong ?
The problem is Hashing is not working, Object class default hashCode implementation returns 32- bit memroy address as integer. The Object key at Line A is hashed to a different bucket (memory address) and when we call get at Line B it passes equals check, but this Key Object points to a different bucket and there is no value associated with the new bucket, which again results in "null".
We should also implement hashCode in our EmployeeId class to overcome this problem. HashCode implementation will retrun a code which is used for identifying bucket index.

class EmployeeId{
/* All the above implementation remains same, Adding hashCode to EmployeeId */
@Override
public int hashCode(){
return id.hashCode();
}}

Now run our HashCodeTest, Line C will print "Reader". Awesome, Our Key object has passed Object Identity check and is hashed to proper bucket.

Conclusions :-
1. equals and hashCode are mandatory requirement for your Objects.
2. If two objects are equal, their hash code must return same value.
3. We can even return some constant value from hashCode implementation. But in the case where Object identity fails (2 Objects are not equal), this causes a bucket collision.

Purpose of hashCode

Have you ever thought how hashing is implemented in Java. When we call put(key, value) for storage and get(key) for reterival, this get and put API's are from HashMap/HashTable Java classes. Dive one more level, these are classical data structure, where main advantage of using them is will get O(1) or O(constant) time complexity during storage and retreival. So how it works, mapping (indexing) your key to a bucket (storage area) which stores value. Now how this will be done in an Object oriented programing language like Java where your keys can itself be object. While the Java language does not provide direct support for associative arrays -- arrays that can take object as an index. :). The presence of the hashCode() method in the root Object class clearly anticipates the use of HashMap/HashTable. This supports hashing directly in the object model, facilitates the development and use of hash-based containers.

Consider default hashCode() which is provided by Object class in Java, this method uses the 32-bit internal JVM address of the object as hashCode. But hashCode methods returns integer, on JVM it is stored as hex. Now consider HashMap as array of arrays (or combination of arrays and linked list) . When we store a key/value pair for a class a specific subarray id is utilized. Those arrays are pointed by these addresses which are in hex form and when we access any element JVM uses the same hashCode or hex to refer where the values are stored. In hashMap, hashTable keys are used to compute hashCode.

We can conclude that all objects are bind to a memory location on heap, so JVM will create 32-bit hashCode for all objects and returns to us in the form interger memory location.
We will see more by diving. Currently I will put some thoughts around, which will help to understand more why we have to overrride hashCode from Object class.

1) Is hashCode a unique value for all Objects. If its a memory address ?
2) What happens to the hashCode, in case garbage collection shifts or swaps memroy addresses.

HashCode is not a unique identifier, as there can be two objects of same class which are equal and they should return same hashCode. This kind of restriciton is forced when we Override hashCode and equals to support efficient insertion and efficient retreival. We can also return a constant value from hashCode implementaiton. We will see this by an example further.

If the object is moved in memory during garbage collection, hashCode stays constant, this default hashCode is not very useful since to lookup an object in a hashMap we need the exact same key Object by which the key/value pair orignally filled. Normally, when we go to look up we don't have original key Object itself, we have just have same data for a key.

So if we have key as String or Wrapper classes we don't have to implement hashCode, because hashCode implementation for these classes returns thier exact value in hashCode (check thier implementation to get a hang on it), If lookup (retreival) passes equals checks it will hash to correct bucket, where these values are stored.

Define now the purpose of hashCode:-
This method is used to get the hash value(an integer) of an object. What is the purpose of this hashcode/number ? This number is used to place/retrieve the object in its corresponding bucket (Used in hashing algorithms used by HashTable & Hashmap datastructures) .

What happens if hashcode () returns different value for the same object on the same run. Object will be placed in the hashtable and it is not possible to find it. Because hashcode of the key is used to place the object in the bucket. If while placing hashcode is 1 it will be placed in bucket_1. If it returns 2 while retrieval search for an object will be in bucket_2 instead of bucket_1 where it is actually placed. And hence search will result in empty(or null).

But this is not a problem between runs. First time object's hashcode would be 10 in the first run and 67 in the second run. This difference doesn't affect. But during same run the value should be always same.

Friday, September 11, 2009

Thought on equals

Consider this class and we will provide an equals method for it.

Class A{
String test;
Integer i;

public A(String str, Integer i){
this.test = str; this. i = i;
}

@Override
public boolean equals(Object obj) {
if (obj == null)return false;
if (obj instanceof A) {
A a= (A) obj;
if (this.test==a.test && this.i==a.i)
return true;
return false;
}
else
return false;
}}}

Now consider Object creation for A class

A a = new (new String("testing"), new Integer(1));
A a1 = new (new String("testing"), new Integer(1));

Both objects a and a1 have same data values for String test and Integer i now when we call

a==a1 --> Result is false which is correct because a and a1 are two references pointing to two different memory locations. And "==" checks for object references and not equality of contents, Its the responsibility of class equals method to declare 2 objects are equal when they have same content. So calling

a.equals(a1) --> This also returns false. Dangerously wrong. Here there are 2 culprits:-

1. The creation of Object

A a = new A(new String("test"), new Integer(1) );
A a1 = new A(new String("test"), new Integer(1) );

Both objects a and a1 creates new String and new Integer, which will never be equal when we call "==" (reference check). Hence result in a false comparison. This can be corrected by creating objects as:-

A a = new A("test",1);
A a1 = new A("test",1);

Now a.equals(a1) will result in true. So your caller should be aware, while creating Objects.

2. Equals should be Overriden to check contents, instead depending on the object/caller. Now modifying code so that it can work in both the case of object creation.

if (this.test.equals(a.test) && this.i.equals(a.i)).

now if we invoke a.equals(a1) will result in true, in either case. The reason that String class equals compare exact character sequence which is data content and Integer class equals compares the exact value in the implementation.

So calling equals on the primitive wrappers or String class will ensure that it will compare data sequences.

As part of good practice avoid using wrapper classes as member variable, use primitves for which "==" will check for code values, even if the object creation happens as new Integer(1) in constructor. This will again fall in the category of unboxing Integer to int.

Some thought on "=="

In terms of java objects, "==" comparison means two references pointing to the same object instance. Now consider Immutable classes (immutable :? Well immutable class is a class whose objects can't be modified once created or state can't be changed, they don't expose any setter methods). String and all wrapper classes are immutable in java (wrapper:? Well wrapper classes are Integer for int, Float for float, Double for double etc ).

Some code snippet :-

String i = "test"; String i1 = "test"; String j = new ("test"); String j1 = new ("test");

Now compare:-

1) i==i1 2) j==j1 3) i==j.

1- is true, 2 and 3 are false. As String is a class in java, so when we create a String it will create a String variable which points to "test". So i and i1 should be 2 different references and would have returned false. Here's the reason when we create i="test", this gets created on method area and when we initalize another i1="test", before creating a new memory allocation on method area content checking is done to identify wether "test" String exist. If a match is found i1 points to the same "test" to which i was pointing too, resulting i==i1, true.

When we create the same String j = new String("test"). This creates a new object and memory allocation happens on heap. So if we compare j==j1, which are two different memory location results in false.

Considering the same scenario for wrapper classes, I am taking int and Integer

int i=1; int i1=1; Integer j = new Integer(1); Integer j1= new Integer(1);

Now Compare :-

1) i==i1 2) j==j1 3) i==j

Here 1,3 are true and 2 is false. Reason for having 2 is false because we have created two new Objects and comparing thier references which points to two different memory locations.

Now int is a prmitive type and is not a classs. For comparing two primitives is same as comparing two values, but comparing primitives and wrapper class type it return true and that's the catch for autoboxing and unboxing. Converting int to an Integer is called autoboxing and Integer to an int is called unboxing. This is implemented in Integer class something called as IntegerCache to support the object identity semantics of autoboxing for values.