Internally Hashset maintains buckets as given in the following diagram
Internally hashset maintains an array...
and each element points to list of objects
bucket[0] or bucket1 --- contains list of objects
bucket[1] or bucket2 ---contains list of objects.
Go through below content and then you will understand better about the bucket
concept I am talking about.
Internally storing an object in to hashset involves following steps
1)Generate hashcode of the object
2)And find the bucket to which the object to be inserted with following formula
This is only an example to show how hashset internally works....
Formula:
hashcode mod no of buckets
Example if hashcode of object is 11 and no of buckets is 5
11 modulus operator 5 = 1
3)Suppose if the generated bucket no for the object is 5.Hashset will check in bucket 5,
if there is any duplicate object using equals method.
4)If the equals method returns true on any existing object that means there is similar object
already and the object won't be added to set
5)Otherwise object gets added to set.
Important Points to remember for the rest of the discussion:
-------------------------------------------------------------
1)If you don't override the hashcode every object generates unique hashcode.
2)If you don't override the equals method equals method checks if the hashcode
of the objects are equal or not and returns true or false based on hashcode.
We will see an example with Student class
class Student{
String sname;
public Student(String sname){
this.sname = sname;
}
public Student(int sid){
}
}
Suppose we want to store 5 student objects in to hashset.
There are 2 students with same sid 1 in the below example.As you know we never want
want to add duplicate students (students having same sid) in to set.
Only one student with sid1 should be stored in hashset.
Student st1 = new Student(1); --Assume hashcode is 6
The bucket to in to which this object need to be stored is
hashcode mod noofbuckets = 6/5 =1.Calculate the bucket for the rest
using the same formula
Student st2 = new Student(2) -- Assume hashcode is 7
7/5 = second bucket
Student st3= new Student(3)-- Assume hashcode is 8
8/5=3 bucket
Student st4= new Student(4) -- Assume hashcode is 9
9/5=4th bucket
Student st5 = new Student(1)-- Assume hashcode is11
11/5 = 1st bucket
Now remember the process that happens while adding an object in
to hashset and place in to buckets.
Bucket1
St1 need to be stored here as per above formula
Bucket2
st2 lands here
Bucket3
st3 lands here
Bucket4
st4 lands here
Now the interesting question comes ?
st5 need to be stored in to hashset or not.Ideally it should not get stored
because st5 has sid 1 and is same as st1's sid in bucket1.
But as per the rules to add we will try to see if hashset prevents adding duplicate
student with same sid 1.
As per the first rule generate hashcode for st5,assume hashcode generated is 11,
so its bucket is 11 mod 5 =1st bucket
Next we have to see in first bucket if there is any object which is equal to
st5 using equals method.As you know since we didn't override
the equals method equals method checks hashcode of the st5 and the
existing object s1 in the bucket one
As st1 hashcode is 6 and st5 hashcode is 11 as they are not equal
poor hashset allows insertion of.... st5 which is same as st1 object
Oops................ Now you might have understood the importance of equals method
So..... You have to override the equals method in student such that....when 2 students
are having same sid it returns true...If you don't override equals,default equals mehod checks
the hashcode which is unique per object even if student ids are same.
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final StudentTo other = (StudentTo) obj;
if (this.sid != other.sid) {
return false;
}
return true;
}
After adding equals method in to Student Class
Now we will try to add student5(new Student(1)) in to bucket1 and see if it allows or not
First step find hashcode of student5.We assumed it is 11.
Find the bucket using hashcode mod noofbuckets 11 mod 5 =1
Second step check the st5 with st1 using equals method to check if
any equal object exists already or not...
Now the overridden equals method checks if sid in st1 and st5 is equal or
not.As both st5 and st1 are having same sid 1,equals method returns true and
hashset rejects insertion of st5 object.
Conclusion about importance of equals method:
-------------------------------------------------
If equals method is not overriden.. hashset couldn't find the duplicates
with equals because default equals method checks hashcodes and as
hashcode is unique for every object, equals method cant' find duplicate.
So we have to override the equals method to check based on student
name not based on hashcode so that equals method can find if any
duplicate student object in the bucket
2)Steps involved to find an object from HashSet:
-------------------------------------------------------------
1)Calculate hashcode
2)Calculate bucket from hashcode using formula hashcode mod noofbuckets
3)Check if there is any duplicate already in that bucket using
equals method.
Now we will see importance of the hashcode:
---------------------------------------------
Now we will try to check if student with sid 1 contains in hashset.
Let us create a student with sid 1 and try to find using
contains....
Student findStudent = new Student(1);
As you know if you don't override the hashcode method,hashcode generated will be unique for
every objet.Assume the hashcode generated is 13.
boolean isExisting= hset.contains(findStudent);--This will not find existing student with sid 1
in bucket1 because of the following.
Now recollect the 3 steps to find the object from hash set
1)Find the hashcode of the object you are trying to find...
the hashcode of student is 13
2)Find the bucket using 13 mod 5 so bucket is 3
3)It will check if there is any student with sid 1exists in bucket3....
and it won't find.....
OOPSSSSSSSSSSSSS
Eventhough there is student with sid 1 exists in bucket1,hashset find
procedure is unable to find becasue... the hashcode generated for findStudent
is generated as 13 and it lead to wrong bucket 13 mod 5 = 3.
So you have to override the hashcode such that.........
whenever 2 objects are equal it should return same hashcode.
Now override the hashcode such that it returns the same hashcode for
students which have same sid
public int hashCode() {
return this.sid *6;
}
After adding above method
repeat the steps to find the object
Find the hashcode of the findStudent now it will return hashcode as
6 using overridden hashcode method 1*6 = 6
2)Calculate the bucket with formula hashcode mod noofbuckets
= 6 mod 5 = 1
3)Now hashset tries to find if any object is existing with sid 6
using equals method....
so.. findStudent.equals(st1)-- returns true because of overriden method
so now the object will be found succesfully..
So finally
If equals method is not overriden for equal objects hashset can't prevent duplicates
If you don't override hashcode properly for equal object 2 problems can occur
1)object won't be found even if it is existing.
2)duplicate objects can be added
because suppose student1 (new Student(1)) sid is added in to bucket1--
Assume hashcode is 6.
and if you are trying to add another duplicate student (new Student(1))
with sid same as first student as you know while inserting,
hashcode of the object is used to find the bucket in to which hashset need to store the object
and as you didn't override the hashcode for equal objects for the second object it will generate another hashcode say 13 ,instead of 6 so hashset checks if the object with sid 1 exists or not in
bucket2(13 mod 5) and couldn't find the duplicate student with sid in 3 and duplicate get added
If you override the hashcode for the second object the hashcode generated will be
1*6=6 so bucket will be mapped correctly to first bucket (6 mod 5=1) and as there is
student with sid 1 already in bucket1 hashset reject the insertion
3) Two non equal objects can be stored in the same bucket if the bucket number generated
is same.In the following example you see that student1 and student2 can be stored in
same bucket even if they are not equal.This is the reason hashset can't simply generate
the bucket value for the object you are finding and return the object.It has to find the
correct object using equals method.
For example:
new Student(1); -- hashcode is 6(sid*6)
bucket formula = 6 mod 5 =1
so It will be stored in
bucket1 if there are not students with sid 1.
new Student(6);
hashcode = 36(6*6)
bucket = 36 mod 5 =1
So it will be stored in bucket1 if there are no students with sid 1.
No comments:
Post a Comment