We all know they exist and they are a pain in the neck. The best way to learn from them is to let everyone around you know about them as well.
This one is for Java. The Set.add() javadoc states:
More formally, adds the specified element, o, to this set if this set contains no element e such that (o==null ? e==null : o.equals(e))
The gotcha here is that an implementation I am using: HashSet, will only consult the equals method if the two objects have the same hash code.
I’m using: org.apache.commons.lang.builder.EqualsBuilder and org.apache.commons.lang.builder.HashCodeBuilder to keep the hash code and equals code nice and clean.
1. Fuzz | April 26th, 2007 at 3:15 pm
You posted about Big Brother being your project… then a gotcha’s section…
So can we assume that the security issues they were having may have been your bug?
Nice site anyway
2. Zooba | April 26th, 2007 at 7:26 pm
A hash code is (intended as) a unique fingerprint for the contents, ie. no two different objects should have the same hash code. An easy optimisation is to compare hash codes for equality before doing a deep compare.
The only way this is a gotcha is if the equals method has side-effects. In this case, the gotcha is worse, because the equals method will be called for each item which already exists in the set.
Not convinced…
3. pimaster | April 26th, 2007 at 8:12 pm
Zooba, I was just trying to point out that the HashSet does not follow what the javadoc says it is meant to do.
I know that it is fair for the HashSet to check the hashcode first, but it would have been nice if it was documented.
Fuzz. I was part of the team that worked on BB. The security issue is not related to this gotcha. It was a small issue with the latest videos that are shown on the site. I also helped with the news section and the searching. I was also working on some of the caching which keeps the site snappy.
Unfortunately I don’t think I’m at liberty to talk about the security issue. Big problem, simple cause.
4. Zooba | April 26th, 2007 at 10:33 pm
Your sample of javadoc is from the Set class, not the HashSet class. The HashSet documentation for add simply says:
“Adds the specified element to this set if it is not already present.”
The small code snippet is simply a convenient way of explaining to programmers how the algorithm works without resorting to an overly verbose description. Personally, I think the only gotcha here is that the comment has been interpreted as the actual implementation.
5. pimaster | April 26th, 2007 at 10:50 pm
Isn’t that the definition of a gotcha, thinking the documentation reflects the implementation?
6. Zooba | April 26th, 2007 at 11:53 pm
It does reflect the implementation. The snippet “(o==null ? e==null : o.equals(e))” reflects the condition for an item being added is that there isn’t an item already in there which is equal to it. The fact that this condition is expressed in Java does not mean that the actual code looks like this:
[java,N]if(o==null ? e==null : o.equals(e))
{
// add item
}
That is a case of the documentation being the implementation. It is foolish to assume that documentation requires no interpretation. Even more so when the documentation belongs to an interface rather than the class in question.
7. Zooba | April 26th, 2007 at 11:56 pm
Well, apparently I don’t remember how to get syntax highlighting to work correctly. I also accidentally negated the action. Hopefully when the condition is true the action will be to stop searching through the set and return false. The rest of my comment is valid, it is only the comment in the code that is incorrect.