home | projects | develop | info | Zip: filenames with non Ascii chars |
ZipFile works unexpected since jdk1.2 - Diagnosis and Solutions |
develop java-snippets clipboard & html zip: only ascii graphic: bezier tools |
Introduction
You can't get the contents of a ZipEntry which name contains 'umlauts'
(precisely any char>127) with Java 1.2 - 1.4. Symptoms
If you let ZipFile enumerate the entries, you get all names maybe wrong encoded, but you get them.
If you try to get a ZipInputStream from such an entry
(with chars>127 in the name), ZipFile returns null.
There is no chance.
Diagnosis
My first (wrong) idea - some years ago - was that Sun have
more than one idea how to code filenames in zips. An extra difficulty (i saw in windows): if you have names with chars which are encoded in iso8859-1 format (e.g. info-zip creates those) and others which are encoded as utf-8, those ZipEntries may have the same (Java String) names, so you can get a Stream only for one. Solutions
I have 3 ones, the first is the best. Solution 1
Use the pure-java-zip classes, e.g. from
jazzlib.sourceforge.net they are free for use in every
product, also in non GPL ones. Be aware of some bugs, and there are differences to the jdk classes too.
Some differences: entries are enumerated in a random way; the ZipEntry class handles some extrainfos.
PerformanceI tested only the decompression, which needed around 2.5 times longer than the native implementation (Jre1.4.1/Win and jazzlib 0.0.6). Get the test code here. I found jazzlib/gcj-native-compiled another 2.5 times slower than jazzlib/Jre1.4.1. Mixed useYou can use the jazzlib classes to read the zip-structure and the native zlib from the JDK to decompress. This way you get best performance and most tested code. Here are the needed changes to use the native zlib. Just insert some import statements and change the deprecated API call that converts name (byte[]) to String. Put the three files 'ZipConstants.java ZipEntry.java ZipFile.java' in one folder (you may change package names) and add these imports: //ZipEntry.java import java.util.zip.ZipOutputStream; //ZipFile.java import java.util.zip.Inflater; import java.util.zip.InflaterInputStream; import java.util.zip.ZipException; import java.util.zip.ZipOutputStream;
In the description of Solution 2When opening a zip put all filenames in a hashmap, where the value is the 'index' from the native-zip-entry (look into the sources). Do this in the constructor. Now just overwrite ZipFile.getEntry(), there lookup the entry in the hashmap, done! Ups, to much private. RecipeHacking ZipFile. You have to patch java.util.zip.ZipFile. The changes follow bellow. After compiling the resulting jar in the BootClassPath. e.g. 'java ... -Xbootclasspath/p:zippatch1.3.jar ...' A special classloader would be nice, but to load java.* packages is forbidden. Side effectsPerformance may decrease with big archives. It is not known to me that any bug appears, but you never know. PatchHere are the three places to change. I tested it with 1.2 up to 1.4. 1. Add a hashtable/map (HashMap isn't synchronized) public class ZipFile implements ZipConstants { private java.util.Hashtable entries; 2. Insert this Block at the end of public ZipFile(File file, int mode) throws IOException { // JDK 1.3, 1.4 public ZipFile(String name) throws IOException { // JDK 1.2 ... entries = new java.util.Hashtable((int)(this.total*1.1), 0.95f); for (int i = 0;i < total;i++){ long jzentry = getNextEntry(jzfile, i); if (jzentry == 0) throw new InternalError("jzentry == 0"); ZipEntry ze = new ZipEntry(jzentry); entries.put(ze.name, new Integer(i)); freeEntry(jzfile, jzentry); // not in jdk1.2.2 } 3. Replace method getEntry(long, String) // private static native long getEntry(long jzfile, String name); private long getEntry(long jzfile, String name){ Integer i = (Integer)entries.get(name); if (i==null) return 0L; long jzentry = getNextEntry(jzfile, i.intValue()); //as you like: if (jzentry == 0) {throw new InternalError("jzentry == 0");} return jzentry; }
Solution 3Is nearly the same as Nr2. But we don't change ZipFile, we extend it and get access to the private methods/fields/classes with reflection. So we can use one class for all JDKs and don't need the BootClassPath. It may not work if security is involved - so it surely doesn't work in an applet. ZipPatch.java |
© July 2003 Peter Büttner