Looking to find out if someone had faced this issue and know how to adjust the limit on ZipSecureFile.setMinInflateRatio() in the Unstructured Stage to fix the "Zip bomb detected!" error
We have been using the Unstructured Stage to process regular MS Excel files for a while without issues but recently while processing bigger files we ran into below error.
UnStr_MinorSubs_Xls,0: org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
at org.apache.poi.POIXMLFactory.createDocumentPart(POIXMLFactory.java:65)
at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:601)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:174)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:249)
at com.ibm.is.cc.unstructured.poi.SSFile.readFile(SSFile.java:146)
at com.ibm.is.cc.unstructured.api.ExcelFile.readFile(ExcelFile.java:143)
at com.ibm.is.cc.unstructured.api.ExcelFile.readFile(ExcelFile.java:127)
at com.ibm.is.cc.unstructured.runtime.impl.excel.read.ExcelFileHandler.initWorkbook(ExcelFileHandler.java:183)
at com.ibm.is.cc.unstructured.runtime.impl.excel.read.ExcelFileHandler.nextFile(ExcelFileHandler.java:245)
at com.ibm.is.cc.unstructured.runtime.impl.excel.read.ExcelProcessorRep.process(ExcelProcessorRep.java:100)
at com.ibm.is.cc.unstructured.UnstructuredProcessor.process(UnstructuredProcessor.java:167)
at com.ibm.is.cc.javastage.connector.CC_JavaAdapter.run(CC_JavaAdapter.java:443)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:86)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:58)
at java.lang.reflect.Constructor.newInstance(Constructor.java:542)
at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:56)
at org.apache.poi.POIXMLFactory.createDocumentPart(POIXMLFactory.java:62)
... 11 more
Caused by: java.io.IOException: Zip bomb detected! The file would exceed the max. ratio of compressed file size to the size of the expanded data. This may indicate that the file is used to inflate memory usage and thus could pose a security risk. You can adjust this limit via ZipSecureFile.setMinInflateRatio() if you need to work with files which exceed this limit. Counter: 821896, cis.counter: 8192, ratio: 0.009967197796314862Limits: MIN_INFLATE_RATIO: 0.01
at org.apache.poi.openxml4j.util.ZipSecureFile$ThresholdInputStream.advance(ZipSecureFile.java:258) ....
The error state that the limit could be set as follows : ZipSecureFile.setMinInflateRatio(0); but not where to do it. I've tried the custom properties on the stage to make the adjustment, and check all environment variables relate to the stage, and various possibilities like renaming the file to be regular zip files ( Excel files are zip files under the hood) and I've run out of ideas. Any help would be greatly appreciated.
Thanks in advance!
Julio Rodriguez
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
Were you there? probably the video is all over the internet now. I did pull the trigger and called them up, also the fire dept and they checked all my drawers, garbage can and the hidden places and they couldn't find the bomb...What did find was the bottle of clear liquid that we add to the coffee only on Friday's, so now they know why people come to my desk so frequently
I work for a Global financial institution you could imagine what was the drill around here...specially that it sound like a treat!
Julio Rodriguez
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
It turned out to be a mix of numeric and alphabetic values in a column that was expecting only numerics values. There were also non-printable characters.
After cleaning up the data by the user, the process fijished normal...still looking an official answer from the Big Blue on how to set up the limit
Thanks!
Julio Rodriguez
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses