org.apache.dubbo.common.utils.IOUtils 这个类看起来像是一个公共类,但实际上里面有坑,不要在你自己的项目里面使用这个类!
该类有一个 write() 方法,内容如下:
public static long write(InputStream is, OutputStream os, int bufferSize) throws IOException {
int read;
long total = 0;
byte[] buff = new byte[bufferSize];
while (is.available() > 0) { // !!!
read = is.read(buff, 0, buff.length);
if (read > 0) {
os.write(buff, 0, read);
total += read;
}
}
return total;
}
其中的 while 循环使用了 InputStream.available() 的返回值来作为结束判断条件,这个方法在网络环境当中是不能作为读取结束的依据的,因为当网络传输出现延迟时,该方法立刻会返回 0,导致循环结束。测试代码如下:
@Test
public void testWriteURL() throws Exception {
String url = "https://pan.baidu.com/static/images/16new/bg2.jpg";
long expectedSize = 77_949L;
InputStream inputStream = new URL(url).openStream();
ByteArrayOutputStream bos = new ByteArrayOutputStream();
long writeSize = org.apache.dubbo.common.utils.IOUtils.write(inputStream, bos);
assertThat(writeSize, equalTo(expectedSize)); // org.hamcrest.MatcherAssert
}
这个测试通常会失败,write() 方法在读取了部分内容后就结束循环并返回了。
正确的写法请参考 org.apache.commons.io.IOUtils#copy(java.io.InputStream, java.io.OutputStream) 的实现,对应的测试方法如下:
public static void main(String[] args) throws IOException {
String url = "https://pan.baidu.com/static/images/16new/bg2.jpg";
long expectedSize = 77_949L;
InputStream inputStream = new URL(url).openStream();
ByteArrayOutputStream bos = new ByteArrayOutputStream();
long writeSize = org.apache.commons.io.IOUtils.copy(inputStream, bos);
assertThat(writeSize, equalTo(expectedSize)); // org.hamcrest.MatcherAssert
}
上面这个方法就不会运行失败。
为什么这个问题一直没有暴露出来呢?因为这个方法在整个 dubbo 项目中只有一个地方用到了:org.apache.dubbo.common.io.Bytes#unzip(),而这个方法只用来处理内存中的字节串,而与网络无关。万一 dubbo 项目将来在某个地方用它来读取网络内容,问题就会暴露出来。
Thanks for the issue, would you please submit a pr ?
@tswstarplanet OK I'll create a PR.
could some one translate this for me in english. I tried but failed to it. Thanks
- [x] I have searched the issues of this repository and believe that this is not a duplicate.
- [x] I have checked the FAQ of this repository and believe that this is not a duplicate.
Environment
- Dubbo version: 2.7.1-SNAPSHOT
org.apache.dubbo.common.utils.IOUtilsit seems that this is a class for common use but there is a pitfall in it. DO NOT USE IT IN YOUR OWN PROJECT!Problem Description
There is a method called
write()in the class:public static long write(InputStream is, OutputStream os, int bufferSize) throws IOException { int read; long total = 0; byte[] buff = new byte[bufferSize]; while (is.available() > 0) { // !!! read = is.read(buff, 0, buff.length); if (read > 0) { os.write(buff, 0, read); total += read; } } return total; }which uses the returned value of
InputStream.available()as the break condition of the while loop, in network environment, it returns 0 immediately in case of network latency and thus is not applicable to work as the EOF mark of the stream. Test code as follows:@Test public void testWriteURL() throws Exception { String url = "https://pan.baidu.com/static/images/16new/bg2.jpg"; long expectedSize = 77_949L; InputStream inputStream = new URL(url).openStream(); ByteArrayOutputStream bos = new ByteArrayOutputStream(); long writeSize = org.apache.dubbo.common.utils.IOUtils.write(inputStream, bos); assertThat(writeSize, equalTo(expectedSize)); // org.hamcrest.MatcherAssert }The test will usually fail, method
write()returns after reading part of the content and the loop breaks.for a correct implementation please refer to
org.apache.commons.io.IOUtils#copy(java.io.InputStream, java.io.OutputStream), and corresponding test code is:public static void main(String[] args) throws IOException { String url = "https://pan.baidu.com/static/images/16new/bg2.jpg"; long expectedSize = 77_949L; InputStream inputStream = new URL(url).openStream(); ByteArrayOutputStream bos = new ByteArrayOutputStream(); long writeSize = org.apache.commons.io.IOUtils.copy(inputStream, bos); assertThat(writeSize, equalTo(expectedSize)); // org.hamcrest.MatcherAssert }the test above passes
why this issue didn't expose up to now? Because it is only used once in
org.apache.dubbo.common.io.Bytes#unzip()within the whole project, and is only used to deal with data in memory. If we use it to read data from network in the future in some places, problem occurs.
@khanimteyaz
Most helpful comment