Introduction to Java I/O

4 min readJun 17, 2023

Have you ever wondered what the OutputStream and InputStream classes are? Let’s find out.

I/O stands for Input and Output. Java has a rich set of I/O classes in the core API — especially in the java.io package.

I/O in Java is divided into two:

Byte and number oriented I/O which is handled by input and output streams.
Character and text I/O which is handled by readers and writers.

Streams

A stream is an ordered sequence of bytes of undetermined length. There are 2 types of streams

Input streams — They move data into a Java program from some external source.
Output stream — They move bytes of data from Java to an external target.

An input stream may read from a finite source of bytes like a file or an infinite source of bytes like System.in

Where do streams come from

Streams may come from various sources among them:

System.in
Files
Network connections

The Stream Classes

There are 2 main stream classes:

OutputStream
InputStream

They are abstract base classes for many different sub classes with more abilities including

BufferedInputStream
ByteArrayInputStream
DataInputStream
FileInputStream
FilterInputStream
LineNumberInputStream
ObjectOutputStream
PipedOutputStream
PushbackInputStream
StringBufferInputStream
BufferedOutputStream
ByteArrayOutputStream
DataOutputStream
FileOutputStream
FilterOutputStream
ObjectInputStream
PipedInputStream
PrintStream
SequenceInputStream

Input streams read bytes and output stream write bytes. Readers read characters and writers write characters.

To understand input and output streams we need a solid understanding of how Java deals with bytes, integers, characters and other primitive data types and when and why one is converted into another.

Integer Data

Most common integer data type in Java is the int, a 32-bit, big-endian, two’s complement integer. Takes values between -2,147,483,648 and 2,147,483,647.

Longs are 64-bit, big-endian, two’s complement integers. Takes values between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807.

Shorts are 16-bit big-endian two’s complement integers with ranges between -32,768 and 32,767

Bytes are 8-bit two’s complement integer that ranges from -128 to 127. A byte too is signed.

By default a literal like 1245 is an int. If we were to convert this to a byte we would need to truncate the higher order bits. You can use the bitwise operations as follows:

int & 0x000000ff;

Character Data

Computers only understand numbers.

When dealing with characters, we need to map integers to characters. In ASCII for example, character Z is mapped to 90.

Different encodings have different mappings.

ASCII

It is a seven bit character set.

Defines 2⁷ or 128 different characters. These characters are sufficient for handling most of American English and make approximations for most of European languages.

ISO Latin-1

It is an eight bit character set.

Defines 2⁸ or 256 characters. First 128 characters correspond to ASCII. They diverge from 128 to 255

Provides just enough characters to write most Western Europe languages.

Unicode

ISO Latin-1 suffices for most Western European languages but does not work for Greek, Arabic, Hebrew, Persian languages.

Unicode is a 16 bit character set. Defines 2¹⁶ — 65536 different possible characters — only about 40000 are used.

First 256 characters correspond to ISO Latin-1.

You must have realized streams do not work fine for this. Streams are designed to read one byte at a time but this is 2 bytes. This is why we have readers and writers. Without readers and writers you multiply the first byte by 256 then add it to the second byte read and cast the result to a char.

Readers handle the conversion of bytes in one character set to Java chars without any extra effort. For similar reasons, you should use a writer rather than an output stream to write text.

UTF-8

Unicode is a relatively inefficient encoding when most of your text consists of ASCII characters. Every character requires the same number of bytes — two — even though some characters are used much more frequently than others. A more efficient encoding would use fewer bits for the more common characters. This is what UTF-8 does.

In UTF-8 the ASCII alphabet is encoded using a single byte, just as in ASCII. The next 1,919 characters are encoded in two bytes. The remaining Unicode characters are encoded in three bytes. However, since these three-byte characters are relatively uncommon, especially in English text, the savings achieved by encoding ASCII in a single byte more than makes up for it.

I hope this explains to some extend what Streams are and why readers and writers are needed.

We shall cover Output Streams next.