I’ve found that many novice programmers are confused when it comes to reading files. I had to educate an apprentice today on this subject and I thought that I would put down a short summary of my lesson today.
To most people, there are many different types of files out there – videos, images, music, text and what nots.
All these do not mean anything to programmers. A programmer is only concerned with two types of files – text and binary. A text file is delimited by a carriage-return on Unices and a carriage-return line-feed on Windows. A binary file has no delimiter. Therefore, a text file is read in line by line while a binary file is read in blob by blob. In addition, it is possible to terminate a text file with an end-of-file marker (^Z) but that would not be possible in a binary file.
So, when trying to read a text file, it is possible to read in line by line until an EOF is encountered. However, there are a few different ways to read a binary file.
It is possible to keep reading the file in blobs until the number of bytes actually read is less than the size of a blob. This indicates that the end of file has been reached. This is the preferred method of reading binary files. It is also possible to detect the size of a file and keep track of it with a counter until it is decremented to zero. This requires careful management of the size counter and limits the size of the file to the maximum value for an integer.