Session 28: Files

Textbook: Sections 5.1 and 5.2

Files (Section 5.1)
  types
  access types
  operations
Directories (Section 5.2)
  path names
  file attributes
  operations

Just as multiprocessing is the way that the OS allocates the CPU among processes, and pages and/or segments are the way that the OS allocates memory among processes, so too are files are the primary way to share the disk device among processes. And, among these, files are the most complex, as the OS tends to give individual programs a lot of control over the files they read. So the file system (the name for this particular subsystem of the overall operating system) is one thing we can certainly not neglect.

Files

Types

There are many kinds of files, but we can characterize them in a rough hierarchy.

               ______files______
              /        |        \
regular files    directories     special files
    /   \                           /     \
binary  text                  character  block

The regular files are what you typically think of as files. They are divided between binary files and text files. This division between binary files and text files is not particularly strong - it's just a matter of what the file happens to contain. A text file contains only characters (typically encoded in ASCII format), divided up into lines. (In Unix, the ASCII newline character is used to divide lines. On Windows systems, a pair of ASCII characters (the carriage return followed by the newline) is used. And on Macintosh systems, the carriage return is used alone to separate lines.) Binary files include everything that isn't a text file - that is, any file that contains any non-character data. In a text file, a number would be represented as a sequence of ASCII digits; in a binary file, the binary representation would typically appear directly in the file.

A directory is a special type of file that lists the locations of other files.

The special file is a file that represents another entity outside the disk. Unix, for example, uses files to represent devices - and so the special files are split between character special files (to represent character devices, like terminals or printers) and block special files (to represent block devices, like disks). On other operating systems, a special file might represent a user, a font, or a dial-up connection, even though these may not actually be files on the disk as such.

We'll be discussing regular files and directories, but special files are a detail that we won't be discussing much.

Access types

Operating systems typically support two techniques for reading files: sequential access and random access. A sequential-access file involves reading through the file from beginning to end. You never back up, and you can only step forward one byte at a time. It's by far the most common technique, and it's probably the only one most students have ever had the occassion to try.

In larger programs, however - especially ones that deal with larger files - it's essential that the program be able to access any part of the file at any time. A particularly important such program is a database, which may be dealing with a file that's hundreds of megabytes or even gigabytes long. Reading the entire file into memory isn't a viable prospect, but people want to be able to quickly get to any record at any time. The large size prohibits the program from going through the entire file for each database query. Some form of random access is essential.

Typically, random-access files will involve some form of fixed-length data - called a record. If each record is n bytes long, and I want to get the ith record, then it's straightforward to compute where to find the record: at byte n(i - 1).

Text files, on the other hand, usually don't have fixed-length divisions. They are divided into lines - which often behave similarly to records - but lines can typically be of different lengths.

But there are exceptions to both rules: random-access files can work around the fixed-length restrictions, and text files do occasionally have fixed-length lines.

Operations

Operating systems typically provide a variety of functions for programs to use for accessing data in a file.

  1. Create for creating a new, empty file that did not exist previously.

  2. Open for starting up access to a file. Typically, the opener might designate whether the file will be read or changed. In the latter case, there is often an Append option, so that changes are put at the end of the file.

  3. Close for freeing resources related to an open file.

  4. Read for retrieving information from a file.

  5. Write for putting information into a file.

  6. Seek for moving the file pointer to a different location in the file. This function provides the random-access capability.

Directories

Path names

In the most common modern operating systems, files are arranged in a hierarchy. This gives a way of organizing the vast number of files you find in a typical system. (Personally, I have over 50,000 files just under my home directory! And of course there are many more users than just me using the computers.)

It wasn't always that way - many old systems would have just a single directory. And that was fine when a disk could hold around 200KB. But nowadays, disks are much larger, and as a result people have many more files to store.

The operating system gives each file a different name, arranged under directories. You might find the following file in your system, for example.

/usr/people/classes/writeups/LabStuff/350/PROJECTS/Project6/server.C
That's designating the file named server.C, located in the Project6 subdirectory of the PROJECTS subdirectory of the 350 subdirectory of the LabStuff subdirectory of the writeups subdirectory of the classes subdirectory of the people subdirectory of the usr subdirectory of the root directory of the disk. (I should have chosen a file higher in the hierarchy!) Such filenames, relative to a base directory, are called an absolute path name.

Every operating system seems to have to discover its own character for separating directory names in a path name. Unix uses the slash `/'; MS-DOS and later Windows adapted the backslash `\'; Macintosh uses the colon `:'; and MULTICS used `>'. In all these operating systems, starting a path name with the directory name separator makes the path absolute.

Other path names are relative path names. The operating system tracks the process's current working directory, and a relative path is taken starting at this location. For example, if the current working directory is /usr/include then the path name ``sys/types.h'' refers to the file whose absolute path name is /usr/include/sys/types.h.

Unix also has each directory contain two special non-removable directories, named ``.'' (referring to the directory in question) and ``..'' (referring to the parent of the directory in question. So if the current working directory is /usr/include, the relative path name ../bin/ls refers the file whose absolute path name is /usr/bin/ls (since the .. of the path refers to the parent of the working directory, which is /usr.

File attributes

For each file in a directory, the directory file tracks a number of pieces of information about the file. The minimum information would be the file's name and which block on the disk it occupies. But typically it also includes additional information about the file, called the file's attributes.

When you execute ls on a file, it lists a file's name. But the command also allows you to view attributes of the file, by placing a -l option on it. (The `l' stands for long, since it provides much more information than normal.)

% ls -l /usr/bin/ls
-r-xr-xr-x   1 root     bin        18844 Jan  5  2000 /usr/bin/ls*
This gives you an idea of some of the attributes Unix tracks:

Operations

Of course, an operating system with directories must give the programmer ways to work with directories. The first group allows you to work with the files listed in a directory.

In Unix, these are the most important functions for working with directories. When you want to delete an entry from a directory, you can unlink it. When you want to rename a file, you would create a new link for the new name to the same file, and then you would unlink the old name.