Understanding Files & Directories in Linux

This is an oldie but goodie. Although it references UNIX, the basic concepts of working in the Shell on Linux is little changed.

In UNIX, almost everything can be thought of as a file, even physical devices and processes. As long as information can be read from or written to them they are treated the same in many ways.

The UNIX file system is hierarchical. Every file is in a directory. Every directory is contained within some other directory except for the highest-level directory (the root directory) whose designation is the forward slash /

When describing the location of a file within the directory structure, the names of directories are separated with slashes. Files are owned by users. A user can determine what access she and others will have to files and directories that she owns.

Directories

The Working Directory

Whenever you are interacting with the shell, you are in what is called the working directory. This is your virtual location on the UNIX file system you are logged in to. To determine the working directory use the pwd command. If the current working directory was /var/spool, the command pwd would display:

prompt> pwd
/var/spool

When you login, your current directory will be your home directory. This is a directory that is owned by you. You can create files or directories in it, allow or deny access to others, or remove file and directories. On our UNIX system, users’ home directories are of the form /u/username.

Path

A path is a sequence of directories that can be specified either in absolute terms or relative to the current working directory. An absolute path always starts at the root. A relative path starts from your current working directory, or from your home directory. Relative paths maybe specified with combinations of directory names and the following:

The dot “.” specifies the current working directory.
Two dots “..” specifies the directory above the current directory (the parent directory).
The tilde “~” specifies a user’s home directory, or in an expression of the form ~userx the home directory of userx.

For example, given this absolute path: /u/shazbot/school/classes/CS202/assignments/, if the current working directory of the user shazbot is /u/shazbot/school/classes/, and if shazbot wanted to specify the assignments directory, then the following are all equivalent:

/u/shazbot/school/classes/CS202/assignments/
CS202/assignments
./CS202/assignments
../classes/CS202/assignments/
~/school/classes/CS202/assignments/

Given the previous example, if shazbot wanted to specify her home directory, then the following are all equivalent:

/u/shazbot/
~
~/
../..
./../../

Changing Directories – cd

To change your working directory use the cd command. To change to a directory, simply enter at the prompt:

prompt> cd path/directoryname

If no directory is specified you will be returned to your home directory.

Making and Removing Directories – mkdir/rmdir

Directories can be created with the mkdir command. The directory name may be specified absolutely or it is assumed to be relative to the current working directory.

prompt> mkdir directoryname

If a directory is empty, it may be removed with rmdir.

prompt> rmdir directoryname

Groups

Every user belongs to one or more groups. Groups and their users are specified in the file /etc/group. Students, are in the group “them.” Files and directories that you create are associated with your group. It is possible to belong to more than one group. A student that is also a systems administrator and who is part of a special project called X might be in the three groups: them, root, and projectX. If she wanted the directory Xpapers to be associated with the group projectX, she could change the group identification of it with the command chgrp.

prompt> chgrp projectX Xpapers

Permissions

Every file or directory has a mode or permissions associated with it. The mode consists of ten characters (e.g. -rwxr-xr-x). The first character represents the file type (e.g. d for a directory, or – for an ordinary file); the next three characters represent the permissions of the owner of the file; the three after that represent the permissions of the groups; and the last three represent the permissions of others.

The permissions of the owner of the file, members of the group associated with the file, and the rest of the world others are represented by the letters r, w, and x – (r)ead from, (w)rite to, or e(x)ecute the file.

For instance, a file with the mode -r–r–r– can be read by everyone(the owner, the group, the rest of the world). -rwxr-xr-x means that the file can be read or executed by members of the group and the rest of the world, but only the owner can write to it. Permissions can be altered with the chmod command by either the owner of the file or the owner of the directory.

chmod [mode] filename
MODES:

u ~ user’s permissions
g ~ group’s permissions
o ~ other’s permissions
a ~ all permissions (user, group, and other)
+ ~ add permissions
– ~ remove permissions

prompt> chmod go+rx foobar

The file named foobar has its mode changed such that group and others get the read and execute permissions added.

If one or more of the files is a directory and the -R option is set, chmod will recursively apply the change to everything beneath that directory. Read and execute commands are fairly straightforward for regular files. They are a little different for directories:

If a directory is not readable, its contents cannot be listed.
If it is not executable, you cannot cd into it.

Caution: you can set permissions such that you may not be able to see the contents of your own directories.

File Names

File names in UNIX are case sensitive. KRUFT, kruft, and Kruft are three different files. While you may use almost any character in a filename, it is considered wise to use only letters, digits, underscores, periods, and dashes. Many of the others are either non-printing or have other meanings in the context of the shell that can lead to trouble.

Chief among these characters that may have unintended effects are the *, ?, [, ], {, and } characters.

The * character stands for any string of zero or more characters. So U*NIX would match UNIX, USENIX, and USER_isVERYfondOFUNIX.

? matches any single character. ls U?NIX, for instance, would not match UNIX, but ls U?IX would.

Square brackets [ and ] match any characters within them. For instance ls *.[cs] would list all files ending with .c or .s

Curly brackets { and } match all strings containing the comma-separated items of the list inside the brackets. ls *.{gif,GIF,jpg,JPG} would match all files ending with .gif, .GIF, .jpg, or .JPG.

Basic File Manipulation

Listing Files and Directories – ls

One of the most useful commands is ls, which lists the contents of a directory. When used without other options ls lists the contents of the current working directory. When a directory is specified, the contents of that directory will be listed.

Users can see what type of file (directory, link, ordinary file, etc.) is being listed with the -F option. The -l option shows even more information about the files.

Files beginning with a period (.filename) are known as “dot files” or “invisible files” because they do not appear in the listing unless the a option is used.

Suppose a directory contains the files .alamode, anchovy, curry, and melba

prompt> ls
anchovy curry melba
prompt> ls a*
anchovy
prompt> ls a a*
.alamode anchovy

Moving Files – mv

The name or location of a file can be changed with the mv command.

prompt> mv sourcefile destinationfile

The command mv foo tmp/bar would move the file foo to the /tmp directory and rename it bar

If the source is a file and the destination is a file that does not exist, the source file will be renamed.
If the destination is the same as a file that already exists, the existing file will be overwritten by the source file.
If the source is one or more files, and the destination is a directory, the files will be moved there.
If the source is a directory and the destination does not yet exist, the source will be renamed with the new name.
If the destination already exists, the source directory will become a subdirectory of it.

Copying Files – cp

Files can be copied with the cp command. To copy fileA to fileB, do the following command:

prompt>  cp fileA fileB

The contents of fileA will be copied to fileB. If fileB already exists, then its contents will be replaced. Otherwise, fileB will be created and will contain a copy the contents of fileA.

To copy the contents of a directory, use the -r option with the cp command. For instance, to copy the contents in dirA to dirB, type the following command:

prompt>  cp -r dirA/* dirB

The contents of dirA will be copied to dirB. If dirB already exists, then the contents of dirA will be added to dirB. Otherwise, you will get an error that says “cp: dirB not found.”

Removing Files – rm

One or more files can be deleted with the rm command.

prompt>  rm filename1 filename2 ... filenameX

To remove a file, you do not need to have write permission on the file itself, but you do need to have write permissions on the directory it is in.

The f option removes write-protected files without a prompt or warning.
When the i option is used, the user is prompted for each file and asked to verify deletion.
If the r option is used and if filename is a directory, the directory and its all of its files, subdirectories, and subdirectory files will be recursively removed.

Needless to say, commands such as rm rf * can cause severe damage.

Links – ln

A link allows a file to be accessed by a different name. Links are created with the ln command (and removed with the rm command).

prompt>  ln sourcefile linkname

This creates a new file name linkname that can access the data in the file specified by sourcefile. However, a link is not a copy of the file but simply another name for it. In fact, you could rename a file by creating a link to the file and removing the original file name.

prompt>  ln foo bar
prompt>  rm foo

A new name (bar) is created for the original file (foo), then foo is removed and the only name for the file is the new name bar (Note: this will not work with a symbolic link).

Links cannot be created across file systems. If you need to do so, use the s option. It will create a special kind of link (called a symbolic link) that will work.

Displaying and Printing Files to the Screen

Files may be displayed on the screen in several ways.

Displaying Entire Files – cat

An entire file may be written to the screen with the cat command. However, large files will often overflow the terminal window buffer, which means when you send it to the screen, you won’t be able to see all of the data even if you try to scroll back.

cat is used to concatenate files together through stdout (standard output, which is printed to the screen by default). The command cat can be used as follows:

prompt>  cat fileA {files}

{…} means 0 or more times. Thus, the cat command can take 1 or more files as arguments, and display the contents of all the listed files one after the other. Or you can redirect the stdout to a file:

prompt>  cat fileA {files} > fileB

This will overwrite or create fileB with all of the data from fileA followed by the data from the files specified in {files}.

Displaying Files One Page at a Time – more

more is used to view the contents of a file one screen at a time.

prompt>  more fileA {files}

{…} means 0 or more times (see cat above).

more will display the contents of each file specified one screen at a time. Press ‹Enter› to scroll down one line at a time. To scroll down one screen at a time, press ‹Space›. Press b to back up one screen.

For instance, if the user shazbot wishes to look at her email she could use the command:

prompt>  more /var/spool/mail/shazbot

This command instructs the shell to print the contents of the user’s (shazbot) mail spool one screen at a time.

Displaying Non-Text Files

Some files require the use of special tools if they are to be properly displayed. HTML files must be viewed in a web browser. Postscript files are best examined with the ghostview utility. PDF files should be viewed with Acrobat Reader, and many images can be displayed with xv.

Other Operations on Files

The UNIX operating system contains a large number of commands for examining, manipulating, and modifying files. As you gain more familiarity with UNIX, you will find dozens of useful utilities. Some of the more useful operations are comparing, searching, compressing, and archiving.

Comparing – diff

The diff command can be used to look for differences between two files. It takes two filenames as arguments. When differences are found between the two, the differing lines are printed to the screen. The lines from the first file are preceded with a less than symbol < lines from the second file are preceded with a greater than symbol >

Searching – grep

UNIX has a very powerful utility called grep (for Get Regular ExPression). grep takes two arguments: a pattern and a file. Both of the arguments may contain wild-card characters. grep returns a list of occurrences of the pattern preceded by the name of the file each occurs in and a colon. For instance:

prompt>  grep foobar *.{c,cc,pl}

Finds all occurrences of the string foobar in files that end with .c, .cc, or .pl

Compressed Files – gzip/gunzip

Files may be compressed with the gzip command to reduce their size.

prompt>  gzip filename

Files compressed with this utility will have the extension .gz appended to the filenames.

To uncompress a file with a .gz suffix, use the command gunzip:

prompt>  gunzip filename

There are variations of the common UNIX file utilities that can be used on gzip compressed files. They behave like the original commands do, but they work on compressed files. Some of these commands are gzgrep, gzmore, gzcat, and gzdiff.

Recovering Deleted Files

When a file is deleted in UNIX it is gone. There is no practical way of recovering it. However, the system administrators do nightly backups of the file system. If you have some version of the file that existed prior to the last backup, it is likely that it can be restored. If you lose or delete such files, send mail to support@cat.pdx.edu and give the path for each of the files that were deleted and when. If possible, also tell support when the latest version was known to exist. Time permitting, the staff will try to recover the most recently backed-up version if one exists. Turn-around time on file discovery and recovery is measured in hours, so do not get dependent on this service. Save copies of important files your are working on.