|
|||||||||||||||||||
|
2.0 Installing Condor2.1 Installing Condor
First, download Condor. We have mirrored it locally so you can
download it quickly. It is 146MB, so please be patient. You will get a file named condor-6.7.19-linux-x86-glibc23-dynamic.tar.gz. Let us pick apart the name a bit:
To install and start up Condor, you will take three steps. First, unpack Condor from this file and see what you get: % condor-6.7.19-linux-x86-glibc23-dynamic.tar.gz % tar xzf condor-6.7.19-linux-x86-glibc23-dynamic.tar.gz % cd condor-6.7.19 % ls DOC INSTALL LICENSE.TXT README condor_configure* condor_install* examples/ release.tar
If you explore these files a bit, you'll notice it's a bit
strange. All of the Condor binaries are contained in the release.tar
file. We'll run condor_configure (not condor_install) to
install condor into a subdirectory named The second step is to run condor_configure: % ./condor_configure --install --make-personal-condor [pause] Condor has been installed into: /home/users/roy/condor-6.7.19 In order for Condor to work properly you must set your CONDOR_CONFIG environment variable to point to your Condor configuration file: /home/users/roy/condor-6.7.19/etc/condor_config before running Condor commands/daemons. If you look at the contents of the directory, you'll see that there are now bin and sbin directories for Condor. You'll also see configuration files in etc and locallocal..<machinename>: % ls -CF bin/ condor_install* etc/ include/ lib/ LICENSE.TXT man/ release.tar src/ condor_configure* DOC examples/ INSTALL libexec/ local.ws-01/ README sbin/ % ls etc condor_config examples/ % ls local.ws-01 condor_config.local execute/ log/ spool/ If you tell Condor where to find condor_config, it will know how to find condor_config.local because it is listed in the file: % grep "^LOCAL_CONFIG" etc/condor_config LOCAL_CONFIG_FILE = /home/users/roy/condor-6.7.19/local.ws-01/condor_config.local Condor will read all of condor_config, then read all of condor_config.local. Anything in condor_config.local overrides what is in condor_config. In general, it doesn't matter where you put configuration variables. So tell Condor where to find the configuration file, and tell the shell how to find the Condor binaries. If you are using a shell other than bash, the commands may be different. You need to edit these commands to be appropriate for you--no copy and paste! % export CONDOR_CONFIG=/home/users/roy/condor-6.7.19/etc/condor_config % export PATH=/home/users/roy/condor-6.7.19/bin:${PATH} % export PATH=/home/users/roy/condor-6.7.19/sbin:${PATH}Make sure it worked: % echo $CONDOR_CONFIG /home/users/roy/condor-6.7.19/etc/condor_config % which condor_master ~/condor-6.7.19/sbin/condor_master % condor_version $CondorVersion: 6.7.19 May 10 2006 $ $CondorPlatform: I386-LINUX_RH9 $ You might be surprised that it reports RedHat 9 instead of CERN Scientific Linux 3.0.4 (the version of Linux installed on these computers). It is reporting the operating system that it was compiled on, not the operating system that is in use. Don't worry, the RedHat 9 binaries work just fine on Scientific Linux 3: we've done plenty of testing of that. Now that you've installed Condor, you need to run it. This is easy, just run condor_master. Then check if Condor is running: % condor_master % ps -x PID TTY STAT TIME COMMAND 15652 ? S 0:00 sshd: roy@pts/0 15654 pts/0 S 0:00 -bash 15824 ? S 0:00 condor_master 15825 ? S 0:00 condor_collector -f 15826 ? S 0:00 condor_negotiator -f 15827 ? S 0:00 condor_schedd -f 15828 ? S 0:04 condor_startd -f 15846 pts/0 R 0:00 ps -x Most excellent! You have installed Condor and gotten it running. The output you see from ps may be slightly different than ours, but as long as it lists all of those Condor programs, it's okay. Let's look at what we see: condor_master: This program runs constantly and ensures that all other parts of Condor are running. If they hang or crash, it restarts them. condor_collector: This program is part of the Condor central manager. It collects information about all computers in the pool as well as which users want to run jobs. It is what normally responds to the condor_status command. condor_negotiator: This program is part of the Condor central manager. It decides what jobs should be run where. condor_startd: If this program is running, it allows jobs to be started up on this computer--that is, your computer is an "execute machine". This advertises your computer to the central manager (more on that later, but in this case it's also your computer) so that it knows about this computer. It will start up the jobs that run. condor_schedd If this program is running, it allows jobs to be submitted from this computer--that is, your computer is a "submit machine". This will advertise jobs to the central manager so that it knows about them. It will contact a condor_startd on other execute machines for each job that needs to be started. condor_shadow (Not shown above) For each job that has been submitted from this computer, there is one condor_shadow running. It will watch over the job as it runs remotely. In some cases it will provide some assistance (see the standard universe later.) You may or may not see any condor_shadow processes running, depending on what is happening on the computer when you try it out. We have a graphic representation of these daemons, drawn by Sarah Miller, age 12. 2.2 Condor_qYou can find out what jobs have been submitted on your computer with the condor_q command: % condor_q -- Submitter: ws-01.gs.unina.it : <192.167.1.21:33443> : ws-01.gs.unina.it ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held Nothing is running right now. If something was running, you would see output like this: % condor_q -- Submitter: royal01.cs.wisc.edu : <128.105.112.101:32775> : royal01.cs.wisc.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 4589.0 doronn 3/30 18:07 19+09:26:01 I 0 0.0 go1 5140.0 araddan 7/18 15:59 8+08:16:47 I 0 0.0 .condor_run.23359 5145.0 araddan 7/18 17:22 0+21:29:41 I 0 0.0 matlab-script.txt 6041.0 grishas 12/7 18:41 7+08:03:25 R 0 45.7 a.out 6042.0 grishas 12/7 18:42 8+07:47:14 R 0 45.7 a.out 6044.0 grishas 12/9 11:15 6+17:14:46 R 0 45.7 a.outThe output that you see will be different depending on what jobs are running. Notice what we can see from this:
Extra credit
What else can you find out with condor_q? Try any one of:
How do you use the -constraint or -format options to condor_q? When would you want them? When would you use the -l option? 2.3 Condor_statusYou can find out what computers are in your Condor pool. (A pool is similar to a cluster, but it doesn't have the connotation that all computers are dedicated full-time to computation: some may be desktop computers owned by users.) To look, use condor_status: % condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime vm1@ws-01.gs. LINUX INTEL Unclaimed Idle 0.150 500 0+00:00:04 vm2@ws-01.gs. LINUX INTEL Unclaimed Idle 0.000 500 0+00:00:05 Total Owner Claimed Unclaimed Matched Preempting Backfill INTEL/LINUX 2 0 0 2 0 0 0 Total 2 0 0 2 0 0 0 Right now, we just see our local computer, since you are just running a personal Condor. It appears as two computers, because Condor thinks that there are two processes, due to hyperthreading. Later on, you will see many more computers. Let's look at exactly what you can see:
Extra credit
What else can you find out with condor_status? Try any one of:
2.4 Configuring CondorLook at the condor_config.local, which is located in condor-6.7.19/local.<your-computer-name>. It was created automatically, so it's not as well commented as condor_config. One thing to notice is the START expression. For you, it should be "TRUE". This means that Condor will accept jobs (from any authorized user) at any time. This is where you can change Condor to say things like, "Don't run jobs if someone is using the computer" or "Don't run jobs between 8am and 5pm." Look in condor_config to see some examples of this sort of configuration. These examples are not used because they are overridden in condor_config.local. You can see which version of START is being used with the condor_config_val command: % condor_config_val -v START START: TRUE Defined in '/home/users/roy/condor-6.7.19/local.ws-01/condor_config.local', line 82.
Extra credit
Change Condor to allow multiple jobs. To do this, you tell Condor that you have multiple CPUs. Set NUM_CPUS = 2 in your condor_config.local, and use condor_restart. Wait a bit for Condor to restart, and see what you have. Experiment. Look at condor_config and look at some other examples for the START expression. What do they mean? |
||||||||||||||||||
|