2007年5月22日火曜日

5/15分の宿題

  1. Experimentally construct a rough memory map for an application on your operating system.
    1. Write a program that prints out (in hexadecimal) the addresses of the following:
      1. main()
      2. a variable on the outermost stack frame (main()'s stack frame)
      3. a variable on the stack frame of a recursively-called function, called to a depth of five times
      4. a statically-defined but uninitialized variable
      5. a statically-defined, initialized variable
      6. several large chunks of malloc()ed memory
      7. a library routine, such as strcpy()
      8. a system call wrapper, such as the one for write()
    2. Take that information and draw a memory map for your OS. It should indicate which direction the stack and the heap grow in. An ASCII picture is okay, or you can use a drawing program of some sort if you wish.
      1. How big is the distance between your stack and your heap?
      2. Was your program compiled with static libraries or shared libraries?

A:
1: 0x8048454
2: 0xbfd8cc70
3: 0xbfd8cc04
4: 0x80497e0
5: 0x80497d8
6: 0xb5fc6008
7: 0x8048364
8: 0x8048334

source code:
#include <stdio.h>
#include <string.h>
#include <unistd.h>

#define ALLOC_SIZE (1024 * 1024 * 32)

void rec_call(int);

int uninitialized_int;
int initialized_int = 0xbeaf;

int main(){
char *buff;
printf("main: 0x%x\n", main);
printf("uninitialized_int: 0x%x\n", &uninitialized_int);
printf("initialized_int: 0x%x\n", &initialized_int);
printf("calling recursive function...\n");
rec_call(5);
printf("...done.\n");
buff = malloc(sizeof(char) * ALLOC_SIZE);
printf("32MB malloc() address: 0x%x\n", buff);
free(buff);
printf("strcpy() address: 0x%x\n", strcpy);
printf("write() address: 0x%x\n", write);
}

void rec_call(int count){
if(count > 0){
rec_call(--count);
}
}

result:
$ ./a.out
main: 0x8048454 (1)
uninitialized_int: 0x80497e0 (4)
initialized_int: 0x80497d8 (5)
calling recursive function...
...done.
32MB malloc() address: 0xb5f34008 (6)
strcpy() address: 0x8048364 (7)
write() address: 0x8048334 (8)

GDB result:
$ gdb ./a.out
GNU gdb Red Hat Linux (6.5-15.fc6rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".

(gdb) b main
Breakpoint 1 at 0x8048465: file HWA.c, line 14.
(gdb) run
Starting program: /home/morimori/SS/06/a.out

Breakpoint 1, main () at HWA.c:14
14 printf("main: 0x%x\n", main);
(gdb) info f
Stack level 0, frame at 0xbff57f30 (2):
eip = 0x8048465 in main (HWA.c:14); saved eip 0xbcff2c
source language c.
Arglist at 0xbff57f28, args:
Locals at 0xbff57f28, Previous frame's sp at 0xbff57f24
Saved registers:
ebp at 0xbff57f28, eip at 0xbff57f2c
(gdb) b rec_call
Breakpoint 2 at 0x8048529: file HWA.c, line 28.
(gdb) c
Continuing.
main: 0x8048454
uninitialized_int: 0x80497e0
initialized_int: 0x80497d8
calling recursive function...

Breakpoint 2, rec_call (count=5) at HWA.c:28
28 if(count > 0){
(gdb) c
Continuing.

Breakpoint 2, rec_call (count=4) at HWA.c:28
28 if(count > 0){
(gdb) c
Continuing.

Breakpoint 2, rec_call (count=3) at HWA.c:28
28 if(count > 0){
(gdb) c
Continuing.

Breakpoint 2, rec_call (count=2) at HWA.c:28
28 if(count > 0){
(gdb) c
Continuing.

Breakpoint 2, rec_call (count=1) at HWA.c:28
28 if(count > 0){
(gdb) c
Continuing.

Breakpoint 2, rec_call (count=0) at HWA.c:28
28 if(count > 0){
(gdb) info f
Stack level 0, frame at 0xbfbac324 (3):
eip = 0x8048529 in rec_call (HWA.c:28); saved eip 0x804853e
called by frame at 0xbfbac330
source language c.
Arglist at 0xbfbac31c, args: count=0
Locals at 0xbfbac31c, Previous frame's sp is 0xbfbac324
Saved registers:
ebp at 0xbfbac31c, eip at 0xbfbac320

B:Mac OS Xの場合
カーネルアドレススペース(32bit,PowerPC, Mac OS X 10.4 Mac OS X Internals pp.910 TABLE 8-3)

0x00000000-0x00004fff : Exception vectors and low-memory code
0x00005000-0x00005fff: Low-memory globals
0x00006000-0x00006fff: Low-memory shared page used for low-level debugging
0x00007000-0x0000dfff: Boot processor interrupt and debug stacks
0x0000e000-0x0fffffff: Kernel code and data
0x10000000-0xdfffffff: Physical memory window
0xe0000000-0xffffffff: User memory window

ユーザアドレススペース(Mac OS X Internals pp.910-911 TABLE 8-4)
0x00000000-0x00001000: So-called zero page(__PAGEZERO)--inaccessible by default so that dereferencing a NULL pointer(including small offsets from a NULL pointer) causes a protection fault
0x00001000-0x8fdfffff: Application address range(about 2.3GB)
0x8fe00000-0x8fffffff: Space reserved exclusively for Apple system libraries; e.g., the dynamic linker's text segment, mapped starting at 0x8fe00000
0x90000000-0x9fffffff: Global shared text segment, reserved exclusively for Apple system libraries; e.g., the system library's text segment, mapped starting at 0x90000000
0xa0000000-0xafffffff: Global shared data segment, reserved exclusively for Apple system libraries; e.g., the system library's data segment, mapped starting at 0xa0000000
0xb0000000-0xbfffffff: Preferred address range for the application's main thread
0xc0000000-0xebffffff: Additional space available for third-party applications and framework code
0xf0000000-0xfdffffff: Range preferred for use by additional thread stacks, although applications may use this range as necessary
0xfe000000-0xffbfffff: Range reserved for use by the pasteboard and other system services; not to be used by user programs
0xffc00000-0xfffdffff: Range preferred for use by other system services, although applications may use this range as necessary
0xfffe0000-0xffff7fff: Range reserved for use by system services and not to be used by user programs; e.g., a portion of the address range starting at 0xfffec000 is used by the Objective-C library as a commpage for optimizing message dispatch
0xffff8000-0xffffefff: System-shared commpage(seven pages)
0xfffff000-0xffffffff: Last page of a 32-bit address space; cannot be mapped by the Mack VM subsystem

1.メモリマップを調べた限りでは,スタックフレームとヒープのアドレスは分からなかったので,プログラムを書いて推測した.


source:
#include

void func();

int main(){
char *c, *c2;
c = malloc(sizeof(char));
c2 = malloc(sizeof(char));
printf("heap c: 0x%x\n", c);
printf("heap c2: 0x%x\n", c2);
func();
free(c);
free(c2);
printf("Hello, world.\n");
}

void func(){
int tmp;
tmp = 1;
}


result:
$ ./a.out
heap c: 0x3000f0
heap c2: 0x300100
Hello, world.


$ gdb ./a.out
GNU gdb 6.3.50-20050815 (Apple version gdb-563) (Wed Jul 19 05:10:58 GMT 2006)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-apple-darwin"...Reading symbols for shared libraries .. done

(gdb) b main
Breakpoint 1 at 0x1efa: file ./hello.c, line 7.
(gdb) run
Starting program: /Users/mmori/tmp/a.out
Reading symbols for shared libraries . done

Breakpoint 1, main () at ./hello.c:7
7 c = malloc(sizeof(char));
(gdb) info f
Stack level 0, frame at 0xbffffa50:
eip = 0x1efa in main (./hello.c:7); saved eip 0x1ed2
source language c.
Arglist at 0xbffffa48, args:
Locals at 0xbffffa48, Previous frame's sp is 0xbffffa50
Saved registers:
ebx at 0xbffffa44, ebp at 0xbffffa48, eip at 0xbffffa4c
(gdb) b func
Breakpoint 2 at 0x1f77: file ./hello.c, line 19.
(gdb) c
Continuing.
heap c: 0x3000f0
heap c2: 0x300100

Breakpoint 2, func () at ./hello.c:19
19 tmp = 1;
(gdb) info f
Stack level 0, frame at 0xbffffa20:
eip = 0x1f77 in func (./hello.c:19); saved eip 0x1f47
called by frame at 0xbffffa50
source language c.
Arglist at 0xbffffa18, args:
Locals at 0xbffffa18, Previous frame's sp is 0xbffffa20
Saved registers:
ebp at 0xbffffa18, eip at 0xbffffa1c
(gdb) c
Continuing.
Hello, world.

Program exited with code 016.


実行結果から,ヒープの開始アドレスは0x3000f0か,それよりも手前から下方向にのびていくことが分かる.
また,スタックフレームは,0xbffffa50から,上方向にのびていることがわかる.
スタックとヒープの間の距離は,0xbfcff960,すなわち10進で3218078048,3068MBとなる.

2.GCCでは,通常明示的にstaticオプションを付けない限りはshared libraryとしてコンパイルされる.そのため,今回使用したプログラムは全てshared libraryである.

2. Extend your project proposal to answer the questions above.

調査内容: Apacheのログファイルが2GBを超えた際に,システムのLoadが非常に高くなるのはなぜか調べる.
詳細: Redhat系Linuxで,Apache 2.0系を利用してサーバを運用していた際,ログファイルaccess_logのファイルサイズが2GBを超えた時点で極端にシステムの負荷が高くなった.恐らくEXT3ファイルシステムの問題か何かであると予想されるが,この原因について調査する.

Output: システム負荷増大の原因と,その対策
Midterm Milestones: 実験環境の構築と再現性の確認
Equipment and Skills: サーバ構築,及びファイルシステムに関する技術.必要であればソースコードを参照することで原因を調査したい
What will you learn if the project is successful? : ファイルシステムに関する動作の詳細,及び実際の挙動を知ることができる
What will you learn if the project fails? : システム障害の原因を辿るプロセスを学ぶことができる.

2007年5月14日月曜日

5/8分の宿題

1.Take last week's memory copy program, and modify it to fork() to a certain depth, then have each one of the processes time the copy of a certain amount of memory. Your program should take three arguments, the depth, the amount of memory to copy, and the number of times to copy the memory. Ideally, the amount of memory to copy should be large enough to require several seconds, but that's not practical, so have it repeat the copy some number of times. For example, fork five processes, malloc() ten megabytes each, and have the processes copy that memory one hundred times.

program source:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>

#define TMP_FILE ".tmpfile"
#define LINE_BUFFSIZE (40)
#define COPY_SOURCE (1024 * 1024) /* 1M Bytes */

double cputime();
void add_count();
void print_result();
void usage();
void do_alloc(int, size_t, int);
void die(char*);

int count, depth;
char source[COPY_SOURCE];

typedef struct statistics{
double time;
int process_number;
} statistics_t;

int main(int argc, char **argv){
int i;
if(argc != 4){
usage();
return 0;
}

count = 0;
signal(SIGUSR1, add_count);/* counter signal */
int times;
size_t amount;/* amount may be bigger than int */
depth = atoi(argv[1]);/* number of processes */
amount = atol(argv[2]);/* amount of memory(per process) */
times = atoi(argv[3]);/* times of copy */

if(depth < 1 || amount < 1 || times < 1){
usage();
return -1;
}

char *ptr;
int current_processes = 0;

for(i = 0;i < depth;i++){
pid_t child_pid = fork();
if(child_pid == -1){
die("fork() failed.\n");
}
if(child_pid == 0){
do_alloc(i, amount, times);
return 0;
}
// printf("fork() successed, pid: %d\n", child_pid);
}
wait();
}

void add_count(){
count++;
if(count == depth){
print_result();
}
}

void print_result(){
FILE *fp;
int i, index;
double tmpd;
char buff[LINE_BUFFSIZE];
char *cp;
statistics_t *statistics;

if((statistics = malloc(sizeof(statistics_t) * depth)) == NULL){
die("statistics memory allocation fault.");
}

if((fp = fopen(TMP_FILE, "r")) == NULL){
die("file read error.\n");
}
for(i = 0;i < depth;i++){
if((cp = fgets(buff, LINE_BUFFSIZE, fp)) == NULL){
die("file read error.\n");
}
printf("%s", buff);
/*
sscanf(buff, "%d,%.10f\n", &index, &amp;amp;amp;amp;amp;amp;amp;amp;amp;tmpd);
statistics[index].time = tmpd;
statistics[index].process_number = index;
*/
}
fclose(fp);
remove(TMP_FILE);
/*
tmpd = .0;
for(i = 0;i < depth;i++){
printf("%d -> %.10f ms\n", statistics[i].process_number, statistics[i].t\
ime);
tmpd += statistics[i].time;
}
*/
}

/**
* do_alloc() : allocate memory
*/

void do_alloc(int process, size_t size, int times){
int i, j;
double stime, etime;
char *ptr;
size_t remain;
int current_offset;

/* start memory allocation */
stime = cputime();
for(i = 0;i < times;i++){
/* allocate memory */
if((ptr = malloc(size)) == NULL){
printf("malloc() allocation fault!!\n");
}
current_offset = 0;
for(remain = size;remain > 0;){
if(remain >= COPY_SOURCE){
memcpy(ptr, source, COPY_SOURCE);
remain -= COPY_SOURCE;
}else{
memcpy(ptr, source, remain);
remain = 0;
}
}
free(ptr);
}
etime = cputime();
/* end memory allocation */

FILE *fp;
if((fp = fopen(TMP_FILE, "a")) == NULL){
die("can't open temporary file.");
}
char result[LINE_BUFFSIZE];
sprintf(result, "%d,%.10f\n", process, etime - stime);
if(fputs(result, fp) == EOF){
die("can't write file.");
}
fclose(fp);

kill(getppid(), SIGUSR1);/* termination of process */
return;
}

/**
* usage() : print usage.
*/
void usage(){
printf("Usage: forkmemcopy DEPTH AMOUNT_OF_MEMORY_TO_COPY NUMBER_OF_TIMES\n")\
;
}

/**
* get cputime
*/
double cputime(){
struct timeval t;
// void *v;
gettimeofday(&t, NULL);
return t.tv_sec + (double)t.tv_usec*1e-6;
}

/**
* common error function
*/
void die(char *s){
printf(s);
perror("ERROR(die)");
exit(-1);
}

補足: 最初に立ち上げたプロセスがManagerプロセスとなり,子プロセスをfork()していき,子プロセスがdo_alloc()内でmemcpy()を実行する.
do_alloc()は内部で結果をTMP_FILEへ追記書き込みし,終了時にManagerプロセスにSIGUSR1を送る.
ManagerプロセスはSIGUSR1の数をカウントしており,depthの数だけのSIGUSR1を受け取ったらTMP_FILEの内容を読み込み表示する.
データのやりとりにファイルを使っているが,追記モード("a")でのみ書き込みを行っているため,書き込み時に競合が起こる心配は無いと考えている.

当初,平均値の計算などもTMP_FILEから読み出した値から計算するよう実装しようとしたが,sscanf()によってdoubleの値がうまく読み込めなかったため,急遽結果を出力するだけのプログラムとした.
平均値計算には,5〜6行のPHPスクリプトを用いた.

A. Run the program with a depth of one, and report how long it takes. Repeat this program twenty-five times and report the average and the individual times.

Result:
0.8175361156
0.9691882133
0.7505040169
0.4270858765
0.3950059414
0.4412739277
0.4337899685
0.4545879364
0.3823759556
0.6414270401
0.4668221474
0.3328568935
0.3372659683
0.3725678921
0.3376910686
0.3227369785
0.3254480362
0.4398460388
0.3319430351
0.3406119347
0.6110649109
0.3521721363
0.3620688915
0.3821530342
0.3682980537

Average: 0.45585288

B. Now run the program with a depth of five. Again, repeat at least five times and report the average. Is the average higher than five times the depth one case? Why?
Result:
1.9947190285
2.0967659950
2.1174561977
2.1399757862
2.3340249062
1.6144938469
2.1689970493
2.3224248886
2.5233850479
2.6110658646
0.7902989388
1.9660778046
2.2923190594
2.4991791248
2.4989440441
0.8868319988
1.9140131474
2.2147798538
2.3721489906
2.4381871223
0.7857530117
1.3553609848
2.0533030033
2.3023898602
2.3273780346

Average: 2.024810944
Aに比べて4倍以上の実行時間がかかっている。これは、メモリ確保を同時に行うことで、メモリに対するI/Oの負荷が高まり、動作に時間がかかったことが理由に考えられる。

C. Plot the density function for the execution time. (You should have twenty-five data points here for copying a gigabyte of memory each, for the depth one and depth five cases.) Is there more variability in the depth five case?

グラフの生成にはRを用いた.
> dataA <- read.table("dataA.txt")
> hist(dataA$V1)

> dataB <- read.table("dataB.txt")
> hist(dataB$V1)




2. Run one copy of your program at the normal priority, and at the same time a second copy at lower priority, e.g. by using nice. Does the first one completely monopolize the CPU until it is finished?

以下の様なシェルスクリプトを作り実験した。

#!/bin/bash

./a.out 1 10240000 1000 > nice10.out &
nice -n 5 ./a.out 1 10240000 1000 > nice5.out &

結果、nice5.outは8.2090461254 sec, nice10.outは8.3474671841 secとなった。
なぜこうなったか考えてみると、プログラム中のメモリコピー部分(時間を計測している部分)に入る前の時点でnice10のプロセスがnice5のプロセスに実行を譲っているため、このような結果となったのではないかと考える。
そして、実際にそうであるのかどうか、timeコマンドを間に挟んで以下の様に実行した

#!/bin/bash

time ./a.out 1 10240000 1000 > nice10.out &
time nice -n 0 ./a.out 1 10240000 1000 > nice5.out &

結果、
real 0m7.870s
user 0m3.102s
sys 0m0.769s

real 0m8.384s
user 0m3.499s
sys 0m0.830s

という出力が得られた、これではやはり優先度制御がうまく行われていないのではないかと考える。


3. Find and report the time quantum for your particular system.

# cat /proc/driver/rtc
rtc_time : 15:05:37
rtc_date : 2007-05-14
rtc_epoch : 1900
alarm : 02:17:18
DST_enable : no
BCD : yes
24hr : yes
square_wave : no
alarm_IRQ : no
update_IRQ : no
periodic_IRQ : no
periodic_freq : 1024
batt_status : okay

以上より、1024HzでRTCのクロックが取れていることがわかる。


参考: プログラミング言語C,B.W.カーニハン/D.M.リッチー : http://www.amazon.co.jp/%E3%83%97%E3%83%AD%E3%82%B0%E3%83%A9%E3%83%9F%E3%83%B3%E3%82%B0%E8%A8%80%E8%AA%9EC-ANSI%E8%A6%8F%E6%A0%BC%E6%BA%96%E6%8B%A0-B-W-%E3%82%AB%E3%83%BC%E3%83%8B%E3%83%8F%E3%83%B3/dp/4320026926/ref=pd_bbs_1/250-6458066-9954607?ie=UTF8&s=books&qid=1179155847&sr=8-1

RjpWiki: http://www.okada.jp.org/RWiki/

2007年5月11日金曜日

5/1分の宿題

すっかり遅れてしまったが,5/1出題分の宿題について.

1:Ignoring the deadlock for a moment, what happens at the philosophers' table when one philosopher dies while hold a fork?

->死んだ時にフォークを手放すかどうかによる.フォークを手放さない場合,当然全ての哲学者は飢え死ぬことになるだろう.
死んだ際に左手に持っていたフォークを手放す場合,その死んだ哲学者の両隣の哲学者LとRは,それぞれLなら右側,Rなら左側のフォークを独占できる.
この場合,左のフォークを先に確保するアルゴリズムのため,Rは相変わらず右のフォークを獲得できないでロックする事があるが,Lは左のフォークを確保した地点で右のフォークを確実に確保できる.そのため,Lから順々にデッドロックの輪が解放されていき,結果としてRのロックは解放される.
しかし,このケースでは,生き残った哲学者全ては平等な立場では無くなる.すなわち,Lが最も有利であり,Rが最も不利な立場になる.

2:One possible solution to the dining philosophers problem is to number the forks, and require the philophers to always pick up an even-numbered fork first.
  • Does this scheme work?
  • What happens when there are an odd number of forks?
  • Can this scheme be extended to work when you need three forks to eat?
このアルゴリズムは哲学者の人数が偶数の場合,正常に動作する.
人数が奇数の場合,正常に動作しない.なぜならば1番のフォークとMAX_COUNTのフォークに囲まれている哲学者はフォークを一生取ることができない(1からナンバリングした場合,両方奇数となる)か,0番のフォークとMAX_COUNTのフォークに囲まれている哲学者はどちらのフォークを取れば良いか分からない(0からナンバリングした場合,両方偶数となる)からである.
3つのフォークをどの位置から取ってくるのかがよく分からないが,恐らくうまく動かないのではないかと思う.3つ目のフォークを確保する際に最大2つのフォークを確保済みで無ければならず,それがロックの原因になるのではないかと思う.

3:
  • Another possible solution, since all forks are identical, is to pile all of the forks in the middle of the table, and have the philosophers grab any two when they want to eat. Does this work better?
    • Analyze the probability of deadlock, treating time as discrete, based on the number of philosophers, probability of a philospher wanting to eat, and number of forks.
うまく動作しない.哲学者の数とフォークの数が同じ以上,全員が一つのフォークを持って立ち往生するという本質は変わらないからである.
以下に哲学者が4人(A,B,C,D)の場合を例示する.
  1. A,B,C,Dが同時にフォークを一つ取る
  2. テーブルの上にフォークは一つも残らない(デッドロック)
4:I give you a red disk that you can use as a marker. How would create a protocol that guarantees deadlock avoidance? Is it robust against the death of one of the philosophers?
  • 二つめのフォークが取れなかった場合に,マークされた哲学者であれば一度食べることをあきらめてフォークを机に戻す.
  • フォークを机に戻した後,左の哲学者にマークを渡す
これで不平等さはほとんど無いのではないかと思う.

5:Describe how the original Ethernet CSMA/CD scheme is like the dining philosophers problem.

フォークを確保する際,取りたいフォークが机の上に無ければ手に持ったフォークを全て机に置き(Reset),ランダムな時間空腹を我慢する.その後またフォークを取りに行けば良い.

6:Find the synchronization primitive used in an Intel Core Duo dual-processor or an AMD on-chip multiprocessor.

Core2 DuoではLOCK命令が該当する命令だと思われる.
参照元: Intel® 64 and IA-32 Architectures Software Developer's Manuals: http://www.intel.com/products/processor/manuals/index.htm
 Volume 2A: Instruction Set Reference, A-MにLOCK命令の詳細が記述されている.

7:Write a program that copies one chunk of memory to another (your OS certainly provides some memory copy library routine). Measure its performance. (We will use this information in later exercises in the course.)

source:

$ cat memorycopy.c
#include <stdio.h>
#include <string.h>
#include <sys/time.h>

#define SIZE 1024000

double cputime();

int main(){
unsigned char buff[SIZE] = "homework no.7";
unsigned char buff2[SIZE];
double stime, etime;
stime = cputime();
memcpy(buff2, buff, SIZE);
etime = cputime();
printf("%d Bytes memcpy() time: %10.6f ms.\n", SIZE, etime - stime);
}

double cputime(){
struct timeval t;
gettimeofday(&t, NULL);
return t.tv_sec + (double)t.tv_usec*1e-6;
}

output:
$ ./a.out
1024000 Bytes memcpy() time: 0.003304 ms.