vine-users ML アーカイブ



[vine-users:079726] Re: AMD64 環 境での起動時のメモリリー クについて

  • From: "hamabo" <hamabo@xxxxxxxxxxxxxxx>
  • Subject: [vine-users:079726] Re: AMD64 環 境での起動時のメモリリー クについて
  • Date: Sun, 4 Oct 2009 07:31:43 +0900
野宮さん、ありがとうございます。

確かに中途半端だったな…と認識し、あれから全パスメモリチェックを行って、メモリのエラーが無い事を確認しました。
その後、テスト稼動していると突然同じ現象(リンクアップ・ダウンを繰り返す)が再発しました。
カーネルを2.6.27に戻して予備の同型アダプタへ換装、eth0のドライバを最新のバージョン(e1000-8.0.16)へ更新して、別のマシンからコネクションを500程度HTTPdへ張ってテスト稼動していると、またまた再発しました…。

ログには、

Oct  4 04:54:57 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct  4 04:54:57 localhost kernel:   Tx Queue             <0>
Oct  4 04:54:57 localhost kernel:   TDH                  <b4>
Oct  4 04:54:57 localhost kernel:   TDT                  <fa>
Oct  4 04:54:57 localhost kernel:   next_to_use          <fa>
Oct  4 04:54:57 localhost kernel:   next_to_clean        <b1>
Oct  4 04:54:57 localhost kernel: buffer_info[next_to_clean]
Oct  4 04:54:57 localhost kernel:   time_stamp           <100028ea8>
Oct  4 04:54:57 localhost kernel:   next_to_watch        <b6>
Oct  4 04:54:57 localhost kernel:   jiffies              <100029109>
Oct  4 04:54:57 localhost kernel:   next_to_watch.status <0>
Oct  4 04:54:59 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct  4 04:54:59 localhost kernel:   Tx Queue             <0>
Oct  4 04:54:59 localhost kernel:   TDH                  <b4>
Oct  4 04:54:59 localhost kernel:   TDT                  <fa>
Oct  4 04:54:59 localhost kernel:   next_to_use          <fa>
Oct  4 04:54:59 localhost kernel:   next_to_clean        <b1>
Oct  4 04:54:59 localhost kernel: buffer_info[next_to_clean]
Oct  4 04:54:59 localhost kernel:   time_stamp           <100028ea8>
Oct  4 04:54:59 localhost kernel:   next_to_watch        <b6>
Oct  4 04:54:59 localhost kernel:   jiffies              <1000292fd>
Oct  4 04:54:59 localhost kernel:   next_to_watch.status <0>
Oct  4 04:55:01 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct  4 04:55:01 localhost kernel:   Tx Queue             <0>
Oct  4 04:55:01 localhost kernel:   TDH                  <b4>
Oct  4 04:55:01 localhost kernel:   TDT                  <fa>
Oct  4 04:55:01 localhost kernel:   next_to_use          <fa>
Oct  4 04:55:01 localhost kernel:   next_to_clean        <b1>
Oct  4 04:55:01 localhost kernel: buffer_info[next_to_clean]
Oct  4 04:55:01 localhost kernel:   time_stamp           <100028ea8>
Oct  4 04:55:01 localhost kernel:   next_to_watch        <b6>
Oct  4 04:55:01 localhost kernel:   jiffies              <1000294f1>
Oct  4 04:55:01 localhost kernel:   next_to_watch.status <0>
Oct  4 04:55:03 localhost kernel: ------------[ cut here ]------------
Oct  4 04:55:03 localhost kernel: WARNING: at net/sched/sch_generic.c:219
dev_watchdog+0x136/0x1d8()
Oct  4 04:55:03 localhost kernel: NETDEV WATCHDOG: eth0 (e1000): transmit
timed out
Oct  4 04:55:03 localhost kernel: Modules linked in: xt_length ipt_REJECT
xt_limit ipt_LOG ipt_recent xt_tcpudp nf_conntrack_ipv4 xt_state
nf_conntrack xt_multiport iptable_filter ip_tables x_tables cpufreq_ondemand
powernow_k8 freq_table dm_mod firewire_ohci firewire_core crc
_itu_t i2c_piix4 i2c_core ohci1394 thermal e1000 ieee1394 sg shpchp
processor pcspkr button wmi usb_storage ahci pata_atiixp libata dock
ide_cd_mod cdrom sd_mod scsi_mod crc_t10dif uhci_hcd ohci_hcd ehci_hcd
Oct  4 04:55:03 localhost kernel: Pid: 0, comm: swapper Not tainted
2.6.27-43vl5 #1
Oct  4 04:55:03 localhost kernel:
Oct  4 04:55:03 localhost kernel: Call Trace:
Oct  4 04:55:03 localhost kernel:  <IRQ>  [<ffffffff80239eb1>]
warn_slowpath+0xb4/0xe0
Oct  4 04:55:03 localhost kernel:  [<ffffffffa0170696>]
ipt_do_table+0x501/0x56b [ip_tables]
Oct  4 04:55:03 localhost kernel:  [<ffffffff8022ed69>]
source_load+0x2a/0x4f
Oct  4 04:55:03 localhost kernel:  [<ffffffff8022edb8>]
target_load+0x2a/0x4f
Oct  4 04:55:03 localhost kernel:  [<ffffffff8022f79b>]
place_entity+0x6c/0x9a
Oct  4 04:55:03 localhost kernel:  [<ffffffff80230236>]
enqueue_entity+0x9c/0xbd
Oct  4 04:55:03 localhost kernel:  [<ffffffff8022eb50>]
enqueue_task+0x13/0x1e
Oct  4 04:55:03 localhost kernel:  [<ffffffff8022fa6f>]
resched_task+0x2d/0x74
Oct  4 04:55:03 localhost kernel:  [<ffffffff802338a2>]
try_to_wake_up+0x175/0x187
Oct  4 04:55:03 localhost kernel:  [<ffffffff8024ba00>]
autoremove_wake_function+0x9/0x2e
Oct  4 04:55:03 localhost kernel:  [<ffffffff8022f049>]
__wake_up_common+0x41/0x75
Oct  4 04:55:03 localhost kernel:  [<ffffffff80469bba>]
dev_watchdog+0x136/0x1d8
Oct  4 04:55:03 localhost kernel:  [<ffffffff8022fbeb>] __wake_up+0x38/0x4e
Oct  4 04:55:03 localhost kernel:  [<ffffffff80469a84>]
dev_watchdog+0x0/0x1d8
Oct  4 04:55:03 localhost kernel:  [<ffffffff802421e3>]
run_timer_softirq+0x16f/0x1ec
Oct  4 04:55:03 localhost kernel:  [<ffffffff8023e5e3>]
__do_softirq+0x65/0xdb
Oct  4 04:55:03 localhost kernel:  [<ffffffff8021189c>]
call_softirq+0x1c/0x28
Oct  4 04:55:03 localhost kernel:  [<ffffffff802139d3>] do_softirq+0x3c/0x81
Oct  4 04:55:03 localhost kernel:  [<ffffffff8023e538>] irq_exit+0x3f/0x85
Oct  4 04:55:03 localhost kernel:  [<ffffffff8021fc0b>]
smp_apic_timer_interrupt+0x8f/0xa8
Oct  4 04:55:03 localhost kernel:  [<ffffffff802110a3>]
apic_timer_interrupt+0x83/0x90
Oct  4 04:55:03 localhost kernel:  <EOI>  [<ffffffff8021f9b4>]
lapic_next_event+0x0/0x13
Oct  4 04:55:03 localhost kernel:  [<ffffffff80223c3c>]
native_safe_halt+0x2/0x3
Oct  4 04:55:03 localhost kernel:  [<ffffffff8024f194>]
notifier_call_chain+0x29/0x4c
Oct  4 04:55:03 localhost kernel:  [<ffffffff80217454>]
default_idle+0x2a/0x46
Oct  4 04:55:03 localhost kernel:  [<ffffffff80217682>] c1e_idle+0x10a/0x10f
Oct  4 04:55:03 localhost kernel:  [<ffffffff8020eca5>] cpu_idle+0x9e/0xc8
Oct  4 04:55:03 localhost kernel:
Oct  4 04:55:03 localhost kernel: ---[ end trace 003e90f6d4b5fa06 ]---
Oct  4 04:55:06 localhost kernel: e1000: eth0: e1000_watchdog_task: NIC Link
is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Oct  4 04:55:12 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct  4 04:55:12 localhost kernel:   Tx Queue             <0>
Oct  4 04:55:12 localhost kernel:   TDH                  <7f>
Oct  4 04:55:12 localhost kernel:   TDT                  <ba>
Oct  4 04:55:12 localhost kernel:   next_to_use          <ba>
Oct  4 04:55:12 localhost kernel:   next_to_clean        <7c>
Oct  4 04:55:12 localhost kernel: buffer_info[next_to_clean]
Oct  4 04:55:12 localhost kernel:   time_stamp           <100029dbe>
Oct  4 04:55:12 localhost kernel:   next_to_watch        <80>
Oct  4 04:55:12 localhost kernel:   jiffies              <100029fb2>
Oct  4 04:55:12 localhost kernel:   next_to_watch.status <0>
Oct  4 04:55:14 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct  4 04:55:14 localhost kernel:   Tx Queue             <0>
Oct  4 04:55:14 localhost kernel:   TDH                  <7f>
Oct  4 04:55:14 localhost kernel:   TDT                  <ba>
Oct  4 04:55:14 localhost kernel:   next_to_use          <ba>
Oct  4 04:55:14 localhost kernel:   next_to_clean        <7c>
Oct  4 04:55:14 localhost kernel: buffer_info[next_to_clean]
Oct  4 04:55:14 localhost kernel:   time_stamp           <100029dbe>
Oct  4 04:55:14 localhost kernel:   next_to_watch        <80>
Oct  4 04:55:14 localhost kernel:   jiffies              <10002a1a6>
Oct  4 04:55:14 localhost kernel:   next_to_watch.status <0>
Oct  4 04:55:16 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct  4 04:55:16 localhost kernel:   Tx Queue             <0>
Oct  4 04:55:16 localhost kernel:   TDH                  <7f>
Oct  4 04:55:16 localhost kernel:   TDT                  <ba>
Oct  4 04:55:16 localhost kernel:   next_to_use          <ba>
Oct  4 04:55:16 localhost kernel:   next_to_clean        <7c>
Oct  4 04:55:16 localhost kernel: buffer_info[next_to_clean]
Oct  4 04:55:16 localhost kernel:   time_stamp           <100029dbe>
Oct  4 04:55:16 localhost kernel:   next_to_watch        <80>
Oct  4 04:55:16 localhost kernel:   jiffies              <10002a39a>
Oct  4 04:55:16 localhost kernel:   next_to_watch.status <0>
Oct  4 04:55:18 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct  4 04:55:18 localhost kernel:   Tx Queue             <0>
Oct  4 04:55:18 localhost kernel:   TDH                  <7f>
Oct  4 04:55:18 localhost kernel:   TDT                  <ba>
Oct  4 04:55:18 localhost kernel:   next_to_use          <ba>
Oct  4 04:55:18 localhost kernel:   next_to_clean        <7c>
Oct  4 04:55:18 localhost kernel: buffer_info[next_to_clean]
Oct  4 04:55:18 localhost kernel:   time_stamp           <100029dbe>
Oct  4 04:55:18 localhost kernel:   next_to_watch        <80>
Oct  4 04:55:18 localhost kernel:   jiffies              <10002a58e>
Oct  4 04:55:18 localhost kernel:   next_to_watch.status <0>
Oct  4 04:55:20 localhost kernel: e1000: eth0: e1000_clean_tx_irq: Detected
Tx Unit Hang
Oct  4 04:55:20 localhost kernel:   Tx Queue             <0>
Oct  4 04:55:20 localhost kernel:   TDH                  <7f>
Oct  4 04:55:20 localhost kernel:   TDT                  <ba>
Oct  4 04:55:20 localhost kernel:   next_to_use          <ba>
Oct  4 04:55:20 localhost kernel:   next_to_clean        <7c>
Oct  4 04:55:20 localhost kernel: buffer_info[next_to_clean]
Oct  4 04:55:20 localhost kernel:   time_stamp           <100029dbe>
Oct  4 04:55:20 localhost kernel:   next_to_watch        <80>
Oct  4 04:55:20 localhost kernel:   jiffies              <10002a782>
Oct  4 04:55:20 localhost kernel:   next_to_watch.status <0>

<この後、同じようにアップ・ダウンを繰り返すログ>

そしてコネクションを切ると、アップしたままダウンしなくなりました。

…。

それからPCI NICに問題があるのかと思い、オンボードのイーサネットアダプタ(Realtek 8111C)へ繋いで見ると問題が起きません。
1000MT Desktop Adapterを別のマシンでチェックしましが、特に問題はなく、このNIC自体ハードウェア的な問題ではないような気がします。
マザーボードの仕様では、PCIはサウスブリッジ制御(SB700)、PCI-Eはノースブリッジ制御(780G)で、オンボードのイーサネットアダプタ(Realtek 8111C)はノースブリッジ制御のようです。
ちょっと問題がソフトウェアと言うかマザーボード使用上ハードウェア的な感じがしてきましたので、PCI-EのNICを調達して試してみます。

お騒がせ致しました…。