Menu

#67 DV1394 breaks EXT3 (crash), possible buffer overrun

bug (general)
open
dv1394 (5)
5
2003-09-17
2003-09-17
No

This is a very repeatable problem. It manifests itself
whenusing dvgrab to pull video from my Canon ZR70MC DV
camcorder. Video download will work fine for a variable
amount of time (10-30 minutes) and then dvgrab will exit
with a bus error.

Subsequent to this event, and endless stream of messages
from the EXT3 filesystem (/home and /var partitions, I'm
assuming /home since that's where the video is being
stored and /var since that's where syslog would be
attempting to write messages) appears on the console.
They refer to errors in ext3_orphan_add,
ext3_get_inode_loc, and ext3_write_inode ... or something
like that, since logging doesn't work because all of the
filesystems are inaccessible, this is from memory. The only
resolution is to power-cycle the machine.

I've tried several kernels with the exact same result.
memtest86 verifies the RAM is okay, and running various
disk tests shows no defects there either. The system is
otherwise rock-solid.

Output from lspci:

00:00.0 Host bridge: Advanced Micro Devices [AMD]
AMD-760 [IGD4-1P] System Controller (rev 13)
00:01.0 PCI bridge: Advanced Micro Devices [AMD]
AMD-760 [IGD4-1P] AGP Bridge
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686
[Apollo Super South] (rev 40)
00:07.1 IDE interface: VIA Technologies, Inc.
VT82C586/B/686A/B PIPC Bus Master IDE (rev 06)
00:07.2 USB Controller: VIA Technologies, Inc. USB (rev 16)
00:07.3 USB Controller: VIA Technologies, Inc. USB (rev 16)
00:07.4 SMBus: VIA Technologies, Inc. VT82C686 [Apollo
Super ACPI] (rev 40)
00:08.0 PCI bridge: Hint Corp HB1-SE33 PCI-PCI Bridge
(rev 15)
00:09.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8029(AS)
00:0a.0 Ethernet controller: D-Link System Inc RTL8139
Ethernet (rev 10)
00:0c.0 Multimedia audio controller: Ensoniq 5880 AudioPCI
(rev 02)
01:05.0 VGA compatible controller: nVidia Corporation NV15
[GeForce2 GTS/Pro] (rev a4)
02:08.0 USB Controller: NEC Corporation USB (rev 41)
02:08.1 USB Controller: NEC Corporation USB (rev 41)
02:08.2 USB Controller: NEC Corporation USB 2.0 (rev 02)
02:09.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE
1394 Host Controller (rev 46)

The associated entry in /proc/pci for the card:

Bus 2, device 9, function 0:
FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394
Host Controller (rev
70).
IRQ 5.
Master Capable. Latency=32. Max Lat=32.
Non-prefetchable 32 bit memory at 0xda003000
[0xda0037ff].
I/O at 0xc000 [0xc07f].

My current kernel is 2.4.23-pre4, but the problem occurs
also under 2.4.22 and 2.4.21 stock kernels and
2.4.21-0.13mdk (Mandrake's patched distribution kernel).

libraw1394 version is 0.10
dvgrab version is 1.2 (and I'm actually grabbing from
/dev/raw1394).

No information is written in the logs as the filesystems are
all inaccessible as a result from the error...

Output from dmesg associated with firewire:

raw1394: /dev/raw1394 device initialized
ohci1394: $Rev: 1010 $ Ben Collins <bcollins@debian.org>
ohci1394_0: OHCI-1394 1.0 (PCI): IRQ=[5]
MMIO=[da003000-da0037ff] Max Packet=[2048]
ieee1394: Current remote IRM is not 1394a-2000
compliant, resetting...
ieee1394: Node added: ID:BUS[0-00:1023]
GUID[000085000077916e]
ieee1394: Host added: ID:BUS[0-01:1023]
GUID[000108003701714a]
ieee1394: unsolicited response packet received - np
ieee1394: contents: ffc10120 ffc07000 00000000
8411f8a1
ieee1394: Node changed: 0-01:1023 -> 0-00:1023
ieee1394: Node removed: ID:BUS[0-00:1023]
GUID[000085000077916e]

My guess -- mind you this is only a guess -- would be that
there's a buffer overrun somewhere in the driver that spills
over into the filesystem code and sets it to some
indeterminate state, possibly as the result of a strange or
malformed packet coming in from the camera. If anyone
could provide any suggestions for checking this out.

Discussion


Log in to post a comment.

MongoDB Logo MongoDB