|
From: marcello.carla <mar...@gm...> - 2024-07-28 21:08:10
|
Dear Michael and Dear David,
1)
here is a proposal for a patch to system hang in case of an off-line or
non existent device. Thanks to Michael for having spotted the problem.
When bb_write() is called, NDAC has to be already asserted low, or no
device is listening. This was not checked and made go crazy
bb_NRFD_interrupt().
In the patch, I have also moved to debug level 1 the (normally useless)
warnings for out of order or idle interrupts. But see also point 2.
--- gpib_bitbang.c-76c3dc 2024-07-27 23:35:01.412798005 +0200
+++ gpib_bitbang.c 2024-07-27 23:48:26.639818311 +0200
@@ -417,7 +417,7 @@
int send_eoi, size_t *bytes_written)
{
unsigned long flags;
- int retval = 0;
+ int retval = -1;
bb_private_t *priv = board->private_data;
@@ -438,6 +438,7 @@
dbg_printk(1,"Enabling interrupts - NRFD: %d NDAC: %d\n",
gpiod_get_value(NRFD), gpiod_get_value(NDAC));
+ if (gpiod_get_value(NDAC)) goto write_end;
spin_lock_irqsave (&priv->rw_lock, flags);
priv->w_busy = 1; /* make the interrupt
routines active */
@@ -506,13 +507,13 @@
if (priv->phase == 99) ENABLE_IRQ (priv->irq_NRFD,
IRQ_TYPE_EDGE_RISING);
if (priv->w_busy == 0) {
- dbg_printk(0,"interrupt while idle after %zu/%zu at %d\n",
+ dbg_printk(1,"interrupt while idle after %zu/%zu at %d\n",
priv->w_cnt, priv->length,
priv->phase);
priv->nrfd_idle++;
goto nrfd_exit; /* idle */
}
if (nrfd == 0) {
- dbg_printk(0,"out of order interrupt after %zu/%zu at
%d cmd %d " LINFMT ".\n",
+ dbg_printk(1,"out of order interrupt after %zu/%zu at
%d cmd %d " LINFMT ".\n",
priv->w_cnt, priv->length,
priv->phase, priv->cmd, LINVAL);
priv->phase = 3;
priv->nrfd_seq++;
@@ -565,12 +566,12 @@
irq, gpiod_get_value(NRFD), ndac, board->status,
priv->direction, priv->w_busy, priv->r_busy);
if (priv->w_busy == 0) {
- dbg_printk(0,"interrupt while idle.\n");
+ dbg_printk(1,"interrupt while idle.\n");
priv->ndac_idle++;
goto ndac_exit;
}
if (ndac == 0) {
- dbg_printk(0,"out of order interrupt at %zu:%d.\n",
priv->w_cnt, priv->phase);
+ dbg_printk(1,"out of order interrupt at %zu:%d.\n",
priv->w_cnt, priv->phase);
priv->phase = 5;
priv->ndac_seq++;
goto ndac_exit;
2)
I see the problem of a buffer overflow while writing, but to reproduce
repeated interrupts with slow edges I had to use rising and falling times
longer than 100 us, a not common event, and only with RPi3b. RPi4 and
RPi5 have Schmitt triggers on input, and this makes the event impossible.
Yet, a good interlock between NDAC and NRFD interrupts is advisable, and
the current implementation (for historical reasons) is a mess. Sorry for
that and thanks again to Michael for signalling the problem. A revision
of this part of the code was already on schedule. Asap, as usual.
Bye
Marcello Carla'
On 7/28/24 21:17, Michael Schwingen wrote:
> On 26.07.24 17:48, marcello.carla via Linux-gpib-general wrote:
>> Dear Michael,
>>
>> bb_DAV_interrupt():
>>
>> the check for buffer overflow is at line 390 of current version
>> [76c3dc] (line 379 of last unpatched [b4cbd1]):
>>
>> priv->end_flag = ((priv->count >= priv->request) || priv->end);
>>
>> 'count' is the number of read character; 'request' is the buffer
>> length; when the buffer is full, the operation is terminated even
>> before an EOI or newline. Can you reproduce the conditions when
>> this mechanism does not work correctly?
>
> No, I currently can't reproduce it, but I am quite sure I had cases
> where the count incremented way beyond the expected transfer count.
>
> It might have been the write case:
>
> in bb_NRFD_interrupt, we have
>
> set_data_lines(priv->w_buf[priv->w_cnt++]); // put the data on
> the lines
> with no check of the transfer size - it checks the size to assert EOI,
> but does not stop further interrupts from happening.
>
> The check to end the transfer is in bb_NDAC_interrupt.
>
> Now if you get lots of NRFD interrupts before NDAC interrupt is
> called, you will increment w_cnt without limit.
>
> Also, if you get multiple NRFD interrupts for one transfer (which can
> happen with sloppy rising/falling edges and reflections), you will get
> wrong data in the buffer. The NRFD interrupt should be locked out
> until the NDAC phase has happened (and vice-versa).
>
>
>>
>> system hang:
>>
>> yes, there is a problem; when you address a non existing device
>> with ibrd(), you correctly obtain a timeout error; when you try
>> an ibwrt() on a non existing device, the system hangs. I shall
>> try to spot the error and propose a remedy asap.
>
> The interesting thing is this only happens some of the time. i had to
> power-cycle the DMM about 5 times before I could catch the hang.
>
> cu
>
> Michael
>
>
>
> _______________________________________________
> Linux-gpib-general mailing list
> Lin...@li...
> https://lists.sourceforge.net/lists/listinfo/linux-gpib-general
|