Kaldi / Bugs / #19 bug in shuffle

Jan "yenda" Trmal - 2015-06-16

BTW, I still cannot see why the current implementation should work the way
you described (unless the rand() is broken)
y.

On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:

I'd suggest using the second option (sort -R might not be available
everywhere -- I remember running into troubles with it somewhere).
Let's wait for Dan.
y.

On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
wrote:

** [bugs:#19] bug in shuffle_list.pl**

Status: open
Group: v1.0_(example)
Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
Last Updated: Tue Jun 16, 2015 04:54 PM UTC
Owner: nobody

Hi,

First, thanks for this great ASR package called Kaldi!
Next, I believe I ran into a bug in utils/shuffle_list.pl
To reproduce (bug manifestation depends on perl implementation of sort)

i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
cat nums-10k | ./utils/shuffle_list.pl | tail -40

Above creates file of 10k lines where each line contains its index. Then
we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
not properly shuffled.

Instead of shuffle_list.pl, if lines are unique, one could use sort -R
or something like this perl one
liner

Basically, the problem is that providing perl's sort algorithm a fair
coin to flip doesn't guarantee shuffled output.
I'm not sure where else this script is called, but in my case it made
nnet1 train on a small set of speakers at the end of each iteration.

Eric

Sent from sourceforge.net because you indicated interest in <
https://sourceforge.net/p/kaldi/bugs/19/>

To unsubscribe from further messages, please visit <
https://sourceforge.net/auth/subscriptions/>

Related

Bugs: ~~#19~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Povey - 2015-06-16
  
  The reason for using that script is reproducibility, which sort -R lacks.
  The core of the sorting is
  @lines = sort { rand() <=> rand() } @lines;
  which Karel or I got from online somewhere. This algorithm is
  probably incorrect (i.e. does not give fully random output), depending
  on the implementation of 'sort'.
  I think it would be better to prepend each line with the output of
  rand() and then \t, and then sort using string order, and then remove
  everything up to and including the \t before printing out. This will
  still be consistent but will properly sort the input. Yenda, do you
  have time to test this out?
  
  Dan
  
  On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:
  
  BTW, I still cannot see why the current implementation should work the way
  you described (unless the rand() is broken)
  y.
  
  On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:
  
  I'd suggest using the second option (sort -R might not be available
  everywhere -- I remember running into troubles with it somewhere).
  Let's wait for Dan.
  y.
  
  On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
  wrote:
  
  [bugs:#19] bug in shuffle_list.pl
  
  Status: open
  Group: v1.0_(example)
  Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
  Last Updated: Tue Jun 16, 2015 04:54 PM UTC
  Owner: nobody
  
  Hi,
  
  First, thanks for this great ASR package called Kaldi!
  Next, I believe I ran into a bug in utils/shuffle_list.pl
  To reproduce (bug manifestation depends on perl implementation of sort)
  
  i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
  cat nums-10k | ./utils/shuffle_list.pl | tail -40
  
  Above creates file of 10k lines where each line contains its index. Then
  we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
  not properly shuffled.
  
  Instead of shuffle_list.pl, if lines are unique, one could use sort -R
  or something like this perl one
  liner
  
  Basically, the problem is that providing perl's sort algorithm a fair
  coin to flip doesn't guarantee shuffled output.
  I'm not sure where else this script is called, but in my case it made
  nnet1 train on a small set of speakers at the end of each iteration.
  
  Eric
  
  Sent from sourceforge.net because you indicated interest in <
  https://sourceforge.net/p/kaldi/bugs/19/>
  
  To unsubscribe from further messages, please visit <
  https://sourceforge.net/auth/subscriptions/>
  
  [bugs:#19] bug in shuffle_list.pl
  
  Status: open
  Group: v1.0_(example)
  Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
  Last Updated: Tue Jun 16, 2015 04:54 PM UTC
  Owner: nobody
  
  Hi,
  
  First, thanks for this great ASR package called Kaldi!
  Next, I believe I ran into a bug in utils/shuffle_list.pl
  To reproduce (bug manifestation depends on perl implementation of sort)
  
  i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
  cat nums-10k | ./utils/shuffle_list.pl | tail -40
  
  Above creates file of 10k lines where each line contains its index. Then we
  look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is not
  properly shuffled.
  
  Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
  something like this perl one liner
  
  Basically, the problem is that providing perl's sort algorithm a fair coin
  to flip doesn't guarantee shuffled output.
  I'm not sure where else this script is called, but in my case it made nnet1
  train on a small set of speakers at the end of each iteration.
  
  Eric
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/kaldi/bugs/19/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  Related
  
  Bugs: ~~#19~~
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Jan "yenda" Trmal - 2015-06-16
    
    I will look into it in the evening.
    y.
    
    On Tue, Jun 16, 2015 at 2:21 PM, Daniel Povey danielpovey@users.sf.net
    wrote:
    
    The reason for using that script is reproducibility, which sort -R lacks.
    The core of the sorting is
    @lines = sort { rand() <=> rand() } @lines;
    which Karel or I got from online somewhere. This algorithm is
    probably incorrect (i.e. does not give fully random output), depending
    on the implementation of 'sort'.
    I think it would be better to prepend each line with the output of
    rand() and then \t, and then sort using string order, and then remove
    everything up to and including the \t before printing out. This will
    still be consistent but will properly sort the input. Yenda, do you
    have time to test this out?
    
    Dan
    
    On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:
    
    BTW, I still cannot see why the current implementation should work the
    way
    you described (unless the rand() is broken)
    y.
    
    On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:
    
    I'd suggest using the second option (sort -R might not be available
    everywhere -- I remember running into troubles with it somewhere).
    Let's wait for Dan.
    y.
    
    On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
    wrote:
    
    [bugs:#19] bug in shuffle_list.pl
    
    Status: open
    Group: v1.0_(example)
    Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
    Last Updated: Tue Jun 16, 2015 04:54 PM UTC
    Owner: nobody
    
    Hi,
    
    First, thanks for this great ASR package called Kaldi!
    Next, I believe I ran into a bug in utils/shuffle_list.pl
    To reproduce (bug manifestation depends on perl implementation of sort)
    
    i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
    cat nums-10k | ./utils/shuffle_list.pl | tail -40
    
    Above creates file of 10k lines where each line contains its index. Then
    we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
    is
    not properly shuffled.
    
    Instead of shuffle_list.pl, if lines are unique, one could use sort -R
    or something like this perl one
    liner
    
    Basically, the problem is that providing perl's sort algorithm a fair
    coin to flip doesn't guarantee shuffled output.
    I'm not sure where else this script is called, but in my case it made
    nnet1 train on a small set of speakers at the end of each iteration.
    
    Eric
    
    Sent from sourceforge.net because you indicated interest in <
    https://sourceforge.net/p/kaldi/bugs/19/>
    
    To unsubscribe from further messages, please visit <
    https://sourceforge.net/auth/subscriptions/>
    
    [bugs:#19] bug in shuffle_list.pl
    
    Status: open
    Group: v1.0_(example)
    Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
    Last Updated: Tue Jun 16, 2015 04:54 PM UTC
    Owner: nobody
    
    Hi,
    
    First, thanks for this great ASR package called Kaldi!
    Next, I believe I ran into a bug in utils/shuffle_list.pl
    To reproduce (bug manifestation depends on perl implementation of sort)
    
    i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
    cat nums-10k | ./utils/shuffle_list.pl | tail -40
    
    Above creates file of 10k lines where each line contains its index. Then
    we
    look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
    not
    properly shuffled.
    
    Instead of shuffle_list.pl, if lines are unique, one could use sort -R
    or
    something like this perl one liner
    
    Basically, the problem is that providing perl's sort algorithm a fair
    coin
    to flip doesn't guarantee shuffled output.
    I'm not sure where else this script is called, but in my case it made
    nnet1
    train on a small set of speakers at the end of each iteration.
    
    Eric
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/kaldi/bugs/19/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    ** [bugs:#19] bug in shuffle_list.pl**
    
    Status: open
    Group: v1.0_(example)
    Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
    Last Updated: Tue Jun 16, 2015 04:54 PM UTC
    Owner: nobody
    
    Hi,
    
    First, thanks for this great ASR package called Kaldi!
    Next, I believe I ran into a bug in utils/shuffle_list.pl
    To reproduce (bug manifestation depends on perl implementation of sort)
    
    i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
    cat nums-10k | ./utils/shuffle_list.pl | tail -40
    
    Above creates file of 10k lines where each line contains its index. Then
    we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
    not properly shuffled.
    
    Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
    something like this perl one liner
    
    Basically, the problem is that providing perl's sort algorithm a fair coin
    to flip doesn't guarantee shuffled output.
    I'm not sure where else this script is called, but in my case it made
    nnet1 train on a small set of speakers at the end of each iteration.
    
    Eric
    
    Sent from sourceforge.net because you indicated interest in <
    https://sourceforge.net/p/kaldi/bugs/19/>
    
    To unsubscribe from further messages, please visit <
    https://sourceforge.net/auth/subscriptions/>
    
    Related
    
    Bugs: ~~#19~~
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Eric Shellef - 2015-06-16
    
    Just a small note, if reproducibility across platforms is also a concern,
    I'm not sure perl random is consistent. See e.g.
    http://www.perlmonks.org/bare/?node_id=437589
    
    On Tue, Jun 16, 2015 at 11:21 AM, Daniel Povey danielpovey@users.sf.net
    wrote:
    
    The reason for using that script is reproducibility, which sort -R lacks.
    The core of the sorting is
    @lines = sort { rand() <=> rand() } @lines;
    which Karel or I got from online somewhere. This algorithm is
    probably incorrect (i.e. does not give fully random output), depending
    on the implementation of 'sort'.
    I think it would be better to prepend each line with the output of
    rand() and then \t, and then sort using string order, and then remove
    everything up to and including the \t before printing out. This will
    still be consistent but will properly sort the input. Yenda, do you
    have time to test this out?
    
    Dan
    
    On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:
    
    BTW, I still cannot see why the current implementation should work the way
    you described (unless the rand() is broken)
    y.
    
    On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:
    
    I'd suggest using the second option (sort -R might not be available
    everywhere -- I remember running into troubles with it somewhere).
    Let's wait for Dan.
    y.
    
    On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
    wrote:
    
    [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl
    
    Status: open
    Group: v1.0_(example)
    Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
    Last Updated: Tue Jun 16, 2015 04:54 PM UTC
    Owner: nobody
    
    Hi,
    
    First, thanks for this great ASR package called Kaldi!
    Next, I believe I ran into a bug in utils/shuffle_list.pl
    To reproduce (bug manifestation depends on perl implementation of sort)
    
    i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
    cat nums-10k | ./utils/shuffle_list.pl | tail -40
    
    Above creates file of 10k lines where each line contains its index. Then
    we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
    not properly shuffled.
    
    Instead of shuffle_list.pl, if lines are unique, one could use sort -R
    or something like this perl one
    liner
    
    Basically, the problem is that providing perl's sort algorithm a fair
    coin to flip doesn't guarantee shuffled output.
    I'm not sure where else this script is called, but in my case it made
    nnet1 train on a small set of speakers at the end of each iteration.
    
    Eric
    
    Sent from sourceforge.net because you indicated interest in <
    https://sourceforge.net/p/kaldi/bugs/19/>
    
    To unsubscribe from further messages, please visit <
    https://sourceforge.net/auth/subscriptions/>
    
    [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl
    
    Status: open
    Group: v1.0_(example)
    Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
    Last Updated: Tue Jun 16, 2015 04:54 PM UTC
    Owner: nobody
    
    Hi,
    
    First, thanks for this great ASR package called Kaldi!
    Next, I believe I ran into a bug in utils/shuffle_list.pl
    To reproduce (bug manifestation depends on perl implementation of sort)
    
    i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
    cat nums-10k | ./utils/shuffle_list.pl | tail -40
    
    Above creates file of 10k lines where each line contains its index. Then we
    look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is not
    properly shuffled.
    
    Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
    something like this perl one liner
    
    Basically, the problem is that providing perl's sort algorithm a fair coin
    to flip doesn't guarantee shuffled output.
    I'm not sure where else this script is called, but in my case it made nnet1
    train on a small set of speakers at the end of each iteration.
    
    Eric
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/kaldi/bugs/19/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in
    shuffle_list.pl http://shuffle_list.pl*
    
    Status: open
    Group: v1.0_(example)
    Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
    Last Updated: Tue Jun 16, 2015 04:54 PM UTC
    Owner: nobody
    
    Hi,
    
    First, thanks for this great ASR package called Kaldi!
    Next, I believe I ran into a bug in utils/shuffle_list.pl
    To reproduce (bug manifestation depends on perl implementation of sort)
    
    i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
    cat nums-10k | ./utils/shuffle_list.pl | tail -40
    
    Above creates file of 10k lines where each line contains its index. Then
    we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
    not properly shuffled.
    
    Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
    something like this http://stackoverflow.com/a/886250 perl one liner
    
    Basically, the problem is that providing perl's sort algorithm a fair coin
    to flip doesn't guarantee shuffled output.
    I'm not sure where else this script is called, but in my case it made
    nnet1 train on a small set of speakers at the end of each iteration.
    
    Eric
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/kaldi/bugs/19/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    Related
    
    Bugs: ~~#19~~
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Daniel Povey - 2015-06-16
      
      I'm more concerned about reproducibility on the same platform, from run to run.
      Across platforms, things won't be exactly reproducible for other reasons.
      Dan
      
      On Tue, Jun 16, 2015 at 4:57 PM, Eric Shellef ericshellef@users.sf.net wrote:
      
      Just a small note, if reproducibility across platforms is also a concern,
      I'm not sure perl random is consistent. See e.g.
      http://www.perlmonks.org/bare/?node_id=437589
      
      On Tue, Jun 16, 2015 at 11:21 AM, Daniel Povey danielpovey@users.sf.net
      wrote:
      
      The reason for using that script is reproducibility, which sort -R lacks.
      The core of the sorting is
      @lines = sort { rand() <=> rand() } @lines;
      which Karel or I got from online somewhere. This algorithm is
      probably incorrect (i.e. does not give fully random output), depending
      on the implementation of 'sort'.
      I think it would be better to prepend each line with the output of
      rand() and then \t, and then sort using string order, and then remove
      everything up to and including the \t before printing out. This will
      still be consistent but will properly sort the input. Yenda, do you
      have time to test this out?
      
      Dan
      
      On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:
      
      BTW, I still cannot see why the current implementation should work the way
      you described (unless the rand() is broken)
      y.
      
      On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:
      
      I'd suggest using the second option (sort -R might not be available
      everywhere -- I remember running into troubles with it somewhere).
      Let's wait for Dan.
      y.
      
      On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
      wrote:
      
      [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl
      
      Status: open
      Group: v1.0_(example)
      Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
      Last Updated: Tue Jun 16, 2015 04:54 PM UTC
      Owner: nobody
      
      Hi,
      
      First, thanks for this great ASR package called Kaldi!
      Next, I believe I ran into a bug in utils/shuffle_list.pl
      To reproduce (bug manifestation depends on perl implementation of sort)
      
      i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
      cat nums-10k | ./utils/shuffle_list.pl | tail -40
      
      Above creates file of 10k lines where each line contains its index. Then
      we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
      not properly shuffled.
      
      Instead of shuffle_list.pl, if lines are unique, one could use sort -R
      or something like this perl one
      liner
      
      Basically, the problem is that providing perl's sort algorithm a fair
      coin to flip doesn't guarantee shuffled output.
      I'm not sure where else this script is called, but in my case it made
      nnet1 train on a small set of speakers at the end of each iteration.
      
      Eric
      
      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/kaldi/bugs/19/>
      
      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>
      
      [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl
      
      Status: open
      Group: v1.0_(example)
      Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
      Last Updated: Tue Jun 16, 2015 04:54 PM UTC
      Owner: nobody
      
      Hi,
      
      First, thanks for this great ASR package called Kaldi!
      Next, I believe I ran into a bug in utils/shuffle_list.pl
      To reproduce (bug manifestation depends on perl implementation of sort)
      
      i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
      cat nums-10k | ./utils/shuffle_list.pl | tail -40
      
      Above creates file of 10k lines where each line contains its index. Then we
      look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is not
      properly shuffled.
      
      Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
      something like this perl one liner
      
      Basically, the problem is that providing perl's sort algorithm a fair coin
      to flip doesn't guarantee shuffled output.
      I'm not sure where else this script is called, but in my case it made nnet1
      train on a small set of speakers at the end of each iteration.
      
      Eric
      
      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/bugs/19/
      
      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/
      
      [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in
      shuffle_list.pl http://shuffle_list.pl*
      
      Status: open
      Group: v1.0_(example)
      Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
      Last Updated: Tue Jun 16, 2015 04:54 PM UTC
      Owner: nobody
      
      Hi,
      
      First, thanks for this great ASR package called Kaldi!
      Next, I believe I ran into a bug in utils/shuffle_list.pl
      To reproduce (bug manifestation depends on perl implementation of sort)
      
      i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
      cat nums-10k | ./utils/shuffle_list.pl | tail -40
      
      Above creates file of 10k lines where each line contains its index. Then
      we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
      not properly shuffled.
      
      Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
      something like this http://stackoverflow.com/a/886250 perl one liner
      
      Basically, the problem is that providing perl's sort algorithm a fair coin
      to flip doesn't guarantee shuffled output.
      I'm not sure where else this script is called, but in my case it made
      nnet1 train on a small set of speakers at the end of each iteration.
      
      Eric
      
      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/bugs/19/
      
      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/
      
      [bugs:#19] bug in shuffle_list.pl
      
      Status: open
      Group: v1.0_(example)
      Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
      Last Updated: Tue Jun 16, 2015 04:54 PM UTC
      Owner: nobody
      
      Hi,
      
      First, thanks for this great ASR package called Kaldi!
      Next, I believe I ran into a bug in utils/shuffle_list.pl
      To reproduce (bug manifestation depends on perl implementation of sort)
      
      i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
      cat nums-10k | ./utils/shuffle_list.pl | tail -40
      
      Above creates file of 10k lines where each line contains its index. Then we
      look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is not
      properly shuffled.
      
      Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
      something like this perl one liner
      
      Basically, the problem is that providing perl's sort algorithm a fair coin
      to flip doesn't guarantee shuffled output.
      I'm not sure where else this script is called, but in my case it made nnet1
      train on a small set of speakers at the end of each iteration.
      
      Eric
      
      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/bugs/19/
      
      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/
      
      Related
      
      Bugs: ~~#19~~
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Jan "yenda" Trmal - 2015-06-17
        
        I just committed a fix to this. Eric, can you please check if it fixes
        your issues? I checked the output and they seem "random enough" on our
        cluster.
        y.
        
        On Tue, Jun 16, 2015 at 5:09 PM, Daniel Povey danielpovey@users.sf.net
        wrote:
        
        I'm more concerned about reproducibility on the same platform, from run to
        run.
        Across platforms, things won't be exactly reproducible for other reasons.
        Dan
        
        On Tue, Jun 16, 2015 at 4:57 PM, Eric Shellef ericshellef@users.sf.net
        wrote:
        
        Just a small note, if reproducibility across platforms is also a concern,
        I'm not sure perl random is consistent. See e.g.
        http://www.perlmonks.org/bare/?node_id=437589
        
        On Tue, Jun 16, 2015 at 11:21 AM, Daniel Povey danielpovey@users.sf.net
        wrote:
        
        The reason for using that script is reproducibility, which sort -R lacks.
        The core of the sorting is
        @lines = sort { rand() <=> rand() } @lines;
        which Karel or I got from online somewhere. This algorithm is
        probably incorrect (i.e. does not give fully random output), depending
        on the implementation of 'sort'.
        I think it would be better to prepend each line with the output of
        rand() and then \t, and then sort using string order, and then remove
        everything up to and including the \t before printing out. This will
        still be consistent but will properly sort the input. Yenda, do you
        have time to test this out?
        
        Dan
        
        On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:
        
        BTW, I still cannot see why the current implementation should work the
        way
        you described (unless the rand() is broken)
        y.
        
        On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:
        
        I'd suggest using the second option (sort -R might not be available
        everywhere -- I remember running into troubles with it somewhere).
        Let's wait for Dan.
        y.
        
        On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
        wrote:
        
        [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
        is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or something like this perl one
        liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in <
        https://sourceforge.net/p/kaldi/bugs/19/>
        
        To unsubscribe from further messages, please visit <
        https://sourceforge.net/auth/subscriptions/>
        
        [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we
        look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not
        properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1
        train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in
        shuffle_list.pl http://shuffle_list.pl*
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
        is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this http://stackoverflow.com/a/886250 perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        [bugs:#19] bug in shuffle_list.pl
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we
        look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not
        properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1
        train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        ** [bugs:#19] bug in shuffle_list.pl**
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
        something like this perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in <
        https://sourceforge.net/p/kaldi/bugs/19/>
        
        To unsubscribe from further messages, please visit <
        https://sourceforge.net/auth/subscriptions/>
        
        Related
        
        Bugs: ~~#19~~
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Eric Shellef - 2015-06-19
        
        Hi Jan,
        
        The numbers look mixed and the logic of the code makes sense.
        
        FYI, I saw quicker convergence on a validation set when training nnet1 with
        the properly shuffled audio (several hundred hours) as compared to the same
        audio under previous shuffle. The WER on a test set was accordingly better
        after ten epochs with the properly shuffled sentences.
        I haven't verified this trend on more than one test set, but it's worth
        checking.
        
        Thanks,
        Eric
        
        On Wed, Jun 17, 2015 at 12:07 PM, Jan jtrmal@users.sf.net wrote:
        
        I just committed a fix to this. Eric, can you please check if it fixes
        your issues? I checked the output and they seem "random enough" on our
        cluster.
        y.
        
        On Tue, Jun 16, 2015 at 5:09 PM, Daniel Povey danielpovey@users.sf.net
        wrote:
        
        I'm more concerned about reproducibility on the same platform, from run to
        
        run.
        Across platforms, things won't be exactly reproducible for other reasons.
        Dan
        
        On Tue, Jun 16, 2015 at 4:57 PM, Eric Shellef ericshellef@users.sf.net
        wrote:
        
        Just a small note, if reproducibility across platforms is also a concern,
        I'm not sure perl random is consistent. See e.g.
        http://www.perlmonks.org/bare/?node_id=437589
        
        On Tue, Jun 16, 2015 at 11:21 AM, Daniel Povey danielpovey@users.sf.net
        wrote:
        
        The reason for using that script is reproducibility, which sort -R lacks.
        The core of the sorting is
        @lines = sort { rand() <=> rand() } @lines;
        which Karel or I got from online somewhere. This algorithm is
        probably incorrect (i.e. does not give fully random output), depending
        on the implementation of 'sort'.
        I think it would be better to prepend each line with the output of
        rand() and then \t, and then sort using string order, and then remove
        everything up to and including the \t before printing out. This will
        still be consistent but will properly sort the input. Yenda, do you
        have time to test this out?
        
        Dan
        
        On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:
        
        BTW, I still cannot see why the current implementation should work the
        way
        you described (unless the rand() is broken)
        y.
        
        On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:
        
        I'd suggest using the second option (sort -R might not be available
        everywhere -- I remember running into troubles with it somewhere).
        Let's wait for Dan.
        y.
        
        On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
        wrote:
        
        [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
        is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or something like this perl one
        liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in <
        https://sourceforge.net/p/kaldi/bugs/19/>
        
        To unsubscribe from further messages, please visit <
        https://sourceforge.net/auth/subscriptions/>
        
        [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we
        look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not
        properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1
        train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in
        shuffle_list.pl http://shuffle_list.pl*
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
        is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this http://stackoverflow.com/a/886250 perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        [bugs:#19] bug in shuffle_list.pl
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we
        look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not
        properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1
        train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        [bugs:#19] bug in shuffle_list.pl http://shuffle_list.pl*
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
        something like this http://stackoverflow.com/a/886250 perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in <
        https://sourceforge.net/p/kaldi/bugs/19/>
        
        To unsubscribe from further messages, please visit <
        https://sourceforge.net/auth/subscriptions/>
        
        [bugs:#19] bug in shuffle_list.pl http://shuffle_list.pl*
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
        something like this http://stackoverflow.com/a/886250 perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        Related
        
        Bugs: ~~#19~~
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Daniel Povey - 2015-06-19
        
        Interesting, and that makes sense. Cc'ing Karel for his info.
        
        Dan
        
        On Thu, Jun 18, 2015 at 11:40 PM, Eric Shellef ericshellef@users.sf.net wrote:
        
        Hi Jan,
        
        The numbers look mixed and the logic of the code makes sense.
        
        FYI, I saw quicker convergence on a validation set when training nnet1 with
        the properly shuffled audio (several hundred hours) as compared to the same
        audio under previous shuffle. The WER on a test set was accordingly better
        after ten epochs with the properly shuffled sentences.
        I haven't verified this trend on more than one test set, but it's worth
        checking.
        
        Thanks,
        Eric
        
        On Wed, Jun 17, 2015 at 12:07 PM, Jan jtrmal@users.sf.net wrote:
        
        I just committed a fix to this. Eric, can you please check if it fixes
        your issues? I checked the output and they seem "random enough" on our
        cluster.
        y.
        
        On Tue, Jun 16, 2015 at 5:09 PM, Daniel Povey danielpovey@users.sf.net
        wrote:
        
        I'm more concerned about reproducibility on the same platform, from run to
        
        run.
        Across platforms, things won't be exactly reproducible for other reasons.
        Dan
        
        On Tue, Jun 16, 2015 at 4:57 PM, Eric Shellef ericshellef@users.sf.net
        wrote:
        
        Just a small note, if reproducibility across platforms is also a concern,
        I'm not sure perl random is consistent. See e.g.
        http://www.perlmonks.org/bare/?node_id=437589
        
        On Tue, Jun 16, 2015 at 11:21 AM, Daniel Povey danielpovey@users.sf.net
        wrote:
        
        The reason for using that script is reproducibility, which sort -R lacks.
        The core of the sorting is
        @lines = sort { rand() <=> rand() } @lines;
        which Karel or I got from online somewhere. This algorithm is
        probably incorrect (i.e. does not give fully random output), depending
        on the implementation of 'sort'.
        I think it would be better to prepend each line with the output of
        rand() and then \t, and then sort using string order, and then remove
        everything up to and including the \t before printing out. This will
        still be consistent but will properly sort the input. Yenda, do you
        have time to test this out?
        
        Dan
        
        On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:
        
        BTW, I still cannot see why the current implementation should work the
        way
        you described (unless the rand() is broken)
        y.
        
        On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:
        
        I'd suggest using the second option (sort -R might not be available
        everywhere -- I remember running into troubles with it somewhere).
        Let's wait for Dan.
        y.
        
        On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
        wrote:
        
        [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
        is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or something like this perl one
        liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in <
        https://sourceforge.net/p/kaldi/bugs/19/>
        
        To unsubscribe from further messages, please visit <
        https://sourceforge.net/auth/subscriptions/>
        
        [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we
        look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not
        properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1
        train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in
        shuffle_list.pl http://shuffle_list.pl*
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
        is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this http://stackoverflow.com/a/886250 perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        [bugs:#19] bug in shuffle_list.pl
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we
        look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not
        properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1
        train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        [bugs:#19] bug in shuffle_list.pl http://shuffle_list.pl*
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
        something like this http://stackoverflow.com/a/886250 perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in <
        https://sourceforge.net/p/kaldi/bugs/19/>
        
        To unsubscribe from further messages, please visit <
        https://sourceforge.net/auth/subscriptions/>
        
        [bugs:#19] bug in shuffle_list.pl http://shuffle_list.pl*
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
        something like this http://stackoverflow.com/a/886250 perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        [bugs:#19] bug in shuffle_list.pl
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then we
        look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is not
        properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
        something like this perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made nnet1
        train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        Related
        
        Bugs: ~~#19~~
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Jan "yenda" Trmal - 2015-06-19
        
        I'm happy it works for you. I think your observation makes sense. The
        question is if it's only some specific version of perl/OS/glibc (or some
        combination of those) that caused you the sorting problems or if it's just
        you who actually noticed (and the issue affects many more people and
        systems).
        y.
        
        On Thu, Jun 18, 2015 at 11:54 PM, Daniel Povey danielpovey@users.sf.net
        wrote:
        
        Interesting, and that makes sense. Cc'ing Karel for his info.
        
        Dan
        
        On Thu, Jun 18, 2015 at 11:40 PM, Eric Shellef ericshellef@users.sf.net
        wrote:
        
        Hi Jan,
        
        The numbers look mixed and the logic of the code makes sense.
        
        FYI, I saw quicker convergence on a validation set when training nnet1
        with
        the properly shuffled audio (several hundred hours) as compared to the
        same
        audio under previous shuffle. The WER on a test set was accordingly
        better
        after ten epochs with the properly shuffled sentences.
        I haven't verified this trend on more than one test set, but it's worth
        checking.
        
        Thanks,
        Eric
        
        On Wed, Jun 17, 2015 at 12:07 PM, Jan jtrmal@users.sf.net wrote:
        
        I just committed a fix to this. Eric, can you please check if it fixes
        your issues? I checked the output and they seem "random enough" on our
        cluster.
        y.
        
        On Tue, Jun 16, 2015 at 5:09 PM, Daniel Povey danielpovey@users.sf.net
        wrote:
        
        I'm more concerned about reproducibility on the same platform, from run
        to
        
        run.
        Across platforms, things won't be exactly reproducible for other reasons.
        Dan
        
        On Tue, Jun 16, 2015 at 4:57 PM, Eric Shellef ericshellef@users.sf.net
        wrote:
        
        Just a small note, if reproducibility across platforms is also a concern,
        I'm not sure perl random is consistent. See e.g.
        http://www.perlmonks.org/bare/?node_id=437589
        
        On Tue, Jun 16, 2015 at 11:21 AM, Daniel Povey danielpovey@users.sf.net
        wrote:
        
        The reason for using that script is reproducibility, which sort -R lacks.
        The core of the sorting is
        @lines = sort { rand() <=> rand() } @lines;
        which Karel or I got from online somewhere. This algorithm is
        probably incorrect (i.e. does not give fully random output), depending
        on the implementation of 'sort'.
        I think it would be better to prepend each line with the output of
        rand() and then \t, and then sort using string order, and then remove
        everything up to and including the \t before printing out. This will
        still be consistent but will properly sort the input. Yenda, do you
        have time to test this out?
        
        Dan
        
        On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:
        
        BTW, I still cannot see why the current implementation should work the
        way
        you described (unless the rand() is broken)
        y.
        
        On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:
        
        I'd suggest using the second option (sort -R might not be available
        everywhere -- I remember running into troubles with it somewhere).
        Let's wait for Dan.
        y.
        
        On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
        wrote:
        
        [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
        is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or something like this perl one
        liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in <
        https://sourceforge.net/p/kaldi/bugs/19/>
        
        To unsubscribe from further messages, please visit <
        https://sourceforge.net/auth/subscriptions/>
        
        [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we
        look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not
        properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1
        train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in
        shuffle_list.pl http://shuffle_list.pl*
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
        is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this http://stackoverflow.com/a/886250 perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        [bugs:#19] bug in shuffle_list.pl
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we
        look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not
        properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1
        train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        [bugs:#19] bug in shuffle_list.pl http://shuffle_list.pl*
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
        is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this http://stackoverflow.com/a/886250 perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in <
        https://sourceforge.net/p/kaldi/bugs/19/>
        
        To unsubscribe from further messages, please visit <
        https://sourceforge.net/auth/subscriptions/>
        
        [bugs:#19] bug in shuffle_list.pl http://shuffle_list.pl*
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
        is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this http://stackoverflow.com/a/886250 perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        [bugs:#19] bug in shuffle_list.pl
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we
        look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not
        properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1
        train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        ** [bugs:#19] bug in shuffle_list.pl**
        
        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody
        
        Hi,
        
        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)
        
        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40
        
        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not properly shuffled.
        
        Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
        something like this perl one liner
        
        Basically, the problem is that providing perl's sort algorithm a fair coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.
        
        Eric
        
        Sent from sourceforge.net because you indicated interest in <
        https://sourceforge.net/p/kaldi/bugs/19/>
        
        To unsubscribe from further messages, please visit <
        https://sourceforge.net/auth/subscriptions/>
        
        Related
        
        Bugs: ~~#19~~
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jan "yenda" Trmal - 2015-06-16

I'd suggest using the second option (sort -R might not be available
everywhere -- I remember running into troubles with it somewhere).
Let's wait for Dan.
y.

On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
wrote:

** [bugs:#19] bug in shuffle_list.pl**

Status: open
Group: v1.0_(example)
Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
Last Updated: Tue Jun 16, 2015 04:54 PM UTC
Owner: nobody

Hi,

First, thanks for this great ASR package called Kaldi!
Next, I believe I ran into a bug in utils/shuffle_list.pl
To reproduce (bug manifestation depends on perl implementation of sort)

i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
cat nums-10k | ./utils/shuffle_list.pl | tail -40

Above creates file of 10k lines where each line contains its index. Then
we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
not properly shuffled.

Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
something like this perl one liner

Basically, the problem is that providing perl's sort algorithm a fair coin
to flip doesn't guarantee shuffled output.
I'm not sure where else this script is called, but in my case it made
nnet1 train on a small set of speakers at the end of each iteration.

Eric

Sent from sourceforge.net because you indicated interest in <
https://sourceforge.net/p/kaldi/bugs/19/>

To unsubscribe from further messages, please visit <
https://sourceforge.net/auth/subscriptions/>

Related

Bugs: ~~#19~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jan "yenda" Trmal - 2015-07-22

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

bug in shuffle_list.pl

Group

Searches

Help

#19 bug in shuffle_list.pl

Related

Discussion

Related

Related

Related

Eric

Eric

Eric

Related

Related

Related

Eric

Eric

Eric

Related

Related

Related

Related