Menu

#19 bug in shuffle_list.pl

v1.0_(example)
closed
nobody
None
5
2015-07-22
2015-06-16
No

Hi,

First, thanks for this great ASR package called Kaldi!
Next, I believe I ran into a bug in utils/shuffle_list.pl
To reproduce (bug manifestation depends on perl implementation of sort)

i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
cat nums-10k | ./utils/shuffle_list.pl | tail -40

Above creates file of 10k lines where each line contains its index. Then we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is not properly shuffled.

Instead of shuffle_list.pl, if lines are unique, one could use sort -R or something like this perl one liner

Basically, the problem is that providing perl's sort algorithm a fair coin to flip doesn't guarantee shuffled output.
I'm not sure where else this script is called, but in my case it made nnet1 train on a small set of speakers at the end of each iteration.

Eric

Related

Bugs: #19

Discussion

  • Jan "yenda" Trmal

    BTW, I still cannot see why the current implementation should work the way
    you described (unless the rand() is broken)
    y.

    On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:

    I'd suggest using the second option (sort -R might not be available
    everywhere -- I remember running into troubles with it somewhere).
    Let's wait for Dan.
    y.

    On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
    wrote:


    ** [bugs:#19] bug in shuffle_list.pl**

    Status: open
    Group: v1.0_(example)
    Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
    Last Updated: Tue Jun 16, 2015 04:54 PM UTC
    Owner: nobody

    Hi,

    First, thanks for this great ASR package called Kaldi!
    Next, I believe I ran into a bug in utils/shuffle_list.pl
    To reproduce (bug manifestation depends on perl implementation of sort)

    i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
    cat nums-10k | ./utils/shuffle_list.pl | tail -40

    Above creates file of 10k lines where each line contains its index. Then
    we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
    not properly shuffled.

    Instead of shuffle_list.pl, if lines are unique, one could use sort -R
    or something like this perl one
    liner

    Basically, the problem is that providing perl's sort algorithm a fair
    coin to flip doesn't guarantee shuffled output.
    I'm not sure where else this script is called, but in my case it made
    nnet1 train on a small set of speakers at the end of each iteration.

    Eric


    Sent from sourceforge.net because you indicated interest in <
    https://sourceforge.net/p/kaldi/bugs/19/>

    To unsubscribe from further messages, please visit <
    https://sourceforge.net/auth/subscriptions/>

     

    Related

    Bugs: #19

    • Daniel Povey

      Daniel Povey - 2015-06-16

      The reason for using that script is reproducibility, which sort -R lacks.
      The core of the sorting is
      @lines = sort { rand() <=> rand() } @lines;
      which Karel or I got from online somewhere. This algorithm is
      probably incorrect (i.e. does not give fully random output), depending
      on the implementation of 'sort'.
      I think it would be better to prepend each line with the output of
      rand() and then \t, and then sort using string order, and then remove
      everything up to and including the \t before printing out. This will
      still be consistent but will properly sort the input. Yenda, do you
      have time to test this out?

      Dan

      On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:

      BTW, I still cannot see why the current implementation should work the way
      you described (unless the rand() is broken)
      y.

      On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:

      I'd suggest using the second option (sort -R might not be available
      everywhere -- I remember running into troubles with it somewhere).
      Let's wait for Dan.
      y.

      On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
      wrote:


      [bugs:#19] bug in shuffle_list.pl

      Status: open
      Group: v1.0_(example)
      Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
      Last Updated: Tue Jun 16, 2015 04:54 PM UTC
      Owner: nobody

      Hi,

      First, thanks for this great ASR package called Kaldi!
      Next, I believe I ran into a bug in utils/shuffle_list.pl
      To reproduce (bug manifestation depends on perl implementation of sort)

      i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
      cat nums-10k | ./utils/shuffle_list.pl | tail -40

      Above creates file of 10k lines where each line contains its index. Then
      we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
      not properly shuffled.

      Instead of shuffle_list.pl, if lines are unique, one could use sort -R
      or something like this perl one
      liner

      Basically, the problem is that providing perl's sort algorithm a fair
      coin to flip doesn't guarantee shuffled output.
      I'm not sure where else this script is called, but in my case it made
      nnet1 train on a small set of speakers at the end of each iteration.

      Eric


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/kaldi/bugs/19/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>


      [bugs:#19] bug in shuffle_list.pl

      Status: open
      Group: v1.0_(example)
      Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
      Last Updated: Tue Jun 16, 2015 04:54 PM UTC
      Owner: nobody

      Hi,

      First, thanks for this great ASR package called Kaldi!
      Next, I believe I ran into a bug in utils/shuffle_list.pl
      To reproduce (bug manifestation depends on perl implementation of sort)

      i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
      cat nums-10k | ./utils/shuffle_list.pl | tail -40

      Above creates file of 10k lines where each line contains its index. Then we
      look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is not
      properly shuffled.

      Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
      something like this perl one liner

      Basically, the problem is that providing perl's sort algorithm a fair coin
      to flip doesn't guarantee shuffled output.
      I'm not sure where else this script is called, but in my case it made nnet1
      train on a small set of speakers at the end of each iteration.

      Eric


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/bugs/19/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #19

      • Jan "yenda" Trmal

        I will look into it in the evening.
        y.

        On Tue, Jun 16, 2015 at 2:21 PM, Daniel Povey danielpovey@users.sf.net
        wrote:

        The reason for using that script is reproducibility, which sort -R lacks.
        The core of the sorting is
        @lines = sort { rand() <=> rand() } @lines;
        which Karel or I got from online somewhere. This algorithm is
        probably incorrect (i.e. does not give fully random output), depending
        on the implementation of 'sort'.
        I think it would be better to prepend each line with the output of
        rand() and then \t, and then sort using string order, and then remove
        everything up to and including the \t before printing out. This will
        still be consistent but will properly sort the input. Yenda, do you
        have time to test this out?

        Dan

        On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:

        BTW, I still cannot see why the current implementation should work the
        way
        you described (unless the rand() is broken)
        y.

        On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:

        I'd suggest using the second option (sort -R might not be available
        everywhere -- I remember running into troubles with it somewhere).
        Let's wait for Dan.
        y.

        On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
        wrote:


        [bugs:#19] bug in shuffle_list.pl

        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody

        Hi,

        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)

        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40

        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
        is
        not properly shuffled.

        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or something like this perl one
        liner

        Basically, the problem is that providing perl's sort algorithm a fair
        coin to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.

        Eric


        Sent from sourceforge.net because you indicated interest in <
        https://sourceforge.net/p/kaldi/bugs/19/>

        To unsubscribe from further messages, please visit <
        https://sourceforge.net/auth/subscriptions/>


        [bugs:#19] bug in shuffle_list.pl

        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody

        Hi,

        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)

        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40

        Above creates file of 10k lines where each line contains its index. Then
        we
        look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not
        properly shuffled.

        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or
        something like this perl one liner

        Basically, the problem is that providing perl's sort algorithm a fair
        coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1
        train on a small set of speakers at the end of each iteration.

        Eric


        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/


        ** [bugs:#19] bug in shuffle_list.pl**

        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody

        Hi,

        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)

        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40

        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not properly shuffled.

        Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
        something like this perl one liner

        Basically, the problem is that providing perl's sort algorithm a fair coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.

        Eric


        Sent from sourceforge.net because you indicated interest in <
        https://sourceforge.net/p/kaldi/bugs/19/>

        To unsubscribe from further messages, please visit <
        https://sourceforge.net/auth/subscriptions/>

         

        Related

        Bugs: #19

      • Eric Shellef

        Eric Shellef - 2015-06-16

        Just a small note, if reproducibility across platforms is also a concern,
        I'm not sure perl random is consistent. See e.g.
        http://www.perlmonks.org/bare/?node_id=437589

        On Tue, Jun 16, 2015 at 11:21 AM, Daniel Povey danielpovey@users.sf.net
        wrote:

        The reason for using that script is reproducibility, which sort -R lacks.
        The core of the sorting is
        @lines = sort { rand() <=> rand() } @lines;
        which Karel or I got from online somewhere. This algorithm is
        probably incorrect (i.e. does not give fully random output), depending
        on the implementation of 'sort'.
        I think it would be better to prepend each line with the output of
        rand() and then \t, and then sort using string order, and then remove
        everything up to and including the \t before printing out. This will
        still be consistent but will properly sort the input. Yenda, do you
        have time to test this out?

        Dan

        On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:

        BTW, I still cannot see why the current implementation should work the way
        you described (unless the rand() is broken)
        y.

        On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:

        I'd suggest using the second option (sort -R might not be available
        everywhere -- I remember running into troubles with it somewhere).
        Let's wait for Dan.
        y.

        On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
        wrote:


        [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl

        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody

        Hi,

        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)

        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40

        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not properly shuffled.

        Instead of shuffle_list.pl, if lines are unique, one could use sort -R
        or something like this perl one
        liner

        Basically, the problem is that providing perl's sort algorithm a fair
        coin to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.

        Eric

        Sent from sourceforge.net because you indicated interest in <
        https://sourceforge.net/p/kaldi/bugs/19/>

        To unsubscribe from further messages, please visit <
        https://sourceforge.net/auth/subscriptions/>


        [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl

        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody

        Hi,

        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)

        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40

        Above creates file of 10k lines where each line contains its index. Then we
        look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is not
        properly shuffled.

        Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
        something like this perl one liner

        Basically, the problem is that providing perl's sort algorithm a fair coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made nnet1
        train on a small set of speakers at the end of each iteration.

        Eric

        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/


        Status: open
        Group: v1.0_(example)
        Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
        Last Updated: Tue Jun 16, 2015 04:54 PM UTC
        Owner: nobody

        Hi,

        First, thanks for this great ASR package called Kaldi!
        Next, I believe I ran into a bug in utils/shuffle_list.pl
        To reproduce (bug manifestation depends on perl implementation of sort)

        i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
        cat nums-10k | ./utils/shuffle_list.pl | tail -40

        Above creates file of 10k lines where each line contains its index. Then
        we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
        not properly shuffled.

        Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
        something like this http://stackoverflow.com/a/886250 perl one liner

        Basically, the problem is that providing perl's sort algorithm a fair coin
        to flip doesn't guarantee shuffled output.
        I'm not sure where else this script is called, but in my case it made
        nnet1 train on a small set of speakers at the end of each iteration.

        Eric

        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/bugs/19/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/

         

        Related

        Bugs: #19

        • Daniel Povey

          Daniel Povey - 2015-06-16

          I'm more concerned about reproducibility on the same platform, from run to run.
          Across platforms, things won't be exactly reproducible for other reasons.
          Dan

          On Tue, Jun 16, 2015 at 4:57 PM, Eric Shellef ericshellef@users.sf.net wrote:

          Just a small note, if reproducibility across platforms is also a concern,
          I'm not sure perl random is consistent. See e.g.
          http://www.perlmonks.org/bare/?node_id=437589

          On Tue, Jun 16, 2015 at 11:21 AM, Daniel Povey danielpovey@users.sf.net
          wrote:

          The reason for using that script is reproducibility, which sort -R lacks.
          The core of the sorting is
          @lines = sort { rand() <=> rand() } @lines;
          which Karel or I got from online somewhere. This algorithm is
          probably incorrect (i.e. does not give fully random output), depending
          on the implementation of 'sort'.
          I think it would be better to prepend each line with the output of
          rand() and then \t, and then sort using string order, and then remove
          everything up to and including the \t before printing out. This will
          still be consistent but will properly sort the input. Yenda, do you
          have time to test this out?

          Dan

          On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:

          BTW, I still cannot see why the current implementation should work the way
          you described (unless the rand() is broken)
          y.

          On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:

          I'd suggest using the second option (sort -R might not be available
          everywhere -- I remember running into troubles with it somewhere).
          Let's wait for Dan.
          y.

          On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
          wrote:


          [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl

          Status: open
          Group: v1.0_(example)
          Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
          Last Updated: Tue Jun 16, 2015 04:54 PM UTC
          Owner: nobody

          Hi,

          First, thanks for this great ASR package called Kaldi!
          Next, I believe I ran into a bug in utils/shuffle_list.pl
          To reproduce (bug manifestation depends on perl implementation of sort)

          i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
          cat nums-10k | ./utils/shuffle_list.pl | tail -40

          Above creates file of 10k lines where each line contains its index. Then
          we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
          not properly shuffled.

          Instead of shuffle_list.pl, if lines are unique, one could use sort -R
          or something like this perl one
          liner

          Basically, the problem is that providing perl's sort algorithm a fair
          coin to flip doesn't guarantee shuffled output.
          I'm not sure where else this script is called, but in my case it made
          nnet1 train on a small set of speakers at the end of each iteration.

          Eric

          Sent from sourceforge.net because you indicated interest in <
          https://sourceforge.net/p/kaldi/bugs/19/>

          To unsubscribe from further messages, please visit <
          https://sourceforge.net/auth/subscriptions/>


          [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl

          Status: open
          Group: v1.0_(example)
          Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
          Last Updated: Tue Jun 16, 2015 04:54 PM UTC
          Owner: nobody

          Hi,

          First, thanks for this great ASR package called Kaldi!
          Next, I believe I ran into a bug in utils/shuffle_list.pl
          To reproduce (bug manifestation depends on perl implementation of sort)

          i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
          cat nums-10k | ./utils/shuffle_list.pl | tail -40

          Above creates file of 10k lines where each line contains its index. Then we
          look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is not
          properly shuffled.

          Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
          something like this perl one liner

          Basically, the problem is that providing perl's sort algorithm a fair coin
          to flip doesn't guarantee shuffled output.
          I'm not sure where else this script is called, but in my case it made nnet1
          train on a small set of speakers at the end of each iteration.

          Eric

          Sent from sourceforge.net because you indicated interest in
          https://sourceforge.net/p/kaldi/bugs/19/

          To unsubscribe from further messages, please visit
          https://sourceforge.net/auth/subscriptions/


          [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in
          shuffle_list.pl http://shuffle_list.pl*

          Status: open
          Group: v1.0_(example)
          Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
          Last Updated: Tue Jun 16, 2015 04:54 PM UTC
          Owner: nobody

          Hi,

          First, thanks for this great ASR package called Kaldi!
          Next, I believe I ran into a bug in utils/shuffle_list.pl
          To reproduce (bug manifestation depends on perl implementation of sort)

          i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
          cat nums-10k | ./utils/shuffle_list.pl | tail -40

          Above creates file of 10k lines where each line contains its index. Then
          we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
          not properly shuffled.

          Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
          something like this http://stackoverflow.com/a/886250 perl one liner

          Basically, the problem is that providing perl's sort algorithm a fair coin
          to flip doesn't guarantee shuffled output.
          I'm not sure where else this script is called, but in my case it made
          nnet1 train on a small set of speakers at the end of each iteration.

          Eric

          Sent from sourceforge.net because you indicated interest in
          https://sourceforge.net/p/kaldi/bugs/19/

          To unsubscribe from further messages, please visit
          https://sourceforge.net/auth/subscriptions/


          [bugs:#19] bug in shuffle_list.pl

          Status: open
          Group: v1.0_(example)
          Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
          Last Updated: Tue Jun 16, 2015 04:54 PM UTC
          Owner: nobody

          Hi,

          First, thanks for this great ASR package called Kaldi!
          Next, I believe I ran into a bug in utils/shuffle_list.pl
          To reproduce (bug manifestation depends on perl implementation of sort)

          i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
          cat nums-10k | ./utils/shuffle_list.pl | tail -40

          Above creates file of 10k lines where each line contains its index. Then we
          look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is not
          properly shuffled.

          Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
          something like this perl one liner

          Basically, the problem is that providing perl's sort algorithm a fair coin
          to flip doesn't guarantee shuffled output.
          I'm not sure where else this script is called, but in my case it made nnet1
          train on a small set of speakers at the end of each iteration.

          Eric


          Sent from sourceforge.net because you indicated interest in
          https://sourceforge.net/p/kaldi/bugs/19/

          To unsubscribe from further messages, please visit
          https://sourceforge.net/auth/subscriptions/

           

          Related

          Bugs: #19

          • Jan "yenda" Trmal

            I just committed a fix to this. Eric, can you please check if it fixes
            your issues? I checked the output and they seem "random enough" on our
            cluster.
            y.

            On Tue, Jun 16, 2015 at 5:09 PM, Daniel Povey danielpovey@users.sf.net
            wrote:

            I'm more concerned about reproducibility on the same platform, from run to
            run.
            Across platforms, things won't be exactly reproducible for other reasons.
            Dan

            On Tue, Jun 16, 2015 at 4:57 PM, Eric Shellef ericshellef@users.sf.net
            wrote:

            Just a small note, if reproducibility across platforms is also a concern,
            I'm not sure perl random is consistent. See e.g.
            http://www.perlmonks.org/bare/?node_id=437589

            On Tue, Jun 16, 2015 at 11:21 AM, Daniel Povey danielpovey@users.sf.net
            wrote:

            The reason for using that script is reproducibility, which sort -R lacks.
            The core of the sorting is
            @lines = sort { rand() <=> rand() } @lines;
            which Karel or I got from online somewhere. This algorithm is
            probably incorrect (i.e. does not give fully random output), depending
            on the implementation of 'sort'.
            I think it would be better to prepend each line with the output of
            rand() and then \t, and then sort using string order, and then remove
            everything up to and including the \t before printing out. This will
            still be consistent but will properly sort the input. Yenda, do you
            have time to test this out?

            Dan

            On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:

            BTW, I still cannot see why the current implementation should work the
            way
            you described (unless the rand() is broken)
            y.

            On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:

            I'd suggest using the second option (sort -R might not be available
            everywhere -- I remember running into troubles with it somewhere).
            Let's wait for Dan.
            y.

            On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
            wrote:


            [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl

            Status: open
            Group: v1.0_(example)
            Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
            Last Updated: Tue Jun 16, 2015 04:54 PM UTC
            Owner: nobody

            Hi,

            First, thanks for this great ASR package called Kaldi!
            Next, I believe I ran into a bug in utils/shuffle_list.pl
            To reproduce (bug manifestation depends on perl implementation of sort)

            i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
            cat nums-10k | ./utils/shuffle_list.pl | tail -40

            Above creates file of 10k lines where each line contains its index. Then
            we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
            is
            not properly shuffled.

            Instead of shuffle_list.pl, if lines are unique, one could use sort -R
            or something like this perl one
            liner

            Basically, the problem is that providing perl's sort algorithm a fair
            coin to flip doesn't guarantee shuffled output.
            I'm not sure where else this script is called, but in my case it made
            nnet1 train on a small set of speakers at the end of each iteration.

            Eric

            Sent from sourceforge.net because you indicated interest in <
            https://sourceforge.net/p/kaldi/bugs/19/>

            To unsubscribe from further messages, please visit <
            https://sourceforge.net/auth/subscriptions/>


            [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl

            Status: open
            Group: v1.0_(example)
            Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
            Last Updated: Tue Jun 16, 2015 04:54 PM UTC
            Owner: nobody

            Hi,

            First, thanks for this great ASR package called Kaldi!
            Next, I believe I ran into a bug in utils/shuffle_list.pl
            To reproduce (bug manifestation depends on perl implementation of sort)

            i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
            cat nums-10k | ./utils/shuffle_list.pl | tail -40

            Above creates file of 10k lines where each line contains its index. Then
            we
            look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
            not
            properly shuffled.

            Instead of shuffle_list.pl, if lines are unique, one could use sort -R
            or
            something like this perl one liner

            Basically, the problem is that providing perl's sort algorithm a fair
            coin
            to flip doesn't guarantee shuffled output.
            I'm not sure where else this script is called, but in my case it made
            nnet1
            train on a small set of speakers at the end of each iteration.

            Eric

            Sent from sourceforge.net because you indicated interest in
            https://sourceforge.net/p/kaldi/bugs/19/

            To unsubscribe from further messages, please visit
            https://sourceforge.net/auth/subscriptions/


            [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in
            shuffle_list.pl http://shuffle_list.pl*

            Status: open
            Group: v1.0_(example)
            Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
            Last Updated: Tue Jun 16, 2015 04:54 PM UTC
            Owner: nobody

            Hi,

            First, thanks for this great ASR package called Kaldi!
            Next, I believe I ran into a bug in utils/shuffle_list.pl
            To reproduce (bug manifestation depends on perl implementation of sort)

            i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
            cat nums-10k | ./utils/shuffle_list.pl | tail -40

            Above creates file of 10k lines where each line contains its index. Then
            we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
            is
            not properly shuffled.

            Instead of shuffle_list.pl, if lines are unique, one could use sort -R
            or
            something like this http://stackoverflow.com/a/886250 perl one liner

            Basically, the problem is that providing perl's sort algorithm a fair
            coin
            to flip doesn't guarantee shuffled output.
            I'm not sure where else this script is called, but in my case it made
            nnet1 train on a small set of speakers at the end of each iteration.

            Eric

            Sent from sourceforge.net because you indicated interest in
            https://sourceforge.net/p/kaldi/bugs/19/

            To unsubscribe from further messages, please visit
            https://sourceforge.net/auth/subscriptions/


            [bugs:#19] bug in shuffle_list.pl

            Status: open
            Group: v1.0_(example)
            Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
            Last Updated: Tue Jun 16, 2015 04:54 PM UTC
            Owner: nobody

            Hi,

            First, thanks for this great ASR package called Kaldi!
            Next, I believe I ran into a bug in utils/shuffle_list.pl
            To reproduce (bug manifestation depends on perl implementation of sort)

            i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
            cat nums-10k | ./utils/shuffle_list.pl | tail -40

            Above creates file of 10k lines where each line contains its index. Then
            we
            look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
            not
            properly shuffled.

            Instead of shuffle_list.pl, if lines are unique, one could use sort -R
            or
            something like this perl one liner

            Basically, the problem is that providing perl's sort algorithm a fair
            coin
            to flip doesn't guarantee shuffled output.
            I'm not sure where else this script is called, but in my case it made
            nnet1
            train on a small set of speakers at the end of each iteration.

            Eric


            Sent from sourceforge.net because you indicated interest in
            https://sourceforge.net/p/kaldi/bugs/19/

            To unsubscribe from further messages, please visit
            https://sourceforge.net/auth/subscriptions/


            ** [bugs:#19] bug in shuffle_list.pl**

            Status: open
            Group: v1.0_(example)
            Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
            Last Updated: Tue Jun 16, 2015 04:54 PM UTC
            Owner: nobody

            Hi,

            First, thanks for this great ASR package called Kaldi!
            Next, I believe I ran into a bug in utils/shuffle_list.pl
            To reproduce (bug manifestation depends on perl implementation of sort)

            i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
            cat nums-10k | ./utils/shuffle_list.pl | tail -40

            Above creates file of 10k lines where each line contains its index. Then
            we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
            not properly shuffled.

            Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
            something like this perl one liner

            Basically, the problem is that providing perl's sort algorithm a fair coin
            to flip doesn't guarantee shuffled output.
            I'm not sure where else this script is called, but in my case it made
            nnet1 train on a small set of speakers at the end of each iteration.

            Eric


            Sent from sourceforge.net because you indicated interest in <
            https://sourceforge.net/p/kaldi/bugs/19/>

            To unsubscribe from further messages, please visit <
            https://sourceforge.net/auth/subscriptions/>

             

            Related

            Bugs: #19

            • Eric Shellef

              Eric Shellef - 2015-06-19

              Hi Jan,

              The numbers look mixed and the logic of the code makes sense.

              FYI, I saw quicker convergence on a validation set when training nnet1 with
              the properly shuffled audio (several hundred hours) as compared to the same
              audio under previous shuffle. The WER on a test set was accordingly better
              after ten epochs with the properly shuffled sentences.
              I haven't verified this trend on more than one test set, but it's worth
              checking.

              Thanks,
              Eric

              On Wed, Jun 17, 2015 at 12:07 PM, Jan jtrmal@users.sf.net wrote:

              I just committed a fix to this. Eric, can you please check if it fixes
              your issues? I checked the output and they seem "random enough" on our
              cluster.
              y.

              On Tue, Jun 16, 2015 at 5:09 PM, Daniel Povey danielpovey@users.sf.net
              wrote:

              I'm more concerned about reproducibility on the same platform, from run to

              run.
              Across platforms, things won't be exactly reproducible for other reasons.
              Dan

              On Tue, Jun 16, 2015 at 4:57 PM, Eric Shellef ericshellef@users.sf.net
              wrote:

              Just a small note, if reproducibility across platforms is also a concern,
              I'm not sure perl random is consistent. See e.g.
              http://www.perlmonks.org/bare/?node_id=437589

              On Tue, Jun 16, 2015 at 11:21 AM, Daniel Povey danielpovey@users.sf.net
              wrote:

              The reason for using that script is reproducibility, which sort -R lacks.
              The core of the sorting is
              @lines = sort { rand() <=> rand() } @lines;
              which Karel or I got from online somewhere. This algorithm is
              probably incorrect (i.e. does not give fully random output), depending
              on the implementation of 'sort'.
              I think it would be better to prepend each line with the output of
              rand() and then \t, and then sort using string order, and then remove
              everything up to and including the \t before printing out. This will
              still be consistent but will properly sort the input. Yenda, do you
              have time to test this out?

              Dan

              On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:

              BTW, I still cannot see why the current implementation should work the
              way
              you described (unless the rand() is broken)
              y.

              On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:

              I'd suggest using the second option (sort -R might not be available
              everywhere -- I remember running into troubles with it somewhere).
              Let's wait for Dan.
              y.

              On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
              wrote:


              [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl

              Status: open
              Group: v1.0_(example)
              Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
              Last Updated: Tue Jun 16, 2015 04:54 PM UTC
              Owner: nobody

              Hi,

              First, thanks for this great ASR package called Kaldi!
              Next, I believe I ran into a bug in utils/shuffle_list.pl
              To reproduce (bug manifestation depends on perl implementation of sort)

              i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
              cat nums-10k | ./utils/shuffle_list.pl | tail -40

              Above creates file of 10k lines where each line contains its index. Then
              we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
              is
              not properly shuffled.

              Instead of shuffle_list.pl, if lines are unique, one could use sort -R
              or something like this perl one
              liner

              Basically, the problem is that providing perl's sort algorithm a fair
              coin to flip doesn't guarantee shuffled output.
              I'm not sure where else this script is called, but in my case it made
              nnet1 train on a small set of speakers at the end of each iteration.

              Eric

              Sent from sourceforge.net because you indicated interest in <
              https://sourceforge.net/p/kaldi/bugs/19/>

              To unsubscribe from further messages, please visit <
              https://sourceforge.net/auth/subscriptions/>


              [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl

              Status: open
              Group: v1.0_(example)
              Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
              Last Updated: Tue Jun 16, 2015 04:54 PM UTC
              Owner: nobody

              Hi,

              First, thanks for this great ASR package called Kaldi!
              Next, I believe I ran into a bug in utils/shuffle_list.pl
              To reproduce (bug manifestation depends on perl implementation of sort)

              i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
              cat nums-10k | ./utils/shuffle_list.pl | tail -40

              Above creates file of 10k lines where each line contains its index. Then
              we
              look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
              not
              properly shuffled.

              Instead of shuffle_list.pl, if lines are unique, one could use sort -R
              or
              something like this perl one liner

              Basically, the problem is that providing perl's sort algorithm a fair
              coin
              to flip doesn't guarantee shuffled output.
              I'm not sure where else this script is called, but in my case it made
              nnet1
              train on a small set of speakers at the end of each iteration.

              Eric

              Sent from sourceforge.net because you indicated interest in
              https://sourceforge.net/p/kaldi/bugs/19/

              To unsubscribe from further messages, please visit
              https://sourceforge.net/auth/subscriptions/


              [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in
              shuffle_list.pl http://shuffle_list.pl*

              Status: open
              Group: v1.0_(example)
              Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
              Last Updated: Tue Jun 16, 2015 04:54 PM UTC
              Owner: nobody

              Hi,

              First, thanks for this great ASR package called Kaldi!
              Next, I believe I ran into a bug in utils/shuffle_list.pl
              To reproduce (bug manifestation depends on perl implementation of sort)

              i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
              cat nums-10k | ./utils/shuffle_list.pl | tail -40

              Above creates file of 10k lines where each line contains its index. Then
              we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
              is
              not properly shuffled.

              Instead of shuffle_list.pl, if lines are unique, one could use sort -R
              or
              something like this http://stackoverflow.com/a/886250 perl one liner

              Basically, the problem is that providing perl's sort algorithm a fair
              coin
              to flip doesn't guarantee shuffled output.
              I'm not sure where else this script is called, but in my case it made
              nnet1 train on a small set of speakers at the end of each iteration.

              Eric

              Sent from sourceforge.net because you indicated interest in
              https://sourceforge.net/p/kaldi/bugs/19/

              To unsubscribe from further messages, please visit
              https://sourceforge.net/auth/subscriptions/


              [bugs:#19] bug in shuffle_list.pl

              Status: open
              Group: v1.0_(example)
              Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
              Last Updated: Tue Jun 16, 2015 04:54 PM UTC
              Owner: nobody

              Hi,

              First, thanks for this great ASR package called Kaldi!
              Next, I believe I ran into a bug in utils/shuffle_list.pl
              To reproduce (bug manifestation depends on perl implementation of sort)

              i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
              cat nums-10k | ./utils/shuffle_list.pl | tail -40

              Above creates file of 10k lines where each line contains its index. Then
              we
              look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
              not
              properly shuffled.

              Instead of shuffle_list.pl, if lines are unique, one could use sort -R
              or
              something like this perl one liner

              Basically, the problem is that providing perl's sort algorithm a fair
              coin
              to flip doesn't guarantee shuffled output.
              I'm not sure where else this script is called, but in my case it made
              nnet1
              train on a small set of speakers at the end of each iteration.

              Eric

              Sent from sourceforge.net because you indicated interest in
              https://sourceforge.net/p/kaldi/bugs/19/

              To unsubscribe from further messages, please visit
              https://sourceforge.net/auth/subscriptions/


              Status: open
              Group: v1.0_(example)
              Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
              Last Updated: Tue Jun 16, 2015 04:54 PM UTC
              Owner: nobody

              Hi,

              First, thanks for this great ASR package called Kaldi!
              Next, I believe I ran into a bug in utils/shuffle_list.pl
              To reproduce (bug manifestation depends on perl implementation of sort)

              i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
              cat nums-10k | ./utils/shuffle_list.pl | tail -40

              Above creates file of 10k lines where each line contains its index. Then
              we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
              not properly shuffled.

              Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
              something like this http://stackoverflow.com/a/886250 perl one liner

              Basically, the problem is that providing perl's sort algorithm a fair coin
              to flip doesn't guarantee shuffled output.
              I'm not sure where else this script is called, but in my case it made
              nnet1 train on a small set of speakers at the end of each iteration.

              Eric

              Sent from sourceforge.net because you indicated interest in <
              https://sourceforge.net/p/kaldi/bugs/19/>

              To unsubscribe from further messages, please visit <
              https://sourceforge.net/auth/subscriptions/>


              Status: open
              Group: v1.0_(example)
              Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
              Last Updated: Tue Jun 16, 2015 04:54 PM UTC
              Owner: nobody

              Hi,

              First, thanks for this great ASR package called Kaldi!
              Next, I believe I ran into a bug in utils/shuffle_list.pl
              To reproduce (bug manifestation depends on perl implementation of sort)

              i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
              cat nums-10k | ./utils/shuffle_list.pl | tail -40

              Above creates file of 10k lines where each line contains its index. Then
              we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
              not properly shuffled.

              Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
              something like this http://stackoverflow.com/a/886250 perl one liner

              Basically, the problem is that providing perl's sort algorithm a fair coin
              to flip doesn't guarantee shuffled output.
              I'm not sure where else this script is called, but in my case it made
              nnet1 train on a small set of speakers at the end of each iteration.

              Eric

              Sent from sourceforge.net because you indicated interest in
              https://sourceforge.net/p/kaldi/bugs/19/

              To unsubscribe from further messages, please visit
              https://sourceforge.net/auth/subscriptions/

               

              Related

              Bugs: #19

              • Daniel Povey

                Daniel Povey - 2015-06-19

                Interesting, and that makes sense. Cc'ing Karel for his info.

                Dan

                On Thu, Jun 18, 2015 at 11:40 PM, Eric Shellef ericshellef@users.sf.net wrote:

                Hi Jan,

                The numbers look mixed and the logic of the code makes sense.

                FYI, I saw quicker convergence on a validation set when training nnet1 with
                the properly shuffled audio (several hundred hours) as compared to the same
                audio under previous shuffle. The WER on a test set was accordingly better
                after ten epochs with the properly shuffled sentences.
                I haven't verified this trend on more than one test set, but it's worth
                checking.

                Thanks,
                Eric

                On Wed, Jun 17, 2015 at 12:07 PM, Jan jtrmal@users.sf.net wrote:

                I just committed a fix to this. Eric, can you please check if it fixes
                your issues? I checked the output and they seem "random enough" on our
                cluster.
                y.

                On Tue, Jun 16, 2015 at 5:09 PM, Daniel Povey danielpovey@users.sf.net
                wrote:

                I'm more concerned about reproducibility on the same platform, from run to

                run.
                Across platforms, things won't be exactly reproducible for other reasons.
                Dan

                On Tue, Jun 16, 2015 at 4:57 PM, Eric Shellef ericshellef@users.sf.net
                wrote:

                Just a small note, if reproducibility across platforms is also a concern,
                I'm not sure perl random is consistent. See e.g.
                http://www.perlmonks.org/bare/?node_id=437589

                On Tue, Jun 16, 2015 at 11:21 AM, Daniel Povey danielpovey@users.sf.net
                wrote:

                The reason for using that script is reproducibility, which sort -R lacks.
                The core of the sorting is
                @lines = sort { rand() <=> rand() } @lines;
                which Karel or I got from online somewhere. This algorithm is
                probably incorrect (i.e. does not give fully random output), depending
                on the implementation of 'sort'.
                I think it would be better to prepend each line with the output of
                rand() and then \t, and then sort using string order, and then remove
                everything up to and including the \t before printing out. This will
                still be consistent but will properly sort the input. Yenda, do you
                have time to test this out?

                Dan

                On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:

                BTW, I still cannot see why the current implementation should work the
                way
                you described (unless the rand() is broken)
                y.

                On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:

                I'd suggest using the second option (sort -R might not be available
                everywhere -- I remember running into troubles with it somewhere).
                Let's wait for Dan.
                y.

                On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
                wrote:


                [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl

                Status: open
                Group: v1.0_(example)
                Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
                Last Updated: Tue Jun 16, 2015 04:54 PM UTC
                Owner: nobody

                Hi,

                First, thanks for this great ASR package called Kaldi!
                Next, I believe I ran into a bug in utils/shuffle_list.pl
                To reproduce (bug manifestation depends on perl implementation of sort)

                i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
                cat nums-10k | ./utils/shuffle_list.pl | tail -40

                Above creates file of 10k lines where each line contains its index. Then
                we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
                is
                not properly shuffled.

                Instead of shuffle_list.pl, if lines are unique, one could use sort -R
                or something like this perl one
                liner

                Basically, the problem is that providing perl's sort algorithm a fair
                coin to flip doesn't guarantee shuffled output.
                I'm not sure where else this script is called, but in my case it made
                nnet1 train on a small set of speakers at the end of each iteration.

                Eric

                Sent from sourceforge.net because you indicated interest in <
                https://sourceforge.net/p/kaldi/bugs/19/>

                To unsubscribe from further messages, please visit <
                https://sourceforge.net/auth/subscriptions/>


                [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl

                Status: open
                Group: v1.0_(example)
                Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
                Last Updated: Tue Jun 16, 2015 04:54 PM UTC
                Owner: nobody

                Hi,

                First, thanks for this great ASR package called Kaldi!
                Next, I believe I ran into a bug in utils/shuffle_list.pl
                To reproduce (bug manifestation depends on perl implementation of sort)

                i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
                cat nums-10k | ./utils/shuffle_list.pl | tail -40

                Above creates file of 10k lines where each line contains its index. Then
                we
                look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
                not
                properly shuffled.

                Instead of shuffle_list.pl, if lines are unique, one could use sort -R
                or
                something like this perl one liner

                Basically, the problem is that providing perl's sort algorithm a fair
                coin
                to flip doesn't guarantee shuffled output.
                I'm not sure where else this script is called, but in my case it made
                nnet1
                train on a small set of speakers at the end of each iteration.

                Eric

                Sent from sourceforge.net because you indicated interest in
                https://sourceforge.net/p/kaldi/bugs/19/

                To unsubscribe from further messages, please visit
                https://sourceforge.net/auth/subscriptions/


                [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in
                shuffle_list.pl http://shuffle_list.pl*

                Status: open
                Group: v1.0_(example)
                Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
                Last Updated: Tue Jun 16, 2015 04:54 PM UTC
                Owner: nobody

                Hi,

                First, thanks for this great ASR package called Kaldi!
                Next, I believe I ran into a bug in utils/shuffle_list.pl
                To reproduce (bug manifestation depends on perl implementation of sort)

                i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
                cat nums-10k | ./utils/shuffle_list.pl | tail -40

                Above creates file of 10k lines where each line contains its index. Then
                we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
                is
                not properly shuffled.

                Instead of shuffle_list.pl, if lines are unique, one could use sort -R
                or
                something like this http://stackoverflow.com/a/886250 perl one liner

                Basically, the problem is that providing perl's sort algorithm a fair
                coin
                to flip doesn't guarantee shuffled output.
                I'm not sure where else this script is called, but in my case it made
                nnet1 train on a small set of speakers at the end of each iteration.

                Eric

                Sent from sourceforge.net because you indicated interest in
                https://sourceforge.net/p/kaldi/bugs/19/

                To unsubscribe from further messages, please visit
                https://sourceforge.net/auth/subscriptions/


                [bugs:#19] bug in shuffle_list.pl

                Status: open
                Group: v1.0_(example)
                Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
                Last Updated: Tue Jun 16, 2015 04:54 PM UTC
                Owner: nobody

                Hi,

                First, thanks for this great ASR package called Kaldi!
                Next, I believe I ran into a bug in utils/shuffle_list.pl
                To reproduce (bug manifestation depends on perl implementation of sort)

                i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
                cat nums-10k | ./utils/shuffle_list.pl | tail -40

                Above creates file of 10k lines where each line contains its index. Then
                we
                look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
                not
                properly shuffled.

                Instead of shuffle_list.pl, if lines are unique, one could use sort -R
                or
                something like this perl one liner

                Basically, the problem is that providing perl's sort algorithm a fair
                coin
                to flip doesn't guarantee shuffled output.
                I'm not sure where else this script is called, but in my case it made
                nnet1
                train on a small set of speakers at the end of each iteration.

                Eric

                Sent from sourceforge.net because you indicated interest in
                https://sourceforge.net/p/kaldi/bugs/19/

                To unsubscribe from further messages, please visit
                https://sourceforge.net/auth/subscriptions/


                [bugs:#19] bug in shuffle_list.pl http://shuffle_list.pl*

                Status: open
                Group: v1.0_(example)
                Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
                Last Updated: Tue Jun 16, 2015 04:54 PM UTC
                Owner: nobody

                Hi,

                First, thanks for this great ASR package called Kaldi!
                Next, I believe I ran into a bug in utils/shuffle_list.pl
                To reproduce (bug manifestation depends on perl implementation of sort)

                i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
                cat nums-10k | ./utils/shuffle_list.pl | tail -40

                Above creates file of 10k lines where each line contains its index. Then
                we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
                not properly shuffled.

                Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
                something like this http://stackoverflow.com/a/886250 perl one liner

                Basically, the problem is that providing perl's sort algorithm a fair coin
                to flip doesn't guarantee shuffled output.
                I'm not sure where else this script is called, but in my case it made
                nnet1 train on a small set of speakers at the end of each iteration.

                Eric

                Sent from sourceforge.net because you indicated interest in <
                https://sourceforge.net/p/kaldi/bugs/19/>

                To unsubscribe from further messages, please visit <
                https://sourceforge.net/auth/subscriptions/>


                [bugs:#19] bug in shuffle_list.pl http://shuffle_list.pl*

                Status: open
                Group: v1.0_(example)
                Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
                Last Updated: Tue Jun 16, 2015 04:54 PM UTC
                Owner: nobody

                Hi,

                First, thanks for this great ASR package called Kaldi!
                Next, I believe I ran into a bug in utils/shuffle_list.pl
                To reproduce (bug manifestation depends on perl implementation of sort)

                i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
                cat nums-10k | ./utils/shuffle_list.pl | tail -40

                Above creates file of 10k lines where each line contains its index. Then
                we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
                not properly shuffled.

                Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
                something like this http://stackoverflow.com/a/886250 perl one liner

                Basically, the problem is that providing perl's sort algorithm a fair coin
                to flip doesn't guarantee shuffled output.
                I'm not sure where else this script is called, but in my case it made
                nnet1 train on a small set of speakers at the end of each iteration.

                Eric

                Sent from sourceforge.net because you indicated interest in
                https://sourceforge.net/p/kaldi/bugs/19/

                To unsubscribe from further messages, please visit
                https://sourceforge.net/auth/subscriptions/


                [bugs:#19] bug in shuffle_list.pl

                Status: open
                Group: v1.0_(example)
                Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
                Last Updated: Tue Jun 16, 2015 04:54 PM UTC
                Owner: nobody

                Hi,

                First, thanks for this great ASR package called Kaldi!
                Next, I believe I ran into a bug in utils/shuffle_list.pl
                To reproduce (bug manifestation depends on perl implementation of sort)

                i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
                cat nums-10k | ./utils/shuffle_list.pl | tail -40

                Above creates file of 10k lines where each line contains its index. Then we
                look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is not
                properly shuffled.

                Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
                something like this perl one liner

                Basically, the problem is that providing perl's sort algorithm a fair coin
                to flip doesn't guarantee shuffled output.
                I'm not sure where else this script is called, but in my case it made nnet1
                train on a small set of speakers at the end of each iteration.

                Eric


                Sent from sourceforge.net because you indicated interest in
                https://sourceforge.net/p/kaldi/bugs/19/

                To unsubscribe from further messages, please visit
                https://sourceforge.net/auth/subscriptions/

                 

                Related

                Bugs: #19

                • Jan "yenda" Trmal

                  I'm happy it works for you. I think your observation makes sense. The
                  question is if it's only some specific version of perl/OS/glibc (or some
                  combination of those) that caused you the sorting problems or if it's just
                  you who actually noticed (and the issue affects many more people and
                  systems).
                  y.

                  On Thu, Jun 18, 2015 at 11:54 PM, Daniel Povey danielpovey@users.sf.net
                  wrote:

                  Interesting, and that makes sense. Cc'ing Karel for his info.

                  Dan

                  On Thu, Jun 18, 2015 at 11:40 PM, Eric Shellef ericshellef@users.sf.net
                  wrote:

                  Hi Jan,

                  The numbers look mixed and the logic of the code makes sense.

                  FYI, I saw quicker convergence on a validation set when training nnet1
                  with
                  the properly shuffled audio (several hundred hours) as compared to the
                  same
                  audio under previous shuffle. The WER on a test set was accordingly
                  better
                  after ten epochs with the properly shuffled sentences.
                  I haven't verified this trend on more than one test set, but it's worth
                  checking.

                  Thanks,
                  Eric

                  On Wed, Jun 17, 2015 at 12:07 PM, Jan jtrmal@users.sf.net wrote:

                  I just committed a fix to this. Eric, can you please check if it fixes
                  your issues? I checked the output and they seem "random enough" on our
                  cluster.
                  y.

                  On Tue, Jun 16, 2015 at 5:09 PM, Daniel Povey danielpovey@users.sf.net
                  wrote:

                  I'm more concerned about reproducibility on the same platform, from run
                  to

                  run.
                  Across platforms, things won't be exactly reproducible for other reasons.
                  Dan

                  On Tue, Jun 16, 2015 at 4:57 PM, Eric Shellef ericshellef@users.sf.net
                  wrote:

                  Just a small note, if reproducibility across platforms is also a concern,
                  I'm not sure perl random is consistent. See e.g.
                  http://www.perlmonks.org/bare/?node_id=437589

                  On Tue, Jun 16, 2015 at 11:21 AM, Daniel Povey danielpovey@users.sf.net
                  wrote:

                  The reason for using that script is reproducibility, which sort -R lacks.
                  The core of the sorting is
                  @lines = sort { rand() <=> rand() } @lines;
                  which Karel or I got from online somewhere. This algorithm is
                  probably incorrect (i.e. does not give fully random output), depending
                  on the implementation of 'sort'.
                  I think it would be better to prepend each line with the output of
                  rand() and then \t, and then sort using string order, and then remove
                  everything up to and including the \t before printing out. This will
                  still be consistent but will properly sort the input. Yenda, do you
                  have time to test this out?

                  Dan

                  On Tue, Jun 16, 2015 at 1:19 PM, Jan jtrmal@users.sf.net wrote:

                  BTW, I still cannot see why the current implementation should work the
                  way
                  you described (unless the rand() is broken)
                  y.

                  On Tue, Jun 16, 2015 at 1:05 PM, Jan Trmal jtrmal@gmail.com wrote:

                  I'd suggest using the second option (sort -R might not be available
                  everywhere -- I remember running into troubles with it somewhere).
                  Let's wait for Dan.
                  y.

                  On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
                  wrote:


                  [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl

                  Status: open
                  Group: v1.0_(example)
                  Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
                  Last Updated: Tue Jun 16, 2015 04:54 PM UTC
                  Owner: nobody

                  Hi,

                  First, thanks for this great ASR package called Kaldi!
                  Next, I believe I ran into a bug in utils/shuffle_list.pl
                  To reproduce (bug manifestation depends on perl implementation of sort)

                  i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
                  cat nums-10k | ./utils/shuffle_list.pl | tail -40

                  Above creates file of 10k lines where each line contains its index. Then
                  we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
                  is
                  not properly shuffled.

                  Instead of shuffle_list.pl, if lines are unique, one could use sort -R
                  or something like this perl one
                  liner

                  Basically, the problem is that providing perl's sort algorithm a fair
                  coin to flip doesn't guarantee shuffled output.
                  I'm not sure where else this script is called, but in my case it made
                  nnet1 train on a small set of speakers at the end of each iteration.

                  Eric

                  Sent from sourceforge.net because you indicated interest in <
                  https://sourceforge.net/p/kaldi/bugs/19/>

                  To unsubscribe from further messages, please visit <
                  https://sourceforge.net/auth/subscriptions/>


                  [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in shuffle_list.pl

                  Status: open
                  Group: v1.0_(example)
                  Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
                  Last Updated: Tue Jun 16, 2015 04:54 PM UTC
                  Owner: nobody

                  Hi,

                  First, thanks for this great ASR package called Kaldi!
                  Next, I believe I ran into a bug in utils/shuffle_list.pl
                  To reproduce (bug manifestation depends on perl implementation of sort)

                  i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
                  cat nums-10k | ./utils/shuffle_list.pl | tail -40

                  Above creates file of 10k lines where each line contains its index. Then
                  we
                  look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
                  not
                  properly shuffled.

                  Instead of shuffle_list.pl, if lines are unique, one could use sort -R
                  or
                  something like this perl one liner

                  Basically, the problem is that providing perl's sort algorithm a fair
                  coin
                  to flip doesn't guarantee shuffled output.
                  I'm not sure where else this script is called, but in my case it made
                  nnet1
                  train on a small set of speakers at the end of each iteration.

                  Eric

                  Sent from sourceforge.net because you indicated interest in
                  https://sourceforge.net/p/kaldi/bugs/19/

                  To unsubscribe from further messages, please visit
                  https://sourceforge.net/auth/subscriptions/


                  [bugs:#19] http://sourceforge.net/p/kaldi/bugs/19 bug in
                  shuffle_list.pl http://shuffle_list.pl*

                  Status: open
                  Group: v1.0_(example)
                  Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
                  Last Updated: Tue Jun 16, 2015 04:54 PM UTC
                  Owner: nobody

                  Hi,

                  First, thanks for this great ASR package called Kaldi!
                  Next, I believe I ran into a bug in utils/shuffle_list.pl
                  To reproduce (bug manifestation depends on perl implementation of sort)

                  i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
                  cat nums-10k | ./utils/shuffle_list.pl | tail -40

                  Above creates file of 10k lines where each line contains its index. Then
                  we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
                  is
                  not properly shuffled.

                  Instead of shuffle_list.pl, if lines are unique, one could use sort -R
                  or
                  something like this http://stackoverflow.com/a/886250 perl one liner

                  Basically, the problem is that providing perl's sort algorithm a fair
                  coin
                  to flip doesn't guarantee shuffled output.
                  I'm not sure where else this script is called, but in my case it made
                  nnet1 train on a small set of speakers at the end of each iteration.

                  Eric

                  Sent from sourceforge.net because you indicated interest in
                  https://sourceforge.net/p/kaldi/bugs/19/

                  To unsubscribe from further messages, please visit
                  https://sourceforge.net/auth/subscriptions/


                  [bugs:#19] bug in shuffle_list.pl

                  Status: open
                  Group: v1.0_(example)
                  Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
                  Last Updated: Tue Jun 16, 2015 04:54 PM UTC
                  Owner: nobody

                  Hi,

                  First, thanks for this great ASR package called Kaldi!
                  Next, I believe I ran into a bug in utils/shuffle_list.pl
                  To reproduce (bug manifestation depends on perl implementation of sort)

                  i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
                  cat nums-10k | ./utils/shuffle_list.pl | tail -40

                  Above creates file of 10k lines where each line contains its index. Then
                  we
                  look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
                  not
                  properly shuffled.

                  Instead of shuffle_list.pl, if lines are unique, one could use sort -R
                  or
                  something like this perl one liner

                  Basically, the problem is that providing perl's sort algorithm a fair
                  coin
                  to flip doesn't guarantee shuffled output.
                  I'm not sure where else this script is called, but in my case it made
                  nnet1
                  train on a small set of speakers at the end of each iteration.

                  Eric

                  Sent from sourceforge.net because you indicated interest in
                  https://sourceforge.net/p/kaldi/bugs/19/

                  To unsubscribe from further messages, please visit
                  https://sourceforge.net/auth/subscriptions/


                  [bugs:#19] bug in shuffle_list.pl http://shuffle_list.pl*

                  Status: open
                  Group: v1.0_(example)
                  Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
                  Last Updated: Tue Jun 16, 2015 04:54 PM UTC
                  Owner: nobody

                  Hi,

                  First, thanks for this great ASR package called Kaldi!
                  Next, I believe I ran into a bug in utils/shuffle_list.pl
                  To reproduce (bug manifestation depends on perl implementation of sort)

                  i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
                  cat nums-10k | ./utils/shuffle_list.pl | tail -40

                  Above creates file of 10k lines where each line contains its index. Then
                  we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
                  is
                  not properly shuffled.

                  Instead of shuffle_list.pl, if lines are unique, one could use sort -R
                  or
                  something like this http://stackoverflow.com/a/886250 perl one liner

                  Basically, the problem is that providing perl's sort algorithm a fair
                  coin
                  to flip doesn't guarantee shuffled output.
                  I'm not sure where else this script is called, but in my case it made
                  nnet1 train on a small set of speakers at the end of each iteration.

                  Eric

                  Sent from sourceforge.net because you indicated interest in <
                  https://sourceforge.net/p/kaldi/bugs/19/>

                  To unsubscribe from further messages, please visit <
                  https://sourceforge.net/auth/subscriptions/>


                  [bugs:#19] bug in shuffle_list.pl http://shuffle_list.pl*

                  Status: open
                  Group: v1.0_(example)
                  Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
                  Last Updated: Tue Jun 16, 2015 04:54 PM UTC
                  Owner: nobody

                  Hi,

                  First, thanks for this great ASR package called Kaldi!
                  Next, I believe I ran into a bug in utils/shuffle_list.pl
                  To reproduce (bug manifestation depends on perl implementation of sort)

                  i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
                  cat nums-10k | ./utils/shuffle_list.pl | tail -40

                  Above creates file of 10k lines where each line contains its index. Then
                  we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file
                  is
                  not properly shuffled.

                  Instead of shuffle_list.pl, if lines are unique, one could use sort -R
                  or
                  something like this http://stackoverflow.com/a/886250 perl one liner

                  Basically, the problem is that providing perl's sort algorithm a fair
                  coin
                  to flip doesn't guarantee shuffled output.
                  I'm not sure where else this script is called, but in my case it made
                  nnet1 train on a small set of speakers at the end of each iteration.

                  Eric

                  Sent from sourceforge.net because you indicated interest in
                  https://sourceforge.net/p/kaldi/bugs/19/

                  To unsubscribe from further messages, please visit
                  https://sourceforge.net/auth/subscriptions/


                  [bugs:#19] bug in shuffle_list.pl

                  Status: open
                  Group: v1.0_(example)
                  Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
                  Last Updated: Tue Jun 16, 2015 04:54 PM UTC
                  Owner: nobody

                  Hi,

                  First, thanks for this great ASR package called Kaldi!
                  Next, I believe I ran into a bug in utils/shuffle_list.pl
                  To reproduce (bug manifestation depends on perl implementation of sort)

                  i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
                  cat nums-10k | ./utils/shuffle_list.pl | tail -40

                  Above creates file of 10k lines where each line contains its index. Then
                  we
                  look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
                  not
                  properly shuffled.

                  Instead of shuffle_list.pl, if lines are unique, one could use sort -R
                  or
                  something like this perl one liner

                  Basically, the problem is that providing perl's sort algorithm a fair
                  coin
                  to flip doesn't guarantee shuffled output.
                  I'm not sure where else this script is called, but in my case it made
                  nnet1
                  train on a small set of speakers at the end of each iteration.

                  Eric


                  Sent from sourceforge.net because you indicated interest in
                  https://sourceforge.net/p/kaldi/bugs/19/

                  To unsubscribe from further messages, please visit
                  https://sourceforge.net/auth/subscriptions/


                  ** [bugs:#19] bug in shuffle_list.pl**

                  Status: open
                  Group: v1.0_(example)
                  Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
                  Last Updated: Tue Jun 16, 2015 04:54 PM UTC
                  Owner: nobody

                  Hi,

                  First, thanks for this great ASR package called Kaldi!
                  Next, I believe I ran into a bug in utils/shuffle_list.pl
                  To reproduce (bug manifestation depends on perl implementation of sort)

                  i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
                  cat nums-10k | ./utils/shuffle_list.pl | tail -40

                  Above creates file of 10k lines where each line contains its index. Then
                  we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
                  not properly shuffled.

                  Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
                  something like this perl one liner

                  Basically, the problem is that providing perl's sort algorithm a fair coin
                  to flip doesn't guarantee shuffled output.
                  I'm not sure where else this script is called, but in my case it made
                  nnet1 train on a small set of speakers at the end of each iteration.

                  Eric


                  Sent from sourceforge.net because you indicated interest in <
                  https://sourceforge.net/p/kaldi/bugs/19/>

                  To unsubscribe from further messages, please visit <
                  https://sourceforge.net/auth/subscriptions/>

                   

                  Related

                  Bugs: #19

  • Jan "yenda" Trmal

    I'd suggest using the second option (sort -R might not be available
    everywhere -- I remember running into troubles with it somewhere).
    Let's wait for Dan.
    y.

    On Tue, Jun 16, 2015 at 12:54 PM, Eric Shellef ericshellef@users.sf.net
    wrote:


    ** [bugs:#19] bug in shuffle_list.pl**

    Status: open
    Group: v1.0_(example)
    Created: Tue Jun 16, 2015 04:54 PM UTC by Eric Shellef
    Last Updated: Tue Jun 16, 2015 04:54 PM UTC
    Owner: nobody

    Hi,

    First, thanks for this great ASR package called Kaldi!
    Next, I believe I ran into a bug in utils/shuffle_list.pl
    To reproduce (bug manifestation depends on perl implementation of sort)

    i=0 ; while [ $i -lt 10000 ] ; do echo $i >> nums-10k ; i=$((i+1)) ; done
    cat nums-10k | ./utils/shuffle_list.pl | tail -40

    Above creates file of 10k lines where each line contains its index. Then
    we look at tail after shuffle using perl v5.18.2 on ubuntu. End of file is
    not properly shuffled.

    Instead of shuffle_list.pl, if lines are unique, one could use sort -R or
    something like this perl one liner

    Basically, the problem is that providing perl's sort algorithm a fair coin
    to flip doesn't guarantee shuffled output.
    I'm not sure where else this script is called, but in my case it made
    nnet1 train on a small set of speakers at the end of each iteration.

    Eric


    Sent from sourceforge.net because you indicated interest in <
    https://sourceforge.net/p/kaldi/bugs/19/>

    To unsubscribe from further messages, please visit <
    https://sourceforge.net/auth/subscriptions/>

     

    Related

    Bugs: #19

  • Jan "yenda" Trmal

    • status: open --> closed
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.