Menu

Aggregation function support in PMML

2015-01-09
2021-07-22
  • Debashis Mishra

    Debashis Mishra - 2015-01-09

    Problem statement : Usage of aggregation function along with defined analytic models in PMML.i.e.facilitate data transformation through PMML.
    Worries :
    [1] Zementis PMML validator link dont approve of such a PMML with aggregation defined in Transformation Dictionary.

    [2] None of the PMML open-source consumers,KNIME,Augustus 0.6, most of them using JPMML code within, facilitate consuming aggregation functions defined through PMML, while PMML 4.2 xsd defines such functions for transformation. Even WEKA also don't support aggregation.

    [3] Also came across H2O and Sparkling Water(both Apache License 2.0),which suggests doing math(esp.parsing data through GroupBy aggregation,Finding unique elements in data columns etc) on hadoop/YARN/Spark.They are also planning for PMML support to their Analytic models but its still in incubating stage.But they also don't have plans to support these math functions through PMML.

    Basically my worry is what is the thought behind this avoidance of aggregation function usage through PMML.
    Request a hint to further explore and derive conclusions to further finalize our design plan.

     
    • Svetlana Levitan

      Hi Debashis:

      Thank you for the good question.
      I think the reason for this avoidance is that most scoring engines are
      designed to score one record at a time, and aggregation requires all
      records at once. I think it is better to prepare the aggregated statistics
      first (maybe using a database or whatever tools you have), then score the
      model.

      Thank you and have a nice day.

      Svetlana Levitan, PhD
      IBM SPSS Analytic Components and PMML
      slevitan@us.ibm.com

      From: "Debashis Mishra" debashis121@users.sf.net
      To: "[pmml:discussion] " 187860@discussion.pmml.p.re.sf.net
      Date: 01/09/2015 06:03 AM
      Subject: [pmml:discussion] Aggregation function support in PMML

      Problem statement : Usage of aggregation function along with defined
      analytic models in PMML.i.e.facilitate data transformation through PMML.
      Worries :
      [1] Zementis PMML validator link dont approve of such a PMML with
      aggregation defined in Transformation Dictionary.
      [2] None of the PMML open-source consumers,KNIME,Augustus 0.6, most of
      them using JPMML code within, facilitate consuming aggregation functions
      defined through PMML, while PMML 4.2 xsd defines such functions for
      transformation. Even WEKA also don't support aggregation.
      [3] Also came across H2O and Sparkling Water(both Apache License
      2.0),which suggests doing math(esp.parsing data through GroupBy
      aggregation,Finding unique elements in data columns etc) on
      hadoop/YARN/Spark.They are also planning for PMML support to their
      Analytic models but its still in incubating stage.But they also don't have
      plans to support these math functions through PMML.
      Basically my worry is what is the thought behind this avoidance of
      aggregation function usage through PMML.
      Request a hint to further explore and derive conclusions to further
      finalize our design plan.

      Aggregation function support in PMML

      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/pmml/discussion/187860/
      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Gabriele Cardosi

    Hi Svetlana,
    I know it is a very old post, but I think it is a good place for my questions.
    I'm trying to understand "Aggregate" specification:
    1) when "groupBy" is provided and it is not null, should sum, max, min and average be calculated based on the size of the different groups ?
    2) what should be the data type of an Aggregate multiset or count with groupBy? It seems to me a map, but PMML does not have such datatype
    3) without "sqlWhere", the multiset just return a some set of items, where, inside each set, every element is exactly the same as the "grouping" term
    4) where it would be possible to read the expected syntax for the "sqlWhere" attribute ?

    Many thanks

    Best regards

    Gabriele

     

Log in to post a comment.

MongoDB Logo MongoDB