Predictive Model Markup Language (PMML) / Discussion / Open Discussion: Aggregation function support in PMML

Debashis Mishra - 2015-01-09

Problem statement : Usage of aggregation function along with defined analytic models in PMML.i.e.facilitate data transformation through PMML.
Worries :
[1] Zementis PMML validator link dont approve of such a PMML with aggregation defined in Transformation Dictionary.

[2] None of the PMML open-source consumers,KNIME,Augustus 0.6, most of them using JPMML code within, facilitate consuming aggregation functions defined through PMML, while PMML 4.2 xsd defines such functions for transformation. Even WEKA also don't support aggregation.

[3] Also came across H2O and Sparkling Water(both Apache License 2.0),which suggests doing math(esp.parsing data through GroupBy aggregation,Finding unique elements in data columns etc) on hadoop/YARN/Spark.They are also planning for PMML support to their Analytic models but its still in incubating stage.But they also don't have plans to support these math functions through PMML.

Basically my worry is what is the thought behind this avoidance of aggregation function usage through PMML.
Request a hint to further explore and derive conclusions to further finalize our design plan.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Svetlana Levitan - 2015-01-09
  
  Hi Debashis:
  
  Thank you for the good question.
  I think the reason for this avoidance is that most scoring engines are
  designed to score one record at a time, and aggregation requires all
  records at once. I think it is better to prepare the aggregated statistics
  first (maybe using a database or whatever tools you have), then score the
  model.
  
  Thank you and have a nice day.
  
  Svetlana Levitan, PhD
  IBM SPSS Analytic Components and PMML
  slevitan@us.ibm.com
  
  From: "Debashis Mishra" debashis121@users.sf.net
  To: "[pmml:discussion] " 187860@discussion.pmml.p.re.sf.net
  Date: 01/09/2015 06:03 AM
  Subject: [pmml:discussion] Aggregation function support in PMML
  
  Problem statement : Usage of aggregation function along with defined
  analytic models in PMML.i.e.facilitate data transformation through PMML.
  Worries :
  [1] Zementis PMML validator link dont approve of such a PMML with
  aggregation defined in Transformation Dictionary.
  [2] None of the PMML open-source consumers,KNIME,Augustus 0.6, most of
  them using JPMML code within, facilitate consuming aggregation functions
  defined through PMML, while PMML 4.2 xsd defines such functions for
  transformation. Even WEKA also don't support aggregation.
  [3] Also came across H2O and Sparkling Water(both Apache License
  2.0),which suggests doing math(esp.parsing data through GroupBy
  aggregation,Finding unique elements in data columns etc) on
  hadoop/YARN/Spark.They are also planning for PMML support to their
  Analytic models but its still in incubating stage.But they also don't have
  plans to support these math functions through PMML.
  Basically my worry is what is the thought behind this avoidance of
  aggregation function usage through PMML.
  Request a hint to further explore and derive conclusions to further
  finalize our design plan.
  
  Aggregation function support in PMML
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/pmml/discussion/187860/
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gabriele Cardosi - 2021-07-22

Hi Svetlana,
I know it is a very old post, but I think it is a good place for my questions.
I'm trying to understand "Aggregate" specification:
1) when "groupBy" is provided and it is not null, should sum, max, min and average be calculated based on the size of the different groups ?
2) what should be the data type of an Aggregate multiset or count with groupBy? It seems to me a map, but PMML does not have such datatype
3) without "sqlWhere", the multiset just return a some set of items, where, inside each set, every element is exactly the same as the "grouping" term
4) where it would be possible to read the expected syntax for the "sqlWhere" attribute ?

Many thanks

Best regards

Gabriele

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aggregation function support in PMML

Forums

Help

Aggregation function support in PMML

Aggregation function support in PMML

Forums

Help

Aggregation function support in PMML document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Aggregation function support in PMML