Problem statement : Usage of aggregation function along with defined analytic models in PMML.i.e.facilitate data transformation through PMML.
Worries : [1] Zementis PMML validator link dont approve of such a PMML with aggregation defined in Transformation Dictionary.
[2] None of the PMML open-source consumers,KNIME,Augustus 0.6, most of them using JPMML code within, facilitate consuming aggregation functions defined through PMML, while PMML 4.2 xsd defines such functions for transformation. Even WEKA also don't support aggregation.
[3] Also came across H2O and Sparkling Water(both Apache License 2.0),which suggests doing math(esp.parsing data through GroupBy aggregation,Finding unique elements in data columns etc) on hadoop/YARN/Spark.They are also planning for PMML support to their Analytic models but its still in incubating stage.But they also don't have plans to support these math functions through PMML.
Basically my worry is what is the thought behind this avoidance of aggregation function usage through PMML.
Request a hint to further explore and derive conclusions to further finalize our design plan.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for the good question.
I think the reason for this avoidance is that most scoring engines are
designed to score one record at a time, and aggregation requires all
records at once. I think it is better to prepare the aggregated statistics
first (maybe using a database or whatever tools you have), then score the
model.
Thank you and have a nice day.
Svetlana Levitan, PhD
IBM SPSS Analytic Components and PMML
slevitan@us.ibm.com
Problem statement : Usage of aggregation function along with defined
analytic models in PMML.i.e.facilitate data transformation through PMML.
Worries : [1] Zementis PMML validator link dont approve of such a PMML with
aggregation defined in Transformation Dictionary. [2] None of the PMML open-source consumers,KNIME,Augustus 0.6, most of
them using JPMML code within, facilitate consuming aggregation functions
defined through PMML, while PMML 4.2 xsd defines such functions for
transformation. Even WEKA also don't support aggregation. [3] Also came across H2O and Sparkling Water(both Apache License
2.0),which suggests doing math(esp.parsing data through GroupBy
aggregation,Finding unique elements in data columns etc) on
hadoop/YARN/Spark.They are also planning for PMML support to their
Analytic models but its still in incubating stage.But they also don't have
plans to support these math functions through PMML.
Basically my worry is what is the thought behind this avoidance of
aggregation function usage through PMML.
Request a hint to further explore and derive conclusions to further
finalize our design plan.
Hi Svetlana,
I know it is a very old post, but I think it is a good place for my questions.
I'm trying to understand "Aggregate" specification:
1) when "groupBy" is provided and it is not null, should sum, max, min and average be calculated based on the size of the different groups ?
2) what should be the data type of an Aggregate multiset or count with groupBy? It seems to me a map, but PMML does not have such datatype
3) without "sqlWhere", the multiset just return a some set of items, where, inside each set, every element is exactly the same as the "grouping" term
4) where it would be possible to read the expected syntax for the "sqlWhere" attribute ?
Many thanks
Best regards
Gabriele
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Problem statement : Usage of aggregation function along with defined analytic models in PMML.i.e.facilitate data transformation through PMML.
Worries :
[1] Zementis PMML validator link dont approve of such a PMML with aggregation defined in Transformation Dictionary.
[2] None of the PMML open-source consumers,KNIME,Augustus 0.6, most of them using JPMML code within, facilitate consuming aggregation functions defined through PMML, while PMML 4.2 xsd defines such functions for transformation. Even WEKA also don't support aggregation.
[3] Also came across H2O and Sparkling Water(both Apache License 2.0),which suggests doing math(esp.parsing data through GroupBy aggregation,Finding unique elements in data columns etc) on hadoop/YARN/Spark.They are also planning for PMML support to their Analytic models but its still in incubating stage.But they also don't have plans to support these math functions through PMML.
Basically my worry is what is the thought behind this avoidance of aggregation function usage through PMML.
Request a hint to further explore and derive conclusions to further finalize our design plan.
Hi Debashis:
Thank you for the good question.
I think the reason for this avoidance is that most scoring engines are
designed to score one record at a time, and aggregation requires all
records at once. I think it is better to prepare the aggregated statistics
first (maybe using a database or whatever tools you have), then score the
model.
Thank you and have a nice day.
Svetlana Levitan, PhD
IBM SPSS Analytic Components and PMML
slevitan@us.ibm.com
From: "Debashis Mishra" debashis121@users.sf.net
To: "[pmml:discussion] " 187860@discussion.pmml.p.re.sf.net
Date: 01/09/2015 06:03 AM
Subject: [pmml:discussion] Aggregation function support in PMML
Problem statement : Usage of aggregation function along with defined
analytic models in PMML.i.e.facilitate data transformation through PMML.
Worries :
[1] Zementis PMML validator link dont approve of such a PMML with
aggregation defined in Transformation Dictionary.
[2] None of the PMML open-source consumers,KNIME,Augustus 0.6, most of
them using JPMML code within, facilitate consuming aggregation functions
defined through PMML, while PMML 4.2 xsd defines such functions for
transformation. Even WEKA also don't support aggregation.
[3] Also came across H2O and Sparkling Water(both Apache License
2.0),which suggests doing math(esp.parsing data through GroupBy
aggregation,Finding unique elements in data columns etc) on
hadoop/YARN/Spark.They are also planning for PMML support to their
Analytic models but its still in incubating stage.But they also don't have
plans to support these math functions through PMML.
Basically my worry is what is the thought behind this avoidance of
aggregation function usage through PMML.
Request a hint to further explore and derive conclusions to further
finalize our design plan.
Aggregation function support in PMML
Sent from sourceforge.net because you indicated interest in
https://sourceforge.net/p/pmml/discussion/187860/
To unsubscribe from further messages, please visit
https://sourceforge.net/auth/subscriptions/
Hi Svetlana,
I know it is a very old post, but I think it is a good place for my questions.
I'm trying to understand "Aggregate" specification:
1) when "groupBy" is provided and it is not null, should sum, max, min and average be calculated based on the size of the different groups ?
2) what should be the data type of an Aggregate multiset or count with groupBy? It seems to me a map, but PMML does not have such datatype
3) without "sqlWhere", the multiset just return a some set of items, where, inside each set, every element is exactly the same as the "grouping" term
4) where it would be possible to read the expected syntax for the "sqlWhere" attribute ?
Many thanks
Best regards
Gabriele