我执行数据挖掘的时候报错,第一次执行到抽取报错,第二次点击执行就报错;
第一次抽取报错日志:
- 2020-10-16 10:09:57.003 [85] INFO node.GenericNode.start:90 - Node start. (id:8b8ad4503f2aeb6f80912e7edcbfc366,name:FIT_NODE)
- 2020-10-16 10:09:57.140 [85] ERROR node.GenericNode.handleExecuteError:117 - Node execution failed.(id:8b8ad4503f2aeb6f80912e7edcbfc366,name:FIT_NODE)
- java.lang.IllegalArgumentException: requirement failed: Column CITY must be of type numeric but was actually of type string.
- at scala.Predef$.require(Predef.scala:224) ~[scala-library-2.11.12.jar:?]
- at org.apache.spark.ml.util.SchemaUtils$.checkNumericType(SchemaUtils.scala:76) ~[spark-mllib_2.11-2.4.0.jar:2.4.0]
- at org.apache.spark.ml.feature.QuantileDiscretizer$anonfun$transformSchema$1.apply(QuantileDiscretizer.scala:196) ~[spark-mllib_2.11-2.4.0.jar:2.4.0]
- at org.apache.spark.ml.feature.QuantileDiscretizer$anonfun$transformSchema$1.apply(QuantileDiscretizer.scala:195) ~[spark-mllib_2.11-2.4.0.jar:2.4.0]
- at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[scala-library-2.11.12.jar:?]
- at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) ~[scala-library-2.11.12.jar:?]
- at org.apache.spark.ml.feature.QuantileDiscretizer.transformSchema(QuantileDiscretizer.scala:195) ~[spark-mllib_2.11-2.4.0.jar:2.4.0]
- at org.apache.spark.ml.Pipeline$anonfun$transformSchema$4.apply(Pipeline.scala:184) ~[spark-mllib_2.11-2.4.0.jar:2.4.0]
- at org.apache.spark.ml.Pipeline$anonfun$transformSchema$4.apply(Pipeline.scala:184) ~[spark-mllib_2.11-2.4.0.jar:2.4.0]
- at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) ~[scala-library-2.11.12.jar:?]
- at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) ~[scala-library-2.11.12.jar:?]
- at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) ~[scala-library-2.11.12.jar:?]
- at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:184) ~[spark-mllib_2.11-2.4.0.jar:2.4.0]
- at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74) ~[spark-mllib_2.11-2.4.0.jar:2.4.0]
- at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:136) ~[spark-mllib_2.11-2.4.0.jar:2.4.0]
- at smartbix.datamining.engine.execute.node.feature.BucketizerNode$BucketizerEvent.fit(BucketizerNode.java:109) ~[EngineCommonNode-1.0-SNAPSHOT.jar:?]
- at smartbix.datamining.engine.execute.node.feature.BucketizerNode$BucketizerEvent.fit(BucketizerNode.java:58) ~[EngineCommonNode-1.0-SNAPSHOT.jar:?]
- at smartbix.datamining.engine.execute.node.train.FitNode.execute(FitNode.java:27) ~[EngineCommonNode-1.0-SNAPSHOT.jar:?]
- at smartbix.datamining.engine.execute.node.GenericNode.start(GenericNode.java:101) [EngineCore-1.0-SNAPSHOT.jar:?]
- at smartbix.datamining.engine.experiment.execute.node.ExperimentNodeExecutor.run(ExperimentNodeExecutor.java:40) [EngineExperiment-1.0-SNAPSHOT.jar:?]
- at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_202-ea]
- at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_202-ea]
- at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_202-ea]
- at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_202-ea]
- at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202-ea]
- 2020-10-16 10:09:57.204 [85] INFO reflections.Reflections.scan:232 - Reflections took 54 ms to scan 7 urls, producing 88 keys and 501 values
- 2020-10-16 10:09:57.259 [85] INFO flow.ExperimentGenericFlow.fail:204 - Flow failed,(id:I8ac26e200174d26bd26beaea0174d3a424b40023,name:日利润挖掘)
- 2020-10-16 10:09:57.259 [85] INFO flow.ExperimentGenericFlow.close:235 - Flow closed.(id:I8ac26e200174d26bd26beaea0174d3a424b40023)
- 2020-10-16 10:09:57.260 [85] INFO flow.ExperimentSparkFlowContext.close:34 - clear active session
复制代码 |