Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registering User Defined Functions in SparkSQL #125

Open
gareth625 opened this issue Oct 16, 2017 · 6 comments
Open

Registering User Defined Functions in SparkSQL #125

gareth625 opened this issue Oct 16, 2017 · 6 comments

Comments

@gareth625
Copy link

Hi

I asked this question over on the Flambo google group a few days ago however I'm not sure how active it is. Sorry if I'm just being impatient, I have made some progress since then and have a proper example project to go with my question this time.

I've been attempting to register a User Defined Function in SparkSQL and have made some progress by analogy with how Flambo has implemented the map, filter, etc. interface in flambo.api using the flambo.function namespace. I've produced an analogue function namespace to implement the org.apache.spark.sql.api.java.UDF1 and .UDF2, .UDF3, etc. classes which can then be called in Clojure. This has worked locally with no problem but now I'm submitting to EMR I've getting a java.lang.IllegalStateException: Attempting to call unbound fn: #'my-spark-sql.core/the-work-horse.

I've attached a sample project, my-spark-sql.zip, which you can call with lein run local and that should work. It does for me :) However I haven't managed to get it to run on EMR. I haven't tried using spark-submit locally yet.

Does anyone have any thoughts on what might be wrong here?

@gareth625
Copy link
Author

I spotted my mistake. In the demo project in the my-spark-sql.function namespace I have the macro

(defmacro udf
  [args & body]
  (let [udf# (ffn/mk-sym "my-spark-sql.function/udf%d" (count args))]
    `(~udf# (fn ~args ~@body))))

however it should be

(defmacro udf
  [args & body]
  (let [udf# (ffn/mk-sym "my-spark-sql.function/udf%d" (count args))]
    `(~udf# (sfn/fn ~args ~@body))))

In the last line I should be creating an anonymous serializable.fn rather than an anonymous fn. I didn't manage to strip away enough complexity when building the test project.

Is this something that I could contribute to the Flambo project?

@NonaryR
Copy link

NonaryR commented Nov 23, 2017

@gareth625 hello! great work, thank you! do you have any examples with udaf (defined aggregate function)? I want to implement group by function over sql.dataset

@gareth625
Copy link
Author

@NonaryR unfortunately I haven't. I've just had a look and I think it's a slightly different problem as the UDAF isn't an interface it's an abstract class. My Java and Clojure/Java interop isn't strong enough to know how much of a difference this makes.

I'd be interesting to hear what you find out.

I have some ideas but can't test them at the moment and don't want to suggested deadends as I don't really have enough experience to know if they're pointless.

@leon-barrett
Copy link
Contributor

@gareth625 Good catch--yes, to send the function, it needs to be serializable.

And yes, your udf macro looks helpful, and it'd be nice to have it in flambo.function. Would you like to make a PR for that?

Note that you've duplicated UDF16 a couple times, but I imagine you've caught that by now. :) You might even use a macro to generate numbers 1-16.

Thanks for the proposal, and we look forward to a PR if you're willing.

@jiyouyou125
Copy link

#131 add udf1-udf3, sql test

@gareth625
Copy link
Author

Hi @jiyouyou125

Thanks for that. I'd done a bit of work last week but was having trouble in my repl. I've created a PR against your repo with the helper macros to generate all the possible UDF helpers.

I think there's a bit more tidying up that can be done with regard to registering the UDFs but I'm away now until next week.

Sorry for the slow response.

Thanks
Gareth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants