Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniform returned type in the NLPModel API #454

Closed
tmigot opened this issue Mar 26, 2024 · 9 comments · Fixed by #455
Closed

Uniform returned type in the NLPModel API #454

tmigot opened this issue Mar 26, 2024 · 9 comments · Fixed by #455

Comments

@tmigot
Copy link
Member

tmigot commented Mar 26, 2024

I think we should clarify the returned type in the API.

Essentially, when calling grad(nlp::AbstractNLPModel{T, S}, x::V) where {T, S, V} we have two options:

  • i) The returned type is S
  • ii) The returned type is V

In general, I think assuming S == V is too restrictive.
Personally, I prefer option ii) as it is easier to implement multi-precision.

@amontoison @abelsiqueira @dpo What do you think? i) or ii) or ... ?

Illustration

There are some incoherences right now, for instance, the functions grad(nlp, x) or residual(nls,x) do not have the same behavior.

using CUDA, NLPModels, NLPModelsTest # https://github.com/JuliaSmoothOptimizers/NLPModelsTest.jl#use-gpu

nls = LLS(CuArray{Float32})
x32 = nls.meta.x0
CUDA.allowscalar() do 
  residual(nls, x32)
end
x64 = fill!(CuArray{Float64}(undef, 2), 0)
CUDA.allowscalar() do 
  residual(nls, x64) # ERROR: MethodError: no method matching residual(::LLS{Float32, CuArray{Float32}}, ::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer})
end

CUDA.allowscalar() do 
  grad(nls, x64) # Return a CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}
end

This is connected to JuliaSmoothOptimizers/NLPModelsTest.jl#105

@amontoison
Copy link
Member

  • I think that i) is the best solution. The gradient should have the same type of the model.

@tmigot
Copy link
Member Author

tmigot commented Mar 26, 2024

Oh sorry, I fixed my suggestion. I liked ii) better as it allows to have a model and still evaluate in another precision.

Is there a particular reason @amontoison ?

@dpo
Copy link
Member

dpo commented Mar 26, 2024

I feel it should really be

grad(nlp::AbstractNLPModel{T, S}, x::S) where {T, S}  # -> return a value of type S

and there should be conversion methods if the type of x is not S.

EDIT: No, nevermind. Now that I think about it, we allowed x::V to allow views, among others, to work efficiently. However, if the user calls grad() and not grad!(), we have to decide what the return type should be. It should be S because x could be a view, and it doesn’t make sense to return a new view.

@tmigot
Copy link
Member Author

tmigot commented Mar 27, 2024

The views are a good point. The current implementation of grad does this:

function grad(nlp::AbstractNLPModel, x::AbstractVector)
  @lencheck nlp.meta.nvar x
  g = similar(x)
  return grad!(nlp, x, g)
end

to keep the type of x and avoid the potential issue with views. This is actually a third option, iii).

However, I suspect the compiler will prefer what you both suggested

function grad(nlp::AbstractNLPModel{T, S}, x::AbstractVector) where {T, S}
  @lencheck nlp.meta.nvar x
  g = S(undef, nlp.meta.nvar)
  return grad!(nlp, x, g)
end

@dpo
Copy link
Member

dpo commented Mar 27, 2024

What makes you say that the compiler would prefer S as return type?

Just to think things through, let’s say S is a CUDA array and consider the following cases:

  1. x is a view into an object of type S: then I presume similar(x) will return another object of type S, and that is good.
  2. x is a Vector{Float64} (for some reason). What should we return then and why?

@tmigot
Copy link
Member Author

tmigot commented Mar 27, 2024

What makes you say that the compiler would prefer S as return type?

From https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing I think using similar(x) (iii) will no specialize, while using S (ii) does. This could probably be tested though.


I think in general chosing S as the return type sort of "fix" things, so it can be known in advance what is the return type, while relying on x is more flexible but also less predictable in a sense.

This example sort of illustrates it. Tthe btime's are exactly the same, it is just compilation that changes.

using BenchmarkTools, NLPModels, NLPModelsTest

function grad1(nlp::AbstractNLPModel, x::AbstractVector)
    @lencheck nlp.meta.nvar x
    g = similar(x)
    return grad!(nlp, x, g)
end

function grad2(nlp::AbstractNLPModel{T, S}, x::AbstractVector) where {T, S}
    @lencheck nlp.meta.nvar x
    g = S(undef, nlp.meta.nvar)
    return grad!(nlp, x, g)
end

nlp = BROWNDEN()
g = similar(nlp.meta.x0)
xview = view(fill!(Vector{Float64}(undef, 5), 1), 1:4)
grad!(nlp, nlp.meta.x0, g) # pre-compile grad!
grad!(nlp, xview, g) # pre-compile grad!

@time grad1(nlp, nlp.meta.x0) # 0.009147 seconds (1.99 k allocations: 136.719 KiB, 99.43% compilation time)
@btime grad1(nlp, nlp.meta.x0)
@time grad1(nlp, xview) # 0.012386 seconds (5.15 k allocations: 346.609 KiB, 99.33% compilation time)
@time grad2(nlp, nlp.meta.x0) # 0.007110 seconds (2.25 k allocations: 155.812 KiB, 99.45% compilation time)
@btime grad2(nlp, nlp.meta.x0)
@time grad2(nlp, xview) # 0.006264 seconds (2.69 k allocations: 177.188 KiB, 99.41% compilation time)

I think the conclusion, I am trying to reach is grad2 is more "robust" to unexpected types of x.


Beyond that, I think it's a choice. The following table is testing nlp = BROWNDEN(S) and x::V with the function grad1 and grad2.

S V grad1(similar) grad2(S)
Vector{Float64} Vector{Float32} Vector{Float32} Vector{Float64}
Vector{Float32} Vector{Float64} Vector{Float64} Vector{Float32}
CuArray{Float64} Vector{Float64} Vector{Float64} CuArray{Float64}
Vector{Float64} CuArray{Float64} CuArray{Float64} Vector{Float64}

but it's definitely worth having this discussion and documenting the solution.

@tmigot
Copy link
Member Author

tmigot commented Mar 28, 2024

This seems related JuliaSmoothOptimizers/JSOSolvers.jl#135 @paraynaud (in case you have some feedback too)

@paraynaud
Copy link
Member

From what I read, the option iii) should satisfy everyone.
Additionally, there is no risk for you to interfere with the partially-separable models while revamping grad as PartiallySeparableNLPModels.jl defines its own grad and grad! methods.

@tmigot tmigot linked a pull request Apr 5, 2024 that will close this issue
@tmigot
Copy link
Member Author

tmigot commented Apr 5, 2024

The PR #455 applies what we are discussing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants