Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[blas] Resolve baseline problems #38467

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

BillyONeal
Copy link
Member

@BillyONeal BillyONeal commented Apr 29, 2024

Extended from that originally authored by @Cheney-W in #38097

@BillyONeal BillyONeal changed the title Enable blas and skip blas-test for arm64-neon-android [blas] Make all the Android platforms consistent in ci.baseline.txt Apr 29, 2024
@BillyONeal BillyONeal marked this pull request as draft April 29, 2024 21:55
Comment on lines +121 to +122
blas-test:x64-osx=pass # accelerate framework
lapack-test:x64-osx=pass # accelerate framework
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should group the passes separately. Also the comment does not matter here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with making those separate is that I feel I would need to explain the expected selection on each triplet twice

@BillyONeal BillyONeal marked this pull request as ready for review April 30, 2024 00:11
@BillyONeal
Copy link
Member Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JonLiu1993 JonLiu1993 self-assigned this Apr 30, 2024
@JonLiu1993 JonLiu1993 added info:internal This PR or Issue was filed by the vcpkg team. category:infrastructure Pertaining to the CI/Testing infrastrucutre labels Apr 30, 2024
JonLiu1993
JonLiu1993 previously approved these changes Apr 30, 2024
@JonLiu1993 JonLiu1993 added the info:reviewed Pull Request changes follow basic guidelines label Apr 30, 2024
@Neumann-A
Copy link
Contributor

So this is my analysis:

# Required fails due to broken

openblas:x64-android=fail # broken! ######### Why is this broken but arm64-android works ?
lapack-reference:arm-neon-android=fail # broken!


clapack:x64-android=skip
lapack-reference:x64-android=fail # broken!
clapack:arm64-android=skip
lapack-reference:arm64-android=fail # broken!

######## Maybe clapack works on android if openblas works?

# Test -> Should always pass if blas/lapack is supported
blas-test:x86-windows=pass # openblas
lapack-test:x86-windows=pass # lapack-reference[noblas]
blas-test:x64-windows-static=pass # lapack-reference[blas]
lapack-test:x64-windows-static=pass # lapack-reference[blas]
blas-test:x64-windows=pass # openblas
lapack-test:x64-windows=pass # lapack-reference[noblas]
blas-test:x64-windows-static-md=pass # lapack-reference[blas]
lapack-test:x64-windows-static-md=pass # lapack-reference[blas]
blas-test:arm64-windows=pass # openblas
lapack-test:arm64-windows=pass # clapack
blas-test:x64-uwp=pass # openblas
lapack-test:x64-uwp=pass # clapack
blas-test:arm64-uwp=pass # openblas
lapack-test:arm64-uwp=pass # clapack
blas-test:x64-linux=pass # openblas
lapack-test:x64-linux=pass # lapack-reference[noblas]
blas-test:arm64-osx=pass # accelerate framework
lapack-test:arm64-osx=pass # accelerate framework
blas-test:x64-osx=pass # accelerate framework
lapack-test:x64-osx=pass # accelerate framework

blas-test:arm64-android=pass # openblas ############## why does this work but x64-android not ?

#Failing tests (might be unnecessary if the dep already is broken.):
lapack-test:arm64-android=fail # lapack-reference[noblas] broken!
# arm-neon-android is broken!
blas-test:arm-neon-android=fail # openblas broken!
lapack-test:arm-neon-android=fail # lapack-reference[noblas] broken!
# x64-android is broken!

lapack-test:x64-android=fail # lapack-reference[noblas]

# unnecessary fails:
blas-test:x64-android=fail ###### openblas -> openblas is already failing

# Unnecessary skips/no collision only good to avoid pulling them accidently in, 
# however having an error due to them would be good since it would indicate a 
# location in need of a fix:
lapack-reference:arm64-osx=skip # Either this or the one before
openblas:arm64-osx=skip
lapack-reference:x64-osx=skip # Either this or the one before
openblas:x64-osx=skip

openblas:x64-windows-static=skip
openblas:x64-windows-static-md=skip

# Required skips due to possible collisions between clapack/lapack-reference
clapack:x86-windows=skip
clapack:x64-windows=skip
clapack:x64-windows-static-md=skip
clapack:x64-windows-static=skip
clapack:x64-linux=skip
clapack:arm64-osx=skip
clapack:x64-osx=skip
lapack-reference:x64-uwp=skip
lapack-reference:arm64-windows=skip
lapack-reference:arm64-uwp=skip

clapack:arm-neon-android=skip #### clapack should probably be used if openblas works. 

# Unnecessary pass (will be enforce by blas/lapack-test):
lapack-reference:x86-windows=pass
openblas:x86-windows=pass
lapack-reference:x64-windows=pass
openblas:x64-windows=pass
lapack-reference:x64-windows-static=pass
lapack-reference:x64-windows-static-md=pass
openblas:x64-uwp=pass
clapack:x64-uwp=pass
clapack:arm64-windows=pass
clapack:arm64-uwp=pass
openblas:arm64-windows=pass
openblas:arm64-uwp=pass
openblas:arm64-android=pass

lapack-reference:x64-linux=pass
openblas:x64-linux=pass

# Questionable pass since seemingly not useable:
openblas:arm-neon-android=pass


a) Please don't add unnecessary pass/fail, switching blas/lapack implementations is otherwise a pain. Having the test ports pass is enough.
b) Please don't mark stuff as skip if there is no collision between the ports e.g. only clapack/lapack-reference should be marked with skip
c) Only mark root ports like openblas/lapack-reference/clapack as fail if the corresponding blas/lapack-test port is failing. The test ports do a compile check if the found blas/lapack implementation has an expected symbol and only fails if that does not compile. If that is the case the root ports could also be marked with "supports": "!<what-failed>"
d) the android situation needs some more debugging. arm64 works but x64 does not for openblas? That seems questionable. Also forcing clapack here could be a way forward (currently using lapack-reference?). Android cannot use lapack-reference due to the gfortran libs not being compiled for android??

So from that analyzes the following should be applied:

blas-test:x86-windows=pass # openblas
lapack-test:x86-windows=pass # lapack-reference[noblas]
blas-test:x64-windows-static=pass # lapack-reference[blas]
lapack-test:x64-windows-static=pass # lapack-reference[blas]
blas-test:x64-windows=pass # openblas
lapack-test:x64-windows=pass # lapack-reference[noblas]
blas-test:x64-windows-static-md=pass # lapack-reference[blas]
lapack-test:x64-windows-static-md=pass # lapack-reference[blas]
blas-test:arm64-windows=pass # openblas
lapack-test:arm64-windows=pass # clapack
blas-test:x64-uwp=pass # openblas
lapack-test:x64-uwp=pass # clapack
blas-test:arm64-uwp=pass # openblas
lapack-test:arm64-uwp=pass # clapack
blas-test:x64-linux=pass # openblas
lapack-test:x64-linux=pass # lapack-reference[noblas]
blas-test:arm64-osx=pass # accelerate framework
lapack-test:arm64-osx=pass # accelerate framework
blas-test:x64-osx=pass # accelerate framework
lapack-test:x64-osx=pass # accelerate framework

blas-test:arm64-android=pass # openblas ############## why does this work but x64-android not ?

# Required skips due to possible collisions between clapack/lapack-reference
clapack:x86-windows=skip
clapack:x64-windows=skip
clapack:x64-windows-static-md=skip
clapack:x64-windows-static=skip
clapack:x64-linux=skip
clapack:arm64-osx=skip
clapack:x64-osx=skip
lapack-reference:x64-uwp=skip
lapack-reference:arm64-windows=skip
lapack-reference:arm64-uwp=skip

<+ more skips for the android situation probably 
lapack-reference:x64-android=skip
lapack-reference:arm64-android=skip 
# and force clapack
>

#### Should be marked via supports:
# Questionable pass since seemingly not useable:
openblas:arm-neon-android=pass

openblas:x64-android=fail # broken! ######### Why is this broken but arm64-android works ?
# clapack:x64-android=fail # openblas already fails
clapack:arm64-android=????? # openblas works so this could also work?

@Neumann-A
Copy link
Contributor

Furthermore: If Android uses GCC then the Fortran flags for Android are probably not being setup by the Android toolchain? So switching to clapack could solve that issue maybe.

@BillyONeal
Copy link
Member Author

BillyONeal commented Apr 30, 2024

openblas:x64-android=fail # broken! ######### Why is this broken but arm64-android works ?
lapack-reference:arm-neon-android=fail # broken!

I don't know. I am making ci.baseline.txt consistent with the status quo so that people editing other PRs don't experience baseline issues, I'm not trying to improve Android support right now.

The status quo I'm looking at is Friday's build: https://dev.azure.com/vcpkg/public/_build/results?buildId=102383&view=results

Installing 207/2154 lapack-reference[blas-select,core,noblas]:[email protected]#5...
Building lapack-reference[blas-select,core,noblas]:[email protected]#5...
-- Downloading https://github.com/Reference-LAPACK/lapack/archive/v3.11.0.tar.gz -> Reference-LAPACK-lapack-v3.11.0.tar.gz...
-- Extracting source /vcpkg/downloads/Reference-LAPACK-lapack-v3.11.0.tar.gz
-- Applying patch cmake-config.patch
-- Applying patch lapacke.patch
-- Applying patch fix_prefix.patch
-- Using source at /mnt/vcpkg-ci/b/lapack-reference/src/v3.11.0-6ae738f586.clean
-- The Fortran compiler identification is unknown
CMake Error at scripts/cmake/vcpkg_find_fortran.cmake:62 (message):
  Unable to find a Fortran compiler using 'CMakeDetermineFortranCompiler'.
  Please install one (e.g.  gfortran) and make it available on the PATH!
Call Stack (most recent call first):
  ports/lapack-reference/portfile.cmake:47 (vcpkg_find_fortran)
  scripts/ports.cmake:175 (include)


error: building lapack-reference:arm-neon-android failed with: BUILD_FAILED
Elapsed time to handle lapack-reference:arm-neon-android: 1.1 s
Installing 208/2154 lapack:arm-neon-android@2023-06-10...
Building lapack:arm-neon-android@2023-06-10...
error: building lapack:arm-neon-android failed with: CASCADED_DUE_TO_MISSING_DEPENDENCIES
  due to the following missing dependencies:
    lapack-reference[blas-select]:arm-neon-android
    lapack-reference[core]:arm-neon-android
Elapsed time to handle lapack:arm-neon-android: 47.7 us
Installing 206/2253 openblas:[email protected]...
Building openblas:[email protected]...
-- Using cached OpenMathLib-OpenBLAS-v0.3.27.tar.gz.
-- Extracting source /vcpkg/downloads/OpenMathLib-OpenBLAS-v0.3.27.tar.gz
-- Applying patch uwp.patch
-- Applying patch fix-redefinition-function.patch
-- Applying patch install-tools.patch
-- Using source at /mnt/vcpkg-ci/b/openblas/src/v0.3.27-b066c33329.clean
-- Configuring x64-android
CMake Error at scripts/cmake/vcpkg_execute_required_process.cmake:112 (message):
    Command failed: /vcpkg/downloads/tools/ninja/1.10.2-linux/ninja -v
    Working Directory: /mnt/vcpkg-ci/b/openblas/x64-android-rel/vcpkg-parallel-configure
    Error code: 1
    See logs for more information:
      /mnt/vcpkg-ci/b/openblas/config-x64-android-dbg-CMakeCache.txt.log
      /mnt/vcpkg-ci/b/openblas/config-x64-android-rel-CMakeCache.txt.log
      /mnt/vcpkg-ci/b/openblas/config-x64-android-out.log

Call Stack (most recent call first):
  /mnt/vcpkg-ci/installed/x64-linux/share/vcpkg-cmake/vcpkg_cmake_configure.cmake:252 (vcpkg_execute_required_process)
  ports/openblas/portfile.cmake:63 (vcpkg_cmake_configure)
  scripts/ports.cmake:175 (include)


error: building openblas:x64-android failed with: BUILD_FAILED
Elapsed time to handle openblas:x64-android: 1.6 s

# unnecessary fails:
blas-test:x64-android=fail ###### openblas -> openblas is already failing

I want someone who fixes openblas to need to change this to =pass.

a) Please don't add unnecessary pass/fail, switching blas/lapack implementations is otherwise a pain. Having the test ports pass is enough.

I think anything that changes anything about which blas/lapack backend is selected needs to evaluate this list; that this was not done in #38097 is why we are having this baseline issue. (I didn't know about this separate handling in ci.baseline.txt when I reviewed over there)

b) Please don't mark stuff as skip if there is no collision between the ports e.g. only clapack/lapack-reference should be marked with skip

I marked them skip because it is important that blas-test and lapack-test pass without them.

c) Only mark root ports like openblas/lapack-reference/clapack as fail if the corresponding blas/lapack-test port is failing.

That is not possible. People editing PRs unrelated to blas and lapack should not be getting baseline errors. ci.baseline.txt is a reflection of the status quo, not what we want the world to be.

d) the android situation needs some more debugging.

I agree! But I don't myself have time to do that right now, and blocking unrelated PRs while that is debugged is an unacceptable outcome.

@BillyONeal BillyONeal added requires:vcpkg-team-review This PR or issue requires someone on the vcpkg team to take a further look. and removed info:reviewed Pull Request changes follow basic guidelines labels Apr 30, 2024
@BillyONeal
Copy link
Member Author

I would like the other maintainers to consider Neumann-A's feedback above

@Neumann-A
Copy link
Contributor

I marked them skip because it is important that blas-test and lapack-test pass without them.

No the important part is that they pass selecting the correct blas/lapack implementation. Since BLA_VENDOR is fixed that shouldn't be an issue. You could probably inspect the cmakecache of the test to make sure the correct implementation was selected. (Before BLA_VENDOR was fixed it would have been an issue.) Marking them as skipped in the baseline just means reduced ci test coverage no longer building the ports.

That is not possible. People editing PRs unrelated to blas and lapack should not be getting baseline errors. ci.baseline.txt is a reflection of the status quo, not what we want the world to be.

I agree with that but the only observed CI error seems to be in android. So marking:

openblas:x64-android=fail/skip
blas:arm-neon-android=fail/skip

should be enough or not?

I think anything that changes anything about which blas/lapack backend is selected needs to evaluate this list;

No. This only needs to be evaluated for
lapack-test:=pass (lapack should always depend on blas, so marking this pass means blas-test also passes)
(or blas-test:=pass # if lapack-test is a known failure)

The idea is to mark the final downstream dep as passing since that means all upstream deps pass. Marking everything as pass is just micromanagement.

also changing blas/lapack backend should have no visible impact on the test. They still should pass. The only impact it can have is conflicting libraries being installed which would require a selection in the baseline.

@BillyONeal
Copy link
Member Author

@ras0219-msft
@AugP
@data-queue
@BillyONeal
@JavierMatosD

Before the meeting @JavierMatosD indicated that the supports expression outcome seemed reasonable.

(discussion for) We are happy with skipping more than absolutely necessary: (Neumann-A part B)

@ras0219-msft : This is morally what we have to do whenever there are alternatives so it isn't that much of a disservice if we do that for the BLAS alternatives.

(discussion for) We want supports expressions for blas and lapack if the corresponding tests are broken: (Neumann-A part C)

@ras0219-msft: In this case, supports expressions make sense however if for whatever reason supports did not make sense we want ci.baseline.txt to reflect the reality of the baseline, even if that combination might not make sense.

(discussion for) We want extra pass for the backends: (Neumann-A part A)

@BillyONeal I'm OK with this on the grounds that they don't do anything; they're more of a comment than an actually meaningful directive. I'm not sold on the "but that means if the backend changes I have to edit ci.baseline.txt" because I think any time backend selection changes the list needs to be re-audited.

We did not discuss Neumann-A's part D because we agree it should be fixed but we can't block baseline issues on that being fixed.

Result action items:

  • Billy to remove the extra =pass on each of the backends
  • Billy to change the supports expression on blas and lapack to reflect which tests work
  • The extra skips in ci.baseline.txt for supported platforms for BLAS not made irrelevant by the supports expressions remain

…kg/public/_build/results?buildId=102893&view=logs&j=18f767b8-9e55-53aa-9c57-42862973482f&t=c8899ec1-5880-5fd7-4dca-ef3731d2bc7b

-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - no
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
CMake Error at /vcpkg/downloads/tools/cmake-3.29.2-linux/cmake-3.29.2-linux-x86_64/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find Threads (missing: Threads_FOUND)
Call Stack (most recent call first):
@BillyONeal BillyONeal changed the title [blas] Make all the Android platforms consistent in ci.baseline.txt [blas] Resolve baseline problems May 17, 2024
ports/blas/vcpkg.json Outdated Show resolved Hide resolved
@BillyONeal BillyONeal removed the requires:vcpkg-team-review This PR or issue requires someone on the vcpkg team to take a further look. label May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:infrastructure Pertaining to the CI/Testing infrastrucutre info:internal This PR or Issue was filed by the vcpkg team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants