Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCL fails with 2 fully connected layers #105

Open
joaopauloschuler opened this issue Dec 11, 2022 · 10 comments
Open

OpenCL fails with 2 fully connected layers #105

joaopauloschuler opened this issue Dec 11, 2022 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@joaopauloschuler
Copy link
Owner

I'm having problem to adapt one of my program using CAI with OpenCL

I've tested "SimpleImageClassifierGPU" and it's working on my computer (removing the option -dAVX, because my CPU is old)

When I try to add OpenCL in my test program with 2 fully connected layers (no convolution, it fails).

@joaopauloschuler joaopauloschuler added the bug Something isn't working label Dec 11, 2022
@joaopauloschuler joaopauloschuler self-assigned this Dec 11, 2022
@Dzandaa
Copy link

Dzandaa commented Dec 11, 2022

When calling:

NeuralFit.Fit(NeuralNet, TrainingPairs, nil, nil, {batchsize=}4, {epochs=}SEEpoch.Value);

Without neural.cl in same directory as the executable:
From line 584 of neuralopencl.pas
'File neural.cl could not be found.'

With neural.cl in same directory as the executable:
From line 948 of neuralopencl.pas
'clCreateContext OK!'
then crash...

I think that perhaps one of the problem is that my directory 'neural' is not in '../../../neural' but in ../neural

@Dzandaa
Copy link

Dzandaa commented Dec 11, 2022

I just try same program on a Linux Mint 20.2

I add "-dUseCThreads" in Custom Options
and change

{$IFDEF UseCThreads}
cthreads, cmem,
{$ENDIF}

to

{$IFDEF UseCThreads}
cthreads,
{$ENDIF}

It works, but I don't see any acceleration.
for 1000 epoch:
With OpenCl enabled and AVX : 34.62 Seconds
Without OpenCL and With AVX: 33.82 Seconds
Without OpenCL and Without AVX: 62.76 Seconds
With OpenCL and Without AVX: 62.92 Seconds

@joaopauloschuler
Copy link
Owner Author

OpenCL is actually slower in this experiment. I'm wondering if the number of weights/neurons is so small in this experiment that OpenCL has no advantage.

@Dzandaa
Copy link

Dzandaa commented Dec 12, 2022

I don't know why it crashes on Windows after clCreateContext OK!

@joaopauloschuler
Copy link
Owner Author

I'm about to start working on this.

@joaopauloschuler
Copy link
Owner Author

On dense (fully connected layers), OpenCL is called only when there is enough neurons/weights to compensate the overhead that it adds:

FShouldOpenCL := (FNeurons.Count >= 512) and (pPrevLayer.Output.Size >= 128);

Depending on how many neurons you have on each layer, maybe its not even in use.

@joaopauloschuler
Copy link
Owner Author

I've just tested the following and it works for me:

program Hypotenuse;
(*
Hypotenuse: learns how to calculate hypotenuse sqrt(X^2 + Y^2).
Copyright (C) 2019 Joao Paulo Schwarz Schuler

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*)

{$mode objfpc}{$H+}

uses {$IFDEF UNIX} {$IFDEF UseCThreads}
  cthreads, {$ENDIF} {$ENDIF}
  Classes,
  neuralnetwork,
  neuralvolume,
  neuralfit,
  neuralopencl;

  function CreateHypotenusePairList(MaxCnt: integer): TNNetVolumePairList;
  var
    Cnt: integer;
    LocalX, LocalY, Hypotenuse: TNeuralFloat;
  begin
    Result := TNNetVolumePairList.Create();
    for Cnt := 1 to MaxCnt do
    begin
      LocalX := Random(100);
      LocalY := Random(100);
      Hypotenuse := sqrt(LocalX*LocalX + LocalY*LocalY);

      Result.Add(
        TNNetVolumePair.Create(
          TNNetVolume.Create([LocalX, LocalY]),
          TNNetVolume.Create([Hypotenuse])
        )
      );
    end;
  end;

  // Returns TRUE if difference is smaller than 0.1 .
  function LocalFloatCompare(A, B: TNNetVolume; ThreadId: integer): boolean;
  begin
    Result := ( Abs(A.FData[0]-B.FData[0])<0.1 );
  end;

  procedure RunAlgo();
  var
    NN: TNNet;
    NFit: TNeuralFit;
    TrainingPairs, ValidationPairs, TestPairs: TNNetVolumePairList;
    Cnt: integer;
    pOutPut: TNNetVolume;
    EasyOpenCL: TEasyOpenCL;
  begin
    NN := TNNet.Create();
    NFit := TNeuralFit.Create();
    TrainingPairs := CreateHypotenusePairList(10000);
    ValidationPairs := CreateHypotenusePairList(1000);
    TestPairs := CreateHypotenusePairList(1000);

    NN.AddLayer([
      TNNetInput.Create(2),
      TNNetFullConnectReLU.Create(512),
      TNNetFullConnectReLU.Create(512),
      TNNetFullConnectLinear.Create(1)
    ]);

    EasyOpenCL := TEasyOpenCL.Create();
    if EasyOpenCL.GetPlatformCount() = 0 then
    begin
      WriteLn('No OpenCL capable platform has been found.');
      exit;
    end;
    WriteLn('Setting platform to: ', EasyOpenCL.PlatformNames[0]);
    EasyOpenCL.SetCurrentPlatform(EasyOpenCL.PlatformIds[0]);
    if EasyOpenCL.GetDeviceCount() = 0 then
    begin
      WriteLn('No OpenCL capable device has been found for platform ',EasyOpenCL.PlatformNames[0]);
      exit;
    end;
    EasyOpenCL.SetCurrentDevice(EasyOpenCL.Devices[0]);

    NFit.EnableOpenCL(EasyOpenCL.PlatformIds[0], EasyOpenCL.Devices[0]);

    WriteLn('Computing...');
    NFit.InitialLearningRate := 0.00001;
    NFit.LearningRateDecay := 0;
    NFit.L2Decay := 0;
    NFit.InferHitFn := @LocalFloatCompare;
    NFit.MaxThreadNum := 1;
    NFit.Fit(NN, TrainingPairs, ValidationPairs, TestPairs, {batchsize=}32, {epochs=}50);
    NN.DebugWeights();

    pOutPut := TNNetVolume.Create({pSizeX=}1, {pSizeY=}1, {pDepth=}1, {FillValue=}1);

    // tests the learning
    for Cnt := 0 to 9 do
    begin
      NN.Compute(TestPairs[Cnt].I);
      NN.GetOutput(pOutPut);
      WriteLn
      ( 'Inputs:',
        TestPairs[Cnt].I.FData[0]:5:2,', ',
        TestPairs[Cnt].I.FData[1]:5:2,' - ',
        'Output:',
        pOutPut.Raw[0]:5:2,' ',
        ' Desired Output:',
        TestPairs[Cnt].O.FData[0]:5:2
      );
    end;

    EasyOpenCL.Free;
    pOutPut.Free;
    TestPairs.Free;
    ValidationPairs.Free;
    TrainingPairs.Free;
    NFit.Free;
    NN.Free;
    Write('Press ENTER to exit.');
    ReadLn;
  end;

var
  // Stops Lazarus errors
  Application: record Title:string; end;

begin
  Application.Title:='Hypotenuse Example';
  RunAlgo();
end.

@joaopauloschuler
Copy link
Owner Author

joaopauloschuler commented Dec 13, 2022

I've just tested the following and it also works for me:

    //NFit.MaxThreadNum := 1;
    NFit.Fit(NN, TrainingPairs, nil, nil, {batchsize=}32, {epochs=}50);

and

    NN.AddLayer([
      TNNetInput.Create(2),
      TNNetFullConnectReLU.Create(512),
      TNNetFullConnectReLU.Create(512),
      TNNetFullConnectReLU.Create(512),
      TNNetFullConnectLinear.Create(1)
    ]);

Given that I can't reproduce, you'll need to share a full Lazarus project source code that provokes the error.

@joaopauloschuler
Copy link
Owner Author

In the case that it helps, this is how neural.cl is loaded:

constructor TNeuralKernel.Create(pCurrentPlatform: cl_platform_id;
  pCurrentDevice: cl_device_id; kernelname: string = 'cai_dot_product');
begin
  inherited Create();
  SetCurrentPlatform(pCurrentPlatform);
  SetCurrentDevice(pCurrentDevice);

  // Create the OpenCL Kernel Here:
  if FileExists('../../../neural/neural.cl') then
  begin
    CompileProgramFromFile('../../../neural/neural.cl');
  end
  else if FileExists('neural.cl') then
  begin
    CompileProgramFromFile('neural.cl');
  end
  else
  begin
    MessageProc('File neural.cl could not be found.');
  end;
  PrepareKernel(kernelname);
end; 

@Dzandaa
Copy link

Dzandaa commented Dec 14, 2022

Hi,

Thank you very much for your tests :)
Here is my little test program.

You have to change the path of /neural and add neural.cl in the same directory as the executable.

NetSpectrum.zip

Just train (500-100 epoch) and test

B->

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants