2024. 11. 11. 12:06ㆍMemorizing/Jetson
Jetson Nano에서 Libtorch를 사용하기 위한 과정을 복기
중요한 것은 Pytorch에서 Libtorch를 제공하는데, 그것은 x86-64 아키텍쳐 기준이므로 라즈베리파이, 젯슨과 같이 arm64 아키텍쳐인 경우 새롭게 빌드를 해주어야한다.
세부적인 과정은 Pytorch Official 페이지를 참고했다.
git clone -b v2.0.0 --recurse-submodule https://github.com/pytorch/pytorch.git
mkdir pytorch-build
cd pytorch-build
cmake -DBUILD_SHARED_LIBS:BOOL=ON -DCMAKE_BUILD_TYPE:STRING=Release -DPYTHON_EXECUTABLE:PATH=`which python3` -DCMAKE_INSTALL_PREFIX:PATH=../pytorch-install ../pytorch
cmake --build . --parallel 4 --target install
하지만, 이것을 진행하다보니.. pytorch, cmake 버전이 안 맞는다는 오류가 발생했다.
현재 jetpack에 있는 cuda 버전과 pytorch 버전을 잘 맞추어줘야한다. 나 같은 경우 jetpack이 4.6이라 cuda 10.2 pytorch 1.8로 맞췄다.
젯슨 나노 cuda 10.2 기준으로 cmake를 3.18 버전 이상으로 만들어주어야한다. 따라서 새롭게 설치해서 다시 빌드해주어야했음..
cd ~
wget https://github.com/Kitware/CMake/releases/download/v3.18.0/cmake-3.18.0.tar.gz
tar -zxvf cmake-3.18.0.tar.gz
cd cmake-3.18.0/
./bootstrap -- -DCMAKE_USE_OPENSSL=OFF
make
sudo make install
나는 위의 과정에서 openssl이 설치 안되어 있어서 오류가 발생했는데, 그것을 없애기 위해 DCMAKE_USE_OPENSSL=OFF 옵션을 주었다.
이 과정에서 아래의 에러가 계속 발생하였다..
[ 83%] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMulDivKernel.cu.o
/usr/include/c++/7/cmath: In static member function ‘static scalar_t at::native::div_floor_kernel_cuda(at::TensorIterator&)::<lambda()>::<lambda()>::<lambda(scalar_t, scalar_t)>::_FUN(scalar_t, scalar_t)’:
/usr/include/c++/7/cmath:1302:38: internal compiler error: Segmentation fault
{ return __builtin_copysignf(__x, __y); }
^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
CMake Error at torch_cuda_generated_BinaryMulDivKernel.cu.o.Release.cmake:281 (message):
Error generating file
/home/mingi/pytorch-build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/./torch_cuda_generated_BinaryMulDivKernel.cu.o
caffe2/CMakeFiles/torch_cuda.dir/build.make:79721: recipe for target 'caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMulDivKernel.cu.o' failed
make[2]: *** [caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMulDivKernel.cu.o] Error 1
CMakeFiles/Makefile2:3683: recipe for target 'caffe2/CMakeFiles/torch_cuda.dir/all' failed
make[1]: *** [caffe2/CMakeFiles/torch_cuda.dir/all] Error 2
Makefile:159: recipe for target 'all' failed
make: *** [all] Error 2
그래서 swapfile로 스왑 메모리를 추가해주었는데도 계속 실패하였음..
sudo swapoff /swapfile
sudo rm /swapfile
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
위의 문제가 메모리 부족 현상이라는 것을 깨닫고, 이전에 설치했던 파일 모두 삭제한 후, 메모리를 parallel을 1로 설정하여 다시 빌드
위의 과정을 거쳐 최종적으로 완성된 libtorch 설치 코드
git clone -b v1.8.0 --recurse-submodule https://github.com/pytorch/pytorch.git
mkdir pytorch-build
cd pytorch-build
cmake -DBUILD_SHARED_LIBS:BOOL=ON -DCMAKE_BUILD_TYPE:STRING=Release -DPYTHON_EXECUTABLE:PATH=`which python3` -DCMAKE_INSTALL_PREFIX:PATH=../pytorch-install -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc ../pytorch
cmake --build . --parallel 1 --target install
이후 아래와 같이 빌드를 수행하면 된다.
mkdir build
cd build
cmake ..
make
위의 과정으로 빌드해서 libtorch를 사용했을 때, cuda를 잡지 못한다는 것을 발견하였음.. 따라서 아래와 같이 cmake 빌드를 할 때, cuda를 잡을 수 있도록 해주어야한다..
#아래처럼 환경 변수를 설정해주어야 쿠다를 잡는 것 같았음!
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export MAX_JOBS=1
git clone -b v1.8.0 --recurse-submodule https://github.com/pytorch/pytorch.git
mkdir pytorch-build
cd pytorch-build
cmake -DBUILD_SHARED_LIBS=ON \
-DCMAKE_BUILD_TYPE=Release \
-DPYTHON_EXECUTABLE=$(which python3) \
-DCMAKE_INSTALL_PREFIX=../pytorch-install \
-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \
-DCUDA_ARCH_LIST="5.3" \
-DUSE_CUDA=ON \
-DUSE_CUDNN=ON \
-DUSE_OPENMP=ON \
-DUSE_NNPACK=ON \
-DBUILD_TEST=OFF \
../pytorch
cmake --build . --parallel 1 --target install
추가적으로 첫 번째 cmake 명령어를 실행하면 서머리가 다음과 같이 use_cuda=ON이 나와야한다!
-- ******** Summary ********
-- General:
-- CMake version : 3.18.0
-- CMake command : /usr/local/bin/cmake
-- System : Linux
-- C++ compiler : /usr/bin/c++
-- C++ compiler id : GNU
-- C++ compiler version : 7.5.0
-- CXX flags : -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow
-- Build type : Release
-- Compile definitions : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
-- CMAKE_PREFIX_PATH : /usr/local/cuda
-- CMAKE_INSTALL_PREFIX : /home/mingi/pytorch-install
--
-- TORCH_VERSION : 1.8.0
-- CAFFE2_VERSION : 1.8.0
-- BUILD_CAFFE2 : ON
-- BUILD_CAFFE2_OPS : ON
-- BUILD_CAFFE2_MOBILE : OFF
-- BUILD_STATIC_RUNTIME_BENCHMARK: OFF
-- BUILD_TENSOREXPR_BENCHMARK: OFF
-- BUILD_BINARY : OFF
-- BUILD_CUSTOM_PROTOBUF : ON
-- Link local protobuf : ON
-- BUILD_DOCS : OFF
-- BUILD_PYTHON : ON
-- Python version : 3.6.9
-- Python executable : /usr/bin/python3
-- Pythonlibs version : 3.6.9
-- Python library : /usr/lib/python3.6
-- Python includes : /usr/include/python3.6m
-- Python site-packages: lib/python3.6/site-packages
-- BUILD_SHARED_LIBS : ON
-- CAFFE2_USE_MSVC_STATIC_RUNTIME : OFF
-- BUILD_TEST : OFF
-- BUILD_JNI : OFF
-- BUILD_MOBILE_AUTOGRAD : OFF
-- INTERN_BUILD_MOBILE :
-- USE_BLAS : 1
-- BLAS : open
-- USE_LAPACK : 1
-- LAPACK : open
-- USE_ASAN : OFF
-- USE_CPP_CODE_COVERAGE : OFF
-- USE_CUDA : ON
-- Split CUDA : OFF
-- CUDA static link : OFF
-- USE_CUDNN : ON
-- CUDA version : 10.2
-- cuDNN version : 8.2.1
-- CUDA root directory : /usr/local/cuda
-- CUDA library : /usr/local/cuda/lib64/stubs/libcuda.so
-- cudart library : /usr/local/cuda/lib64/libcudart.so
-- cublas library : /usr/local/cuda/lib64/libcublas.so
-- cufft library : /usr/local/cuda/lib64/libcufft.so
-- curand library : /usr/local/cuda/lib64/libcurand.so
-- cuDNN library : /usr/lib/aarch64-linux-gnu/libcudnn.so
-- nvrtc : /usr/local/cuda/lib64/libnvrtc.so
-- CUDA include path : /usr/local/cuda/include
-- NVCC executable : /usr/local/cuda/bin/nvcc
-- NVCC flags : -Xfatbin;-compress-all;-DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_53,code=sm_53;-Xcudafe;--diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl;-std=c++14;-Xcompiler;-fPIC;--expt-relaxed-constexpr;--expt-extended-lambda;-Wno-deprecated-gpu-targets;--expt-extended-lambda;-Xcompiler;-fPIC;-DCUDA_HAS_FP16=1;-D__CUDA_NO_HALF_OPERATORS__;-D__CUDA_NO_HALF_CONVERSIONS__;-D__CUDA_NO_BFLOAT16_CONVERSIONS__;-D__CUDA_NO_HALF2_OPERATORS__
-- CUDA host compiler : /usr/bin/cc
-- NVCC --device-c : OFF
-- USE_TENSORRT : OFF
-- USE_ROCM : OFF
-- USE_EIGEN_FOR_BLAS : ON
-- USE_FBGEMM : OFF
-- USE_FAKELOWP : OFF
-- USE_KINETO : OFF
-- USE_FFMPEG : OFF
-- USE_GFLAGS : OFF
-- USE_GLOG : OFF
-- USE_LEVELDB : OFF
-- USE_LITE_PROTO : OFF
-- USE_LMDB : OFF
-- USE_METAL : OFF
-- USE_PYTORCH_METAL : OFF
-- USE_FFTW : OFF
-- USE_MKL : OFF
-- USE_MKLDNN : OFF
-- USE_NCCL : ON
-- USE_SYSTEM_NCCL : OFF
-- USE_NNPACK : ON
-- USE_NUMPY : ON
-- USE_OBSERVERS : ON
-- USE_OPENCL : OFF
-- USE_OPENCV : OFF
-- USE_OPENMP : ON
-- USE_TBB : OFF
-- USE_VULKAN : OFF
-- USE_PROF : OFF
-- USE_QNNPACK : ON
-- USE_PYTORCH_QNNPACK : ON
-- USE_REDIS : OFF
-- USE_ROCKSDB : OFF
-- USE_ZMQ : OFF
-- USE_DISTRIBUTED : ON
-- USE_MPI : ON
-- USE_GLOO : ON
-- USE_TENSORPIPE : ON
-- USE_DEPLOY : OFF
-- Public Dependencies : Threads::Threads
-- Private Dependencies : pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;nnpack;XNNPACK;/usr/lib/aarch64-linux-gnu/libnuma.so;fp16;/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so;/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so;gloo;tensorpipe;aten_op_header_gen;foxi_loader;rt;fmt::fmt-header-only;gcc_s;gcc;dl
-- Configuring done
-- Generating done
'Memorizing > Jetson' 카테고리의 다른 글
JetPack 5.1.1 에서 Jetpack 6 설치하기 (0) | 2025.01.03 |
---|---|
error: stat(/usr/local/cuda/lib64/libcudart_static.a): Bad message (0) | 2025.01.03 |
Jetson Orin Nano에 ONNXRUNTIME 환경 설정하기 (1) | 2024.12.21 |
Mac에서 ssh로 젯슨에 접속한 후, x11 forwarding 사용하기 (0) | 2024.12.20 |
Jetson Orin Nano에서 dlib 라이브러리 C++에서 빌드하기 (1) | 2024.12.19 |