How to fix symbol lookup error: undefined symbol errors in a cluster environment

After two dozens of comments to understand the situation, it was found that the libhdf5.so.7 was actually a symlink (with several levels of indirection) to a file that was not shared between the queued processes and the interactive processes. This means even though the symlink itself lies on a shared filesystem, the contents of the file do not and as a result the process was seeing different versions of the library.

For future reference: other than checking LD_LIBRARY_PATH, it’s always a good idea to check a library with nm -D to see if the symbols actually exist. In this case it was found that they do exist in interactive mode but not when run in the queue. A quick md5sum revealed that the files were actually different.

Leave a Comment