Thus, a parallelizing job distributor camouflages Rosetta to run on those systems. The supercomputers and clusters that most of the academic developers have access to REQUIRE parallel distribution of jobs. At the C++ level, it's all about jumping through hoops. I can't address this at the python level. "Furthermore, I'm not sure why people want to parallelize the JobDistributor." As a previous posters says, if Rosetta is compiled with MPI enabled (I have no experience with this), then yes, scoring function computation and other functions that are parallelized might benefit.
WINDOWS PARALLEL PROCESSING SERIAL
Please correct me if I'm wrong, but I believe PyRosetta is inherently serial because any secondary access to the Rosetta shared library through the PyRosetta interface will be blocked by the first Python thread that calls into the library. (Actually, Ahmdal says it would be faster to run many serial jobs.) If I need to generate 10,000 configurations, why don't I just run 4 JobDistributors churning out 2500 structures each? Docking simulations, or other methods that need to generate hundreds or thousands of configurations, are already "embarrassingly parallel"-runing one job on many cores is no different than running many jobs in many single cores. This will run many serial jobs.įurthermore, I'm not sure why people want to parallelize the JobDistributor. This can be done quite easily using the background command (python myPyRosettaScript.py &) in a Linux terminal or just running the other job from another terminal window. What the previous posts are saying is for those people with multicore machines, simply run your PyRosetta script multiple times. Running multiple processes is not the same distributing across multiple cores. If you have more questions about the underlying c++, you may want to repost on the rosetta3.0 board, I don't check this one often.
![windows parallel processing windows parallel processing](https://i.stack.imgur.com/nCf7z.png)
This is functionally equivalent to (and the same speed as) MPI parallelization. Just make sure you start them with independent random number seeds, and you'll get trajectories as effectively as one job eight times as long. It should be sufficient for your use.Ĭ) You can always run 8 jobs in 8 different directories. This method has NO GUARANTEES against overwriting, duplication of effort, or screwing up the scorefile with simultaneous writes. If the number of processors is small and the jobs are long, then this method allows use of multiple processors on one big job in one directory with minimal overwriting. In this mode, non-communicating Rosetta processes will use the filesystem to signal by making temporary files myjob_0001.in_progress, etc.
WINDOWS PARALLEL PROCESSING HOW TO
I'm sure you don't have this if you got binaries, and I'm not sure how to generate it for the python bindings.ī) The job distributor you are likely to be using supports an option called -run::multiple_processes_writing_to_one_directory. I can tell you this:Ī) The underlying C++ needs to be recompiled with extras=mpi to get MPI style parallel processing. I was one of the authors on the underlying C++ job distributors, but I'll freely admit I know nothing about the python layer.