Scheduling All Kinds of Recurring Jobs with Python

Author : carsonlorene
Publish Date : 2020-12-16 15:52:56


Before exploring any external libraries, let’s first check what we have in Pythons standard library. Most of the time, Python standard library will contain solution to whatever problem you might have and if the problem is running deferred jobs like with Linux at command, then grabbing sched module might be the way to go.

https://zenodo.org/communities/demon-slayer-le-train-de-l-infini-streaming-vf-2020-regarder-demon-slayer-film-complet-en-ligne
https://zenodo.org/communities/wander-streaming-vf-2020-film-complet-welcome
https://zenodo.org/communities/regarder-after-we-collided-streaming-vf-2020-film-complet
https://zenodo.org/communities/streaming-after-chapitre-2-film-complet-en-ligne-gratuit
https://zenodo.org/communities/regarder-hold-up-documentaire-streaming-vf-en-francais

sched is a very simple module, which can be used to schedule one-off tasks for some specified time - so, it's important to realise, that this is not recurring job (like cron job). It works on all platforms, which might seem obvious, but will not necessarily be the case with all the libraries shown later.

One of the use cases for such deferred task can be scheduled shutdown or if you are’re working with network connections or firewall you can create one-time job to revert changes in case you mess up and lock yourself out of the system.

Enough talking, let’s see an example:

The code above defines scheduler, which is used to create (.enter) events to be executed at later time. Each event (call to .enter) receives 4 arguments, which are - delay in seconds ( in how many seconds will the event happen?), priority, name of the function to be called and optional function arguments. The priority argument doesn't matter most of the time, but can be very important if 2 events are scheduled to happen at exactly the same time, yet they have to be executed sequentially. In that case, the event with highest priority (lowest number) goes first.

In this code snippet we can also see that .enter method returns event ID. These IDs can be used to cancel events as demonstrated with scheduler.cancel(event_2_id).

To not block the main thread of the program, we also used threading.Thread to start the scheduler and called .join() on it to gracefully terminate when it's done with all the tasks.
Full Power of Crontab

There’s quite a few libraries for running recurring jobs using Python, but let’s start with the one that gives you the full cron “experience”. This library is called python-crontab and can be installed with pip install python-crontab.

python-crontab, unlike other libraries and modules listed here, creates and manages actual real crontabs on Unix systems and tasks on Windows. Therefore, it's not emulating behavior of these operating system tools, but rather leveraging them and using what's already there.

For an example here, let’s see some practical use case. Common reason for running recurring tasks can be checking of status of database server. This can be generally done by connecting to and logging into database and running dummy query like SELECT 1, just like so:

As I previously mentioned, python-crontab provides the real cron "experience", which includes the generally disliked cron syntax. To set the schedule, one uses .setall method to set all the fields. Before setting the schedule however, we need to create the crontab using CronTab() and specify the owning user. If True is passed in, ID of user executing the program will be used. We also have to create an individual job (.new()) in this crontab passing in command to be executed and optionally also a comment.

When we have the crontab and its job ready we need to write it, but it’s good idea to check its syntax using .is_valid() before we do so.

Another basic database admin task is creation of periodic backups, that can be also done easily with python-crontab, this time with little different syntax:

If you’re not super comfortable with cron syntax, this library also provides declarative syntax, which is shown in the example above. This syntax is in my opinion very confusing and even harder to read and use than normal cron syntax, so I’d recommend to stick with cron syntax or choose different library (see next section).

Apart from different syntax we can also see usage of Python context manager, which allows us to omit the .write method shown previously. One more thing to keep in mind is, that if you decide to run cron jobs as root user (not recommended), as shown above, then you will have to run the program with sudo.

This library has also other useful features apart from basic creation and management of crontabs. One of them being listing and inspecting both user and system crontabs, as well as lookup based on criteria like command or comment of the specific job:

As I mentioned in previous section, not all libraries shown here work exactly the same way on all platforms. python-crontab works on Linux and Windows, but on Windows only user crontabs (Windows tasks) are supported.
If You Really Hate Cron Syntax

We’ve seen how to schedule job with declarative syntax with python-crontab in previous section, but it wasn't really readable or user friendly. If you're looking for the most user friendly, most popular library with very simple interface, then schedule is library for you.

schedule is based on an article Rethinking Cron which describes some of the cron problems and weaknesses and this library does a good job at solving them.

The biggest complaint with cron is definitely its terse and hard to write syntax, so let’s see how schedule addresses that:

The first 5 scheduled jobs above don’t really need much of an explanation. The code is very human-readable a quite self-explanatory. The interface only contains a few function for days (.monday()) and times (.seconds(), .hours(), ...), which makes it very easy to use.

Apart from the simple scheduling, the interface contains also .tag() method for grouping the jobs by tag. This can be useful for example for cancelling whole groups of jobs (with .clear()).

One downside of having such simple interface is the lack of explicit month or range scheduling, e.g. scheduling jobs during 10–14h or from Jan to Mar isn’t really possible.

Aside from recurring jobs, you can also use schedule to run one-off tasks and achieve the same effect as with sched, but with nicer syntax:

Apart from the deferred job, this code snippet also shows that we need to keep the thread alive for the jobs to run. That’s because this library doesn’t create actual cron or at jobs. If you don't want to block the main thread of your program like in the example above, you can also run it in background as shown here.
All The Features You Might Ever Need

All the previously mentioned tools have their pros and cons, some specific features and design that makes them good for some specific use cases. If you, however need to run both deferred and periodic jobs, need to store jobs in database, need builtin logging features, etc., then most likely none of the above mentioned tools are going to cut it.

The most feature rich and powerful library for scheduling jobs of any kind in Python is definitely APScheduler, which stands for Advanced Python Scheduler.

It ticks all the boxes, when it comes to features mention above and these kind of features require extensive configuration, so let’s see how APScheduler does it:

This code snippet shows sample configuration, which can be used to setup SQLite and MongoDB job stores, which house the scheduled jobs. It shows configuration of executors which handle running of jobs — here we specify the size of our pools. We also specify some job defaults, such as number of job instances that can run in parallel. All the configs are passed to scheduler, which is used to manage jobs.

Next comes the creation of our jobs using .add_job() method. It takes quite a few arguments, first of them being function to be ran. Next is the trigger type, which can be interval, cron or date. Interval schedules jobs to run periodically in said interval. Cron is just good old cron-like scheduler, which allows for classic and keyword argument-based scheduling arguments. Finally, date trigger create onetime jobs at specific date and time.

One more important argument to .add_job() is misfire_grace_time, which provides anacron-like feature, or in other words - in case your job doesn't run because scheduler is down, scheduler will try to run it when it's back up, as long as the misfire_grace_time hasn't been exceeded.

Scheduled jobs are generally annoying to debug. APScheduler tries to alleviate that with ability to easily configure logging levels as well an ability to add listeners to scheduler events — e.g. when job is executed or when job fails. You can see such listener and log sample log output below:

For The Gevent Users

Last and maybe actually the least (desirable solution) is to use Gevent. Not because Gevent is bad, but because it isn’t really built for running scheduled tasks. If you’re, however already using Gevent in your application, it might make sense to use it to schedule jobs too.

If you aren’t familiar with Gevent, then Gevent is a concurrency library based on coroutines. It uses Greenlets to provide pseudo-concurrency for running multiple tasks in single OS thread. For a better understanding, let’s see a basic example:

This example shows how we can query multiple URLs in parallel using Gevent and its gevent.spawn. In the output above, you can see that all 3 jobs that were created started at the same(-ish) time and returned data later.

To perform the same task, but scheduled in the future, we can do the following:

Above we can see both example for running one-off jobs as well as periodic ones. Both of these solutions are kind of a hack/trick and should only be considered if you’re already using Gevent in your application. It’s also important to mention that above run_regularly function will slowly start to drift, even if we account for runtime of tasks. Therefore, you should preferably use GeventSchedule available in APScheduler library instead, as it's a more robust solution.
Conclusion

Running deferred or periodic jobs is very general task and you won’t find one library or tool that does it all perfectly, but I hope this article gave you a decent overview of what is available. Regardless of whichever tool you choose for your particular use case, it’s important to keep in mind the more general best practices, like for example: add comments to your cron jobs for clarity, avoid using root user (principle of least privilege), don't put passwords into your crontabs, etc. Also, if you're are going to use actual cron jobs, you might also want to leverage /etc/cron.daily, /etc/cron.weekly and /etc/cron.mothly to keep things organized and tidy. Last but not least, looking into anacrontab might also be useful if you need to ensure that your jobs will run even when the machine goes offline for a bit.



Catagory :news