It, for example, to generate a temporary log.Īirflow executes tasks of a DAG on different servers in case you are using Kubernetes executor or Celery executor. Thisįunction should never be used inside a task, especially to do the criticalĬomputation, as it leads to different outcomes on each run. The Python datetime now() function gives the current datetime object. You shouldįollow this partitioning method while writing data in S3/HDFS as well. You can use data_interval_start as a partition. A better way is to read the input data from a specific Someone may update the input data between re-runs, which results inĭifferent outputs. Some of the ways you can avoid producing a differentĭo not use INSERT during a task re-run, an INSERT statement might lead toĭuplicate rows in your database. AnĮxample is not to produce incomplete data in HDFS or S3 at the end of aĪirflow can retry a task if it fails. Implies that you should never produce incomplete results from your tasks. You should treat tasks in Airflow equivalent to transactions in a database. Using multiple Docker Images and Celery Queues.Using DockerOperator or Kubernetes Pod Operator.Handling conflicting/complex Python dependencies.Using AirflowClusterPolicySkipDag exception in cluster policies to skip specific DAGs.Example of watcher pattern with trigger rules.How to check if my code is “top-level” code.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |