Batch Processing
This chapter describes Jakarta Batch, which provides support for defining, implementing, and running batch jobs. Batch jobs are tasks that can be executed without user interaction. The batch framework is composed of a job specification language based on XML, a Java API, and a batch runtime.
Introduction to Batch Processing
Some enterprise applications contain tasks that can be executed without user interaction. These tasks are executed periodically or when resource usage is low, and they often process large amounts of information such as log files, database records, or images. Examples include billing, report generation, data format conversion, and image processing. These tasks are called batch jobs.
Batch processing refers to running batch jobs on a computer system. Jakarta EE includes a batch processing framework that provides the batch execution infrastructure common to all batch applications, enabling developers to concentrate on the business logic of their batch applications. The batch framework consists of a job specification language based on XML, a set of batch annotations and interfaces for application classes that implement the business logic, a batch container that manages the execution of batch jobs, and supporting classes and interfaces to interact with the batch container.
A batch job can be completed without user intervention. For example, consider a telephone billing application that reads phone call records from the enterprise information systems and generates a monthly bill for each account. Since this application does not require any user interaction, it can run as a batch job.
The phone billing application consists of two phases: The first phase associates each call from the registry with a monthly bill, and the second phase calculates the tax and total amount due for each bill. Each of these phases is a step of the batch job.
Batch applications specify a set of steps and their execution order. Different batch frameworks may specify additional elements, like decision elements or groups of steps that run in parallel. The following sections describe steps in more detail and provide information about other common characteristics of batch frameworks.
Steps in Batch Jobs
A step is an independent and sequential phase of a batch job. Batch jobs contain chunk-oriented steps and task-oriented steps.
-
Chunk-oriented steps (chunk steps) process data by reading items from a data source, applying some business logic to each item, and storing the results. Chunk steps read and process one item at a time and group the results into a chunk. The results are stored when the chunk reaches a configurable size. Chunk-oriented processing makes storing results more efficient and facilitates transaction demarcation.
Chunk steps have three parts.
-
The input retrieval part reads one item at a time from a data source, such as entries on a database, files in a directory, or entries in a log file.
-
The business processing part manipulates one item at a time using the business logic defined by the application. Examples include filtering, formatting, and accessing data from the item for computing a result.
-
The output writing part stores a chunk of processed items at a time.
-
Chunk steps are often long-running because they process large amounts of data. Batch frameworks enable chunk steps to bookmark their progress using checkpoints. A chunk step that is interrupted can be restarted from the last checkpoint. The input retrieval and output writing parts of a chunk step save their current position after the processing of each chunk, and can recover it when the step is restarted.
Figure 1, “Chunk Steps in a Batch Job” shows the three parts of two steps in a batch job.
For example, the phone billing application consists of two chunk steps.
-
In the first step, the input retrieval part reads call records from the registry; the business processing part associates each call with a bill and creates a bill if one does not exist for an account; and the output writing part stores each bill in a database.
-
In the second step, the input retrieval part reads bills from the database; the business processing part calculates the tax and total amount due for each bill; and the output writing part updates the database records and generates printable versions of each bill.
This application could also contain a task step that cleaned up the files from the bills generated for the previous month.
Parallel Processing
Batch jobs often process large amounts of data or perform computationally expensive operations. Batch applications can benefit from parallel processing in two scenarios.
-
Steps that do not depend on each other can run on different threads.
-
Chunk-oriented steps where the processing of each item does not depend on the results of processing previous items can run on more than one thread.
Batch frameworks provide mechanisms for developers to define groups of independent steps and to split chunk-oriented steps in parts that can run in parallel.
Status and Decision Elements
Batch frameworks keep track of a status for every step in a job. The status indicates if a step is running or if it has completed. If the step has completed, the status indicates one of the following.
-
The execution of the step was successful.
-
The step was interrupted.
-
An error occurred in the execution of the step.
In addition to steps, batch jobs can also contain decision elements. Decision elements use the exit status of the previous step to determine the next step or to terminate the batch job. Decision elements set the status of the batch job when terminating it. Like a step, a batch job can terminate successfully, be interrupted, or fail.
Figure 2, “Steps and Decision Elements in a Job” shows an example of a job that contains chunk steps, task steps and a decision element.
Batch Framework Functionality
Batch applications have the following common requirements.
-
Define jobs, steps, decision elements, and the relationships between them.
-
Execute some groups of steps or parts of a step in parallel.
-
Maintain state information for jobs and steps.
-
Launch jobs and resume interrupted jobs.
-
Handle errors.
Batch frameworks provide the batch execution infrastructure that addresses the common requirements of all batch applications, enabling developers to concentrate on the business logic of their applications. Batch frameworks consist of a format to specify jobs and steps, an application programming interface (API), and a service available at runtime that manages the execution of batch jobs.
Batch Processing in Jakarta EE
This section lists the components of the batch processing framework in Jakarta EE and provides an overview of the steps you have to follow to create a batch application.
The Batch Processing Framework
Jakarta EE includes a batch processing framework that consists of the following elements:
-
A batch runtime that manages the execution of jobs
-
A job specification language based on XML
-
A Java API to interact with the batch runtime
-
A Java API to implement steps, decision elements, and other batch artifacts
Batch applications in Jakarta EE contain XML files and Java classes. The XML files define the structure of a job in terms of batch artifacts and the relationships between them. (A batch artifact is a part of a chunk-oriented step, a task-oriented step, a decision element, or another component of a batch application). The Java classes implement the application logic of the batch artifacts defined in the XML files. The batch runtime parses the XML files and loads the batch artifacts as Java classes to run the jobs in a batch application.
Creating Batch Applications
The process for creating a batch application in Jakarta EE is the following.
-
Design the batch application.
-
Identify the input sources, the format of the input data, the desired final result, and the required processing phases.
-
Organize the application as a job with chunk-oriented steps, task-oriented steps, and decision elements. Determine the dependencies between them.
-
Determine the order of execution in terms of transitions between steps.
-
Identify steps that can run in parallel and steps that can run in more than one thread.
-
-
Create the batch artifacts as Java classes by implementing the interfaces specified by the framework for steps, decision elements, and so on. These Java classes contain the code to read data from input sources, format items, process items, and store results. Batch artifacts can access context objects from the batch runtime using dependency injection.
-
Define jobs, steps, and their execution flow in XML files using the Job Specification Language. The elements in the XML files reference batch artifacts implemented as Java classes. The batch artifacts can access properties declared in the XML files, such as names of files and databases.
-
Use the Java API provided by the batch runtime to launch the batch application.
The following sections describe in detail how to use the components of the batch processing framework in Jakarta EE to create batch applications.
Elements of a Batch Job
A batch job can contain one or more of the following elements:
-
Steps
-
Flows
-
Splits
-
Decision elements
Steps are described in Introduction to Batch Processing, and can be chunk-oriented or task-oriented. Chunk-oriented steps can be partitioned steps. In a partitioned chunk step, the processing of one item does not depend on other items, so these steps can run in more than one thread.
A flow is a sequence of steps that execute as a unit. A sequence of related steps can be grouped together into a flow. The steps in a flow cannot transition to steps outside the flow. The flow transitions to the next element when its last step completes.
A split is a set of flows that execute in parallel; each flow runs on a separate thread. The split transitions to the next element when all its flows complete.
Decision elements use the exit status of the previous step to determine the next step or to terminate the batch job.
Properties and Parameters
Jobs and steps can have a number of properties associated with them. You define properties in the job definition file, and batch artifacts access these properties using context objects from the batch runtime. Using properties in this manner enables you to decouple static parameters of the job from the business logic and to reuse batch artifacts in different job definition files.
Specifying properties is described in Using the Job Specification Language, and accessing properties in batch artifacts is described in Creating Batch Artifacts.
Jakarta EE applications can also pass parameters to a job when they submit it to the batch runtime. This enables you to specify dynamic parameters that are only known at runtime. Parameters are also necessary for partitioned steps, since each partition needs to know, for example, what range of items to process.
Specifying parameters when submitting jobs is described in Submitting Jobs to the Batch Runtime. Specifying parameters for partitioned steps and accessing them in batch artifacts is demonstrated in The phonebilling Example Application.
Job Instances and Job Executions
A job definition can have multiple instances, each with different parameters. A job execution is an attempt to run a job instance. The batch runtime maintains information about job instances and job executions, as described in Checking the Status of a Job.
Batch and Exit Status
The state of jobs, steps, splits, and flows is represented in the batch runtime as a batch status value. Batch status values are listed Batch Status Values. They are represented as strings.
Value | Description |
---|---|
|
The job has been submitted to the batch runtime. |
|
The job is running. |
|
The job has been requested to stop. |
|
The job has stopped. |
|
The job finished executing because of an error. |
|
The job finished executing successfully. |
|
The job was marked abandoned. |
Jakarta EE applications can submit jobs and access the batch status of a job using the JobOperator
interface, as described in Submitting Jobs to the Batch Runtime.
Job definition files can refer to batch status values using the Job Specification Language (JSL), as described in Using the Job Specification Language.
Batch artifacts can access batch status values using context objects, as described in Using the Context Objects from the Batch Runtime.
For flows, the batch status is that of its last step. For splits, the batch status is the following:
-
COMPLETED
: If all its flows have a batch status ofCOMPLETED
-
FAILED
: If any flow has a batch status ofFAILED
-
STOPPED
: If any flow has a batch status ofSTOPPED
, and no flows have a batch status ofFAILED
The batch status for jobs, steps, splits, and flows is set by the batch runtime. Jobs, steps, splits, and flows also have an exit status, which is a user-defined value based on the batch status. You can set the exit status inside batch artifacts or in the job definition file. You can access the exit status in the same manner as the batch status, described above. The default value for the exit status is the same as the batch status.
Simple Use Case
This section demonstrates how to define a simple job using the Job Specification Language (JSL) and how to implement the corresponding batch artifacts. Refer to the rest of the sections in this chapter for detailed descriptions of the elements in the batch framework.
The following job definition specifies a chunk step and a task step as follows:
<?xml version="1.0" encoding="UTF-8"?>
<job id="simplejob" xmlns="https://jakarta.ee/xml/ns/jakartaee"
version="2.0">
<properties>
<property name="input_file" value="input.txt"/>
<property name="output_file" value="output.txt"/>
</properties>
<step id="mychunk" next="mytask">
<chunk>
<reader ref="MyReader"></reader>
<processor ref="MyProcessor"></processor>
<writer ref="MyWriter"></writer>
</chunk>
</step>
<step id="mytask">
<batchlet ref="MyBatchlet"></batchlet>
<end on="COMPLETED"/>
</step>
</job>
Chunk Step
In most cases, you have to implement a checkpoint class for chunk-oriented steps. The following class just keeps track of the line number in a text file:
public class MyCheckpoint implements Serializable {
private long lineNum = 0;
public void increase() { lineNum++; }
public long getLineNum() { return lineNum; }
}
The following item reader implementation continues reading the input file from the provided checkpoint if the job was restarted. The items consist of each line in the text file (in more complex scenarios, the items are custom Java types and the input source can be a database):
@Dependent
@Named("MyReader")
public class MyReader implements jakarta.batch.api.chunk.ItemReader {
private MyCheckpoint checkpoint;
private BufferedReader breader;
@Inject
JobContext jobCtx;
public MyReader() {}
@Override
public void open(Serializable ckpt) throws Exception {
if (ckpt == null)
checkpoint = new MyCheckpoint();
else
checkpoint = (MyCheckpoint) ckpt;
String fileName = jobCtx.getProperties()
.getProperty("input_file");
breader = new BufferedReader(new FileReader(fileName));
for (long i = 0; i < checkpoint.getLineNum(); i++)
breader.readLine();
}
@Override
public void close() throws Exception {
breader.close();
}
@Override
public Object readItem() throws Exception {
String line = breader.readLine();
return line;
}
}
In the following case, the item processor only converts the line to uppercase. More complex examples can process items in different ways or transform them into custom output Java types:
@Dependent
@Named("MyProcessor")
public class MyProcessor implements jakarta.batch.api.chunk.ItemProcessor {
public MyProcessor() {}
@Override
public Object processItem(Object obj) throws Exception {
String line = (String) obj;
return line.toUpperCase();
}
}
The batch processing API does not support generics. In most cases, you need to cast items to their specific type before processing them. |
The item writer writes the processed items to the output file. It overwrites the output file if no checkpoint is provided; otherwise, it resumes writing at the end of the file. Items are written in chunks:
@Dependent
@Named("MyWriter")
public class MyWriter implements jakarta.batch.api.chunk.ItemWriter {
private BufferedWriter bwriter;
@Inject
private JobContext jobCtx;
@Override
public void open(Serializable ckpt) throws Exception {
String fileName = jobCtx.getProperties()
.getProperty("output_file");
bwriter = new BufferedWriter(new FileWriter(fileName,
(ckpt != null)));
}
@Override
public void writeItems(List<Object> items) throws Exception {
for (int i = 0; i < items.size(); i++) {
String line = (String) items.get(i);
bwriter.write(line);
bwriter.newLine();
}
}
@Override
public Serializable checkpointInfo() throws Exception {
return new MyCheckpoint();
}
}
Task Step
The task step displays the length of the output file. In more complex scenarios, task steps perform any task that does not fit the chunk processing programming model:
@Dependent
@Named("MyBatchlet")
public class MyBatchlet implements jakarta.batch.api.chunk.Batchlet {
@Inject
private JobContext jobCtx;
@Override
public String process() throws Exception {
String fileName = jobCtx.getProperties()
.getProperty("output_file");
System.out.println(""+(new File(fileName)).length());
return "COMPLETED";
}
}
Using the Job Specification Language
The Job Specification Language (JSL) enables you to define the steps in a job and their execution order using an XML file. The following example shows how to define a simple job that contains one chunk step and one task step:
<job id="loganalysis" xmlns="https://jakarta.ee/xml/ns/jakartaee"
version="2.0">
<properties>
<property name="input_file" value="input1.txt"/>
<property name="output_file" value="output2.txt"/>
</properties>
<step id="logprocessor" next="cleanup">
<chunk checkpoint-policy="item" item-count="10">
<reader ref="com.example.pkg.LogItemReader"></reader>
<processor ref="com.example.pkg.LogItemProcessor"></processor>
<writer ref="com.example.pkg.LogItemWriter"></writer>
</chunk>
</step>
<step id="cleanup">
<batchlet ref="com.example.pkg.CleanUp"></batchlet>
<end on="COMPLETED"/>
</step>
</job>
This example defines the loganalysis
batch job, which consists of the logprocessor
chunk step and the cleanup
task step.
The logprocessor
step transitions to the cleanup
step, which terminates the job when completed.
The job
element defines two properties, input_file
and output_file
.
Specifying properties in this manner enables you to run a batch job with different configuration parameters without having to recompile its Java batch artifacts.
The batch artifacts can access these properties using the context objects from the batch runtime.
The logprocessor
step is a chunk step that specifies batch artifacts for the reader (LogItemReader
), the processor (LogItemProcessor
), and the writer (LogItemWriter
).
This step creates a checkpoint for every ten items processed.
The cleanup
step is a task step that specifies the CleanUp
class as its batch artifact.
The job terminates when this step completes.
The following sections describe the elements of the Job Specification Language (JSL) in more detail and show the most common attributes and child elements.
The job Element
The job
element is always the top-level element in a job definition file.
Its main attributes are id
and restartable
.
The job
element can contain one properties
element and zero or more of each of the following elements: listener
, step
, flow
, and split
.
For example:
<job id="jobname" restartable="true">
<listeners>
<listener ref="com.example.pkg.ListenerBatchArtifact"/>
</listeners>
<properties>
<property name="propertyName1" value="propertyValue1"/>
<property name="propertyName2" value="propertyValue2"/>
</properties>
<step ...> ... </step>
<step ...> ... </step>
<decision ...> ... </decision>
<flow ...> ... </flow>
<split ...> ... </split>
</job>
The listener
element specifies a batch artifact whose methods are invoked before and after the execution of the job.
The batch artifact is an implementation of the jakarta.batch.api.listener.JobListener
interface.
See The Listener Batch Artifacts for an example of a job listener implementation.
The first step
, flow
, or split
element inside the job
element executes first.
The step Element
The step
element can be a child of the job
and flow
elements. Its main attributes are id
and next
. The step
element can contain the following elements.
-
One
chunk
element for chunk-oriented steps or onebatchlet
element for task-oriented steps. -
One
properties
element (optional).This element specifies a set of properties that batch artifacts can access using batch context objects.
-
One
listener
element (optional); onelisteners
element if more than one listener is specified.This element specifies listener artifacts that intercept various phases of step execution.
For chunk steps, the batch artifacts for these listeners can be implementations of the following interfaces:
StepListener
,ItemReadListener
,ItemProcessListener
,ItemWriteListener
,ChunkListener
,RetryReadListener
,RetryProcessListener
,RetryWriteListener
,SkipReadListener
,SkipProcessListener
, andSkipWriteListener
.For task steps, the batch artifact for these listeners must be an implementation of the
StepListener
interface.See The Listener Batch Artifacts for an example of an item processor listener implementation.
-
One
partition
element (optional).This element is used in partitioned steps which execute in more than one thread.
-
One
end
element if this is the last step in a job.This element sets the batch status to
COMPLETED
. -
One
stop
element (optional) to stop a job at this step.This element sets the batch status to
STOPPED
. -
One
fail
element (optional) to terminate a job at this step.This element sets the batch status to
FAILED
. -
One or more
next
elements if thenext
attribute is not specified.This element is associated with an exit status and refers to another step, a flow, a split, or a decision element.
The following is an example of a chunk step:
<step id="stepA" next="stepB">
<properties> ... </properties>
<listeners>
<listener ref="MyItemReadListenerImpl"/>
...
</listeners>
<chunk ...> ... </chunk>
<partition> ... </partition>
<end on="COMPLETED" exit-status="MY_COMPLETED_EXIT_STATUS"/>
<stop on="MY_TEMP_ISSUE_EXIST_STATUS" restart="step0"/>
<fail on="MY_ERROR_EXIT_STATUS" exit-status="MY_ERROR_EXIT_STATUS"/>
</step>
The following is an example of a task step:
<step id="stepB" next="stepC">
<batchlet ...> ... </batchlet>
<properties> ... </properties>
<listener ref="MyStepListenerImpl"/>
</step>
The chunk Element
The chunk
element is a child of the step
element for chunk-oriented steps.
The attributes of this element are listed in Attributes of the chunk Element.
Attribute Name | Description | Default Value |
---|---|---|
|
Specifies how to commit the results of processing each chunk:
The checkpoint is updated when the results of a chunk are committed. Every chunk is processed in a global Jakarta EE transaction. If the processing of one item in the chunk fails, the transaction is rolled back and no processed items from this chunk are stored. |
|
|
Specifies the number of items to process before committing the chunk and taking a checkpoint. |
10 |
|
Specifies the number of seconds before committing the chunk and taking a checkpoint when If |
0 (no limit) |
|
Specifies if processed items are buffered until it is time to take a checkpoint. If true, a single call to the item writer is made with a list of the buffered items before committing the chunk and taking a checkpoint. |
true |
|
Specifies the number of skippable exceptions to skip in this step during chunk processing.
Skippable exception classes are specified with the |
No limit |
|
Specifies the number of attempts to execute this step if retryable exceptions occur.
Retryable exception classes are specified with the |
No limit |
The chunk
element can contain the following elements.
-
One
reader
element.This element specifies a batch artifact that implements the
ItemReader
interface. -
One
processor
element.This element specifies a batch artifact that implements the
ItemProcessor
interface. -
One
writer
element.This element specifies a batch artifact that implements the
ItemWriter
interface. -
One
checkpoint-algorithm
element (optional).This element specifies a batch artifact that implements the
CheckpointAlgorithm
interface and provides a custom checkpoint policy. -
One
skippable-exception-classes
element (optional).This element specifies a set of exceptions thrown from the reader, writer, and processor batch artifacts that chunk processing should skip. The
skip-limit
attribute from thechunk
element specifies the maximum number of skipped exceptions. -
One
retryable-exception-classes
element (optional).This element specifies a set of exceptions thrown from the reader, writer, and processor batch artifacts that chunk processing will retry. The
retry-limit
attribute from thechunk
element specifies the maximum number of attempts. -
One
no-rollback-exception-classes
element (optional).This element specifies a set of exceptions thrown from the reader, writer, and processor batch artifacts that should not cause the batch runtime to roll back the current chunk, but to retry the current operation without a rollback instead.
For exception types not specified in this element, the current chunk is rolled back by default when an exception occurs.
The following is an example of a chunk-oriented step:
<step id="stepC" next="stepD">
<chunk checkpoint-policy="item" item-count="5" time-limit="180"
buffer-items="true" skip-limit="10" retry-limit="3">
<reader ref="pkg.MyItemReaderImpl"></reader>
<processor ref="pkg.MyItemProcessorImpl"></processor>
<writer ref="pkg.MyItemWriterImpl"></writer>
<skippable-exception-classes>
<include class="pkg.MyItemException"/>
<exclude class="pkg.MyItemSeriousSubException"/>
</skippable-exception-classes>
<retryable-exception-classes>
<include class="pkg.MyResourceTempUnavailable"/>
</retryable-exception-classes>
</chunk>
</step>
This example defines a chunk step and specifies its reader, processor, and writer artifacts.
The step updates a checkpoint and commits each chunk after processing five items.
It skips all MyItemException
exceptions and all its subtypes, except for MyItemSeriousSubException
, up to a maximum of ten skipped exceptions.
The step retries a chunk when a MyResourceTempUnavailable
exception occurs, up to a maximum of three attempts.
The batchlet Element
The batchlet
element is a child of the step
element for task-oriented steps.
This element only has the ref
attribute, which specifies a batch artifact that implements the Batchlet
interface.
The batch
element can contain a properties
element.
The following is an example of a task-oriented step:
<step id="stepD" next="stepE">
<batchlet ref="pkg.MyBatchletImpl">
<properties>
<property name="pname" value="pvalue"/>
</properties>
</batchlet>
</step>
This example defines a batch step and specifies its batch artifact.
The partition Element
The partition
element is a child of the step
element.
It indicates that a step is partitioned.
Most partitioned steps are chunk steps where the processing of each item does not depend on the results of processing previous items.
You specify the number of partitions in a step and provide each partition with specific information on which items to process, such as the following.
-
A range of items. For example, partition 1 processes items 1 through 500, and partition 2 processes items 501 through 1000.
-
An input source. For example, partition 1 processes the items in
input1.txt
and partition 2 processes the items ininput2.txt
.
When the number of partitions, the number of items, and the input sources for a partitioned step are known at development or deployment time, you can use partition properties in the job definition file to specify partition-specific information and access these properties from the step batch artifacts. The runtime creates as many instances of the step batch artifacts (reader, processor, and writer) as partitions, and each artifact instance receives the properties specific to its partition.
In most cases, the number of partitions, the number of items, or the input sources for a partitioned step can only be determined at runtime.
Instead of specifying partition-specific properties statically in the job definition file, you provide a batch artifact that can access your data sources at runtime and determine how many partitions are needed and what range of items each partition should process.
This batch artifact is an implementation of the PartitionMapper
interface.
The batch runtime invokes this artifact and then uses the information it provides to instantiate the step batch artifacts (reader, writer, and processor) for each partition and to pass them partition-specific data as parameters.
The rest of this section describes the partition
element in detail and shows two examples of job definition files: one that uses partition properties to specify a range of items for each partition, and one that relies on a PartitionMapper
implementation to determine partition-specific information.
See The Phone Billing Chunk Step in The phonebilling Example Application for a complete example of a partitioned chunk step.
The partition
element can contain the following elements.
-
One
plan
element, if themapper
element is not specified.This element defines the number of partitions, the number of threads, and the properties for each partition in the job definition file. The
plan
element is useful when this information is known at development or deployment time. -
One
mapper
element, if theplan
element is not specified.This element specifies a batch artifact that provides the number of partitions, the number of threads, and the properties for each partition. The batch artifact is an implementation of the
PartitionMapper
interface. You use this option when the information required for each partition is only known at runtime. -
One
reducer
element (optional).This element specifies a batch artifact that receives control when a partitioned step begins, ends, or rolls back. The batch artifact enables you to merge results from different partitions and perform other related operations. The batch artifact is an implementation of the
PartitionReducer
interface. -
One
collector
element (optional).This element specifies a batch artifact that sends intermediary results from each partition to a partition analyzer. The batch artifact sends the intermediary results after each checkpoint for chunk steps and at the end of the step for task steps. The batch artifact is an implementation of the
PartitionCollector
interface. -
One
analyzer
element (optional).This element specifies a batch artifact that analyzes the intermediary results from the partition collector instances. The batch artifact is an implementation of the
PartitionAnalyzer
interface.
The following is an example of a partitioned step using the plan
element:
<step id="stepE" next="stepF">
<chunk>
<reader ...></reader>
<processor ...></processor>
<writer ...></writer>
</chunk>
<partition>
<plan partitions="2" threads="2">
<properties partition="0">
<property name="firstItem" value="0"/>
<property name="lastItem" value="500"/>
</properties>
<properties partition="1">
<property name="firstItem" value="501"/>
<property name="lastItem" value="999"/>
</properties>
</plan>
</partition>
<reducer ref="MyPartitionReducerImpl"/>
<collector ref="MyPartitionCollectorImpl"/>
<analyzer ref="MyPartitionAnalyzerImpl"/>
</step>
In this example, the plan
element specifies the properties for each partition in the job definition file.
The following example uses a mapper
element instead of a plan
element.
The PartitionMapper
implementation dynamically provides the same information as the plan
element provides in the job definition file:
<step id="stepE" next="stepF">
<chunk>
<reader ...></reader>
<processor ...></processor>
<writer ...></writer>
</chunk>
<partition>
<mapper ref="MyPartitionMapperImpl"/>
<reducer ref="MyPartitionReducerImpl"/>
<collector ref="MyPartitionCollectorImpl"/>
<analyzer ref="MyPartitionAnalyzerImpl"/>
</partition>
</step>
Refer to The phonebilling Example Application for an example implementation of the PartitionMapper
interface.
The flow Element
The flow
element can be a child of the job
, flow
, and split
elements.
Its attributes are id
and next
.
Flows can transition to flows, steps, splits, and decision elements.
The flow
element can contain the following elements:
-
One or more
step
elements -
One or more
flow
elements (optional) -
One or more
split
elements (optional) -
One or more
decision
elements (optional)
The last step
in a flow is the one with no next
attribute or next
element.
Steps and other elements in a flow cannot transition to elements outside the flow.
The following is an example of the flow
element:
<flow id="flowA" next="stepE">
<step id="flowAstepA" next="flowAstepB">...</step>
<step id="flowAstepB" next="flowAflowC">...</step>
<flow id="flowAflowC" next="flowAsplitD">...</flow>
<split id="flowAsplitD" next="flowAstepE">...</split>
<step id="flowAstepE">...</step>
</flow>
This example flow contains three steps, one flow, and one split.
The last step does not have the next
attribute.
The flow transitions to stepE
when its last step completes.
The split Element
The split
element can be a child of the job
and flow
elements.
Its attributes are id
and next
.
Splits can transition to splits, steps, flows, and decision elements.
The split
element can only contain one or more flow
elements that can only transition to other flow
elements in the split.
The following is an example of a split with three flows that execute concurrently:
<split id="splitA" next="stepB">
<flow id="splitAflowA">...</flow>
<flow id="splitAflowB">...</flow>
<flow id="splitAflowC">...</flow>
</split>
The decision Element
The decision
element can be a child of the job
and flow
elements.
Its attributes are id
and next
.
Steps, flows, and splits can transition to a decision
element.
This element specifies a batch artifact that decides the next step, flow, or split to execute based on information from the execution of the previous step, flow, or split.
The batch artifact implements the Decider
interface.
The decision
element can contain the following elements.
-
One or more
end
elements (optional).This element sets the batch status to
COMPLETED
. -
One or more
stop
elements (optional).This element sets the batch status to
STOPPED
. -
One or more
fail
elements (optional).This element sets the batch status to
FAILED
. -
One or more
next
elements (optional). -
One
properties
element (optional).
The following is an example of the decider
element:
<decision id="decisionA" ref="MyDeciderImpl">
<fail on="FAILED" exit-status="FAILED_AT_DECIDER"/>
<end on="COMPLETED" exit-status="COMPLETED_AT_DECIDER"/>
<stop on="MY_TEMP_ISSUE_EXIST_STATUS" restart="step2"/>
</decision>
Creating Batch Artifacts
After you define a job in terms of its batch artifacts using the Job Specification Language (JSL), you create these artifacts as Java classes that implement the interfaces in the jakarta.batch.api
package and its subpackages.
This section lists the main batch artifact interfaces, demonstrates how to access context objects from the batch runtime, and provides some examples.
Batch Artifact Interfaces
The following tables list the interfaces that you implement to create batch artifacts. The interface implementations are referenced from the elements described in Using the Job Specification Language.
Main Batch Artifact Interfaces lists the interfaces to implement batch artifacts for chunk steps, task steps, and decision elements.
Partition Batch Artifact Interfaces lists the interfaces to implement batch artifacts for partitioned steps.
Listener Batch Artifact Interfaces lists the interfaces to implement batch artifacts for job and step listeners.
Package | Interface | Description |
---|---|---|
|
|
Implements the business logic of a task-oriented step.
It is referenced from the |
|
|
Decides the next step, flow, or split to execute based on information from the execution of the previous step, flow, or split.
It is referenced from the |
|
|
Implements a custom checkpoint policy for chunk steps.
It is referenced from the |
|
|
Reads items from an input source in a chunk step.
It is referenced from the |
|
|
Processes input items to obtain output items in chunk steps.
It is referenced from the |
|
|
Writes output items in chunk steps.
It is referenced from the |
Package | Interface | Description |
---|---|---|
|
|
Provides details on how to execute a partitioned step, such as the number of partitions, the number of threads, and the parameters for each partition. This artifact is not referenced directly from the job definition file. |
|
|
Provides a |
|
|
Receives control when a partitioned step begins, ends, or rolls back.
It is referenced from the |
|
|
Sends intermediary results from each partition to a partition analyzer.
It is referenced from the |
|
|
Processes data and final results from each partition.
It is referenced from the |
Package | Interface | Description |
---|---|---|
|
|
Intercepts job execution before and after running a job.
It is referenced from the |
|
|
Intercepts step execution before and after running a step.
It is referenced from the |
|
|
Intercepts chunk processing in chunk steps before and after processing each chunk, and on errors.
It is referenced from the |
|
|
Intercepts item reading in chunk steps before and after reading each item, and on errors.
It is referenced from the |
|
|
Intercepts item processing in chunk steps before and after processing each item, and on errors.
It is referenced from the |
|
|
Intercepts item writing in chunk steps before and after writing each item, and on errors.
It is referenced from the |
|
|
Intercepts retry item reading in chunk steps when an exception occurs.
It is referenced from the |
|
|
Intercepts retry item processing in chunk steps when an exception occurs.
It is referenced from the |
|
|
Intercepts retry item writing in chunk steps when an exception occurs.
It is referenced from the |
|
|
Intercepts skippable exception handling for item readers in chunk steps.
It is referenced from the |
|
|
Intercepts skippable exception handling for item processors in chunk steps.
It is referenced from the |
|
|
Intercepts skippable exception handling for item writers in chunk steps.
It is referenced from the |
Dependency Injection in Batch Artifacts
To ensure that Jakarta Contexts and Dependency Injection (CDI) works in your batch artifacts, follow these steps.
-
Define your batch artifact implementations as CDI named beans using the
Named
annotation.For example, define an item reader implementation in a chunk step as follows:
@Named("MyItemReaderImpl") public class MyItemReaderImpl implements ItemReader { /* ... Override the ItemReader interface methods ... */ }
-
Provide a public, empty, no-argument constructor for your batch artifacts.
For example, provide the following constructor for the artifact above:
public MyItemReaderImpl() {}
-
Specify the CDI name for the batch artifacts in the job definition file, instead of using the fully qualified name of the class.
For example, define the step for the artifact above as follows:
<step id="stepA" next="stepB"> <chunk> <reader ref="MyItemReaderImpl"></reader> ... </chunk> </step>
This example uses the CDI name (
MyItemReaderImpl
) instead of the fully qualified name of the class (com.example.pkg.MyItemReaderImpl
) to specify a batch artifact. -
Ensure that your module is a CDI bean archive by annotating your batch artifacts with the
jakarta.enterprise.context.Dependent
annotation or by including an emptybeans.xml
deployment description with your application. For example, the following batch artifact is annotated with@Dependent
:@Dependent @Named("MyItemReaderImpl") public class MyItemReaderImpl implements ItemReader { ... }
For more information on bean archives, see Packaging CDI Applications in Jakarta Contexts and Dependency Injection: Advanced Topics.
Jakarta Contexts and Dependency Injection (CDI) is required in order to access context objects from the batch runtime in batch artifacts. |
You may encounter the following errors if you do not follow this procedure.
-
The batch runtime cannot locate some batch artifacts.
-
The batch artifacts throw null pointer exceptions when accessing injected objects.
Using the Context Objects from the Batch Runtime
The batch runtime provides context objects that implement the JobContext
and StepContext
interfaces in the jakarta.batch.runtime.context
package.
These objects are associated with the current job and step, respectively, and enable you to do the following:
-
Get information from the current job or step, such as its name, instance ID, execution ID, batch status, and exit status
-
Set the user-defined exit status
-
Store user data
-
Get property values from the job or step definition
You can inject context objects from the batch runtime inside batch artifact implementations like item readers, item processors, item writers, batchlets, listeners, and so on. The following example demonstrates how to access property values from the job definition file in an item reader implementation:
@Dependent
@Named("MyItemReaderImpl")
public class MyItemReaderImpl implements ItemReader {
@Inject
JobContext jobCtx;
public MyItemReaderImpl() {}
@Override
public void open(Serializable checkpoint) throws Exception {
String fileName = jobCtx.getProperties()
.getProperty("log_file_name");
...
}
...
}
See Dependency Injection in Batch Artifacts for instructions on how to define your batch artifacts to use dependency injection.
Do not access batch context objects inside artifact constructors. Because the job does not run until you submit it to the batch runtime, the batch context objects are not available when CDI instantiates your artifacts upon loading your application. The instantiation of these beans fails and the batch runtime cannot find your batch artifacts when your application submits the job. |
Submitting Jobs to the Batch Runtime
The JobOperator
interface in the jakarta.batch.operations
package enables you to submit jobs to the batch runtime and obtain information about existing jobs.
This interface provides the following functionality.
-
Obtain the names of all known jobs.
-
Start, stop, restart, and abandon jobs.
-
Obtain job instances and job executions.
The BatchRuntime
class in the jakarta.batch.runtime
package provides the getJobOperator
factory method to obtain JobOperator
objects.
Starting a Job
The following example code demonstrates how to obtain a JobOperator
object and submit a batch job:
JobOperator jobOperator = BatchRuntime.getJobOperator();
Properties props = new Properties();
props.setProperty("parameter1", "value1");
...
long execID = jobOperator.start("simplejob", props);
The first argument of the JobOperator.start
method is the name of the job as specified in its job definition file.
The second parameter is a Properties
object that represents the parameters for this job execution.
You can use job parameters to pass to a job information that is only known at runtime.
Checking the Status of a Job
The JobExecution
interface in the jakarta.batch.runtime
package provides methods to obtain information about submitted jobs.
This interface provides the following functionality.
-
Obtain the batch and exit status of a job execution.
-
Obtain the time the execution was started, updated, or ended.
-
Obtain the job name.
-
Obtain the execution ID.
The following example code demonstrates how to obtain the batch status of a job using its execution ID:
JobExecution jobExec = jobOperator.getJobExecution(execID);
String status = jobExec.getBatchStatus().toString();
Invoking the Batch Runtime in Your Application
The component from which you invoke the batch runtime depends on the architecture of your particular application. For example, you can invoke the batch runtime from an enterprise bean, a servlet, a managed bean, and so on.
See The webserverlog Example Application and The phonebilling Example Application for details on how to invoke the batch runtime from a managed bean driven by a Jakarta Faces user interface.
Packaging Batch Applications
Job definition files and batch artifacts do not require separate packaging and can be included in any Jakarta EE application.
Package the batch artifact classes with the rest of the classes of your application, and include the job definition files in one of the following directories:
-
META-INF/batch-jobs/
forjar
packages -
WEB-INF/classes/META-INF/batch-jobs/
forwar
packages
The name of each job definition file must match its job ID.
For example, if you define a job as follows, and you are packaging your application as a WAR file, include the job definition file in WEB-INF/classes/META-INF/batch-jobs/simplejob.xml
:
<job id="simplejob" xmlns="https://jakarta.ee/xml/ns/jakartaee"
version="2.0">
...
</job>
The webserverlog Example Application
The webserverlog
example application, located in the jakartaee-examples/tutorial/batch/webserverlog/
directory, demonstrates how to use the batch framework in Jakarta EE to analyze the log file from a web server.
This example application reads a log file and finds what percentage of page views from tablet devices are product sales.
Architecture of the webserverlog Example Application
The webserverlog
example application consists of the following elements:
-
A job definition file (
webserverlog.xml
) that uses the Job Specification Language (JSL) to define a batch job with a chunk step and a task step. The chunk step acts as a filter, and the task step calculates statistics on the remaining entries. -
A log file (
log1.txt
) that serves as input data to the batch job. -
Two Java classes (
LogLine
andLogFilteredLine
) that represent input items and output items for the chunk step. -
Three batch artifacts (
LogLineReader
,LogLineProcessor
, andLogFilteredLineWriter
) that implement the chunk step of the application. This step reads items from the web server log file, filters them by the web browser used by the client, and writes the results to a text file. -
Two batch artifacts (
InfoJobListener
andInfoItemProcessListener
) that implement two simple listeners. -
A batch artifact (
MobileBatchlet.java
) that calculates statistics on the filtered items. -
Two Facelets pages (
index.xhtml
andjobstarted.xhtml
) that provide the front end of the batch application. The first page shows the log file that will be processed by the batch job, and the second page enables the user to check on the status of the job and shows the results. -
A managed bean (
JsfBean
) that is accessed from the Facelets pages. The bean submits the job to the batch runtime, checks on the status of the job, and reads the results from a text file.
The Job Definition File
The webserverlog.xml
job definition file is located in the WEB-INF/classes/META-INF/batch-jobs/
directory.
The file specifies seven job-level properties and two steps:
<?xml version="1.0" encoding="UTF-8"?>
<job id="webserverlog" xmlns="https://jakarta.ee/xml/ns/jakartaee"
version="2.0">
<properties>
<property name="log_file_name" value="log1.txt"/>
<property name="filtered_file_name" value="filtered1.txt"/>
<property name="num_browsers" value="2"/>
<property name="browser_1" value="Tablet Browser D"/>
<property name="browser_2" value="Tablet Browser E"/>
<property name="buy_page" value="/auth/buy.html"/>
<property name="out_file_name" value="result1.txt"/>
</properties>
<listeners>
<listener ref="InfoJobListener"/>
</listeners>
<step id="mobilefilter" next="mobileanalyzer"> ... </step>
<step id="mobileanalyzer"> ... </step>
</job>
The first step is defined as follows:
<step id="mobilefilter" next="mobileanalyzer">
<listeners>
<listener ref="InfoItemProcessListeners"/>
</listeners>
<chunk checkpoint-policy="item" item-count="10">
<reader ref="LogLineReader"></reader>
<processor ref="LogLineProcessor"></processor>
<writer ref="LogFilteredLineWriter"></writer>
</chunk>
</step>
This step is a normal chunk step that specifies the batch artifacts that implement each phase of the step.
The batch artifact names are not fully qualified class names, so the batch artifacts are CDI beans annotated with @Named
.
The second step is defined as follows:
<step id="mobileanalyzer">
<batchlet ref="MobileBatchlet"></batchlet>
<end on="COMPLETED"/>
</step>
This step is a task step that specifies the batch artifact that implements it. This is the last step of the job.
The LogLine and LogFilteredLine Items
The LogLine
class represents entries in the web server log file and it is defined as follows:
public class LogLine {
private final String datetime;
private final String ipaddr;
private final String browser;
private final String url;
/* ... Constructor, getters, and setters ... */
}
The LogFileteredLine
class is similar to this class but only has two fields: the IP address of the client and the URL.
The Chunk Step Batch Artifacts
The first step is composed of the LogLineReader
, LogLineProcessor
, and LogFilteredLineWriter
batch artifacts.
The LogLineReader
artifact reads records from the web server log file:
@Dependent
@Named("LogLineReader")
public class LogLineReader implements ItemReader {
private ItemNumberCheckpoint checkpoint;
private String fileName;
private BufferedReader breader;
@Inject
private JobContext jobCtx;
public LogLineReader() { }
/* ... Override the open, close, readItem, and
* checkpointInfo methods ... */
}
The open
method reads the log_file_name
property and opens the log file with a buffered reader.
In this example, the log file has been included with the application under webserverlog/WEB-INF/classes/log1.txt
:
fileName = jobCtx.getProperties().getProperty("log_file_name");
ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
InputStream iStream = classLoader.getResourceAsStream(fileName);
breader = new BufferedReader(new InputStreamReader(iStream));
If a checkpoint object is provided, the open
method advances the reader up to the last checkpoint.
Otherwise, this method creates a new checkpoint object.
The checkpoint object keeps track of the line number from the last committed chunk.
The readItem
method returns a new LogLine
object or null at the end of the log file:
@Override
public Object readItem() throws Exception {
String entry = breader.readLine();
if (entry != null) {
checkpoint.nextLine();
return new LogLine(entry);
} else {
return null;
}
}
The LogLineProcessor
artifact obtains a list of browsers from the job properties and filters the log entries according to the list:
@Override
public Object processItem(Object item) {
/* Obtain a list of browsers we are interested in */
if (nbrowsers == 0) {
Properties props = jobCtx.getProperties();
nbrowsers = Integer.parseInt(props.getProperty("num_browsers"));
browsers = new String[nbrowsers];
for (int i = 1; i < nbrowsers + 1; i++)
browsers[i - 1] = props.getProperty("browser_" + i);
}
LogLine logline = (LogLine) item;
/* Filter for only the mobile/tablet browsers as specified */
for (int i = 0; i < nbrowsers; i++) {
if (logline.getBrowser().equals(browsers[i])) {
return new LogFilteredLine(logline);
}
}
return null;
}
The LogFilteredLineWriter
artifact reads the name of the output file from the job properties.
The open
method opens the file for writing.
If a checkpoint object is provided, the artifact continues writing at the end of the file; otherwise, it overwrites the file if it exists.
The writeItems
method writes filtered items to the output file:
@Override
public void writeItems(List<Object> items) throws Exception {
/* Write the filtered lines to the output file */
for (int i = 0; i < items.size(); i++) {
LogFilteredLine filtLine = (LogFilteredLine) items.get(i);
bwriter.write(filtLine.toString());
bwriter.newLine();
}
}
The Listener Batch Artifacts
The InfoJobListener
batch artifact implements a simple listener that writes log messages when the job starts and when it ends:
@Dependent
@Named("InfoJobListener")
public class InfoJobListener implements JobListener {
...
@Override
public void beforeJob() throws Exception {
logger.log(Level.INFO, "The job is starting");
}
@Override
public void afterJob() throws Exception { ... }
}
The InfoItemProcessListener
batch artifact implements the ItemProcessListener
interface for chunk steps:
@Dependent
@Named("InfoItemProcessListener")
public class InfoItemProcessListener implements ItemProcessListener {
...
@Override
public void beforeProcess(Object o) throws Exception {
LogLine logline = (LogLine) o;
llogger.log(Level.INFO, "Processing entry {0}", logline);
}
...
}
The Task Step Batch Artifact
The task step is implemented by the MobileBatchlet
artifact, which computes what percentage of the filtered log entries are purchases:
@Override
public String process() throws Exception {
/* Get properties from the job definition file */
...
/* Count from the output of the previous chunk step */
breader = new BufferedReader(new FileReader(fileName));
String line = breader.readLine();
while (line != null) {
String[] lineSplit = line.split(", ");
if (buyPage.compareTo(lineSplit[1]) == 0)
pageVisits++;
totalVisits++;
line = breader.readLine();
}
breader.close();
/* Write the result */
...
}
The Jakarta Faces Pages
The index.xhtml
page contains a text area that shows the web server log.
The page provides a button for the user to submit the batch job and navigate to the next page:
<body>
...
<textarea cols="90" rows="25"
readonly="true">#{jsfBean.getInputLog()}</textarea>
<p> </p>
<h:form>
<h:commandButton value="Start Batch Job"
action="#{jsfBean.startBatchJob()}" />
</h:form>
</body>
This page calls the methods of the managed bean to show the log file and submit the batch job.
The jobstarted.xhtml
page provides a button to check the current status of the batch job and displays the results when the job finishes:
<p>Current Status of the Job: <b>#{jsfBean.jobStatus}</b></p>
<p>#{jsfBean.showResults()}</p>
<h:form>
<h:commandButton value="Check Status"
action="jobstarted"
rendered="#{jsfBean.completed==false}" />
</h:form>
The Managed Bean
The JsfBean
managed bean submits the job to the batch runtime, checks on the status of the job, and reads the results from a text file.
The startBatchJob
method submits the job to the batch runtime:
/* Submit the batch job to the batch runtime.
* JSF Navigation method (return the name of the next page) */
public String startBatchJob() {
jobOperator = BatchRuntime.getJobOperator();
execID = jobOperator.start("webserverlog", null);
return "jobstarted";
}
The getJobStatus
method checks the status of the job:
/* Get the status of the job from the batch runtime */
public String getJobStatus() {
return jobOperator.getJobExecution(execID).getBatchStatus().toString();
}
The showResults
method reads the results from a text file.
Running the webserverlog Example Application
You can use either NetBeans IDE or Maven to build, package, deploy, and run the webserverlog
example application.
To Run the webserverlog Example Application Using NetBeans IDE
-
Make sure that GlassFish Server has been started (see Starting and Stopping GlassFish Server).
-
From the File menu, choose Open Project.
-
In the Open Project dialog box, navigate to:
jakartaee-examples/tutorial/batch
-
Select the
webserverlog
folder. -
Click Open Project.
-
In the Projects tab, right-click the
webserverlog
project and select Run.This command builds and packages the application into a WAR file,
webserverlog.war
, located in thetarget/
directory; deploys it to the server; and launches a web browser window at the following URL:http://localhost:8080/webserverlog/
To Run the webserverlog Example Application Using Maven
-
Make sure that GlassFish Server has been started (see Starting and Stopping GlassFish Server).
-
In a terminal window, go to:
jakartaee-examples/tutorial/batch/webserverlog/
-
Enter the following command to deploy the application:
mvn install
-
Open a web browser window at the following URL:
http://localhost:8080/webserverlog/
The phonebilling Example Application
The phonebilling
example application, located in the jakartaee-examples/tutorial/batch/phonebilling/
directory, demonstrates how to use the batch framework in Jakarta EE to implement a phone billing system.
This example application processes a log file of phone calls and creates a bill for each customer.
Architecture of the phonebilling Example Application
The phonebilling
example application consists of the following elements.
-
A job definition file (
phonebilling.xml
) that uses the Job Specification Language (JSL) to define a batch job with two chunk steps. The first step reads call records from a log file and associates them with a bill. The second step computes the amount due and writes each bill to a text file. -
A Java class (
CallRecordLogCreator
) that creates the log file for the batch job. This is an auxiliary component that does not demonstrate any key functionality in this example. -
Two Jakarta Persistence entities (
CallRecord
andPhoneBill
) that represent call records and customer bills. The application uses a Jakarta Persistence entity manager to store instances of these entities in a database. -
Three batch artifacts (
CallRecordReader
,CallRecordProcessor
, andCallRecordWriter
) that implement the first step of the application. This step reads call records from the log file, associates them with a bill, and stores them in a database. -
Four batch artifacts (
BillReader
,BillProcessor
,BillWriter
, andBillPartitionMapper
) that implement the second step of the application. This step is a partitioned step that gets each bill from the database, calculates the amount due, and writes it to a text file. -
Two Facelets pages (
index.xhtml
andjobstarted.xhtml
) that provide the front end of the batch application. The first page shows the log file that will be processed by the batch job, and the second page enables the user to check on the status of the job and shows the resulting bill for each customer. -
A managed bean (
JsfBean
) that is accessed from the Facelets pages. The bean submits the job to the batch runtime, checks on the status of the job, and reads the text files for each bill.
The Job Definition File
The phonebilling.xml
job definition file is located in the WEB-INF/classes/META-INF/batch-jobs/
directory.
The file specifies three job-level properties and two steps:
<?xml version="1.0" encoding="UTF-8"?>
<job id="phonebilling" xmlns="https://jakarta.ee/xml/ns/jakartaee"
version="2.0">
<properties>
<property name="log_file_name" value="log1.txt"/>
<property name="airtime_price" value="0.08"/>
<property name="tax_rate" value="0.07"/>
</properties>
<step id="callrecords" next="bills"> ... </step>
<step id="bills"> ... </step>
</job>
The first step is defined as follows:
<step id="callrecords" next="bills">
<chunk checkpoint-policy="item" item-count="10">
<reader ref="CallRecordReader"></reader>
<processor ref="CallRecordProcessor"></processor>
<writer ref="CallRecordWriter"></writer>
</chunk>
</step>
This step is a normal chunk step that specifies the batch artifacts that implement each phase of the step.
The batch artifact names are not fully qualified class names, so the batch artifacts are CDI beans annotated with @Named
.
The second step is defined as follows:
<step id="bills">
<chunk checkpoint-policy="item" item-count="2">
<reader ref="BillReader">
<properties>
<property name="firstItem" value="#{partitionPlan['firstItem']}"/>
<property name="numItems" value="#{partitionPlan['numItems']}"/>
</properties>
</reader>
<processor ref="BillProcessor"></processor>
<writer ref="BillWriter"></writer>
</chunk>
<partition>
<mapper ref="BillPartitionMapper"/>
</partition>
<end on="COMPLETED"/>
</step>
This step is a partitioned chunk step.
The partition plan is specified through the BillPartitionMapper
artifact instead of using the plan
element.
The CallRecord and PhoneBill Entities
The CallRecord
entity is defined as follows:
@Entity
public class CallRecord implements Serializable {
@Id @GeneratedValue
private Long id;
@Temporal(TemporalType.DATE)
private Date datetime;
private String fromNumber;
private String toNumber;
private int minutes;
private int seconds;
private BigDecimal price;
public CallRecord() { }
public CallRecord(String datetime, String from,
String to, int min, int sec) throws ParseException { ... }
public CallRecord(String jsonData) throws ParseException { ... }
/* ... Getters and setters ... */
}
The id
field is generated automatically by the Jakarta Persistence implementation to store and retrieve CallRecord
objects to and from a database.
The second constructor creates a CallRecord
object from an entry of JSON data in the log file using Jakarta JSON Processing.
Log entries look as follows:
{"datetime":"03/01/2013 04:03","from":"555-0101",
"to":"555-0114","length":"03:39"}
The PhoneBill
entity is defined as follows:
@Entity
public class PhoneBill implements Serializable {
@Id
private String phoneNumber;
@OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST)
@OrderBy("datetime ASC")
private List<CallRecord> calls;
private BigDecimal amountBase;
private BigDecimal taxRate;
private BigDecimal tax;
private BigDecimal amountTotal;
public PhoneBill() { }
public PhoneBill(String number) {
this.phoneNumber = number;
calls = new ArrayList<>();
}
public void addCall(CallRecord call) {
calls.add(call);
}
public void calculate(BigDecimal taxRate) { ... }
/* ... Getters and setters ... */
}
The OneToMany
annotation defines the relationship between a bill and its call records.
The FetchType.EAGER
attribute specifies that the collection should be retrieved eagerly.
The CascadeType.PERSIST
attribute indicates that the elements in the call list should be automatically persisted when the phone bill is persisted.
The OrderBy
annotation defines an order for retrieving the elements of the call list from the database.
The batch artifacts use instances of these two entities as items to read, process, and write.
For more information on Jakarta Persistence, see Introduction to Jakarta Persistence. For more information on Jakarta JSON Processing, see JSON Processing.
The Call Records Chunk Step
The first step is composed of the CallRecordReader
, CallRecordProcessor
, and CallRecordWriter
batch artifacts.
The CallRecordReader
artifact reads call records from the log file:
@Dependent
@Named("CallRecordReader")
public class CallRecordReader implements ItemReader {
private ItemNumberCheckpoint checkpoint;
private String fileName;
private BufferedReader breader;
@Inject
JobContext jobCtx;
/* ... Override the open, close, readItem,
* and checkpointInfo methods ... */
}
The open
method reads the log_filename
property and opens the log file with a buffered reader:
fileName = jobCtx.getProperties().getProperty("log_file_name");
breader = new BufferedReader(new FileReader(fileName));
If a checkpoint object is provided, the open
method advances the reader up to the last checkpoint.
Otherwise, this method creates a new checkpoint object.
The checkpoint object keeps track of the line number from the last committed chunk.
The readItem
method returns a new CallRecord
object or null at the end of the log file:
@Override
public Object readItem() throws Exception {
/* Read a line from the log file and
* create a CallRecord from JSON */
String callEntryJson = breader.readLine();
if (callEntryJson != null) {
checkpoint.nextItem();
return new CallRecord(callEntryJson);
} else
return null;
}
The CallRecordProcessor
artifact obtains the airtime price from the job properties, calculates the price of each call, and returns the call object.
This artifact overrides only the processItem
method.
The CallRecordWriter
artifact associates each call record with a bill and stores the bill in the database.
This artifact overrides the open
, close
, writeItems
, and checkpointInfo
methods.
The writeItems
method looks like this:
@Override
public void writeItems(List<Object> callList) throws Exception {
for (Object callObject : callList) {
CallRecord call = (CallRecord) callObject;
PhoneBill bill = em.find(PhoneBill.class, call.getFromNumber());
if (bill == null) {
/* No bill for this customer yet, create one */
bill = new PhoneBill(call.getFromNumber());
bill.addCall(call);
em.persist(bill);
} else {
/* Add call to existing bill */
bill.addCall(call);
}
}
}
The Phone Billing Chunk Step
The second step is composed of the BillReader
, BillProcessor
, BillWriter
, and BillPartitionMapper
batch artifacts.
This step gets the phone bills from the database, computes the tax and total amount due, and writes each bill to a text file.
Since the processing of each bill is independent of the others, this step can be partitioned and run in more than one thread.
The BillPartitionMapper
artifact specifies the number of partitions and the parameters for each partition.
In this example, the parameters represent the range of items each partition should process.
The artifact obtains the number of bills in the database to calculate these ranges.
It provides a partition plan object that overrides the getPartitions
and getPartitionProperties
methods of the PartitionPlan
interface.
The getPartitions
method looks like this:
@Override
public Properties[] getPartitionProperties() {
/* Assign an (approximately) equal number of elements
* to each partition. */
long totalItems = getBillCount();
long partItems = (long) totalItems / getPartitions();
long remItems = totalItems % getPartitions();
/* Populate a Properties array. Each Properties element
* in the array corresponds to a partition. */
Properties[] props = new Properties[getPartitions()];
for (int i = 0; i < getPartitions(); i++) {
props[i] = new Properties();
props[i].setProperty("firstItem",
String.valueOf(i * partItems));
/* Last partition gets the remainder elements */
if (i == getPartitions() - 1) {
props[i].setProperty("numItems",
String.valueOf(partItems + remItems));
} else {
props[i].setProperty("numItems",
String.valueOf(partItems));
}
}
return props;
}
The BillReader
artifact obtains the partition parameters as follows:
@Dependent
@Named("BillReader")
public class BillReader implements ItemReader {
@Inject @BatchProperty(name = "firstItem")
private String firstItemValue;
@Inject @BatchProperty(name = "numItems")
private String numItemsValue;
private ItemNumberCheckpoint checkpoint;
@PersistenceContext
private EntityManager em;
private Iterator iterator;
@Override
public void open(Serializable ckpt) throws Exception {
/* Get the range of items to work on in this partition */
long firstItem0 = Long.parseLong(firstItemValue);
long numItems0 = Long.parseLong(numItemsValue);
if (ckpt == null) {
/* Create a checkpoint object for this partition */
checkpoint = new ItemNumberCheckpoint();
checkpoint.setItemNumber(firstItem0);
checkpoint.setNumItems(numItems0);
} else {
checkpoint = (ItemNumberCheckpoint) ckpt;
}
/* Adjust range for this partition from the checkpoint */
long firstItem = checkpoint.getItemNumber();
long numItems = numItems0 - (firstItem - firstItem0);
...
}
...
}
This artifact also obtains an iterator to read items from the Jakarta Persistence entity manager:
/* Obtain an iterator for the bills in this partition */
String query = "SELECT b FROM PhoneBill b ORDER BY b.phoneNumber";
Query q = em.createQuery(query).setFirstResult((int) firstItem)
.setMaxResults((int) numItems);
iterator = q.getResultList().iterator();
The BillProcessor
artifact iterates over the list of call records in a bill and calculates the tax and total amount due for each bill.
The BillWriter
artifact writes each bill to a plain text file.
The Jakarta Faces Pages
The index.xhtml
page contains a text area that shows the log file of call records.
The page provides a button for the user to submit the batch job and navigate to the next page:
<body>
<h1>The Phone Billing Example Application</h1>
<h2>Log file</h2>
<p>The batch job analyzes the following log file:</p>
<textarea cols="90" rows="25"
readonly="true">#{jsfBean.createAndShowLog()}</textarea>
<p> </p>
<h:form>
<h:commandButton value="Start Batch Job"
action="#{jsfBean.startBatchJob()}" />
</h:form>
</body>
This page calls the methods of the managed bean to show the log file and submit the batch job.
The jobstarted.xhtml
page provides a button to check the current status of the batch job and displays the bills when the job finishes:
<p>Current Status of the Job: <b>#{jsfBean.jobStatus}</b></p>
<h:dataTable var="_row" value="#{jsfBean.rowList}"
border="1" rendered="#{jsfBean.completed}">
<!-- ... show results from jsfBean.rowList ... -->
</h:dataTable>
<!-- Render the check status button if the job has not finished -->
<h:form>
<h:commandButton value="Check Status"
rendered="#{jsfBean.completed==false}"
action="jobstarted" />
</h:form>
The Managed Bean
The JsfBean
managed bean submits the job to the batch runtime, checks on the status of the job, and reads the text files for each bill.
The startBatchJob
method of the bean submits the job to the batch runtime:
/* Submit the batch job to the batch runtime.
* JSF Navigation method (return the name of the next page) */
public String startBatchJob() {
jobOperator = BatchRuntime.getJobOperator();
execID = jobOperator.start("phonebilling", null);
return "jobstarted";
}
The getJobStatus
method of the bean checks the status of the job:
/* Get the status of the job from the batch runtime */
public String getJobStatus() {
return jobOperator.getJobExecution(execID).getBatchStatus().toString();
}
The getRowList
method of the bean creates a list of bills to be displayed on the jobstarted.xhtml
faces page using a table.
Running the phonebilling Example Application
You can use either NetBeans IDE or Maven to build, package, deploy, and run the phonebilling
example application.
To Run the phonebilling Example Application Using NetBeans IDE
-
Make sure that GlassFish Server has been started (see Starting and Stopping GlassFish Server).
-
From the File menu, choose Open Project.
-
In the Open Project dialog box, navigate to:
jakartaee-examples/tutorial/batch
-
Select the
phonebilling
folder. -
Click Open Project.
-
In the Projects tab, right-click the
phonebilling
project and select Run.This command builds and packages the application into a WAR file,
phonebilling.war
, located in thetarget/
directory; deploys it to the server; and launches a web browser window at the following URL:http://localhost:8080/phonebilling/
To Run the phonebilling Example Application Using Maven
-
Make sure that GlassFish Server has been started (see Starting and Stopping GlassFish Server).
-
In a terminal window, go to:
jakartaee-examples/tutorial/batch/phonebilling/
-
Enter the following command to deploy the application:
mvn install
-
Open a web browser window at the following URL:
http://localhost:8080/phonebilling/
Further Information about Batch Processing
For more information on batch processing in Jakarta EE, see Jakarta Batch:
https://jakarta.ee/specifications/batch/2.0/