Spring Batch 示例
欢迎使用 Spring Batch 示例。Spring Batch 是用于执行批处理作业的Spring 框架模块。我们可以使用 Spring Batch 来处理一系列作业。
Spring Batch 示例
在介绍 spring batch 示例程序之前,让我们先了解一下spring batch术语。
- 一个作业可以由“n”个步骤组成。每个步骤包含读取-处理-写入任务,也可以包含单个操作,称为 tasklet。
- 读取-处理-写入基本上是从数据库、CSV 等源读取,然后处理数据并将其写入数据库、CSV、XML 等源。
- Tasklet 意味着执行单个任务或操作,例如清理连接、处理完成后释放资源。
- 读取-处理-写入和小任务可以链接在一起来运行一项作业。
Spring Batch 示例
让我们考虑一个用于 Spring Batch 实现的工作示例。我们将考虑以下场景以实现目的。包含数据的 CSV 文件需要转换为 XML,并且数据和标签将以列名命名。以下是用于 Spring Batch 示例的重要工具和库。
- Apache Maven 3.5.0 - 用于项目构建和依赖项管理。
- Eclipse Oxygen Release 4.7.0——用于创建 spring batch maven 应用程序的 IDE。
- Java 1.8
- Spring Core 4.3.12.发布
- Spring OXM 4.3.12.发布
- Spring JDBC 4.3.12.发布
- Spring Batch 3.0.8.发布
- MySQL Java Driver 5.1.25 - 根据你的 MySQL 安装使用。这是 Spring Batch 元数据表所必需的。
Spring Batch 示例目录结构
下图说明了我们的 Spring Batch 示例项目中的所有组件。
Spring Batch Maven 依赖项
以下是 pom.xml 文件的内容,其中包含我们的 spring batch 示例项目所需的所有依赖项。
<project xmlns="https://maven.apache.org/POM/4.0.0" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.journaldev.spring</groupId>
<artifactId>SpringBatchExample</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>SpringBatchDemo</name>
<url>https://maven.apache.org</url>
<properties>
<jdk.version>1.8</jdk.version>
<spring.version>4.3.12.RELEASE</spring.version>
<spring.batch.version>3.0.8.RELEASE</spring.batch.version>
<mysql.driver.version>5.1.25</mysql.driver.version>
<junit.version>4.11</junit.version>
</properties>
<dependencies>
<!-- Spring Core -->
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
<version>${spring.version}</version>
</dependency>
<!-- Spring jdbc, for database -->
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-jdbc</artifactId>
<version>${spring.version}</version>
</dependency>
<!-- Spring XML to/back object -->
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-oxm</artifactId>
<version>${spring.version}</version>
</dependency>
<!-- MySQL database driver -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql.driver.version}</version>
</dependency>
<!-- Spring Batch dependencies -->
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-core</artifactId>
<version>${spring.batch.version}</version>
</dependency>
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-infrastructure</artifactId>
<version>${spring.batch.version}</version>
</dependency>
<!-- Spring Batch unit test -->
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-test</artifactId>
<version>${spring.batch.version}</version>
</dependency>
<!-- Junit -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.thoughtworks.xstream</groupId>
<artifactId>xstream</artifactId>
<version>1.4.10</version>
</dependency>
</dependencies>
<build>
<finalName>spring-batch</finalName>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-eclipse-plugin</artifactId>
<version>2.9</version>
<configuration>
<downloadSources>true</downloadSources>
<downloadJavadocs>false</downloadJavadocs>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>${jdk.version}</source>
<target>${jdk.version}</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
Spring 批处理 CSV 输入文件
这是我们用于 Spring 批处理的示例 CSV 文件的内容。
1001,Tom,Moody, 29/7/2013
1002,John,Parker, 30/7/2013
1003,Henry,Williams, 31/7/2013
Spring Batch 作业配置
我们必须在配置文件中定义spring bean和 spring batch job。下面是文件的内容job-batch-demo.xml
,它是 spring batch 项目中最重要的部分。
<beans xmlns="https://www.springframework.org/schema/beans"
xmlns:batch="https://www.springframework.org/schema/batch" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://www.springframework.org/schema/batch
https://www.springframework.org/schema/batch/spring-batch-3.0.xsd
https://www.springframework.org/schema/beans
https://www.springframework.org/schema/beans/spring-beans-4.3.xsd
">
<import resource="../config/context.xml" />
<import resource="../config/database.xml" />
<bean id="report" class="com.journaldev.spring.model.Report"
scope="prototype" />
<bean id="itemProcessor" class="com.journaldev.spring.CustomItemProcessor" />
<batch:job id="DemoJobXMLWriter">
<batch:step id="step1">
<batch:tasklet>
<batch:chunk reader="csvFileItemReader" writer="xmlItemWriter"
processor="itemProcessor" commit-interval="10">
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
<bean id="csvFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="resource" value="classpath:csv/input/report.csv" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="names" value="id,firstname,lastname,dob" />
</bean>
</property>
<property name="fieldSetMapper">
<bean class="com.journaldev.spring.ReportFieldSetMapper" />
<!-- if no data type conversion, use BeanWrapperFieldSetMapper to map
by name <bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
<property name="prototypeBeanName" value="report" /> </bean> -->
</property>
</bean>
</property>
</bean>
<bean id="xmlItemWriter" class="org.springframework.batch.item.xml.StaxEventItemWriter">
<property name="resource" value="file:xml/outputs/report.xml" />
<property name="marshaller" ref="reportMarshaller" />
<property name="rootTagName" value="report" />
</bean>
<bean id="reportMarshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
<property name="classesToBeBound">
<list>
<value>com.journaldev.spring.model.Report</value>
</list>
</property>
</bean>
</beans>
- 我们正在使用
FlatFileItemReader
读取 CSV 文件、CustomItemProcessor
处理数据并使用写入 XML 文件StaxEventItemWriter
。 batch:job
– 此标签定义我们要创建的作业。Id 属性指定作业的 ID。我们可以在单个 xml 文件中定义多个作业。- batch:step –此标签用于定义 spring batch 作业的不同步骤。
- Spring Batch Framework 提供了两种不同类型的处理风格,即“面向 TaskletStep”和“面向块”。本例中使用的面向块风格是指在事务边界内逐个读取数据并创建将要写出的“块”。
- reader: spring bean used for reading the data. We have used
csvFileItemReader
bean in this example that is instance ofFlatFileItemReader
. - processor: this is the class which is used for processing the data. We have used
CustomItemProcessor
in this example. - writer: bean used to write data into xml file.
- commit-interval: This property defines the size of the chunk which will be committed once processing is done. Basically it means that ItemReader will read the data one by one and ItemProcessor will also process it the same way but ItemWriter will write the data only when it equals the size of commit-interval.
- Three important interface that are used as part of this project are
ItemReader
,ItemProcessor
andItemWriter
fromorg.springframework.batch.item
package.
Spring Batch Model Class
First of all we are reading CSV file into java object and then using JAXB to write it to xml file. Below is our model class with required JAXB annotations.
package com.journaldev.spring.model;
import java.util.Date;
import javax.xml.bind.annotation.XmlAttribute;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
@XmlRootElement(name = "record")
public class Report {
private int id;
private String firstName;
private String lastName;
private Date dob;
@XmlAttribute(name = "id")
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
@XmlElement(name = "firstname")
public String getFirstName() {
return firstName;
}
public void setFirstName(String firstName) {
this.firstName = firstName;
}
@XmlElement(name = "lastname")
public String getLastName() {
return lastName;
}
public void setLastName(String lastName) {
this.lastName = lastName;
}
@XmlElement(name = "dob")
public Date getDob() {
return dob;
}
public void setDob(Date dob) {
this.dob = dob;
}
@Override
public String toString() {
return "Report [id=" + id + ", firstname=" + firstName + ", lastName=" + lastName + ", DateOfBirth=" + dob
+ "]";
}
}
Note that the model class fields should be same as defined in the spring batch mapper configuration i.e. property name="names" value="id,firstname,lastname,dob"
in our case.
Spring Batch FieldSetMapper
A custom FieldSetMapper
is needed to convert a Date. If no data type conversion is required, then only BeanWrapperFieldSetMapper
should be used to map the values by name automatically. The java class which extends FieldSetMapper
is ReportFieldSetMapper
.
package com.journaldev.spring;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.FieldSet;
import org.springframework.validation.BindException;
import com.journaldev.spring.model.Report;
public class ReportFieldSetMapper implements FieldSetMapper<Report> {
private SimpleDateFormat dateFormat = new SimpleDateFormat("dd/MM/yyyy");
public Report mapFieldSet(FieldSet fieldSet) throws BindException {
Report report = new Report();
report.setId(fieldSet.readInt(0));
report.setFirstName(fieldSet.readString(1));
report.setLastName(fieldSet.readString(2));
// default format yyyy-MM-dd
// fieldSet.readDate(4);
String date = fieldSet.readString(3);
try {
report.setDob(dateFormat.parse(date));
} catch (ParseException e) {
e.printStackTrace();
}
return report;
}
}
Spring Batch Item Processor
Now as defined in the job configuration an itemProcessor will be fired before itemWriter. We have created a CustomItemProcessor.java
class for the same.
package com.journaldev.spring;
import org.springframework.batch.item.ItemProcessor;
import com.journaldev.spring.model.Report;
public class CustomItemProcessor implements ItemProcessor<Report, Report> {
public Report process(Report item) throws Exception {
System.out.println("Processing..." + item);
String fname = item.getFirstName();
String lname = item.getLastName();
item.setFirstName(fname.toUpperCase());
item.setLastName(lname.toUpperCase());
return item;
}
}
We can manipulate data in ItemProcessor implementation, as you can see that I am converting first name and last name values to upper case.
Spring Configuration Files
In our spring batch configuration file, we have imported two additional configuration files - context.xml
and database.xml
.
<beans xmlns="https://www.springframework.org/schema/beans"
xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
https://www.springframework.org/schema/beans
https://www.springframework.org/schema/beans/spring-beans-4.3.xsd">
<!-- stored job-meta in memory -->
<!--
<bean id="jobRepository"
class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean">
<property name="transactionManager" ref="transactionManager" />
</bean>
-->
<!-- stored job-meta in database -->
<bean id="jobRepository"
class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean">
<property name="dataSource" ref="dataSource" />
<property name="transactionManager" ref="transactionManager" />
<property name="databaseType" value="mysql" />
</bean>
<bean id="transactionManager"
class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" />
<bean id="jobLauncher"
class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
</bean>
</beans>
- jobRepository - The JobRepository is responsible for storing each Java object into its correct meta-data table for spring batch.
- transactionManager- this is responsible for committing the transaction once size of commit-interval and the processed data is equal.
- jobLauncher – This is the heart of spring batch. This interface contains the run method which is used to trigger the job.
<beans xmlns="https://www.springframework.org/schema/beans"
xmlns:jdbc="https://www.springframework.org/schema/jdbc" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://www.springframework.org/schema/beans
https://www.springframework.org/schema/beans/spring-beans-4.3.xsd
https://www.springframework.org/schema/jdbc
https://www.springframework.org/schema/jdbc/spring-jdbc-4.3.xsd">
<!-- connect to database -->
<bean id="dataSource"
class="org.springframework.jdbc.datasource.DriverManagerDataSource">
<property name="driverClassName" value="com.mysql.jdbc.Driver" />
<property name="url" value="jdbc:mysql://localhost:3306/Test" />
<property name="username" value="test" />
<property name="password" value="test123" />
</bean>
<bean id="transactionManager"
class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" />
<!-- create job-meta tables automatically -->
<!-- <jdbc:initialize-database data-source="dataSource"> <jdbc:script location="org/springframework/batch/core/schema-drop-mysql.sql"
/> <jdbc:script location="org/springframework/batch/core/schema-mysql.sql"
/> </jdbc:initialize-database> -->
</beans>
Spring Batch uses some metadata tables to store batch jobs information. We can get them created from spring batch configurations but it’s advisable to do it manually by executing the SQL files, as you can see in commented code above. From security point of view, it’s better to not give DDL execution access to spring batch database user.
Spring Batch Tables
Spring Batch tables very closely match the Domain objects that represent them in Java. For example - JobInstance, JobExecution, JobParameters and StepExecution map to BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, BATCH_JOB_EXECUTION_PARAMS and BATCH_STEP_EXECUTION respectively. ExecutionContext maps to both BATCH_JOB_EXECUTION_CONTEXT and BATCH_STEP_EXECUTION_CONTEXT. The JobRepository is responsible for saving and storing each java object into its correct table. Below are the details of each meta-data table.
- Batch_job_instance: The BATCH_JOB_INSTANCE table holds all information relevant to a JobInstance.
- Batch_job_execution_params: The BATCH_JOB_EXECUTION_PARAMS table holds all information relevant to the JobParameters object.
- Batch_job_execution: The BATCH_JOB_EXECUTION table holds data relevant to the JobExecution object. A new row gets added every time a Job is run.
- Batch_step_execution: The BATCH_STEP_EXECUTION table holds all information relevant to the StepExecution object.
- Batch_job_execution_context: The BATCH_JOB_EXECUTION_CONTEXT table holds data relevant to an Job’s ExecutionContext. There is exactly one Job ExecutionContext for every JobExecution, and it contains all of the job-level data that is needed for that particular job execution. This data typically represents the state that must be retrieved after a failure so that a JobInstance can restart from where it had failed.
- Batch_step_execution_context: The BATCH_STEP_EXECUTION_CONTEXT table holds data relevant to an Step’s ExecutionContext. There is exactly one ExecutionContext for every StepExecution, and it contains all of the data that needs to persisted for a particular step execution. This data typically represents the state that must be retrieved after a failure so that a JobInstance can restart from where it failed.
- Batch_job_execution_seq: This table holds the data execution sequence of job.
- Batch_step_execution_seq: This table holds the data for sequence for step execution.
- Batch_job_seq: This table holds the data for sequence of job in case we have multiple jobs we will get multiple rows.
Spring Batch Test Program
Our Spring Batch example project is ready, final step is to write a test class to execute it as a java program.
package com.journaldev.spring;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.context.support.ClassPathXmlApplicationContext;
public class App {
public static void main(String[] args) {
String[] springConfig = { "spring/batch/jobs/job-batch-demo.xml" };
ClassPathXmlApplicationContext context = new ClassPathXmlApplicationContext(springConfig);
JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
Job job = (Job) context.getBean("DemoJobXMLWriter");
JobParameters jobParameters = new JobParametersBuilder().addLong("time", System.currentTimeMillis())
.toJobParameters();
try {
JobExecution execution = jobLauncher.run(job, jobParameters);
System.out.println("Exit Status : " + execution.getStatus());
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Done");
context.close();
}
}
Just run above program and you will get output xml like below.
<?xml version="1.0" encoding="UTF-8"?><report><record id="1001"><dob>2013-07-29T00:00:00+05:30</dob><firstname>TOM</firstname><lastname>MOODY</lastname></record><record id="1002"><dob>2013-07-30T00:00:00+05:30</dob><firstname>JOHN</firstname><lastname>PARKER</lastname></record><record id="1003"><dob>2013-07-31T00:00:00+05:30</dob><firstname>HENRY</firstname><lastname>WILLIAMS</lastname></record></report>
That’s all for Spring Batch example, you can download final project from below link.
Download Spring Batch Example Project
Reference: Official Guide