To split large files into smaller files in Unix, use the split command. At the Unix prompt, enter:
split [options] filename prefix
Replace filename with the name of the large file you wish to split. Replace prefix with the name you wish to give the small output files. You can exclude [options], or replace it with either of the following:
n this simple example, assume myfile is 3,000 lines long:
This will output three 1000-line files: xaa, xab, and xac.
Working on the same file, this next example is more complex:
split -l 500 samplefile segment
This will output six 500-line files: segmentaa, segmentab, segmentac, segmentad, segmentae, andsegmentaf.
The following Job iterates on a list of files then merges their content and displays the final 2-column content on the console.
Dropping and linking the components
Drop the following components onto the design workspace: tFileList, tFileInputDelimited, tUnite and tLogRow.
Connect the tFileList to the tFileInputDelimited using an Iterate connection and connect the other component using a row main link.
Configuring the components
In the tFileListBasic settings view, browse to the directory, where the files to merge are stored.
The files are pretty basic and contain a list of countries and their respective score.
In the Case Sensitive field, select Yes to consider the letter case.
Select the tFileInputDelimited component, and display this component's Basic settings view.
Fill in the File Name/Stream field by using the Ctrl+Space bar combination to access the variable completion list, and selecting tFileList.CURRENT_FILEPATH from the global variable list to process all files from the directory defined in the tFileList.
Click the Edit Schema button and set manually the 2-column schema to reflect the input files' content.
For this example, the 2 columns are Country and Points. They are both nullable. The Country column is of String type and the Points column is of Integer type.
Click OK to validate the setting and accept to propagate the schema throughout the Job.
Then select the tUnite component and display the Component view. Notice that the output schema strictly reflects the input schema and is read-only.
In the Basic settings view of tLogRow, select the Table option to display properly the output values.
Saving and executing the Job
Press Ctrl+S to save your Job.
Press F6, or click Run on the Run console to execute the Job.
The console shows the data from the various files, merged into one single table.