While working with U-SQL language, I noticed that there are few ways of writing data to an output file.
Let's assume the SalesDetails.csv file in the Azure Data Lake Store includes the Count and the Unit Price of an item. We want to calculate the Total Sale value and save it to the TotalSales.csv file, which will be stored in the Azure Data Lake Store.
DECLARE @inputFilePath string="/InputFiles/SalesDetails.csv";
DECLARE @outputFilePath string="/OutputFiles/TotalSales.csv";
@salesDetails=EXTRACT ItemId Int, Item string, UnitPrice decimal, Count int FROM @inputFilePath USING Extractors.Csv(skipFirstNRows:1);
To get the Total Sales values, we have to multiply Unit Price with the Count in the @salesDetails Rowset. But if the @salesDetails Rowset contains the column headers, it will cause issue while multiplying. Therefore to skip the column headers, skipFirstNRows:1 is used.
Below transformation will calculate the Total Sales value using the @salesDetails Rowset
@totalSales= SELECT ItemId AS ItemId, Item AS Item, UnitPrice AS UnitPrice, Count AS Count, UnitPrice*Count AS TotalSales FROM @salesDetails;
In the above, alias for each column is added since I need to add the column names for the output file.
Then the last step is to write the transformed data to the output file with column headers.
OUTPUT @totalSales
TO @outputFilePath
USING Outputters.Csv(outputHeader:true);
outputHeader:true is used to write the column headers to the output file. Anyway if you just only use the above statement, it will quote the header values.
To avoid it, the below statement can be used.
OUTPUT @totalSales
TO @outputFilePath
USING Outputters.Csv(outputHeader:true,quoting:false);
Hope this will be helpful...
Let's assume the SalesDetails.csv file in the Azure Data Lake Store includes the Count and the Unit Price of an item. We want to calculate the Total Sale value and save it to the TotalSales.csv file, which will be stored in the Azure Data Lake Store.
DECLARE @inputFilePath string="/InputFiles/SalesDetails.csv";
DECLARE @outputFilePath string="/OutputFiles/TotalSales.csv";
@salesDetails=EXTRACT ItemId Int, Item string, UnitPrice decimal, Count int FROM @inputFilePath USING Extractors.Csv(skipFirstNRows:1);
To get the Total Sales values, we have to multiply Unit Price with the Count in the @salesDetails Rowset. But if the @salesDetails Rowset contains the column headers, it will cause issue while multiplying. Therefore to skip the column headers, skipFirstNRows:1 is used.
Below transformation will calculate the Total Sales value using the @salesDetails Rowset
@totalSales= SELECT ItemId AS ItemId, Item AS Item, UnitPrice AS UnitPrice, Count AS Count, UnitPrice*Count AS TotalSales FROM @salesDetails;
In the above, alias for each column is added since I need to add the column names for the output file.
Then the last step is to write the transformed data to the output file with column headers.
OUTPUT @totalSales
TO @outputFilePath
USING Outputters.Csv(outputHeader:true);
outputHeader:true is used to write the column headers to the output file. Anyway if you just only use the above statement, it will quote the header values.
To avoid it, the below statement can be used.
OUTPUT @totalSales
TO @outputFilePath
USING Outputters.Csv(outputHeader:true,quoting:false);
Hope this will be helpful...
No comments:
Post a Comment