Libreoffice change csv encoding
Get via App Store Read this post in our app!
Change default encoding of Excel to UTF-8?
I am using a variety of tools to regularly prepare data for the web. One stage requires me to open a CSV in Excel, make changes and save the file.
Is there a way to force Excel to accept UTF-8 encoding, and to save its files with that encoding?
- In the registry editor, go to HKEY_CURRENT_USER>Software>Microsoft>Office>[Excel version here—likely the highest number in this folder]>Excel>Options
- right-click on the right side and go to New > DWORD
- name the item DefaultCPG, and save
- right-click on DefaultCPG and choose Modify
- set the base to decimal
- enter the code that’s shown in the Excel wizard (for UTF-8, it’s 65001)
- click OK.
Like Vasille says in the comment to this question, if your file is not actually in UTF-8 format, you may technically want to convert the characters within the file to the encoding you want before opening in Excel. For my purposes, though, UTF-8 does a good enough job of displaying non-corrupted characters.
One easy way to change excel ANSI encoding to UTF-8 is the open the .csv file in notepad then select File > Save As. Now at the bottom you will see encoding it set to ANSI change it to UTF-8 and save the file as new file and then your done.
I solved similar problem before. Unsuccessfully, but you can use LibreOffice, which is UTF-8 in default.
There's an Excel addin available here to work with Unicode CSV files that should help you.
Here's the developer Jaimon Mathew's note:
Excel treats .csv files as text files and will replace all Unicode characters with “?” when saved normally. If you want to preserve the Unicode characters, you would need to save the file as “Unicode text (*.txt)”, which is a Tab delimited file. Since I couldn’t find any existing solution to preserve Unicode characters in CSV format, I thought I’ll give it a go in creating an Excel Addin to do just that.
It's not the best solution, but it's an option: upload your Excel file to Google drive, open it with Google Tabs and download as a csv file. It worked for me.
You need to use the File > Import option and start with blank document and specify UTF-8
but this is far from optimal to make this a default setting for all files, anyway it is unnecessary to rotate the files around Google Drive or LibreOffice. The defaults are just badly chosen and the disability to change that is irritating.
How to set character encoding when opening Excel but does not find the option to change the defaults such that all files are automatically opened with UTF-8 instead of Macintosh format in OSX
Libreoffice change csv encoding
Get via App Store Read this post in our app!
Specify encoding with libreoffice --convert-to csv
Excel files can be converted to CSV using:
Everything appears to work just fine. The encoding, though, is set to something wonky. Instead of a UTF-8 mdash (—) that I get if I do a "save as" manually from LibreOffice Calc, it gives me a \227 (�). Using file on the CSV gives me "Non-ISO extended-ASCII text, with very long lines". So, two questions:
- What on earth is happening here?
- How do I tell libreoffice to convert to UTF-8?
The specific file that I'm trying to convert is here.
Apparently LibreOffice tries to use ISO-8859-1 by default, which is causing the problem. In response to this bug report, a new parameter --infilter has been added. The following command produces U+2014 em dash:
I tested this with LO 5.0.3.2. From the bug report, it looks like the earliest version containing this option is LO 4.4.
Libreoffice change csv encoding
Get via App Store Read this post in our app!
How to change LibreOffice default text encoding?
I want to change the default text encoding used by LibreOffice when saving a document as a Text document. Where can I find this setting?
I want it to be UTF-8 WITHOUT the BOM, which I believe is called ASCII/US in LibreOffice.
I do know that there is a Text encoded option where you can (in theory, if it actually worked) choose the encoding of each plain file. I have three problems with this:
1 Answer
To show the encoding options dialog, go to Save As. and check Edit filter settings .
In order to avoid the slowness of Save As. , you could use a macro like this:
Set it to a hotkey or toolbar button by going to Tools -> Customize .
It could be modified to use a global variable and save to the previously used location.
UTF-8 WITHOUT the BOM, which I believe is called ASCII/US
No, this produces ASCII-encoded text, which will destroy most Unicode characters.
I do not see any filter options that can save without a BOM from LibreOffice. Instead, there are various command line tools such as iconv that can remove the BOM.
If you have some time, the best solution may be to create a Python or Java macro to read the Writer document and write to a file without the BOM. It could be done in perhaps 30 lines of Python code, or twice that much Java code. Note: I would not recommend doing this in Basic because of its poor file handling functions.
Libreoffice change csv encoding
Forum francophone de support pour Apache OpenOffice, LibreOffice et dérivés de OpenOffice.org
Information
Le sujet que vous souhaitez consulter n’existe pas.
Libreoffice change csv encoding
Get via App Store Read this post in our app!
Excel to CSV with UTF8 encoding
I have an Excel file that has some Spanish characters (tildes, etc.) that I need to convert to a CSV file to use as an import file. However, when I do Save As CSV it mangles the "special" Spanish characters that aren't ASCII characters. It also seems to do this with the left and right quotes and long dashes that appear to be coming from the original user creating the Excel file in Mac.
Since CSV is just a text file I'm sure it can handle a UTF8 encoding, so I'm guessing it is an Excel limitation, but I'm looking for a way to get from Excel to CSV and keep the non-ASCII characters intact.
36 Answers
A simple workaround is to use Google Spreadsheet. Paste (values only if you have complex formulas) or import the sheet then download CSV. I just tried a few characters and it works rather well.
NOTE: Google Sheets does have limitations when importing. See here.
NOTE: Be careful of sensitive data with Google Sheets.
EDIT: Another alternative - basically they use VB macro or addins to force the save as UTF8. I have not tried any of these solutions but they sound reasonable.
I've found OpenOffice's spreadsheet application, Calc, is really good at handling CSV data.
In the "Save As. " dialog, click "Format Options" to get different encodings for CSV. LibreOffice works the same way AFAIK.
Save the Excel sheet as "Unicode Text (.txt)". The good news is that all the international characters are in UTF16 (note, not in UTF8). However, the new "*.txt" file is TAB delimited, not comma delimited, and therefore is not a true CSV.
(optional) Unless you can use a TAB delimited file for import, use your favorite text editor and replace the TAB characters with commas ",".
Import your *.txt file in the target application. Make sure it can accept UTF16 format.
If UTF-16 has been properly implemented with support for non-BMP code points, that you can convert a UTF-16 file to UTF-8 without losing information. I leave it to you to find your favourite method of doing so.
I use this procedure to import data from Excel to Moodle.
I know this is an old question but I happened to come upon this question while struggling with the same issues as the OP.
Not having found any of the offered solutions a viable option, I set out to discover if there is a way to do this just using Excel.
Fortunately, I have found that the lost character issue only happens (in my case) when saving from xlsx format to csv format. I tried saving the xlsx file to xls first, then to csv. It actually worked.
Please give it a try and see if it works for you. Good luck.
You can use iconv command under Unix (also available on Windows as libiconv).
After saving as CSV under Excel in the command line put:
(remember to replace cp1250 with your encoding).
Works fast and great for big files like post codes database, which cannot be imported to GoogleDocs (400.000 cells limit).
The only "easy way" of doing this is as follows. First, realize that there is a difference between what is displayed and what is kept hidden in the Excel .csv file.
(1) Open an Excel file where you have the info (.xls, .xlsx)
(2) In Excel, choose "CSV (Comma Delimited) (*.csv) as the file type and save as that type.
(3) In NOTEPAD (found under "Programs" and then Accessories in Start menu), open the saved .csv file in Notepad
(4) Then choose -> Save As..and at the bottom of the "save as" box, there is a select box labelled as "Encoding". Select UTF-8 (do NOT use ANSI or you lose all accents etc). After selecting UTF-8, then save the file to a slightly different file name from the original.
This file is in UTF-8 and retains all characters and accents and can be imported, for example, into MySQL and other database programs.
This answer is taken from this forum.
Another one I've found useful: "Numbers" allows encoding-settings when saving as CSV.
"nevets1219" is right about Google docs, however if you simply "import" the file it often does not convert it to UTF-8.
But if you import the CSV into an existing Google spreadsheet it does convert to UTF-8.
- On the main Docs (or Drive) screen click the "Create" button and choose "Spreadsheet"
- From the "File" menu choose "Import"
- Click "Choose File"
- Choose "Replace spreadsheet"
- Choose whichever character you are using as a Separator
- Click "Import"
- From the "File" menu choose "Download as" -> CSV (current sheet)
The resulting file will be in UTF-8
You can do this on a modern Windows machine without third party software. This method is reliable and it will handle data that includes quoted commas, quoted tab characters, CJK characters, etc.
In Excel, save the data to file.txt using the type Unicode Text (*.txt) .
Run powershell from the Start menu.
3. Load the file in PowerShell
For those looking for an entirely programmatic (or at least server-side) solution, I've had great success using catdoc's xls2csv tool.
Do the conversion:
This is blazing fast.
Note that it's important that you include the -d utf-8 flag, otherwise it will encode the output in the default cp1252 encoding, and you run the risk of losing information.
Note that xls2csv also only works with .xls files, it does not work with .xlsx files.
What about using Powershell.
I was not able to find a VBA solution for this problem on Mac Excel. There simply seemed to be no way to output UTF-8 text.
So I finally had to give up on VBA, bit the bullet, and learned AppleScript. It wasn't nearly as bad as I had thought.
Assuming an Windows environment, save and work with the file as usual in Excel but then open up the saved Excel file in Gnome Gnumeric (free). Save Gnome Gnumeric's spreadsheet as CSV which - for me anyway - saves it as UTF-8 CSV.
Save xls file (Excel file) as Unicode text=>file will be saved in text format (.txt)
Change format from .txt to .csv (rename the file from XYX.txt to XYX.csv
As funny as it may seem, the easiest way I found to save my 180MB spreadsheet into a UTF8 CSV file was to select the cells into Excel, copy them and to paste the content of the clipboard into SublimeText.
A second option to "nevets1219" is to open your CSV file in Notepad++ and do a convertion to ANSI.
Choose in the top menu : Encoding -> Convert to Ansi
Microsoft Excel has an option to export spreadsheet using Unicode encoding. See following screenshot.
easiest way: no need Open office and google docs
- Save your file as "Unicode text file";
- now you have an unicode text file
- open it with "notepad" and "Save as" it with selecting "utf-8" or other code page that you want
- rename file extension from "txt" to "csv"
dont open it with Ms-office anyway. Now you have a tab delimited CSV file.
I have written a small Python script that can export worksheets in UTF-8.
You just have to provide the Excel file as first parameter followed by the sheets that you would like to export. If you do not provide the sheets, the script will export all worksheets that are present in the Excel file.
Encoding -> Convert to Ansi will encode it in ANSI/UNICODE. Utf8 is a subset of Unicode. Perhaps in ANSI will be encoded correctly, but here we are talking about UTF8, @SequenceDigitale.
There are faster ways, like exporting as csv ( comma delimited ) and then, opening that csv with Notepad++ ( free ), then Encoding > Convert to UTF8. But only if you have to do this once per file. If you need to change and export fequently, then the best is LibreOffice or GDocs solution.
open .csv fine with notepad++. if you see your encoding is good (you see all characters as they should be) press encoding , then convert to ANSI else - find out what is your current encoding
another solution is to open the file by winword and save it as txt and then reopen it by excel and it will work ISA
Came across the same problem and googled out this post. None of the above worked for me. At last I converted my Unicode .xls to .xml (choose Save as . XML Spreadsheet 2003) and it produced the correct character. Then I wrote code to parse the xml and extracted content for my use.
I used the following solution: Mac Exel 2008 > file > Save-as and then under format use MS_DOS Comma Separated (.csv). Worked perfect.
Another way is to open the UTF-8 CSV file in Notepad where it will be displayed correctly. Then replace all the "," with tabs. Paste all of this into a new excel file.
I have the same problem and come across this add in , and it works perfectly fine in excel 2013 beside excel 2007 and 2010 which it is mention for.
Save Dialog > Tools Button > Web Options > Encoding Tab
This is the PHP script (process.php):
And this is the shell command I used to convert the HTML documents to csv:
This is a really, really roundabout way of doing this, but it was the most reliable method that I found.
I use a program that i found on the net (not mine and there is no credit). But it works flowless
You can import to UTF8 or import an UFT8 file, manage it in excel then re export it in UTF8