When using SAMBA on a unix machine so that Windows machines can store their files on a Unix network share, there may sometimes be character encoding problems leading to files not being backed up properly.
The default encoding for SAMBA differs from version 2 to version 3.�� Version 2 default seems to be whatever is used by the client (Windows), possibly cp850 or iso8859-1.�� Version 3 default encoding is UTF8, although it may be set up to use a different encoding.
For the Server Edtion client to be able to read the files properly the filenames must be stored in UTF8 format.�� For SAMBA to store the files in UTF8 format, the following setting must be present in smb.conf:
������ unix charset = UTF8
Instructions to change and test
- Before making any changes to Samba, first create a new folder on the Unix box from a Windows client
- Create some new files in there via a Windows box, including some with accented characters like �� and �� in the name i.e. t��st.txt
- Open the backup client on the UNIX box and include��only��the new folder for backup
- Although the files may be displayed properly the native code that reads the files for backup may not work yet.�� Start a backup and look at the logs for any errors
- If there are errors like����Warning:07:27:43 Failed to read security descriptor for file:/home/.../A�f�����ro.txt, reason: No such file or directory,��then you will need to make some changes to your SMB configuration.��
- Edit smb.conf and add the setting as shown above.
- Restart Samba
- Refresh the directory on the Windows box.�� The filenames will probably be corrupt now - do not be alarmed.
- Create a new file again containing accented characters in the name
- Do another backup and ensure the file is backed up now by looking at the log.
- The other files may still not backup.
- If at this point the file does still not backup you may need to repeat steps 6-10 using different encodings like ISO8859-1 or CP850
Converting existing files
Once you can backup a newly created file it means Samba is now configured correctly.�� However, all the files already on the share are still encoded with the original encoding and you may want to try and convert the existing files on your box to UTF8 (or whatever the correct encoding is), otherwise they will not display correctly in Windows any more.�� This can be done by using a utility like��convmv. Look at the man page for examples.�� The recommended steps are:
- Delete the files in the directory created above so that only the original files that did not work (which are encoded the same way as all the other files on the share) are left in the folder.�� The filenames will probably be corrupt now in Windows.�� This is to be expected.
- Now run the conversion tool��ONLY��on your test folder, i.e.
convmv -f cp850 -t utf8 --notest /shares/testfolder/*��(this is assuming the original encoding was cp850)
- Ensure the files are now correctly visible in Windows.���� If not, run the reverse of the command i.e.
convmv -f utf8 -t cp850 --notest /shares/testfolder/*
and try step 2 again with a different encoding specified after -f (.i.e. iso8859-1)
- Once visible in Windows ensure the files can be backed up
- Delete the already converted files (to prevent double conversion) and only now convert the complete share using the convmv tool.
Old Article ID: 249
Previous Views: 1885
Posted: 25 Feb, 2014 by Marais D.
Updated: 25 Feb, 2014 by Marais D.