Troubleshooting steps for an Xsan volume acting as an AFP Bridgehead:
1. Fragmentation – Run the SNFS defrag utilities per the article on Xsanity that I referenced earlier. This will most likely give the biggest bang for the buck in terms of troubleshooting time.
2. DNS – Rule out DNS by using IP address for all users. This is basically not a DNS issue, but we need to be sure.
3. Number of files and size of files. Try to limit each folder to 100 files for now just to see if there is an issue with 1000 vs 100 files (and keep in mind that subfolders count for file sizes).
4. 3rd party indexing applications. Try to temporarily not use any 3rd party indexing applications.
5. Backups during the day. Try to verify that Atempo is not running during the heavy utilization of the SAN (during the day).
6. Encryption. Do not use AFP over SSH (Secure AFP).
7. Switching. Review the switching infrastructure and disable all features that could be limiting bandwidth.
8. DAS. Test using a little Direct Attached Storage where possible to verify that issues are definitely related to resharing of the SAN as opposed to using DAS.
9. AFP Tuning. Consider enabling Jumbo frames. This likely will not net a performance gain but it’s always worth a shot.
10. Network Home Folders. If you’re trying to run any network home folders off the SAN try disabling this for the initial roll out.
11. Wiring. Verify that all wiring is clean Cat 5e or Cat6 cables. I realize this is kinda’ stupid considering you were using all new patch cables that I pulled out of the bags, but please just look through them and make sure they’re good.
12. Infrastructure. From a switching perspective make sure that there aren’t any bottlenecks along the way where there is a switch feeding another switch with a 100MB or sole gigabit cable stacking the two. If you need to stack, use a real stacking cable (typically giving a 10GB backplane link between switches)
13. LUNS. Make sure you have enough LUNs to provide the bandwidth. I believe we’re at 2GB per Volume, so you should be good here, but just wanted to mention that.
These steps should at a minimum help us to narrow down what issues you are running into. You can also use the debugger in Xsan, and get very verbose logs. With these logs we might be able to find some more issues, but make sure to disable this feature shortly after enabling it as it will fill up your boot volume of the machine running it.
Also, Kerberos and LDAP issues are likely not going to net any bang for the buck in terms of troubleshooting. Can you mount the volume for clients? Yes, which likely rules out any OD issues. Just an FYI to help conserve valuable time in isolating your bandwidth issues.
We have seen fragmentation cause this a few times and this may resolve the issue. If so, it will reoccur and when it does you will need to defrag again. Due to the effect that a defrag has you will likely find you need to rebuild the volume from scratch to clear up the orphaned iNodes caused by the defrag process.